<center>COVID-19 DATA ANALYSIS</center>
<bold>Objective:</bold>
This project is trying to show and compare the circumstance of Covid-19 between each country by using data analysis and visualization.

Dataset:<br>
    Source:<br>
    -	This dataset is acquired from Kaggle website. It contains information about the Covid-19 spread as of January 30th, 2020.<br>
    -	File country_wise_latest.csv will be used to perform this analysis. It contains the information as follow:<br>

        •	Country/Region
        •	Confirmed
        •	Deaths
        •	Recovered
        •	Active
        •	New cases
        •	New deaths
        •	New recovered
        •	Deaths / 100 Cases
        •	Recovered / 100 Cases
        •	Deaths / 100 Recovered
        •	Confirmed last week
        •	1 week change
        •	1 week % increase
        •   WHO Region

Date Cleaning/Preprocessing:
    Data structure:

In [7]:
import pandas as pd 
import numpy as np
import plotly.express as px
import plotly.offline as pyo
pyo.init_notebook_mode(connected=True)
# Load dataset
df = pd.read_csv('country_wise_latest.csv')
#Read the first 5 rows of the data set
df.head()

Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
0,Afghanistan,36263,1269,25198,9796,106,10,18,3.5,69.49,5.04,35526,737,2.07,Eastern Mediterranean
1,Albania,4880,144,2745,1991,117,6,63,2.95,56.25,5.25,4171,709,17.0,Europe
2,Algeria,27973,1163,18837,7973,616,8,749,4.16,67.34,6.17,23691,4282,18.07,Africa
3,Andorra,907,52,803,52,10,0,0,5.73,88.53,6.48,884,23,2.6,Europe
4,Angola,950,41,242,667,18,1,0,4.32,25.47,16.94,749,201,26.84,Africa


In [8]:
# Check Na/Null values
df.isna().sum()

Country/Region            0
Confirmed                 0
Deaths                    0
Recovered                 0
Active                    0
New cases                 0
New deaths                0
New recovered             0
Deaths / 100 Cases        0
Recovered / 100 Cases     0
Deaths / 100 Recovered    0
Confirmed last week       0
1 week change             0
1 week % increase         0
WHO Region                0
dtype: int64

- This dataset does not contains Na/Null values
- This dataset is clean and ready for analyzing

In [9]:
# Create the initial choropleth map for 'Confirmed' cases
fig = px.choropleth(df,
                    locations="Country/Region",
                    locationmode="country names",
                    color="Confirmed",
                    hover_name="Country/Region",
                    title="World Map of Confirmed Cases",
                    color_continuous_scale=px.colors.sequential.Plasma)

# Update the layout to include dropdown menu
fig.update_layout(
    title=dict(x=0.5),
    height=500,
    margin={"r":0,"t":30,"l":25,"b":10},
    updatemenus=[{
        "buttons": [
            {
                "args": [{"z": [df["Confirmed"]]}, {"title.text": "World Map of Confirmed Cases"}],
                "label": "Confirmed",
                "method": "update"
            },
            {
                "args": [{"z": [df["Deaths"]]}, {"title.text": "World Map of Deaths"}],
                "label": "Deaths",
                "method": "update"
            },
            {
                "args": [{"z": [df["Recovered"]]}, {"title.text": "World Map of Recovered Cases"}],
                "label": "Recovered",
                "method": "update"
            }
        ],
        "direction": "down",
        "pad": {"r": 10, "t": 10},
        "showactive": True,
        "x": 0.17,
        "xanchor": "left",
        "y": 1.15,
        "yanchor": "top"
    }]
)
pyo.iplot(fig)

In [10]:
# Plot the top 10 countries with the highest number of cases
sorted_df = df.sort_values(by='Confirmed', ascending=False)
plot1 = px.bar(sorted_df.head(10), 
                x='Country/Region', 
                y='Confirmed', 
                title='Top 10 Countries with Highest Case Numbers', 
                color='Confirmed',  
                hover_data=['Confirmed'])
plot1.update_layout(
    title={
        'x': 0.5,  # Centers the title
        'xanchor': 'center',  # Align the title to the center
        'yanchor': 'top'  # Optional: anchor title to the top
    },
    xaxis_title='Country',
    yaxis_title='Number of Cases',
    height = 600
)
pyo.iplot(plot1)

As of January 30th 2020
- The United States has the highest number of confirmed cases, exceeding 4 million. It stands out with a bright yellow bar.
- Brazil follows with approximately 2.5 million confirmed cases, represented by a red bar.
- India is next with just over 1.5 million confirmed cases, shown in purple.
- Other countries like Russia, South Africa, and Mexico have progressively fewer confirmed cases, represented by bars in darker shades of purple and blue.

In [11]:
# Plot the top 10 countries with the highest number of deaths.
sorted_df = df.sort_values(by='Deaths', ascending=False)
plot = px.bar(sorted_df.head(10), x= 'Country/Region', y= 'Deaths',title='Top 10 Countries With Highest Number of Deaths', color='Deaths',  hover_data='Deaths')
plot.update_layout(
    title={
        'x': 0.5,  # Centers the title
        'xanchor': 'center',  # Align the title to the center
        'yanchor': 'top'  # Optional: anchor title to the top
    },
    xaxis_title='Country',
    yaxis_title='Number of Deaths',
    height = 600
)
pyo.iplot(plot1)

- The United States also has the highest number of deaths, with around 150,000 deaths. This is again represented by a bright yellow bar.
- Brazil follows with about 100,000 deaths, shown by a red bar.
- United Kingdom and Mexico: Both countries have significant but fewer deaths compared to the US and Brazil, represented by purple bars.
- Other countries like Italy, India, and France show progressively fewer deaths, represented by bars in darker purple and blue shades.

In [12]:
# Top 10 Countries With The Highest Number of Recovered
sorted_df = df.sort_values(by='Recovered', ascending=False)
plot = px.bar(sorted_df.head(10), x= 'Country/Region', y= 'Recovered',title='Top 10 Countries With Highest Number of Recovered', color='Recovered',  hover_data='Recovered')
plot.update_layout(
    title={
        'x': 0.5,  # Centers the title
    },
    xaxis_title='Country',
    yaxis_title='Number of Recovered',
    height = 600
)
pyo.iplot(plot)

- Brazil leads with the highest number of recovered cases, exceeding 1.5 million. This is represented by the yellow bar.
- The United States follows closely behind with over 1 million recoveries, shown in an orange bar.
- India is in third place with a significant number of recoveries, around 1 million, represented by a pink bar.
- Russia shows a moderate number of recoveries, represented by a purple bar, indicating fewer recoveries compared to Brazil, the US, and India.
- Chile, Mexico, South Africa, Peru, Iran, and Pakistan: These countries have fewer recoveries, represented by darker purple and blue bars, with each showing progressively lower numbers of recovered cases.

- The countries with the highest number of confirmed cases (Brazil, the US, and India) also show the highest number of recoveries, which is - consistent with their large populations and high infection rates.
- Brazil's leadership in recoveries might reflect a high overall number of cases but also a strong ability to manage and recover from the disease.
- The US and India are also recovering a large number of patients, reflecting the scale of their respective health responses and the extensive spread of the virus in these countries.
- Russia and other countries like Chile and Mexico show significant but lower recovery rates, likely reflecting their smaller populations and/or different stages of the pandemic.