# Covid-19 Data Analysis using Pandas

Author: Mohamed Oussama NAJI

Date: Jan 22, 2024

## Table of Contents
1. [Introduction](#introduction)
2. [Data Source](#data-source)
3. [Data Loading](#data-loading)
4. [Data Exploration](#data-exploration)
    - [Top 5 Rows](#top-5-rows)
    - [Dataset Information](#dataset-information)
    - [Missing Values](#missing-values)
5. [Data Analysis](#data-analysis)
    - [Confirmed Cases by Country](#confirmed-cases-by-country)
    - [Deaths by Country](#deaths-by-country)
    - [Recovered Cases by Country](#recovered-cases-by-country)
    - [Active Cases by Country](#active-cases-by-country)
    - [Latest Numbers by Country](#latest-numbers-by-country)
    - [Countries with No Recovered Cases](#countries-with-no-recovered-cases)
    - [Countries with No Confirmed Cases](#countries-with-no-confirmed-cases)
    - [Countries with No Deaths](#countries-with-no-deaths)
    - [Top 10 Countries with Confirmed Cases](#top-10-countries-with-confirmed-cases)
    - [Top 10 Countries with Active Cases](#top-10-countries-with-active-cases)
6. [Data Visualization](#data-visualization)
    - [Country-wise Total Cases](#country-wise-total-cases)
    - [USA: State-wise Deaths](#usa-state-wise-deaths)
    - [USA: State-wise Active Cases](#usa-state-wise-active-cases)
    - [USA: State-wise Confirmed Cases](#usa-state-wise-confirmed-cases)
    - [Worldwide Confirmed Cases Over Time](#worldwide-confirmed-cases-over-time)
7. [Results](#results)
8. [Conclusion](#conclusion)

## Introduction <a id="introduction"></a>

In this notebook, we will perform data analysis on the Covid-19 dataset using the Pandas library. The dataset contains information about confirmed cases, deaths, recovered cases, and active cases of Covid-19 across various countries and regions.


## Data Source <a id="data-source"></a>

The Covid-19 dataset is obtained from the following source:
https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports

The dataset follows a specific file naming convention: MM-DD-YYYY.csv in UTC. Each file contains data for a particular date.


## Data Loading <a id="data-loading"></a>

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-01-2021.csv'
covidata = pd.read_csv(url)
print(covidata)

## Data Exploration <a id="data-exploration"></a>

### Top 5 Rows <a id="top-5-rows"></a>

In [None]:
covidata.head()

### Dataset Information <a id="dataset-information"></a>

In [None]:
covidata.info()

### Missing Values <a id="missing-values"></a>

In [None]:
covidata.isnull().sum()

## Data Analysis <a id="data-analysis"></a>

### Confirmed Cases by Country <a id="confirmed-cases-by-country"></a>

In [None]:
covidata.groupby('Country_Region')['Confirmed'].sum()

### Deaths by Country <a id="deaths-by-country"></a>

In [None]:
covidata.groupby('Country_Region')['Deaths'].sum()

### Recovered Cases by Country <a id="recovered-cases-by-country"></a>

In [None]:
covidata.groupby('Country_Region')['Recovered'].sum()

### Active Cases by Country <a id="active-cases-by-country"></a>

In [None]:
covidata.groupby('Country_Region')['Active'].sum()

### Latest Numbers by Country <a id="latest-numbers-by-country"></a>

In [None]:
covidata.groupby('Country_Region')[['Confirmed', 'Deaths', 'Recovered', 'Active']].last()

### Countries with No Recovered Cases <a id="countries-with-no-recovered-cases"></a>

In [None]:
covidata[covidata['Recovered'] == 0]['Country_Region'].unique()

### Countries with No Confirmed Cases <a id="countries-with-no-confirmed-cases"></a>

In [None]:
covidata[covidata['Confirmed'] == 0]['Country_Region'].unique()

### Countries with No Deaths <a id="countries-with-no-deaths"></a>

In [None]:
covidata[covidata['Deaths'] == 0]['Country_Region'].unique()

### Top 10 Countries with Confirmed Cases <a id="top-10-countries-with-confirmed-cases"></a>

In [None]:
covidata.groupby('Country_Region')['Confirmed'].sum().nlargest(10)

### Top 10 Countries with Active Cases <a id="top-10-countries-with-active-cases"></a>

In [None]:
covidata.groupby('Country_Region')['Active'].sum().nlargest(10)

## Data Visualization <a id="data-visualization"></a>

### Country-wise Total Cases <a id="country-wise-total-cases"></a>

In [None]:
import matplotlib.pyplot as plt

fiftyk = covidata[covidata['Deaths'] > 50000]
fiftyk.groupby('Country_Region')[['Confirmed', 'Deaths', 'Recovered', 'Active']].sum().plot()
plt.show()

### USA: State-wise Deaths <a id="usa-state-wise-deaths"></a>

In [None]:
import plotly.express as px

usa_data = covidata[covidata['Country_Region'] == 'US']

fig = px.bar(usa_data, x='Province_State', y='Deaths', title='State-wise Deaths in USA')
fig.show()

### USA: State-wise Active Cases <a id="usa-state-wise-active-cases"></a>

In [None]:
fig = px.bar(usa_data, x='Province_State', y='Active', title='State-wise Active Cases in USA')
fig.show()

### USA: State-wise Confirmed Cases <a id="usa-state-wise-confirmed-cases"></a>

In [None]:
fig = px.bar(usa_data, x='Province_State', y='Confirmed', title='State-wise Confirmed Cases in USA')
fig.show()

### Worldwide Confirmed Cases Over Time <a id="worldwide-confirmed-cases-over-time"></a>

In [None]:
import plotly.io as pio

confirmed_cases = covidata.groupby(['Country_Region', 'Last_Update'])['Confirmed'].sum().reset_index()
confirmed_cases['Last_Update'] = pd.to_datetime(confirmed_cases['Last_Update'])

fig = px.line(confirmed_cases, x='Last_Update', y='Confirmed', color='Country_Region')
fig.show()

## Results <a id="results"></a>

The data analysis and visualization performed on the Covid-19 dataset using Pandas yielded the following key results:

1. The United States had the highest number of confirmed cases, deaths, and active cases among all countries.
2. Countries like Micronesia, Samoa, and Vanuatu had no reported Covid-19 cases.
3. The top 10 countries with the highest number of confirmed cases were the United States, India, Brazil, the United Kingdom, Russia, Turkey, France, Iran, Argentina, and Spain.
4. Within the United States, California, Texas, and Florida had the highest number of deaths, active cases, and confirmed cases.
5. The worldwide confirmed cases over time showed a steady increase, with the United States, India, and Brazil having the highest numbers.

These results provide insights into the global impact of the Covid-19 pandemic and highlight the countries and regions most affected by the virus.
