# Covid-19 Data Analysis

### This Notebook aims to Analyze the publicly available COVID-19 data sourced by CDC, but publicly available on healthdata.gov. You can download the data from here https://healthdata.gov/dataset/provisional-covid-19-death-counts-sex-age-and-state



#### Importing the necessary python libraries

In [17]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="ticks")
import plotly.express as px
import cufflinks as cf
cf.go_offline()

#### Load the data into Pandas Dataframe

In [4]:
df = pd.read_csv('./Provisional_COVID-19_Death.csv')

#### Get the shape of the Dataframe

In [6]:
df.shape

(1416, 13)

#### Get the information of the dataframe

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1416 entries, 0 to 1415
Data columns (total 13 columns):
Data as of                                  1416 non-null object
Start week                                  1416 non-null object
End Week                                    1416 non-null object
State                                       1416 non-null object
Sex                                         1416 non-null object
Age group                                   1416 non-null object
COVID-19 Deaths                             1141 non-null float64
Total Deaths                                1285 non-null float64
Pneumonia Deaths                            1089 non-null float64
Pneumonia and COVID-19 Deaths               1119 non-null float64
Influenza Deaths                            879 non-null float64
Pneumonia, Influenza, or COVID-19 Deaths    1063 non-null float64
Footnote                                    876 non-null object
dtypes: float64(6), object(7)
memory usage: 

#### Describe the Data on the dataframe

In [8]:
df.describe()

Unnamed: 0,COVID-19 Deaths,Total Deaths,Pneumonia Deaths,Pneumonia and COVID-19 Deaths,Influenza Deaths,"Pneumonia, Influenza, or COVID-19 Deaths"
count,1141.0,1285.0,1089.0,1119.0,879.0,1063.0
mean,803.774759,8251.07,923.983471,353.882931,57.146758,1478.427093
std,5813.512354,62273.65,6469.131249,2535.556459,365.643951,10234.593491
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,72.0,0.0,0.0,0.0,12.0
50%,20.0,475.0,54.0,1.0,0.0,82.0
75%,173.0,2651.0,286.0,76.0,21.0,419.5
max,114741.0,1324958.0,125868.0,49623.0,6492.0,196538.0


In [9]:
df.head(2)

Unnamed: 0,Data as of,Start week,End Week,State,Sex,Age group,COVID-19 Deaths,Total Deaths,Pneumonia Deaths,Pneumonia and COVID-19 Deaths,Influenza Deaths,"Pneumonia, Influenza, or COVID-19 Deaths",Footnote
0,07/08/2020,02/01/2020,07/04/2020,United States,All,Under 1 year,9.0,7174.0,67.0,2.0,14.0,88.0,
1,07/08/2020,02/01/2020,07/04/2020,United States,All,1-4 years,7.0,1361.0,48.0,2.0,40.0,93.0,


## Here is the brief overview about the data so far
### Dataset has about 1416 rows with 13 columns
### Time-series Data
#### Start of the week
#### End of the week
### User Demographics Include
#### - Age & Sex
### Mortality Types Include
#### - Influenza
#### - Pneumonia
#### - Influenza, Pneumonia & Covid-19
### Missing Data, Irrelevant Data & Timeseries Data
#### - Foot note column is irrelevant and will be dropped
#### - Right now Start week, End Week, and Data as of is in object, which needs to be converted into datetime format

In [12]:
df = df.drop(['Footnote'], axis=1)

## Viz-1
### This graph represents Covid Total Deaths by each state and broken down by Sex


In [56]:
fig = px.bar(df, x='State', y='Total Deaths', color='Sex', title='Total Deaths by each state and grouped by Sex')
fig.show()

## Viz-2
### This is a bar chart showing total COVID 19 deaths, broken down by State and Age group

In [60]:
fig = px.bar(df, x='State', y='Total Deaths', category_orders={'Age group': ['65-74 Years','75-84 Years']})
fig.show()