# Johns Hopkins Cumulative Deaths Due to Covid-19

### Importing, Cleaning and Selecting Features 

We are looking for the cumulative reported deaths due to Covid-19 by FIPS code as of 07/31/20.  Johns Hopkins Covid-19 github repository can be found at https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series

In [1]:
import pandas as pd
import numpy as np

In [2]:
df_JHU=pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv')
df_JHU.head()

Unnamed: 0,UID,iso2,iso3,code3,FIPS,Admin2,Province_State,Country_Region,Lat,Long_,...,10/24/20,10/25/20,10/26/20,10/27/20,10/28/20,10/29/20,10/30/20,10/31/20,11/1/20,11/2/20
0,84001001,US,USA,840,1001.0,Autauga,Alabama,US,32.539527,-86.644082,...,31,31,31,31,31,31,31,31,31,31
1,84001003,US,USA,840,1003.0,Baldwin,Alabama,US,30.72775,-87.722071,...,69,69,69,69,69,69,71,71,71,71
2,84001005,US,USA,840,1005.0,Barbour,Alabama,US,31.868263,-85.387129,...,9,9,9,9,9,9,9,9,9,9
3,84001007,US,USA,840,1007.0,Bibb,Alabama,US,32.996421,-87.125115,...,14,14,14,15,15,15,15,15,15,15
4,84001009,US,USA,840,1009.0,Blount,Alabama,US,33.982109,-86.567906,...,25,25,25,25,25,25,25,25,25,25


In [3]:
df_JHU=df_JHU[['FIPS','Admin2','Province_State','10/31/20']]
df_JHU.head()

Unnamed: 0,FIPS,Admin2,Province_State,10/31/20
0,1001.0,Autauga,Alabama,31
1,1003.0,Baldwin,Alabama,71
2,1005.0,Barbour,Alabama,9
3,1007.0,Bibb,Alabama,15
4,1009.0,Blount,Alabama,25


In [4]:
#Checking for null values
df_JHU[df_JHU['FIPS'].isnull()]

Unnamed: 0,FIPS,Admin2,Province_State,10/31/20
1267,,Dukes and Nantucket,Massachusetts,2
1304,,Federal Correctional Institution (FCI),Michigan,5
1336,,Michigan Department of Corrections (MDOC),Michigan,75
1591,,Kansas City,Missouri,213
2954,,Bear River,Utah,18
2959,,Central Utah,Utah,10
2978,,Southeast Utah,Utah,5
2979,,Southwest Utah,Utah,57
2982,,TriCounty,Utah,3
2990,,Weber-Morgan,Utah,36


In [5]:
df_JHU[df_JHU['FIPS'].isnull()]['10/31/20'].sum()

424

There are some locations without FIPS codes.  Their total deaths sum to 183.  If we drop them the effect should be negligable.  

In [6]:
df_JHU=df_JHU[['FIPS','10/31/20']].dropna()
df_JHU.columns=['FIPS','deaths_10/31/20']
df_JHU['FIPS']=df_JHU['FIPS'].astype(int).astype(str).str.rstrip('.0').str.strip().apply(lambda x : x.zfill(5))
df_JHU.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3330 entries, 0 to 3339
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   FIPS             3330 non-null   object
 1   deaths_10/31/20  3330 non-null   int64 
dtypes: int64(1), object(1)
memory usage: 78.0+ KB


In [7]:
df_JHU=df_JHU.set_index('FIPS')

In [8]:
df_JHU

Unnamed: 0_level_0,deaths_10/31/20
FIPS,Unnamed: 1_level_1
01001,31
01003,71
01005,9
01007,15
01009,25
...,...
56039,1
56041,3
90056,10
56043,7
