#  COVID-19 – Data-Based Prediction Tool 

## Ian Scarff (iie728)

## Practicum II Project 2020

### Import Packages

In [1]:
import numpy as np
import pandas as pd

# Covid-19 Data Preprocessing

### About the Data

Data comes from the website usafacts.org under the webpage 
"Coronavirus Locations: COVID-19 Map by County and State."

The 21 cases confirmed on the Grand Princess cruise ship on March 5 and 6 are attributed to the state of California, but not to any counties. The national numbers also include the 45 people with coronavirus repatriated from the Diamond Princess.

USAFacts attempts to match each case with a county, but some cases counted at the state level are not allocated to counties due to lack of information.

Data is updated each day.


NOTES FROM USAFacts:

Note from April 28: On April 14, New York City began a separate count of "probable deaths" of people believed to have died as a result of COVID-19, though weren't tested. On April 28, these deaths were retroactively added to our death counts, assigned to a New York City borough if possible. In the future, USAFacts will include "probable deaths" in the overall tally if a local government chooses to report that information separately.

Note from April 18: Certain states have changed their methodology in reporting deaths due to COVID-19. As a result, we are holding off on reporting death data in a few key states (New York is notable among these states due to the high number of confirmed cases and deaths). USAFacts is committed to providing official numbers confirmed by state or local health agencies, and we will appropriately backfill the death data when we receive more guidance from the CDC and relevant health departments.

Note from April 15: In certain states, probable deaths are listed alongside confirmed deaths. Following the lead of the CDC, we will begin publishing death counts that combine these two totals where applicable; this might result in larger than expected increases in deaths in certain counties.

Note from March 28: The data now includes all counties regardless of confirmed case count. Additionally, New York City data has been allotted to its five boroughs/counties, where possible.



##### There is no missing data.

#### Import Data

To unsure that we always have a copy of the data saved in the environment, every time the data is imported it will be saved.

In [2]:
### Number of confirmed cases by county
!curl https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_confirmed_usafacts.csv --output data/cases.csv

### Number of confirmed deaths by county
!curl https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_deaths_usafacts.csv --output data/deaths.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1488k  100 1488k    0     0  1796k      0 --:--:-- --:--:-- --:--:-- 1793k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1222k  100 1222k    0     0  1674k      0 --:--:-- --:--:-- --:--:-- 1672k


The labeling for counties in the population dataset were unreliable.

Created seperate population dataset with naming convention that matches other data frames.

Now load those datasets.

In [3]:
### Total Cases
cases = pd.read_csv("data/cases.csv")

odd = "Unnamed: " + str(len(cases.columns) - 1)

if (cases.columns[-1] == odd):
    cases = cases.drop(columns = cases.columns[-1])

cases

Unnamed: 0,countyFIPS,County Name,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,7/1/20,7/2/20,7/3/20,7/4/20,7/5/20,7/6/20,7/7/20,7/8/20,7/9/20,7/10/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,553,561,568,591,615,618,644,651,661,670
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,703,751,845,863,881,911,997,1056,1131,1187
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,326,335,348,350,352,356,360,366,371,381
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,174,179,189,190,193,197,199,201,211,218
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,89,92,100,102,106,113,122,124,126,128
3191,56039,Teton County,WY,56,0,0,0,0,0,0,...,134,136,135,137,140,145,146,149,149,150
3192,56041,Uinta County,WY,56,0,0,0,0,0,0,...,177,180,182,183,184,190,190,192,198,200
3193,56043,Washakie County,WY,56,0,0,0,0,0,0,...,39,39,39,39,39,39,40,42,42,42


In [4]:
### Total Deaths
deaths = pd.read_csv("data/deaths.csv")

if (cases.columns[-1] == odd):
    deaths = deaths.drop(columns = deaths.columns[-1])

deaths

Unnamed: 0,countyFIPS,County Name,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,7/1/20,7/2/20,7/3/20,7/4/20,7/5/20,7/6/20,7/7/20,7/8/20,7/9/20,7/10/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,12,13,13,13,13,13,13,13,14,15
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,10,10,10,10,10,10,10,10,11,12
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,1,1,2,2,2,2,2,2,2,2
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3191,56039,Teton County,WY,56,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
3192,56041,Uinta County,WY,56,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3193,56043,Washakie County,WY,56,0,0,0,0,0,0,...,5,5,5,5,5,5,5,5,5,5


In [5]:
### Total Population
population = pd.read_csv("data/population.csv")
population

Unnamed: 0,County Name,Population,countyFIPS
0,Statewide Unallocated,0,0
1,Autauga,55869,1001
2,Baldwin,223234,1003
3,Barbour,24686,1005
4,Bibb,22394,1007
...,...,...,...
3138,Sweetwater,42343,56037
3139,Teton,23464,56039
3140,Uinta,20226,56041
3141,Washakie,7805,56043


### Fixing Errors
In the cases and deaths dataframes, certain obervations need to be removed.

1: Wade Hampton Census Area, Alaska. This area no longer exists. Was renamed to Kusilvak Census Area.

2: New York City Unallocated/Probable. This is not a county. Observations for the NYC area are covered by the 5 counties of the metropolitan area.

3: Grand Princess Cruise Ship. This is a cruise ship, not a county, and these cases are attributed to California.

In [6]:
#### County Data

### Remove Wade Hampton Area
cases = cases.drop(list(cases[cases["County Name"] == "Wade Hampton Census Area"].index))

### New York City Unallocated/Probable
cases = cases.drop(list(cases[cases["County Name"] == "New York City Unallocated/Probable"].index))

### Remove Grand Princess Cruise Ship
cases = cases.drop(list(cases[cases["County Name"] == "Grand Princess Cruise Ship"].index))


#### Deaths Data
### Remove Wade Hampton Area
deaths = deaths.drop(list(deaths[deaths["County Name"] == "Wade Hampton Census Area"].index))

### New York City Unallocated/Probable
deaths = deaths.drop(list(deaths[deaths["County Name"] == "New York City Unallocated/Probable"].index))

### Remove Grand Princess Cruise Ship
deaths = deaths.drop(list(deaths[deaths["County Name"] == "Grand Princess Cruise Ship"].index))

In [7]:
cases = cases.rename(columns = {"State" : "StateABV"})
cases

Unnamed: 0,countyFIPS,County Name,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,7/1/20,7/2/20,7/3/20,7/4/20,7/5/20,7/6/20,7/7/20,7/8/20,7/9/20,7/10/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,553,561,568,591,615,618,644,651,661,670
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,703,751,845,863,881,911,997,1056,1131,1187
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,326,335,348,350,352,356,360,366,371,381
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,174,179,189,190,193,197,199,201,211,218
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,89,92,100,102,106,113,122,124,126,128
3191,56039,Teton County,WY,56,0,0,0,0,0,0,...,134,136,135,137,140,145,146,149,149,150
3192,56041,Uinta County,WY,56,0,0,0,0,0,0,...,177,180,182,183,184,190,190,192,198,200
3193,56043,Washakie County,WY,56,0,0,0,0,0,0,...,39,39,39,39,39,39,40,42,42,42


In [8]:
deaths = deaths.rename(columns = {"State" : "StateABV"})
deaths

Unnamed: 0,countyFIPS,County Name,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,7/1/20,7/2/20,7/3/20,7/4/20,7/5/20,7/6/20,7/7/20,7/8/20,7/9/20,7/10/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,12,13,13,13,13,13,13,13,14,15
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,10,10,10,10,10,10,10,10,11,12
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,1,1,2,2,2,2,2,2,2,2
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3191,56039,Teton County,WY,56,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
3192,56041,Uinta County,WY,56,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3193,56043,Washakie County,WY,56,0,0,0,0,0,0,...,5,5,5,5,5,5,5,5,5,5


### Prep Data

#### Ensuring Labels

To ensure that county and state labels are the same across dataframes, replace them with labels in FIPS.csv

Bring in FIPS data

In [9]:
### County FIPS
countyFIPS = pd.read_csv("data/countyFIPS.csv")
countyFIPS

Unnamed: 0,County Name,countyFIPS
0,Statewide Unallocated,0
1,Autauga,1001
2,Baldwin,1003
3,Barbour,1005
4,Bibb,1007
...,...,...
3138,Sweetwater,56037
3139,Teton,56039
3140,Uinta,56041
3141,Washakie,56043


In [10]:
### State FIPS
stateFIPS = pd.read_csv("data/stateFIPS.csv")
stateFIPS

Unnamed: 0,State,stateFIPS
0,Alabama,1
1,Alaska,2
2,Arizona,4
3,Arkansas,5
4,California,6
5,Colorado,8
6,Connecticut,9
7,Delaware,10
8,DC,11
9,Florida,12


##### Fixing Cases Labels

In [11]:
### Drop cases county labels
cases = cases.drop(columns = "County Name")
cases

Unnamed: 0,countyFIPS,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,7/1/20,7/2/20,7/3/20,7/4/20,7/5/20,7/6/20,7/7/20,7/8/20,7/9/20,7/10/20
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,AL,1,0,0,0,0,0,0,0,...,553,561,568,591,615,618,644,651,661,670
2,1003,AL,1,0,0,0,0,0,0,0,...,703,751,845,863,881,911,997,1056,1131,1187
3,1005,AL,1,0,0,0,0,0,0,0,...,326,335,348,350,352,356,360,366,371,381
4,1007,AL,1,0,0,0,0,0,0,0,...,174,179,189,190,193,197,199,201,211,218
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,WY,56,0,0,0,0,0,0,0,...,89,92,100,102,106,113,122,124,126,128
3191,56039,WY,56,0,0,0,0,0,0,0,...,134,136,135,137,140,145,146,149,149,150
3192,56041,WY,56,0,0,0,0,0,0,0,...,177,180,182,183,184,190,190,192,198,200
3193,56043,WY,56,0,0,0,0,0,0,0,...,39,39,39,39,39,39,40,42,42,42


In [12]:
### Add County Name from countyFIPS
cases = cases.merge(countyFIPS, how = "left")
cases

Unnamed: 0,countyFIPS,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,7/2/20,7/3/20,7/4/20,7/5/20,7/6/20,7/7/20,7/8/20,7/9/20,7/10/20,County Name
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Statewide Unallocated
1,1001,AL,1,0,0,0,0,0,0,0,...,561,568,591,615,618,644,651,661,670,Autauga
2,1003,AL,1,0,0,0,0,0,0,0,...,751,845,863,881,911,997,1056,1131,1187,Baldwin
3,1005,AL,1,0,0,0,0,0,0,0,...,335,348,350,352,356,360,366,371,381,Barbour
4,1007,AL,1,0,0,0,0,0,0,0,...,179,189,190,193,197,199,201,211,218,Bibb
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3187,56037,WY,56,0,0,0,0,0,0,0,...,92,100,102,106,113,122,124,126,128,Sweetwater
3188,56039,WY,56,0,0,0,0,0,0,0,...,136,135,137,140,145,146,149,149,150,Teton
3189,56041,WY,56,0,0,0,0,0,0,0,...,180,182,183,184,190,190,192,198,200,Uinta
3190,56043,WY,56,0,0,0,0,0,0,0,...,39,39,39,39,39,40,42,42,42,Washakie


In [13]:
### Add State names from stateFIPS
cases = cases.merge(stateFIPS, how = "left")
cases

Unnamed: 0,countyFIPS,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,7/3/20,7/4/20,7/5/20,7/6/20,7/7/20,7/8/20,7/9/20,7/10/20,County Name,State
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,Statewide Unallocated,Alabama
1,1001,AL,1,0,0,0,0,0,0,0,...,568,591,615,618,644,651,661,670,Autauga,Alabama
2,1003,AL,1,0,0,0,0,0,0,0,...,845,863,881,911,997,1056,1131,1187,Baldwin,Alabama
3,1005,AL,1,0,0,0,0,0,0,0,...,348,350,352,356,360,366,371,381,Barbour,Alabama
4,1007,AL,1,0,0,0,0,0,0,0,...,189,190,193,197,199,201,211,218,Bibb,Alabama
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3187,56037,WY,56,0,0,0,0,0,0,0,...,100,102,106,113,122,124,126,128,Sweetwater,Wyoming
3188,56039,WY,56,0,0,0,0,0,0,0,...,135,137,140,145,146,149,149,150,Teton,Wyoming
3189,56041,WY,56,0,0,0,0,0,0,0,...,182,183,184,190,190,192,198,200,Uinta,Wyoming
3190,56043,WY,56,0,0,0,0,0,0,0,...,39,39,39,39,40,42,42,42,Washakie,Wyoming


##### Fixing Deaths Labels

In [14]:
### Drop deaths county labels
deaths = deaths.drop(columns = "County Name")
deaths

Unnamed: 0,countyFIPS,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,7/1/20,7/2/20,7/3/20,7/4/20,7/5/20,7/6/20,7/7/20,7/8/20,7/9/20,7/10/20
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,AL,1,0,0,0,0,0,0,0,...,12,13,13,13,13,13,13,13,14,15
2,1003,AL,1,0,0,0,0,0,0,0,...,10,10,10,10,10,10,10,10,11,12
3,1005,AL,1,0,0,0,0,0,0,0,...,1,1,2,2,2,2,2,2,2,2
4,1007,AL,1,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,WY,56,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3191,56039,WY,56,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
3192,56041,WY,56,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3193,56043,WY,56,0,0,0,0,0,0,0,...,5,5,5,5,5,5,5,5,5,5


In [15]:
### Add County Name from countyFIPS
deaths = deaths.merge(countyFIPS, how = "left")
deaths

Unnamed: 0,countyFIPS,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,7/2/20,7/3/20,7/4/20,7/5/20,7/6/20,7/7/20,7/8/20,7/9/20,7/10/20,County Name
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Statewide Unallocated
1,1001,AL,1,0,0,0,0,0,0,0,...,13,13,13,13,13,13,13,14,15,Autauga
2,1003,AL,1,0,0,0,0,0,0,0,...,10,10,10,10,10,10,10,11,12,Baldwin
3,1005,AL,1,0,0,0,0,0,0,0,...,1,2,2,2,2,2,2,2,2,Barbour
4,1007,AL,1,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,Bibb
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3187,56037,WY,56,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Sweetwater
3188,56039,WY,56,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,Teton
3189,56041,WY,56,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Uinta
3190,56043,WY,56,0,0,0,0,0,0,0,...,5,5,5,5,5,5,5,5,5,Washakie


In [16]:
### Add State names from stateFIPS
deaths = deaths.merge(stateFIPS, how = "left")
deaths

Unnamed: 0,countyFIPS,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,7/3/20,7/4/20,7/5/20,7/6/20,7/7/20,7/8/20,7/9/20,7/10/20,County Name,State
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,Statewide Unallocated,Alabama
1,1001,AL,1,0,0,0,0,0,0,0,...,13,13,13,13,13,13,14,15,Autauga,Alabama
2,1003,AL,1,0,0,0,0,0,0,0,...,10,10,10,10,10,10,11,12,Baldwin,Alabama
3,1005,AL,1,0,0,0,0,0,0,0,...,2,2,2,2,2,2,2,2,Barbour,Alabama
4,1007,AL,1,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,Bibb,Alabama
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3187,56037,WY,56,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,Sweetwater,Wyoming
3188,56039,WY,56,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,Teton,Wyoming
3189,56041,WY,56,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,Uinta,Wyoming
3190,56043,WY,56,0,0,0,0,0,0,0,...,5,5,5,5,5,5,5,5,Washakie,Wyoming


##### Fixing Population Labels

In [17]:
### Drop population county and state labels
population = population.drop(columns = "County Name")
population

Unnamed: 0,Population,countyFIPS
0,0,0
1,55869,1001
2,223234,1003
3,24686,1005
4,22394,1007
...,...,...
3138,42343,56037
3139,23464,56039
3140,20226,56041
3141,7805,56043


In [18]:
### Add County Name from countyFIPS
population = population.merge(countyFIPS, how = "left")
population

Unnamed: 0,Population,countyFIPS,County Name
0,0,0,Statewide Unallocated
1,55869,1001,Autauga
2,223234,1003,Baldwin
3,24686,1005,Barbour
4,22394,1007,Bibb
...,...,...,...
3138,42343,56037,Sweetwater
3139,23464,56039,Teton
3140,20226,56041,Uinta
3141,7805,56043,Washakie


Turns out that the “Statewide Unallocated” data means that those measurements are correct, they just haven’t been assigned a county due to lack of information. 

Leave these observations out of the county dataframe, but included them in creating state dataframe.

#### County Level Data

The cases and deaths data is in a less usable form.

Unpivot the data using pd.melt to make the data more usable.

In [19]:
### Unpivot cases data
cases = pd.melt(cases, id_vars = ['County Name', "State", "StateABV", "countyFIPS", "stateFIPS"],
                 value_vars = cases.columns[3:-2],
                 var_name = "Date", value_name = "Cases")

cases

Unnamed: 0,County Name,State,StateABV,countyFIPS,stateFIPS,Date,Cases
0,Statewide Unallocated,Alabama,AL,0,1,1/22/20,0
1,Autauga,Alabama,AL,1001,1,1/22/20,0
2,Baldwin,Alabama,AL,1003,1,1/22/20,0
3,Barbour,Alabama,AL,1005,1,1/22/20,0
4,Bibb,Alabama,AL,1007,1,1/22/20,0
...,...,...,...,...,...,...,...
545827,Sweetwater,Wyoming,WY,56037,56,7/10/20,128
545828,Teton,Wyoming,WY,56039,56,7/10/20,150
545829,Uinta,Wyoming,WY,56041,56,7/10/20,200
545830,Washakie,Wyoming,WY,56043,56,7/10/20,42


In [20]:
### Unpivot death data
deaths = pd.melt(deaths, id_vars = ['County Name', "State", "StateABV", "countyFIPS", "stateFIPS"],
                 value_vars = list(deaths.columns[3:-2]),
                 var_name = "Date", value_name = "Deaths")

deaths

Unnamed: 0,County Name,State,StateABV,countyFIPS,stateFIPS,Date,Deaths
0,Statewide Unallocated,Alabama,AL,0,1,1/22/20,0
1,Autauga,Alabama,AL,1001,1,1/22/20,0
2,Baldwin,Alabama,AL,1003,1,1/22/20,0
3,Barbour,Alabama,AL,1005,1,1/22/20,0
4,Bibb,Alabama,AL,1007,1,1/22/20,0
...,...,...,...,...,...,...,...
545827,Sweetwater,Wyoming,WY,56037,56,7/10/20,0
545828,Teton,Wyoming,WY,56039,56,7/10/20,1
545829,Uinta,Wyoming,WY,56041,56,7/10/20,0
545830,Washakie,Wyoming,WY,56043,56,7/10/20,5


Combine cases and deaths into one data frame.

In [21]:
### Merge dataframes
cases_deaths = cases.merge(deaths, on = ["State", "StateABV", "County Name", "Date", "countyFIPS", "stateFIPS"])
cases_deaths

Unnamed: 0,County Name,State,StateABV,countyFIPS,stateFIPS,Date,Cases,Deaths
0,Statewide Unallocated,Alabama,AL,0,1,1/22/20,0,0
1,Autauga,Alabama,AL,1001,1,1/22/20,0,0
2,Baldwin,Alabama,AL,1003,1,1/22/20,0,0
3,Barbour,Alabama,AL,1005,1,1/22/20,0,0
4,Bibb,Alabama,AL,1007,1,1/22/20,0,0
...,...,...,...,...,...,...,...,...
545827,Sweetwater,Wyoming,WY,56037,56,7/10/20,128,0
545828,Teton,Wyoming,WY,56039,56,7/10/20,150,1
545829,Uinta,Wyoming,WY,56041,56,7/10/20,200,0
545830,Washakie,Wyoming,WY,56043,56,7/10/20,42,5


Add population to cases_deaths.

In [22]:
### Merge dataframes
cases_deaths = cases_deaths.merge(population, on = ["countyFIPS","County Name"], how = "left")

### Sort
cases_deaths = cases_deaths.astype({"Date" : "datetime64"})
cases_deaths = cases_deaths.sort_values(["State","County Name","Date"], ascending = [True, True, True])


### Rename population and cases
cases_deaths = cases_deaths.rename(columns = {"Cases" : "Total Cases",
                                              "Deaths" : "Total Deaths"})

cases_deaths = cases_deaths.reset_index().drop(columns = "index")
cases_deaths

Unnamed: 0,County Name,State,StateABV,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population
0,Autauga,Alabama,AL,1001,1,2020-01-22,0,0,55869
1,Autauga,Alabama,AL,1001,1,2020-01-23,0,0,55869
2,Autauga,Alabama,AL,1001,1,2020-01-24,0,0,55869
3,Autauga,Alabama,AL,1001,1,2020-01-25,0,0,55869
4,Autauga,Alabama,AL,1001,1,2020-01-26,0,0,55869
...,...,...,...,...,...,...,...,...,...
545827,Weston,Wyoming,WY,56045,56,2020-07-06,1,0,6927
545828,Weston,Wyoming,WY,56045,56,2020-07-07,1,0,6927
545829,Weston,Wyoming,WY,56045,56,2020-07-08,1,0,6927
545830,Weston,Wyoming,WY,56045,56,2020-07-09,1,0,6927


Change data types for County Name and State.

In [23]:
cases_deaths.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545832 entries, 0 to 545831
Data columns (total 9 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   County Name   545832 non-null  object        
 1   State         545832 non-null  object        
 2   StateABV      545832 non-null  object        
 3   countyFIPS    545832 non-null  int64         
 4   stateFIPS     545832 non-null  int64         
 5   Date          545832 non-null  datetime64[ns]
 6   Total Cases   545832 non-null  int64         
 7   Total Deaths  545832 non-null  int64         
 8   Population    545832 non-null  int64         
dtypes: datetime64[ns](1), int64(5), object(3)
memory usage: 37.5+ MB


In [24]:
cases_deaths = cases_deaths.astype({"County Name" : "category",
                                    "State" : "category",
                                    "countyFIPS" : "str",
                                    "stateFIPS" : "str"})
cases_deaths.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545832 entries, 0 to 545831
Data columns (total 9 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   County Name   545832 non-null  category      
 1   State         545832 non-null  category      
 2   StateABV      545832 non-null  object        
 3   countyFIPS    545832 non-null  object        
 4   stateFIPS     545832 non-null  object        
 5   Date          545832 non-null  datetime64[ns]
 6   Total Cases   545832 non-null  int64         
 7   Total Deaths  545832 non-null  int64         
 8   Population    545832 non-null  int64         
dtypes: category(2), datetime64[ns](1), int64(3), object(3)
memory usage: 30.8+ MB


##### Fixing countyFIPS labels

The first 6 states (Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut) have countyFIPS codes that need to start with 0.

Extract the first 6 states.

In [25]:
### First six states end where DC begins
firstSix = cases_deaths[:list(cases_deaths["countyFIPS"][cases_deaths["State"] == "DC"].index)[0]]
firstSix

Unnamed: 0,County Name,State,StateABV,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population
0,Autauga,Alabama,AL,1001,1,2020-01-22,0,0,55869
1,Autauga,Alabama,AL,1001,1,2020-01-23,0,0,55869
2,Autauga,Alabama,AL,1001,1,2020-01-24,0,0,55869
3,Autauga,Alabama,AL,1001,1,2020-01-25,0,0,55869
4,Autauga,Alabama,AL,1001,1,2020-01-26,0,0,55869
...,...,...,...,...,...,...,...,...,...
55228,Windham,Connecticut,CT,9015,9,2020-07-06,624,14,116782
55229,Windham,Connecticut,CT,9015,9,2020-07-07,626,14,116782
55230,Windham,Connecticut,CT,9015,9,2020-07-08,626,14,116782
55231,Windham,Connecticut,CT,9015,9,2020-07-09,630,14,116782


Fix FIPS codes.

In [26]:
### Create a new column with the fixed FIPS codes
firstSix.insert(2,"countyFIPS2", '0' + firstSix["countyFIPS"])
firstSix

Unnamed: 0,County Name,State,countyFIPS2,StateABV,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population
0,Autauga,Alabama,01001,AL,1001,1,2020-01-22,0,0,55869
1,Autauga,Alabama,01001,AL,1001,1,2020-01-23,0,0,55869
2,Autauga,Alabama,01001,AL,1001,1,2020-01-24,0,0,55869
3,Autauga,Alabama,01001,AL,1001,1,2020-01-25,0,0,55869
4,Autauga,Alabama,01001,AL,1001,1,2020-01-26,0,0,55869
...,...,...,...,...,...,...,...,...,...,...
55228,Windham,Connecticut,09015,CT,9015,9,2020-07-06,624,14,116782
55229,Windham,Connecticut,09015,CT,9015,9,2020-07-07,626,14,116782
55230,Windham,Connecticut,09015,CT,9015,9,2020-07-08,626,14,116782
55231,Windham,Connecticut,09015,CT,9015,9,2020-07-09,630,14,116782


In [27]:
### Drop the old FIPS codes and rename the new FIPS codes column
firstSix = firstSix.drop(columns = "countyFIPS")
firstSix = firstSix.rename(columns = {"countyFIPS2" : "countyFIPS"})
firstSix

Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869
...,...,...,...,...,...,...,...,...,...
55228,Windham,Connecticut,09015,CT,9,2020-07-06,624,14,116782
55229,Windham,Connecticut,09015,CT,9,2020-07-07,626,14,116782
55230,Windham,Connecticut,09015,CT,9,2020-07-08,626,14,116782
55231,Windham,Connecticut,09015,CT,9,2020-07-09,630,14,116782


Now drop the first six states in cases_deaths and stack firstSix on top.

In [28]:
firstSixIndex = np.arange(start = 0, stop = list(cases_deaths["countyFIPS"][cases_deaths["State"] == "DC"].index)[0])
cases_deaths = cases_deaths.drop(firstSixIndex)
cases_deaths

Unnamed: 0,County Name,State,StateABV,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population
55233,Washington,DC,DC,11001,11,2020-01-22,0,0,705749
55234,Washington,DC,DC,11001,11,2020-01-23,0,0,705749
55235,Washington,DC,DC,11001,11,2020-01-24,0,0,705749
55236,Washington,DC,DC,11001,11,2020-01-25,0,0,705749
55237,Washington,DC,DC,11001,11,2020-01-26,0,0,705749
...,...,...,...,...,...,...,...,...,...
545827,Weston,Wyoming,WY,56045,56,2020-07-06,1,0,6927
545828,Weston,Wyoming,WY,56045,56,2020-07-07,1,0,6927
545829,Weston,Wyoming,WY,56045,56,2020-07-08,1,0,6927
545830,Weston,Wyoming,WY,56045,56,2020-07-09,1,0,6927


In [29]:
cases_deaths = pd.concat([firstSix,cases_deaths])
cases_deaths

Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869
...,...,...,...,...,...,...,...,...,...
545827,Weston,Wyoming,56045,WY,56,2020-07-06,1,0,6927
545828,Weston,Wyoming,56045,WY,56,2020-07-07,1,0,6927
545829,Weston,Wyoming,56045,WY,56,2020-07-08,1,0,6927
545830,Weston,Wyoming,56045,WY,56,2020-07-09,1,0,6927


In [30]:
cases_deaths.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 545832 entries, 0 to 545831
Data columns (total 9 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   County Name   545832 non-null  category      
 1   State         545832 non-null  category      
 2   countyFIPS    545832 non-null  object        
 3   StateABV      545832 non-null  object        
 4   stateFIPS     545832 non-null  object        
 5   Date          545832 non-null  datetime64[ns]
 6   Total Cases   545832 non-null  int64         
 7   Total Deaths  545832 non-null  int64         
 8   Population    545832 non-null  int64         
dtypes: category(2), datetime64[ns](1), int64(3), object(3)
memory usage: 35.0+ MB


Now make a new data frame without "Statewide Unallocated."

In [31]:
cases_deaths2 = cases_deaths[cases_deaths["County Name"] != "Statewide Unallocated"]
cases_deaths2 = cases_deaths2.reset_index()
cases_deaths2 = cases_deaths2.drop(columns = "index")
cases_deaths2

Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869
...,...,...,...,...,...,...,...,...,...
537277,Weston,Wyoming,56045,WY,56,2020-07-06,1,0,6927
537278,Weston,Wyoming,56045,WY,56,2020-07-07,1,0,6927
537279,Weston,Wyoming,56045,WY,56,2020-07-08,1,0,6927
537280,Weston,Wyoming,56045,WY,56,2020-07-09,1,0,6927


### State Level Data

Now create a data frame that summarizes the data for each state.

In [32]:
### First for Alabama
### Aggregate data
StateData = cases_deaths[cases_deaths['State'] == "Alabama"].groupby("Date").agg(
        TotalCases = pd.NamedAgg(column = "Total Cases", aggfunc = sum),
        TotalDeaths = pd.NamedAgg(column = "Total Deaths", aggfunc = sum),
        Population = pd.NamedAgg(column = "Population", aggfunc = sum))

### Make a vector of the state and its FIPS
state = np.repeat("Alabama", len(cases_deaths["Date"].unique()))
stateABV = np.repeat("AL", len(cases_deaths["Date"].unique()))
statefips = np.repeat('1', len(cases_deaths["Date"].unique()))

### Grab dates
date = cases_deaths["Date"].unique()

### Insert into State Data
StateData.insert(0, "stateFIPS", statefips)
StateData.insert(0, "StateABV", stateABV)
StateData.insert(0, "State", state)
StateData.insert(0, "Date", date)

### Now the rest
for state, fipsNum, stateABV in zip(cases_deaths["State"].unique()[1:], cases_deaths["stateFIPS"].unique()[1:], 
                                    cases_deaths["StateABV"].unique()[1:]) :
    ### Aggregate data
    myStateData = cases_deaths[cases_deaths['State'] == state].groupby("Date").agg(
        TotalCases = pd.NamedAgg(column = "Total Cases", aggfunc = sum),
        TotalDeaths = pd.NamedAgg(column = "Total Deaths", aggfunc = sum),
        Population = pd.NamedAgg(column = "Population", aggfunc = sum))
    
    ### Make a vector of the state/fips and grab dates
    mystate = np.repeat(state, len(cases_deaths["Date"].unique()))
    mystateABV = np.repeat(stateABV, len(cases_deaths["Date"].unique()))
    mystatefips = np.repeat(fipsNum, len(cases_deaths["Date"].unique()))
    mydate = cases_deaths["Date"].unique()
    
    ### Insert data
    myStateData.insert(0, "stateFIPS", mystatefips)
    myStateData.insert(0, "StateABV", mystateABV)
    myStateData.insert(0, "State", state)
    myStateData.insert(0, "Date", date)
    
    ### Stack state datas
    StateData = pd.concat([StateData, myStateData])

### Reset indicies
StateData = StateData.set_index(np.arange(0,len(StateData)))

StateData

Unnamed: 0,Date,State,StateABV,stateFIPS,TotalCases,TotalDeaths,Population
0,2020-01-22,Alabama,AL,1,0,0,4903185
1,2020-01-23,Alabama,AL,1,0,0,4903185
2,2020-01-24,Alabama,AL,1,0,0,4903185
3,2020-01-25,Alabama,AL,1,0,0,4903185
4,2020-01-26,Alabama,AL,1,0,0,4903185
...,...,...,...,...,...,...,...
8716,2020-07-06,Wyoming,WY,56,1675,20,578759
8717,2020-07-07,Wyoming,WY,56,1709,20,578759
8718,2020-07-08,Wyoming,WY,56,1739,21,578759
8719,2020-07-09,Wyoming,WY,56,1767,21,578759


### USA Level Data

Now create a data set for the USA.

In [33]:
### First for date
### Aggregate data
USAData = StateData[StateData['Date'] == StateData["Date"].unique()[0]].groupby("Date").agg(
        TotalCases = pd.NamedAgg(column = "TotalCases", aggfunc = sum),
        TotalDeaths = pd.NamedAgg(column = "TotalDeaths", aggfunc = sum),
        Population = pd.NamedAgg(column = "Population", aggfunc = sum))

### Insert into usaData
USAData.insert(0, "Date", StateData["Date"].unique()[0])
USAData.insert(0, "Country", "United States")


### For the rest of dates
for day in StateData["Date"].unique()[1:]:
    ### Aggregate data
    myUSAData = StateData[StateData['Date'] == day].groupby("Date").agg(
        TotalCases = pd.NamedAgg(column = "TotalCases", aggfunc = sum),
        TotalDeaths = pd.NamedAgg(column = "TotalDeaths", aggfunc = sum),
        Population = pd.NamedAgg(column = "Population", aggfunc = sum))
        
    ### Insert date into data
    myUSAData.insert(0, "Date", day)
    myUSAData.insert(0, "Country", "United States")
    
    ### Stack state datas
    USAData = pd.concat([USAData, myUSAData])
    
    

### Reset indicies
USAData = USAData.set_index(np.arange(0,len(USAData)))

USAData

Unnamed: 0,Country,Date,TotalCases,TotalDeaths,Population
0,United States,2020-01-22,1,0,328239523
1,United States,2020-01-23,1,0,328239523
2,United States,2020-01-24,2,0,328239523
3,United States,2020-01-25,2,0,328239523
4,United States,2020-01-26,5,0,328239523
...,...,...,...,...,...
166,United States,2020-07-06,2917873,129365,328239523
167,United States,2020-07-07,2973816,130284,328239523
168,United States,2020-07-08,3031787,131142,328239523
169,United States,2020-07-09,3094659,132126,328239523


The final Total Cases & Total Deaths nubers are only a bit off. Give or take 100. This is due to removal of data points earlier.

### New Cases & New Deaths

Use multiprocessing to: 
1) Calculate the number of new cases each day.

2) Calculate the number of new deaths each day.

In [34]:
### Import Pool from multiprocessing
from multiprocessing import Pool

#### County Data

In [35]:
### Create a parallelizing function
def parallel1(data, func, n_cores = 25):
    ### Split data by state into 25 sections
    splits = np.array_split(data["State"].unique(), 25)
    
    ### Create empty list
    data_split = []
    
    ### Add each split dataframe to the list
    for i in range(25):
        data_split.append(data[data["State"].isin(list(splits[i]))])
    
    ### Run 
    pool = Pool(n_cores)
    data1 = pd.concat(pool.map(func, data_split))
    pool.close()
    pool.join()
    return data1

In [36]:
### Define function to create new cases data
def newCases1(data):
    changeInCases = []
    ### For each state.
    for state in data["State"].unique():
        ### For each county in the state
        for county in data["County Name"][data["State"] == state].unique():
            ### Calculate diff in case for each day, keep first day
            changeInCases.extend(abs(np.diff(data["Total Cases"][(data["County Name"] == county) &
                                                                         (data["State"] == state)],
                                             prepend = data["Total Cases"][(data["County Name"] == county) &
                                                                         (data["State"] == state)].iloc[0])))
    ### Add to data
    data["New Cases"] = changeInCases

    return data

In [37]:
cases_deaths2 = parallel1(cases_deaths2, newCases1)
cases_deaths2

Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869,0
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869,0
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869,0
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869,0
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869,0
...,...,...,...,...,...,...,...,...,...,...
537277,Weston,Wyoming,56045,WY,56,2020-07-06,1,0,6927,1
537278,Weston,Wyoming,56045,WY,56,2020-07-07,1,0,6927,0
537279,Weston,Wyoming,56045,WY,56,2020-07-08,1,0,6927,0
537280,Weston,Wyoming,56045,WY,56,2020-07-09,1,0,6927,0


In [38]:
### Define function to create new deaths data
def newDeaths1(data):
    changeInDeaths = []
    ### For each state.
    for state in data["State"].unique():
        ### For each county in the state
        for county in data["County Name"][data["State"] == state].unique():
            ### Calculate diff in case for each day, keep first day
            changeInDeaths.extend(abs(np.diff(data["Total Deaths"][(data["County Name"] == county) &
                                                                           (data["State"] == state)],
                                             prepend = data["Total Deaths"][(data["County Name"] == county) &
                                                                           (data["State"] == state)].iloc[0])))
            
    ### Add to data
    data["New Deaths"] = changeInDeaths
        
    return data

In [39]:
cases_deaths2 = parallel1(cases_deaths2, newDeaths1)
cases_deaths2

Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869,0,0
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869,0,0
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869,0,0
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869,0,0
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869,0,0
...,...,...,...,...,...,...,...,...,...,...,...
537277,Weston,Wyoming,56045,WY,56,2020-07-06,1,0,6927,1,0
537278,Weston,Wyoming,56045,WY,56,2020-07-07,1,0,6927,0,0
537279,Weston,Wyoming,56045,WY,56,2020-07-08,1,0,6927,0,0
537280,Weston,Wyoming,56045,WY,56,2020-07-09,1,0,6927,0,0


### State Data

In [40]:
### Create a parallelizing function
def parallel2(data, func, n_cores = 25):
    ### Split data by state into 25 sections
    splits = np.array_split(data["State"].unique(), 25)
    
    ### Create empty list
    data_split = []
    
    ### Add each split dataframe to the list
    for i in range(25):
        data_split.append(data[data["State"].isin(list(splits[i]))])
    
    pool = Pool(n_cores)
    data1 = pd.concat(pool.map(func, data_split))
    pool.close()
    pool.join()
    return data1

In [41]:
### Define function to create new cases data
def newCases2(data):
    changeInCases = []
    ### For each state.
    for state in data["State"].unique():
        ### Calculate diff in case for each day, keep first day
        changeInCases.extend(abs(np.diff(data["TotalCases"][data["State"] == state],
                                         prepend = data["TotalCases"][data["State"] == state].iloc[0])))
    ### Add to data
    data["New Cases"] = changeInCases

    return data

In [42]:
StateData = parallel2(StateData, newCases2)
StateData

Unnamed: 0,Date,State,StateABV,stateFIPS,TotalCases,TotalDeaths,Population,New Cases
0,2020-01-22,Alabama,AL,1,0,0,4903185,0
1,2020-01-23,Alabama,AL,1,0,0,4903185,0
2,2020-01-24,Alabama,AL,1,0,0,4903185,0
3,2020-01-25,Alabama,AL,1,0,0,4903185,0
4,2020-01-26,Alabama,AL,1,0,0,4903185,0
...,...,...,...,...,...,...,...,...
8716,2020-07-06,Wyoming,WY,56,1675,20,578759,41
8717,2020-07-07,Wyoming,WY,56,1709,20,578759,34
8718,2020-07-08,Wyoming,WY,56,1739,21,578759,30
8719,2020-07-09,Wyoming,WY,56,1767,21,578759,28


In [43]:
### Define function to create new deaths data
def newDeaths2(data):
    changeInDeaths = []
    ### For each state.
    for state in data["State"].unique():
        ### Calculate diff in case for each day, keep first day
        changeInDeaths.extend(abs(np.diff(data["TotalDeaths"][data["State"] == state],
                                         prepend = data["TotalDeaths"][data["State"] == state].iloc[0])))
            
    ### Add to data
    data["New Deaths"] = changeInDeaths
        
    return data

In [44]:
StateData = parallel2(StateData, newDeaths2)
StateData

Unnamed: 0,Date,State,StateABV,stateFIPS,TotalCases,TotalDeaths,Population,New Cases,New Deaths
0,2020-01-22,Alabama,AL,1,0,0,4903185,0,0
1,2020-01-23,Alabama,AL,1,0,0,4903185,0,0
2,2020-01-24,Alabama,AL,1,0,0,4903185,0,0
3,2020-01-25,Alabama,AL,1,0,0,4903185,0,0
4,2020-01-26,Alabama,AL,1,0,0,4903185,0,0
...,...,...,...,...,...,...,...,...,...
8716,2020-07-06,Wyoming,WY,56,1675,20,578759,41,0
8717,2020-07-07,Wyoming,WY,56,1709,20,578759,34,0
8718,2020-07-08,Wyoming,WY,56,1739,21,578759,30,1
8719,2020-07-09,Wyoming,WY,56,1767,21,578759,28,0


### USA Data

In [45]:
### New Cases
USAData["New Cases"] = abs(np.diff(USAData["TotalCases"], prepend = USAData["TotalCases"].iloc[0]))

### New Deaths
USAData["New Deaths"] = abs(np.diff(USAData["TotalDeaths"], prepend = USAData["TotalDeaths"].iloc[0]))

USAData

Unnamed: 0,Country,Date,TotalCases,TotalDeaths,Population,New Cases,New Deaths
0,United States,2020-01-22,1,0,328239523,0,0
1,United States,2020-01-23,1,0,328239523,0,0
2,United States,2020-01-24,2,0,328239523,1,0
3,United States,2020-01-25,2,0,328239523,0,0
4,United States,2020-01-26,5,0,328239523,3,0
...,...,...,...,...,...,...,...
166,United States,2020-07-06,2917873,129365,328239523,53439,379
167,United States,2020-07-07,2973816,130284,328239523,55943,919
168,United States,2020-07-08,3031787,131142,328239523,57971,858
169,United States,2020-07-09,3094659,132126,328239523,62872,984


### Proportions

County data.

In [46]:
### Percent of population that have cases.
cases_deaths2["%Cases"] = np.where(cases_deaths2["Population"] != 0,
                                   round((cases_deaths2["Total Cases"] / cases_deaths2["Population"]) * 100, 3),
                                   0)

### Percent of population that have died.
cases_deaths2["%Deaths"] = np.where(cases_deaths2["Population"] != 0,
                                    round((cases_deaths2["Total Deaths"] / cases_deaths2["Population"]) * 100, 3),
                                    0)

cases_deaths2

Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869,0,0,0.000,0.0
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869,0,0,0.000,0.0
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869,0,0,0.000,0.0
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869,0,0,0.000,0.0
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869,0,0,0.000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
537277,Weston,Wyoming,56045,WY,56,2020-07-06,1,0,6927,1,0,0.014,0.0
537278,Weston,Wyoming,56045,WY,56,2020-07-07,1,0,6927,0,0,0.014,0.0
537279,Weston,Wyoming,56045,WY,56,2020-07-08,1,0,6927,0,0,0.014,0.0
537280,Weston,Wyoming,56045,WY,56,2020-07-09,1,0,6927,0,0,0.014,0.0


State data.

In [47]:
### Percent of population that have cases.
StateData["%Cases"] = np.where(StateData["Population"] != 0,
                               round((StateData["TotalCases"] / StateData["Population"]) * 100, 3),
                               0)

### Percent of population that have died.
StateData["%Deaths"] = np.where(StateData["Population"] != 0,
                                round((StateData["TotalDeaths"] / StateData["Population"]) * 100, 3),
                                0)

StateData

Unnamed: 0,Date,State,StateABV,stateFIPS,TotalCases,TotalDeaths,Population,New Cases,New Deaths,%Cases,%Deaths
0,2020-01-22,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000
1,2020-01-23,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000
2,2020-01-24,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000
3,2020-01-25,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000
4,2020-01-26,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000
...,...,...,...,...,...,...,...,...,...,...,...
8716,2020-07-06,Wyoming,WY,56,1675,20,578759,41,0,0.289,0.003
8717,2020-07-07,Wyoming,WY,56,1709,20,578759,34,0,0.295,0.003
8718,2020-07-08,Wyoming,WY,56,1739,21,578759,30,1,0.300,0.004
8719,2020-07-09,Wyoming,WY,56,1767,21,578759,28,0,0.305,0.004


Country data.

In [48]:
### Percent of population that have cases.
USAData["%Cases"] = np.where(USAData["Population"] != 0,
                             round((USAData["TotalCases"] / USAData["Population"]) * 100, 3),
                             0)

### Percent of population that have died.
USAData["%Deaths"] = np.where(USAData["Population"] != 0,
                              round((USAData["TotalDeaths"] / USAData["Population"]) * 100, 3),
                              0)

USAData

Unnamed: 0,Country,Date,TotalCases,TotalDeaths,Population,New Cases,New Deaths,%Cases,%Deaths
0,United States,2020-01-22,1,0,328239523,0,0,0.000,0.000
1,United States,2020-01-23,1,0,328239523,0,0,0.000,0.000
2,United States,2020-01-24,2,0,328239523,1,0,0.000,0.000
3,United States,2020-01-25,2,0,328239523,0,0,0.000,0.000
4,United States,2020-01-26,5,0,328239523,3,0,0.000,0.000
...,...,...,...,...,...,...,...,...,...
166,United States,2020-07-06,2917873,129365,328239523,53439,379,0.889,0.039
167,United States,2020-07-07,2973816,130284,328239523,55943,919,0.906,0.040
168,United States,2020-07-08,3031787,131142,328239523,57971,858,0.924,0.040
169,United States,2020-07-09,3094659,132126,328239523,62872,984,0.943,0.040


### Logarithmic Scales

County data.

In [49]:
cases_deaths2["log(Total Cases)"] = round(np.log(cases_deaths2["Total Cases"]), 3)

cases_deaths2["log(Total Deaths)"] = round(np.log(cases_deaths2["Total Deaths"]), 3)

cases_deaths2["log(New Cases)"] = round(np.log(cases_deaths2["New Cases"]), 3)

cases_deaths2["log(New Deaths)"] = round(np.log(cases_deaths2["New Deaths"]), 3)

cases_deaths2

  result = getattr(ufunc, method)(*inputs, **kwargs)


Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths,log(Total Cases),log(Total Deaths),log(New Cases),log(New Deaths)
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869,0,0,0.000,0.0,-inf,-inf,-inf,-inf
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869,0,0,0.000,0.0,-inf,-inf,-inf,-inf
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869,0,0,0.000,0.0,-inf,-inf,-inf,-inf
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869,0,0,0.000,0.0,-inf,-inf,-inf,-inf
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869,0,0,0.000,0.0,-inf,-inf,-inf,-inf
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
537277,Weston,Wyoming,56045,WY,56,2020-07-06,1,0,6927,1,0,0.014,0.0,0.0,-inf,0.0,-inf
537278,Weston,Wyoming,56045,WY,56,2020-07-07,1,0,6927,0,0,0.014,0.0,0.0,-inf,-inf,-inf
537279,Weston,Wyoming,56045,WY,56,2020-07-08,1,0,6927,0,0,0.014,0.0,0.0,-inf,-inf,-inf
537280,Weston,Wyoming,56045,WY,56,2020-07-09,1,0,6927,0,0,0.014,0.0,0.0,-inf,-inf,-inf


State data.

In [50]:
StateData["log(Total Cases)"] = round(np.log(StateData["TotalCases"]), 3)

StateData["log(Total Deaths)"] = round(np.log(StateData["TotalDeaths"]), 3)

StateData["log(New Cases)"] = round(np.log(StateData["New Cases"]), 3)

StateData["log(New Deaths)"] = round(np.log(StateData["New Deaths"]), 3)

StateData

  result = getattr(ufunc, method)(*inputs, **kwargs)


Unnamed: 0,Date,State,StateABV,stateFIPS,TotalCases,TotalDeaths,Population,New Cases,New Deaths,%Cases,%Deaths,log(Total Cases),log(Total Deaths),log(New Cases),log(New Deaths)
0,2020-01-22,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
1,2020-01-23,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
2,2020-01-24,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
3,2020-01-25,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
4,2020-01-26,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8716,2020-07-06,Wyoming,WY,56,1675,20,578759,41,0,0.289,0.003,7.424,2.996,3.714,-inf
8717,2020-07-07,Wyoming,WY,56,1709,20,578759,34,0,0.295,0.003,7.444,2.996,3.526,-inf
8718,2020-07-08,Wyoming,WY,56,1739,21,578759,30,1,0.300,0.004,7.461,3.045,3.401,0.0
8719,2020-07-09,Wyoming,WY,56,1767,21,578759,28,0,0.305,0.004,7.477,3.045,3.332,-inf


Country data.

In [51]:
USAData["log(Total Cases)"] = round(np.log(USAData["TotalCases"]), 3)

USAData["log(Total Deaths)"] = round(np.log(USAData["TotalDeaths"]), 3)

USAData["log(New Cases)"] = round(np.log(USAData["New Cases"]), 3)

USAData["log(New Deaths)"] = round(np.log(USAData["New Deaths"]), 3)

USAData

  result = getattr(ufunc, method)(*inputs, **kwargs)


Unnamed: 0,Country,Date,TotalCases,TotalDeaths,Population,New Cases,New Deaths,%Cases,%Deaths,log(Total Cases),log(Total Deaths),log(New Cases),log(New Deaths)
0,United States,2020-01-22,1,0,328239523,0,0,0.000,0.000,0.000,-inf,-inf,-inf
1,United States,2020-01-23,1,0,328239523,0,0,0.000,0.000,0.000,-inf,-inf,-inf
2,United States,2020-01-24,2,0,328239523,1,0,0.000,0.000,0.693,-inf,0.000,-inf
3,United States,2020-01-25,2,0,328239523,0,0,0.000,0.000,0.693,-inf,-inf,-inf
4,United States,2020-01-26,5,0,328239523,3,0,0.000,0.000,1.609,-inf,1.099,-inf
...,...,...,...,...,...,...,...,...,...,...,...,...,...
166,United States,2020-07-06,2917873,129365,328239523,53439,379,0.889,0.039,14.886,11.770,10.886,5.938
167,United States,2020-07-07,2973816,130284,328239523,55943,919,0.906,0.040,14.905,11.777,10.932,6.823
168,United States,2020-07-08,3031787,131142,328239523,57971,858,0.924,0.040,14.925,11.784,10.968,6.755
169,United States,2020-07-09,3094659,132126,328239523,62872,984,0.943,0.040,14.945,11.792,11.049,6.892


### Finalize Cases & Deaths Data

Fix column names in State data and USA data.

In [52]:
StateData = StateData.rename(columns = {"TotalCases" : "Total Cases",
                                        "TotalDeaths" : "Total Deaths"})
StateData

Unnamed: 0,Date,State,StateABV,stateFIPS,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths,log(Total Cases),log(Total Deaths),log(New Cases),log(New Deaths)
0,2020-01-22,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
1,2020-01-23,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
2,2020-01-24,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
3,2020-01-25,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
4,2020-01-26,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8716,2020-07-06,Wyoming,WY,56,1675,20,578759,41,0,0.289,0.003,7.424,2.996,3.714,-inf
8717,2020-07-07,Wyoming,WY,56,1709,20,578759,34,0,0.295,0.003,7.444,2.996,3.526,-inf
8718,2020-07-08,Wyoming,WY,56,1739,21,578759,30,1,0.300,0.004,7.461,3.045,3.401,0.0
8719,2020-07-09,Wyoming,WY,56,1767,21,578759,28,0,0.305,0.004,7.477,3.045,3.332,-inf


In [53]:
USAData = USAData.rename(columns = {"TotalCases" : "Total Cases",
                                        "TotalDeaths" : "Total Deaths"})
USAData

Unnamed: 0,Country,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths,log(Total Cases),log(Total Deaths),log(New Cases),log(New Deaths)
0,United States,2020-01-22,1,0,328239523,0,0,0.000,0.000,0.000,-inf,-inf,-inf
1,United States,2020-01-23,1,0,328239523,0,0,0.000,0.000,0.000,-inf,-inf,-inf
2,United States,2020-01-24,2,0,328239523,1,0,0.000,0.000,0.693,-inf,0.000,-inf
3,United States,2020-01-25,2,0,328239523,0,0,0.000,0.000,0.693,-inf,-inf,-inf
4,United States,2020-01-26,5,0,328239523,3,0,0.000,0.000,1.609,-inf,1.099,-inf
...,...,...,...,...,...,...,...,...,...,...,...,...,...
166,United States,2020-07-06,2917873,129365,328239523,53439,379,0.889,0.039,14.886,11.770,10.886,5.938
167,United States,2020-07-07,2973816,130284,328239523,55943,919,0.906,0.040,14.905,11.777,10.932,6.823
168,United States,2020-07-08,3031787,131142,328239523,57971,858,0.924,0.040,14.925,11.784,10.968,6.755
169,United States,2020-07-09,3094659,132126,328239523,62872,984,0.943,0.040,14.945,11.792,11.049,6.892


Change data types in State and USA data.

In [54]:
StateData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8721 entries, 0 to 8720
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Date               8721 non-null   datetime64[ns]
 1   State              8721 non-null   object        
 2   StateABV           8721 non-null   object        
 3   stateFIPS          8721 non-null   object        
 4   Total Cases        8721 non-null   int64         
 5   Total Deaths       8721 non-null   int64         
 6   Population         8721 non-null   int64         
 7   New Cases          8721 non-null   int64         
 8   New Deaths         8721 non-null   int64         
 9   %Cases             8721 non-null   float64       
 10  %Deaths            8721 non-null   float64       
 11  log(Total Cases)   8721 non-null   float64       
 12  log(Total Deaths)  8721 non-null   float64       
 13  log(New Cases)     8721 non-null   float64       
 14  log(New 

In [55]:
StateData = StateData.astype({"State" : "category",
                              "stateFIPS" : "str"})
StateData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8721 entries, 0 to 8720
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Date               8721 non-null   datetime64[ns]
 1   State              8721 non-null   category      
 2   StateABV           8721 non-null   object        
 3   stateFIPS          8721 non-null   object        
 4   Total Cases        8721 non-null   int64         
 5   Total Deaths       8721 non-null   int64         
 6   Population         8721 non-null   int64         
 7   New Cases          8721 non-null   int64         
 8   New Deaths         8721 non-null   int64         
 9   %Cases             8721 non-null   float64       
 10  %Deaths            8721 non-null   float64       
 11  log(Total Cases)   8721 non-null   float64       
 12  log(Total Deaths)  8721 non-null   float64       
 13  log(New Cases)     8721 non-null   float64       
 14  log(New 

In [56]:
USAData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 171 entries, 0 to 170
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Country            171 non-null    object        
 1   Date               171 non-null    datetime64[ns]
 2   Total Cases        171 non-null    int64         
 3   Total Deaths       171 non-null    int64         
 4   Population         171 non-null    int64         
 5   New Cases          171 non-null    int64         
 6   New Deaths         171 non-null    int64         
 7   %Cases             171 non-null    float64       
 8   %Deaths            171 non-null    float64       
 9   log(Total Cases)   171 non-null    float64       
 10  log(Total Deaths)  171 non-null    float64       
 11  log(New Cases)     171 non-null    float64       
 12  log(New Deaths)    171 non-null    float64       
dtypes: datetime64[ns](1), float64(6), int64(5), object(1)
memory usag

In [57]:
USAData = USAData.astype({"Country" : "category"})
USAData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 171 entries, 0 to 170
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Country            171 non-null    category      
 1   Date               171 non-null    datetime64[ns]
 2   Total Cases        171 non-null    int64         
 3   Total Deaths       171 non-null    int64         
 4   Population         171 non-null    int64         
 5   New Cases          171 non-null    int64         
 6   New Deaths         171 non-null    int64         
 7   %Cases             171 non-null    float64       
 8   %Deaths            171 non-null    float64       
 9   log(Total Cases)   171 non-null    float64       
 10  log(Total Deaths)  171 non-null    float64       
 11  log(New Cases)     171 non-null    float64       
 12  log(New Deaths)    171 non-null    float64       
dtypes: category(1), datetime64[ns](1), float64(6), int64(5)
memory us

Save cases_deaths2 as county data.

In [58]:
CountyData = cases_deaths2

# Google Mobility Data Preporcessing

### About the Data

The mobility data for this project comes from Google's open source Covid-19 Community Mobility Reports.

The data constists of anonymized aggregated location data.

The data tracks movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential.

Changes for each day are compared to a baseline value for that day of the week:

- The baseline is the median value, for the corresponding day of the week, during the 5-week period Jan 3–Feb 6, 2020 (pre-pandemic).

- __The datasets show trends over several months with the most recent data representing approximately 2-3 days ago—this is how long it takes to produce the datasets.__

<br>

#### Place categories
- Grocery & pharmacy
    - Mobility trends for places like grocery markets, food warehouses, farmers markets, specialty food shops, drug stores, and pharmacies.

- Parks
    - Mobility trends for places like local parks, national parks, public beaches, marinas, dog parks, plazas, and public gardens.

- Transit stations
    - Mobility trends for places like public transport hubs such as subway, bus, and train stations.

- Retail & recreation
    - Mobility trends for places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters.

- Residential
    - Mobility trends for places of residence.

- Workplaces
    - Mobility trends for places of work.


<br>

No personally identifiable information, such as an individual’s location, contacts or movement, is made available at any point.

This data will be available for a limited time, as long as public health officials find it useful in their work to stop the spread of COVID-19.

The data can be found here: https://www.google.com/covid19/mobility/

#### Import Data

To unsure that we always have a copy of the data saved in the environment, every time the data is imported it will be saved.

In [59]:
### Google Mobility data
!curl https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv?cachebust=7d0cb7d254d29111 --output data/mobility.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 42.7M  100 42.7M    0     0  55.0M      0 --:--:-- --:--:-- --:--:-- 54.9M


Load in mobility data.

In [60]:
GoogleMobility = pd.read_csv("data/mobility.csv", dtype = "str")
GoogleMobility

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,iso_3166_2_code,census_fips_code,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
0,AE,United Arab Emirates,,,,,2020-02-15,0,4,5,0,2,1
1,AE,United Arab Emirates,,,,,2020-02-16,1,4,4,1,2,1
2,AE,United Arab Emirates,,,,,2020-02-17,-1,1,5,1,2,1
3,AE,United Arab Emirates,,,,,2020-02-18,-2,1,5,0,2,1
4,AE,United Arab Emirates,,,,,2020-02-19,-2,0,4,-1,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
651001,ZW,Zimbabwe,Midlands Province,,ZW-MI,,2020-07-03,,,,,2,
651002,ZW,Zimbabwe,Midlands Province,,ZW-MI,,2020-07-04,,,,,21,
651003,ZW,Zimbabwe,Midlands Province,,ZW-MI,,2020-07-05,,,,,21,
651004,ZW,Zimbabwe,Midlands Province,,ZW-MI,,2020-07-06,,,,,-3,


This dataset countains world wide information. Filter out anything that is not the United States.

In [61]:
### Keep only US
GoogleMobility = GoogleMobility[GoogleMobility["country_region_code"] == "US"]
GoogleMobility

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,iso_3166_2_code,census_fips_code,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
252182,US,United States,,,,,2020-02-15,6,2,15,3,2,-1
252183,US,United States,,,,,2020-02-16,7,1,16,2,0,-1
252184,US,United States,,,,,2020-02-17,6,0,28,-9,-24,5
252185,US,United States,,,,,2020-02-18,0,-1,6,1,0,1
252186,US,United States,,,,,2020-02-19,2,0,8,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
635645,US,United States,Wyoming,Weston County,,56045,2020-07-01,,,,,-24,
635646,US,United States,Wyoming,Weston County,,56045,2020-07-02,,,,,-23,
635647,US,United States,Wyoming,Weston County,,56045,2020-07-03,,,,,-38,
635648,US,United States,Wyoming,Weston County,,56045,2020-07-06,,,,,-27,


Luckily, we can separate the data into county, state, and country levels.

In [62]:
### Mobility data for whole country
GoogleUsaMobility = GoogleMobility[GoogleMobility["sub_region_1"].isnull()]

### Mobility data for states
GoogleStateMobility = GoogleMobility[(GoogleMobility["sub_region_1"].isnull() != True) & (GoogleMobility["sub_region_2"].isnull())]

### Mobility data for counties
GoogleCountyMobility = GoogleMobility[GoogleMobility["sub_region_2"].isnull() != True]

In [63]:
GoogleUsaMobility

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,iso_3166_2_code,census_fips_code,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
252182,US,United States,,,,,2020-02-15,6,2,15,3,2,-1
252183,US,United States,,,,,2020-02-16,7,1,16,2,0,-1
252184,US,United States,,,,,2020-02-17,6,0,28,-9,-24,5
252185,US,United States,,,,,2020-02-18,0,-1,6,1,0,1
252186,US,United States,,,,,2020-02-19,2,0,8,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
252321,US,United States,,,,,2020-07-03,-10,11,86,-30,-57,14
252322,US,United States,,,,,2020-07-04,-35,-2,81,-30,-28,5
252323,US,United States,,,,,2020-07-05,-22,-10,53,-23,-19,3
252324,US,United States,,,,,2020-07-06,-13,-2,48,-30,-41,11


In [64]:
GoogleStateMobility

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,iso_3166_2_code,census_fips_code,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
252326,US,United States,Alabama,,US-AL,,2020-02-15,5,2,39,7,2,-1
252327,US,United States,Alabama,,US-AL,,2020-02-16,0,-2,-7,3,-1,1
252328,US,United States,Alabama,,US-AL,,2020-02-17,3,0,17,7,-17,4
252329,US,United States,Alabama,,US-AL,,2020-02-18,-4,-3,-11,-1,1,2
252330,US,United States,Alabama,,US-AL,,2020-02-19,4,1,6,4,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
632931,US,United States,Wyoming,,US-WY,,2020-07-03,14,51,,62,-50,7
632932,US,United States,Wyoming,,US-WY,,2020-07-04,-11,34,404,39,-22,-2
632933,US,United States,Wyoming,,US-WY,,2020-07-05,12,33,320,42,-13,-3
632934,US,United States,Wyoming,,US-WY,,2020-07-06,24,37,325,40,-28,3


In [65]:
GoogleCountyMobility

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,iso_3166_2_code,census_fips_code,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
252470,US,United States,Alabama,Autauga County,,01001,2020-02-15,5,7,,,-4,
252471,US,United States,Alabama,Autauga County,,01001,2020-02-16,0,1,-23,,-4,
252472,US,United States,Alabama,Autauga County,,01001,2020-02-17,8,0,,,-27,5
252473,US,United States,Alabama,Autauga County,,01001,2020-02-18,-2,0,,,2,0
252474,US,United States,Alabama,Autauga County,,01001,2020-02-19,-2,0,,,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
635645,US,United States,Wyoming,Weston County,,56045,2020-07-01,,,,,-24,
635646,US,United States,Wyoming,Weston County,,56045,2020-07-02,,,,,-23,
635647,US,United States,Wyoming,Weston County,,56045,2020-07-03,,,,,-38,
635648,US,United States,Wyoming,Weston County,,56045,2020-07-06,,,,,-27,


We can drop some uneccesary columns from each dataframe level.

In [66]:
### Drop columns from usaMobility
GoogleUsaMobility = GoogleUsaMobility.drop(columns = ["country_region_code", "sub_region_1",
                                          "sub_region_2", "iso_3166_2_code",
                                          "census_fips_code"])

### Drop columns from stateMobility
GoogleStateMobility = GoogleStateMobility.drop(columns = ["country_region_code", "country_region", 
                                              "sub_region_2", "iso_3166_2_code", 
                                              "census_fips_code"])

### Drop columns from countyMobility
GoogleCountyMobility = GoogleCountyMobility.drop(columns = ["country_region_code", "country_region",
                                                "sub_region_1", "iso_3166_2_code"])

In [67]:
GoogleUsaMobility

Unnamed: 0,country_region,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
252182,United States,2020-02-15,6,2,15,3,2,-1
252183,United States,2020-02-16,7,1,16,2,0,-1
252184,United States,2020-02-17,6,0,28,-9,-24,5
252185,United States,2020-02-18,0,-1,6,1,0,1
252186,United States,2020-02-19,2,0,8,1,1,0
...,...,...,...,...,...,...,...,...
252321,United States,2020-07-03,-10,11,86,-30,-57,14
252322,United States,2020-07-04,-35,-2,81,-30,-28,5
252323,United States,2020-07-05,-22,-10,53,-23,-19,3
252324,United States,2020-07-06,-13,-2,48,-30,-41,11


In [68]:
GoogleStateMobility

Unnamed: 0,sub_region_1,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
252326,Alabama,2020-02-15,5,2,39,7,2,-1
252327,Alabama,2020-02-16,0,-2,-7,3,-1,1
252328,Alabama,2020-02-17,3,0,17,7,-17,4
252329,Alabama,2020-02-18,-4,-3,-11,-1,1,2
252330,Alabama,2020-02-19,4,1,6,4,1,0
...,...,...,...,...,...,...,...,...
632931,Wyoming,2020-07-03,14,51,,62,-50,7
632932,Wyoming,2020-07-04,-11,34,404,39,-22,-2
632933,Wyoming,2020-07-05,12,33,320,42,-13,-3
632934,Wyoming,2020-07-06,24,37,325,40,-28,3


In [69]:
GoogleCountyMobility

Unnamed: 0,sub_region_2,census_fips_code,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
252470,Autauga County,01001,2020-02-15,5,7,,,-4,
252471,Autauga County,01001,2020-02-16,0,1,-23,,-4,
252472,Autauga County,01001,2020-02-17,8,0,,,-27,5
252473,Autauga County,01001,2020-02-18,-2,0,,,2,0
252474,Autauga County,01001,2020-02-19,-2,0,,,2,0
...,...,...,...,...,...,...,...,...,...
635645,Weston County,56045,2020-07-01,,,,,-24,
635646,Weston County,56045,2020-07-02,,,,,-23,
635647,Weston County,56045,2020-07-03,,,,,-38,
635648,Weston County,56045,2020-07-06,,,,,-27,


Now rename columns to be more usable and to match covid-19 data naming convention.

Also make Date as datetime64.

In [70]:
### Rename usaMobility columns
GoogleUsaMobility = GoogleUsaMobility.rename(columns = {"country_region" : "Country",
                                            "date" : "Date",
                                            "retail_and_recreation_percent_change_from_baseline" : "%Retail/Rec Change",
                                            "grocery_and_pharmacy_percent_change_from_baseline" : "%Grocery/Pharm Change",
                                            "parks_percent_change_from_baseline" : "%Parks Change",
                                            "transit_stations_percent_change_from_baseline" : "%Transit Change",
                                            "workplaces_percent_change_from_baseline" : "%Workplace Change",
                                            "residential_percent_change_from_baseline" : "%Residential Change"})
GoogleUsaMobility = GoogleUsaMobility.astype({"Date" : "datetime64"})


### Rename stateMobility columns
GoogleStateMobility = GoogleStateMobility.rename(columns = {"sub_region_1" : "State",
                                            "date" : "Date",
                                            "retail_and_recreation_percent_change_from_baseline" : "%Retail/Rec Change",
                                            "grocery_and_pharmacy_percent_change_from_baseline" : "%Grocery/Pharm Change",
                                            "parks_percent_change_from_baseline" : "%Parks Change",
                                            "transit_stations_percent_change_from_baseline" : "%Transit Change",
                                            "workplaces_percent_change_from_baseline" : "%Workplace Change",
                                            "residential_percent_change_from_baseline" : "%Residential Change"})
GoogleStateMobility = GoogleStateMobility.astype({"Date" : "datetime64"})


### Rename countyMobility columns
GoogleCountyMobility = GoogleCountyMobility.rename(columns = {"sub_region_2" : "County Name",
                                            "census_fips_code" : "countyFIPS",
                                            "date" : "Date",
                                            "retail_and_recreation_percent_change_from_baseline" : "%Retail/Rec Change",
                                            "grocery_and_pharmacy_percent_change_from_baseline" : "%Grocery/Pharm Change",
                                            "parks_percent_change_from_baseline" : "%Parks Change",
                                            "transit_stations_percent_change_from_baseline" : "%Transit Change",
                                            "workplaces_percent_change_from_baseline" : "%Workplace Change",
                                            "residential_percent_change_from_baseline" : "%Residential Change"})
GoogleCountyMobility = GoogleCountyMobility.astype({"Date" : "datetime64"})


In [71]:
GoogleUsaMobility

Unnamed: 0,Country,Date,%Retail/Rec Change,%Grocery/Pharm Change,%Parks Change,%Transit Change,%Workplace Change,%Residential Change
252182,United States,2020-02-15,6,2,15,3,2,-1
252183,United States,2020-02-16,7,1,16,2,0,-1
252184,United States,2020-02-17,6,0,28,-9,-24,5
252185,United States,2020-02-18,0,-1,6,1,0,1
252186,United States,2020-02-19,2,0,8,1,1,0
...,...,...,...,...,...,...,...,...
252321,United States,2020-07-03,-10,11,86,-30,-57,14
252322,United States,2020-07-04,-35,-2,81,-30,-28,5
252323,United States,2020-07-05,-22,-10,53,-23,-19,3
252324,United States,2020-07-06,-13,-2,48,-30,-41,11


In [72]:
GoogleStateMobility

Unnamed: 0,State,Date,%Retail/Rec Change,%Grocery/Pharm Change,%Parks Change,%Transit Change,%Workplace Change,%Residential Change
252326,Alabama,2020-02-15,5,2,39,7,2,-1
252327,Alabama,2020-02-16,0,-2,-7,3,-1,1
252328,Alabama,2020-02-17,3,0,17,7,-17,4
252329,Alabama,2020-02-18,-4,-3,-11,-1,1,2
252330,Alabama,2020-02-19,4,1,6,4,1,0
...,...,...,...,...,...,...,...,...
632931,Wyoming,2020-07-03,14,51,,62,-50,7
632932,Wyoming,2020-07-04,-11,34,404,39,-22,-2
632933,Wyoming,2020-07-05,12,33,320,42,-13,-3
632934,Wyoming,2020-07-06,24,37,325,40,-28,3


In [73]:
GoogleCountyMobility

Unnamed: 0,County Name,countyFIPS,Date,%Retail/Rec Change,%Grocery/Pharm Change,%Parks Change,%Transit Change,%Workplace Change,%Residential Change
252470,Autauga County,01001,2020-02-15,5,7,,,-4,
252471,Autauga County,01001,2020-02-16,0,1,-23,,-4,
252472,Autauga County,01001,2020-02-17,8,0,,,-27,5
252473,Autauga County,01001,2020-02-18,-2,0,,,2,0
252474,Autauga County,01001,2020-02-19,-2,0,,,2,0
...,...,...,...,...,...,...,...,...,...
635645,Weston County,56045,2020-07-01,,,,,-24,
635646,Weston County,56045,2020-07-02,,,,,-23,
635647,Weston County,56045,2020-07-03,,,,,-38,
635648,Weston County,56045,2020-07-06,,,,,-27,


In [74]:
### Re-label District of Columbia as DC
DCindex = list(GoogleStateMobility["State"][GoogleStateMobility["State"] == "District of Columbia"].index)
for index in DCindex:
    GoogleStateMobility["State"][index] = "DC"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


 # 

# Deaths by Sex & Age (Country, State Level)

This data comes from the Centers of Disease Control and Prevention (CDC) and is provided by the National Center for Health Statistics.

It contains aggregated death data based on Country, State, Sex, and Age Group.

__NOTE__: "Number of deaths reported in this table are the total number of deaths received and coded as of the date of analysis, and do not represent all deaths that occurred in that period. Data during this period are incomplete because of the lag in time between when the death occurred and when the death certificate is completed, submitted to NCHS and processed for reporting purposes. This delay can range from 1 week to 8 weeks or more."

__NOTE__: One or more data cells have counts between 1–9 and have been suppressed in accordance with NCHS confidentiality standards.

Data can be found here: 
* Centers for Disease Control and Prevention. *Provisional Death Counts for Coronavirus Disease (COVID-19): Weekly State-Specific Data Updates*. 2020 April 2020. https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Sex-Age-and-S/9bhg-hcku

In [75]:
### Go grab data
!curl https://data.cdc.gov/api/views/9bhg-hcku/rows.csv?accessType=DOWNLOAD --output data/sexage.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  214k    0  2140    0     0      0      0 --:--:-- --:--:-- --:--:--     0k    0     0   355k      0 --:--:-- --:--:-- --:--:--  355k


In [76]:
### Read in data
DeathsSexAge = pd.read_csv("data/sexage.csv")
DeathsSexAge

Unnamed: 0,Data as of,Start week,End Week,State,Sex,Age group,COVID-19 Deaths,Total Deaths,Pneumonia Deaths,Pneumonia and COVID-19 Deaths,Influenza Deaths,"Pneumonia, Influenza, or COVID-19 Deaths",Footnote
0,07/08/2020,02/01/2020,07/04/2020,United States,All,Under 1 year,9.0,6896.0,64.0,2.0,14.0,85.0,
1,07/08/2020,02/01/2020,07/04/2020,United States,All,1-4 years,6.0,1325.0,46.0,2.0,40.0,90.0,
2,07/08/2020,02/01/2020,07/04/2020,United States,All,5-14 years,14.0,1995.0,66.0,3.0,46.0,123.0,
3,07/08/2020,02/01/2020,07/04/2020,United States,All,15-24 years,142.0,12369.0,247.0,46.0,51.0,390.0,
4,07/08/2020,02/01/2020,07/04/2020,United States,All,25-34 years,770.0,26258.0,937.0,344.0,147.0,1497.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1411,07/08/2020,02/01/2020,07/04/2020,Puerto Rico,Female,75-84 years,0.0,0.0,0.0,0.0,0.0,0.0,
1412,07/08/2020,02/01/2020,07/04/2020,Puerto Rico,Female,85 years and over,0.0,0.0,0.0,0.0,0.0,0.0,
1413,07/08/2020,02/01/2020,07/04/2020,Puerto Rico,Female,All ages,,380.0,40.0,,,44.0,One or more data cells have counts between 1–9...
1414,07/08/2020,02/01/2020,07/04/2020,Puerto Rico,Unknown,All ages,0.0,0.0,0.0,0.0,0.0,0.0,


Only interested in Covid-19 Deaths.

In [77]:
DeathsSexAge = DeathsSexAge.drop(columns = ["Total Deaths",
                                            "Pneumonia Deaths",
                                            "Pneumonia and COVID-19 Deaths",
                                            "Influenza Deaths", 
                                            "Pneumonia, Influenza, or COVID-19 Deaths",
                                            "Footnote"])
DeathsSexAge

Unnamed: 0,Data as of,Start week,End Week,State,Sex,Age group,COVID-19 Deaths
0,07/08/2020,02/01/2020,07/04/2020,United States,All,Under 1 year,9.0
1,07/08/2020,02/01/2020,07/04/2020,United States,All,1-4 years,6.0
2,07/08/2020,02/01/2020,07/04/2020,United States,All,5-14 years,14.0
3,07/08/2020,02/01/2020,07/04/2020,United States,All,15-24 years,142.0
4,07/08/2020,02/01/2020,07/04/2020,United States,All,25-34 years,770.0
...,...,...,...,...,...,...,...
1411,07/08/2020,02/01/2020,07/04/2020,Puerto Rico,Female,75-84 years,0.0
1412,07/08/2020,02/01/2020,07/04/2020,Puerto Rico,Female,85 years and over,0.0
1413,07/08/2020,02/01/2020,07/04/2020,Puerto Rico,Female,All ages,
1414,07/08/2020,02/01/2020,07/04/2020,Puerto Rico,Unknown,All ages,0.0


In [78]:
### Drop Puerto Rico, Puerto Rico Total
PRindex = list(DeathsSexAge["State"][(DeathsSexAge["State"] == "Puerto Rico") | (DeathsSexAge["State"] == "Puerto Rico Total")].index)
DeathsSexAge = DeathsSexAge.drop(index = PRindex)
DeathsSexAge


### Rename DC
DCindex = list(DeathsSexAge["State"][DeathsSexAge["State"] == "District of Columbia"].index)
DeathsSexAge["State"][DCindex] = "DC"

DeathsSexAge["State"].unique()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':


array(['United States', 'United States Total', 'Alabama', 'Alabama Total',
       'Alaska', 'Alaska Total', 'Arizona', 'Arizona Total', 'Arkansas',
       'Arkansas Total', 'California', 'California Total', 'Colorado',
       'Colorado Total', 'Connecticut', 'Connecticut Total', 'Delaware',
       'Delaware Total', 'DC', 'District of Columbia Total', 'Florida',
       'Florida Total', 'Georgia', 'Georgia Total', 'Hawaii',
       'Hawaii Total', 'Idaho', 'Idaho Total', 'Illinois',
       'Illinois Total', 'Indiana', 'Indiana Total', 'Iowa', 'Iowa Total',
       'Kansas', 'Kansas Total', 'Kentucky', 'Kentucky Total',
       'Louisiana', 'Louisiana Total', 'Maine', 'Maine Total', 'Maryland',
       'Maryland Total', 'Massachusetts', 'Massachusetts Total',
       'Michigan', 'Michigan Total', 'Minnesota', 'Minnesota Total',
       'Mississippi', 'Mississippi Total', 'Missouri', 'Missouri Total',
       'Montana', 'Montana Total', 'Nebraska', 'Nebraska Total', 'Nevada',
       'Nevada Total

# 
# 

# Deaths by Race (Country, State level)

This data comes from the Centers of Disease Control and Prevention (CDC) and is provided by the National Center for Health Statistics.

It contains aggregated death data based on Country, State, and Race.

__NOTE__: "The percent of deaths reported in this table are the total number of represent all deaths received and coded as of the date of analysis and do not represent all deaths that occurred in that period. Data are incomplete because of the lag in time between when the death occurred and when the death certificate is completed, submitted to NCHS and processed for reporting purposes. This delay can range from 1 week to 8 weeks or more, depending on the jurisdiction, age, and cause of death. Provisional counts reported here track approximately 1–2 weeks behind other published data sources on the number of COVID-19 deaths in the U.S. COVID-19 deaths are defined as having confirmed or presumed COVID-19, and are coded to ICD–10 code U07.1."

"Unweighted population percentages are based on the Single-Race Population Estimates from the U.S. Census Bureau, for the year 2018 (available from: https://wonder.cdc.gov/single-race-population.html )."

"Weighted population percentages are computed by multiplying county-level population counts by the count of COVID deaths for each county, summing to the state-level, and then estimating the percent of the population within each racial and ethnic group. These weighted population distributions therefore more accurately reflect the geographic locations where COVID outbreaks are occurring. Jurisdictions are included in this table if more than 100 deaths were received and processed by NCHS as of the data of analysis. 1. Race and Hispanic-origin categories are based on the 1997 Office of Management and Budget (OMB) standards (1,2), allowing for the presentation of data by single race and Hispanic origin. These race and Hispanic-origin groups—non-Hispanic single-race white, non-Hispanic single-race black or African American, non-Hispanic single-race American Indian or Alaska Native (AIAN), and non-Hispanic single-race Asian—differ from the bridged-race categories shown in most reports using mortality data. 2. Includes persons having origins in any of the original peoples of North and South America 3. Includes persons having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent. 4. Includes Native Hawaiian and Other Pacific Islander, more than one race, race unknown, and Hispanic origin unknown 5. Excludes New York City."

__NOTE__: One or more data cells have counts between 1–9 and have been suppressed in accordance with NCHS confidentiality standards.

Data can be found here:

* Centers for Disease Control and Prevention. *Provisional COVID-19 Death Counts by Sex, Age, and State*. 1 May 2020. <https://data.cdc.gov/NCHS/Provisional-Death-Counts-for-Coronavirus-Disease-C/pj7m-y5uh>

In [79]:
### Go grab data
!curl https://data.cdc.gov/api/views/pj7m-y5uh/rows.csv?accessType=DOWNLOAD --output data/race.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 22755    0 22755    0     0  64279      0 --:--:-- --:--:-- --:--:-- 64279


In [80]:
### Read in Data
race = pd.read_csv("data/race.csv")
race

Unnamed: 0,Data as of,State,Indicator,Non-Hispanic White,Non-Hispanic Black or African American,Non-Hispanic American Indian or Alaska Native,Non-Hispanic Asian,Hispanic or Latino,Other,Footnote
0,07/08/2020,United States,Count of COVID-19 deaths,60862.0,26426.0,888.0,5629.0,19409.0,1527.0,
1,07/08/2020,United States,Distribution of COVID-19 deaths (%),53.0,23.0,0.8,4.9,16.9,1.3,
2,07/08/2020,United States,Unweighted distribution of population (%),60.4,12.5,0.7,5.7,18.3,2.4,
3,07/08/2020,United States,Weighted distribution of population (%),42.1,17.0,0.3,10.6,28.1,1.9,
4,07/08/2020,Alabama,Count of COVID-19 deaths,479.0,460.0,,,21.0,,One or more data cells have counts between 1–9...
...,...,...,...,...,...,...,...,...,...,...
171,07/08/2020,Washington,Weighted distribution of population (%),60.9,6.1,0.7,16.9,10.8,4.5,
172,07/08/2020,Wisconsin,Count of COVID-19 deaths,444.0,175.0,,20.0,81.0,,One or more data cells have counts between 1–9...
173,07/08/2020,Wisconsin,Distribution of COVID-19 deaths (%),61.1,24.1,,2.8,11.1,,One or more data cells have counts between 1–9...
174,07/08/2020,Wisconsin,Unweighted distribution of population (%),81.1,6.4,0.9,3.0,6.9,1.7,


Drop Footnote

In [81]:
race = race.drop(columns = "Footnote")
race

Unnamed: 0,Data as of,State,Indicator,Non-Hispanic White,Non-Hispanic Black or African American,Non-Hispanic American Indian or Alaska Native,Non-Hispanic Asian,Hispanic or Latino,Other
0,07/08/2020,United States,Count of COVID-19 deaths,60862.0,26426.0,888.0,5629.0,19409.0,1527.0
1,07/08/2020,United States,Distribution of COVID-19 deaths (%),53.0,23.0,0.8,4.9,16.9,1.3
2,07/08/2020,United States,Unweighted distribution of population (%),60.4,12.5,0.7,5.7,18.3,2.4
3,07/08/2020,United States,Weighted distribution of population (%),42.1,17.0,0.3,10.6,28.1,1.9
4,07/08/2020,Alabama,Count of COVID-19 deaths,479.0,460.0,,,21.0,
...,...,...,...,...,...,...,...,...,...
171,07/08/2020,Washington,Weighted distribution of population (%),60.9,6.1,0.7,16.9,10.8,4.5
172,07/08/2020,Wisconsin,Count of COVID-19 deaths,444.0,175.0,,20.0,81.0,
173,07/08/2020,Wisconsin,Distribution of COVID-19 deaths (%),61.1,24.1,,2.8,11.1,
174,07/08/2020,Wisconsin,Unweighted distribution of population (%),81.1,6.4,0.9,3.0,6.9,1.7


Remove New York City and rename "New York < sup > 5 < /sup>" to New York

In [82]:
### Drop NYC.
NYCindex = list(race["State"][race["State"] == "New York City"].index)
race = race.drop(index = NYCindex)

### Rename New York<sup>5</sup> to New York.
NYindex = list(race["State"][race["State"] == "New York<sup>5</sup>"].index)
race["State"][NYindex] = "New York"

### Rename DC
DCindex = list(race["State"][race["State"] == "District of Columbia"].index)
race["State"][DCindex] = "DC"

race["State"].unique()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()


array(['United States', 'Alabama', 'Arizona', 'Arkansas', 'California',
       'Colorado', 'Connecticut', 'Delaware', 'DC', 'Florida', 'Georgia',
       'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana',
       'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota',
       'Mississippi', 'Missouri', 'Nebraska', 'Nevada', 'New Hampshire',
       'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'Ohio',
       'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island',
       'South Carolina', 'Tennessee', 'Texas', 'Utah', 'Virginia',
       'Washington', 'Wisconsin'], dtype=object)

In [83]:
race

Unnamed: 0,Data as of,State,Indicator,Non-Hispanic White,Non-Hispanic Black or African American,Non-Hispanic American Indian or Alaska Native,Non-Hispanic Asian,Hispanic or Latino,Other
0,07/08/2020,United States,Count of COVID-19 deaths,60862.0,26426.0,888.0,5629.0,19409.0,1527.0
1,07/08/2020,United States,Distribution of COVID-19 deaths (%),53.0,23.0,0.8,4.9,16.9,1.3
2,07/08/2020,United States,Unweighted distribution of population (%),60.4,12.5,0.7,5.7,18.3,2.4
3,07/08/2020,United States,Weighted distribution of population (%),42.1,17.0,0.3,10.6,28.1,1.9
4,07/08/2020,Alabama,Count of COVID-19 deaths,479.0,460.0,,,21.0,
...,...,...,...,...,...,...,...,...,...
171,07/08/2020,Washington,Weighted distribution of population (%),60.9,6.1,0.7,16.9,10.8,4.5
172,07/08/2020,Wisconsin,Count of COVID-19 deaths,444.0,175.0,,20.0,81.0,
173,07/08/2020,Wisconsin,Distribution of COVID-19 deaths (%),61.1,24.1,,2.8,11.1,
174,07/08/2020,Wisconsin,Unweighted distribution of population (%),81.1,6.4,0.9,3.0,6.9,1.7


To get the data into a more usable form, first split into separate dataframes by Indicator. Will merge later.

In [84]:
countDeaths = race[race["Indicator"] == "Count of COVID-19 deaths"]
distDeaths = race[race["Indicator"] == "Distribution of COVID-19 deaths (%)"]
unweightDeaths = race[race["Indicator"] == "Unweighted distribution of population (%)"]
weightDeaths = race[race["Indicator"] == "Weighted distribution of population (%)"]

#### Count of COVID-19 deaths

In [85]:
### Unpivot
countDeaths = pd.melt(countDeaths, id_vars = ["Data as of","State", "Indicator"],
       value_vars = countDeaths.columns[3:9],
       var_name = "Race", value_name = "Count of COVID-19 deaths")
countDeaths

Unnamed: 0,Data as of,State,Indicator,Race,Count of COVID-19 deaths
0,07/08/2020,United States,Count of COVID-19 deaths,Non-Hispanic White,60862.0
1,07/08/2020,Alabama,Count of COVID-19 deaths,Non-Hispanic White,479.0
2,07/08/2020,Arizona,Count of COVID-19 deaths,Non-Hispanic White,650.0
3,07/08/2020,Arkansas,Count of COVID-19 deaths,Non-Hispanic White,147.0
4,07/08/2020,California,Count of COVID-19 deaths,Non-Hispanic White,1723.0
...,...,...,...,...,...
253,07/08/2020,Texas,Count of COVID-19 deaths,Other,10.0
254,07/08/2020,Utah,Count of COVID-19 deaths,Other,
255,07/08/2020,Virginia,Count of COVID-19 deaths,Other,
256,07/08/2020,Washington,Count of COVID-19 deaths,Other,20.0


In [86]:
### Drop Indicator
countDeaths = countDeaths.drop(columns = "Indicator")
countDeaths

Unnamed: 0,Data as of,State,Race,Count of COVID-19 deaths
0,07/08/2020,United States,Non-Hispanic White,60862.0
1,07/08/2020,Alabama,Non-Hispanic White,479.0
2,07/08/2020,Arizona,Non-Hispanic White,650.0
3,07/08/2020,Arkansas,Non-Hispanic White,147.0
4,07/08/2020,California,Non-Hispanic White,1723.0
...,...,...,...,...
253,07/08/2020,Texas,Other,10.0
254,07/08/2020,Utah,Other,
255,07/08/2020,Virginia,Other,
256,07/08/2020,Washington,Other,20.0


#### Distribution of COVID-19 deaths (%)

In [87]:
### Unpivot
distDeaths = pd.melt(distDeaths, id_vars = ["Data as of","State", "Indicator"],
       value_vars = distDeaths.columns[3:9],
       var_name = "Race", value_name = "Distribution of COVID-19 deaths (%)")
distDeaths

Unnamed: 0,Data as of,State,Indicator,Race,Distribution of COVID-19 deaths (%)
0,07/08/2020,United States,Distribution of COVID-19 deaths (%),Non-Hispanic White,53.0
1,07/08/2020,Alabama,Distribution of COVID-19 deaths (%),Non-Hispanic White,49.4
2,07/08/2020,Arizona,Distribution of COVID-19 deaths (%),Non-Hispanic White,46.1
3,07/08/2020,Arkansas,Distribution of COVID-19 deaths (%),Non-Hispanic White,57.0
4,07/08/2020,California,Distribution of COVID-19 deaths (%),Non-Hispanic White,31.7
...,...,...,...,...,...
253,07/08/2020,Texas,Distribution of COVID-19 deaths (%),Other,0.4
254,07/08/2020,Utah,Distribution of COVID-19 deaths (%),Other,
255,07/08/2020,Virginia,Distribution of COVID-19 deaths (%),Other,
256,07/08/2020,Washington,Distribution of COVID-19 deaths (%),Other,1.8


In [88]:
### Drop Indicator
distDeaths = distDeaths.drop(columns = "Indicator")
distDeaths

Unnamed: 0,Data as of,State,Race,Distribution of COVID-19 deaths (%)
0,07/08/2020,United States,Non-Hispanic White,53.0
1,07/08/2020,Alabama,Non-Hispanic White,49.4
2,07/08/2020,Arizona,Non-Hispanic White,46.1
3,07/08/2020,Arkansas,Non-Hispanic White,57.0
4,07/08/2020,California,Non-Hispanic White,31.7
...,...,...,...,...
253,07/08/2020,Texas,Other,0.4
254,07/08/2020,Utah,Other,
255,07/08/2020,Virginia,Other,
256,07/08/2020,Washington,Other,1.8


#### Unweighted distribution of population (%)

In [89]:
### Unpivot
unweightDeaths = pd.melt(unweightDeaths, id_vars = ["Data as of","State", "Indicator"],
       value_vars = unweightDeaths.columns[3:9],
       var_name = "Race", value_name = "Unweighted distribution of population (%)")
unweightDeaths

Unnamed: 0,Data as of,State,Indicator,Race,Unweighted distribution of population (%)
0,07/08/2020,United States,Unweighted distribution of population (%),Non-Hispanic White,60.4
1,07/08/2020,Alabama,Unweighted distribution of population (%),Non-Hispanic White,65.4
2,07/08/2020,Arizona,Unweighted distribution of population (%),Non-Hispanic White,54.4
3,07/08/2020,Arkansas,Unweighted distribution of population (%),Non-Hispanic White,72.2
4,07/08/2020,California,Unweighted distribution of population (%),Non-Hispanic White,36.8
...,...,...,...,...,...
253,07/08/2020,Texas,Unweighted distribution of population (%),Other,1.6
254,07/08/2020,Utah,Unweighted distribution of population (%),Other,3.1
255,07/08/2020,Virginia,Unweighted distribution of population (%),Other,2.7
256,07/08/2020,Washington,Unweighted distribution of population (%),Other,4.8


In [90]:
### Drop Indicator
unweightDeaths = unweightDeaths.drop(columns = "Indicator")
unweightDeaths

Unnamed: 0,Data as of,State,Race,Unweighted distribution of population (%)
0,07/08/2020,United States,Non-Hispanic White,60.4
1,07/08/2020,Alabama,Non-Hispanic White,65.4
2,07/08/2020,Arizona,Non-Hispanic White,54.4
3,07/08/2020,Arkansas,Non-Hispanic White,72.2
4,07/08/2020,California,Non-Hispanic White,36.8
...,...,...,...,...
253,07/08/2020,Texas,Other,1.6
254,07/08/2020,Utah,Other,3.1
255,07/08/2020,Virginia,Other,2.7
256,07/08/2020,Washington,Other,4.8


#### Weighted distribution of population (%)

In [91]:
### Unpivot
weightDeaths = pd.melt(weightDeaths, id_vars = ["Data as of","State", "Indicator"],
       value_vars = weightDeaths.columns[3:9],
       var_name = "Race", value_name = "Weighted distribution of population (%)")
weightDeaths

Unnamed: 0,Data as of,State,Indicator,Race,Weighted distribution of population (%)
0,07/08/2020,United States,Weighted distribution of population (%),Non-Hispanic White,42.1
1,07/08/2020,Alabama,Weighted distribution of population (%),Non-Hispanic White,53.1
2,07/08/2020,Arizona,Weighted distribution of population (%),Non-Hispanic White,54.7
3,07/08/2020,Arkansas,Weighted distribution of population (%),Non-Hispanic White,59.6
4,07/08/2020,California,Weighted distribution of population (%),Non-Hispanic White,28.1
...,...,...,...,...,...
253,07/08/2020,Texas,Weighted distribution of population (%),Other,1.5
254,07/08/2020,Utah,Weighted distribution of population (%),Other,2.3
255,07/08/2020,Virginia,Weighted distribution of population (%),Other,3.2
256,07/08/2020,Washington,Weighted distribution of population (%),Other,4.5


In [92]:
### Drop Indicator
weightDeaths = weightDeaths.drop(columns = "Indicator")
weightDeaths

Unnamed: 0,Data as of,State,Race,Weighted distribution of population (%)
0,07/08/2020,United States,Non-Hispanic White,42.1
1,07/08/2020,Alabama,Non-Hispanic White,53.1
2,07/08/2020,Arizona,Non-Hispanic White,54.7
3,07/08/2020,Arkansas,Non-Hispanic White,59.6
4,07/08/2020,California,Non-Hispanic White,28.1
...,...,...,...,...
253,07/08/2020,Texas,Other,1.5
254,07/08/2020,Utah,Other,2.3
255,07/08/2020,Virginia,Other,3.2
256,07/08/2020,Washington,Other,4.5


Now merge all of them together.

In [93]:
raceNew = countDeaths.merge(distDeaths, how = "inner", on = ["Data as of", "State", "Race"])
raceNew = raceNew.merge(unweightDeaths, how = "inner", on = ["Data as of", "State", "Race"])
raceNew = raceNew.merge(weightDeaths, how = "inner", on = ["Data as of", "State", "Race"])
raceNew

Unnamed: 0,Data as of,State,Race,Count of COVID-19 deaths,Distribution of COVID-19 deaths (%),Unweighted distribution of population (%),Weighted distribution of population (%)
0,07/08/2020,United States,Non-Hispanic White,60862.0,53.0,60.4,42.1
1,07/08/2020,Alabama,Non-Hispanic White,479.0,49.4,65.4,53.1
2,07/08/2020,Arizona,Non-Hispanic White,650.0,46.1,54.4,54.7
3,07/08/2020,Arkansas,Non-Hispanic White,147.0,57.0,72.2,59.6
4,07/08/2020,California,Non-Hispanic White,1723.0,31.7,36.8,28.1
...,...,...,...,...,...,...,...
253,07/08/2020,Texas,Other,10.0,0.4,1.6,1.5
254,07/08/2020,Utah,Other,,,3.1,2.3
255,07/08/2020,Virginia,Other,,,2.7,3.2
256,07/08/2020,Washington,Other,20.0,1.8,4.8,4.5


# 
# 

# Hospitalization Estimates (Country, State level)

This data comes from the CDC’s National Healthcare Safety Network (NHSN). It enables hospitals to report:
* Current inpatient and intensive care unit (ICU) bed occupancy.
* Healthcare worker staffing.
* Personal protective equipment (PPE) supply status and availability.

Reporting is currently available to all U.S. acute care hospitals, critical access hospitals, inpatient rehabilitation facilities, inpatient psychiatric facilities, and long-term acute care hospitals.

__Important__: "Statistical methods were used to generate estimates of patient impact and hospital capacity measures that are representative at the national level. The estimates are based on data submitted by acute care hospitals to the NHSN COVID-19 Module. The statistical methods include weighting (to account for non-response) and multiple imputation (to account for missing data). The estimates (number and percentage) are shown along with 95% confidence intervals that reflect the statistical error that is primarily due to non-response."


__NOTE__: This data was submitted directly to CDC’s National Healthcare Safety Network (NHSN) and does not include data submitted to other entities contracted by or within the federal government.

The data can be found here: https://www.cdc.gov/nhsn/covid19/report-overview.html#anchor_1590010579051

In [94]:
### Go grab data
!curl https://www.cdc.gov/nhsn/pdfs/covid19/covid19-NatEst.csv --output data/hospital.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  576k    0  576k    0     0  2880k      0 --:--:-- --:--:-- --:--:-- 2880k


In [95]:
### Load in data
hospital = pd.read_csv("data/hospital.csv")
hospital

Unnamed: 0,state,statename,collectionDate,InpatBeds_Occ_AnyPat_Est,InpatBeds_Occ_AnyPat_LoCI,InpatBeds_Occ_AnyPat_UpCI,InpatBeds_Occ_AnyPat_Est_Avail,InBedsOccAnyPat__Numbeds_Est,InBedsOccAnyPat__Numbeds_LoCI,InBedsOccAnyPat__Numbeds_UpCI,...,InBedsOccCOVID__Numbeds_LoCI,InBedsOccCOVID__Numbeds_UpCI,ICUBeds_Occ_AnyPat_Est,ICUBeds_Occ_AnyPat_LoCI,ICUBeds_Occ_AnyPat_UpCI,ICUBeds_Occ_AnyPat_Est_Avail,ICUBedsOccAnyPat__N_ICUBeds_Est,ICUBedsOccAnyPat__N_ICUBeds_LoCI,ICUBedsOccAnyPat__N_ICUBeds_UpCI,Notes
0,Two-letter state abbreviation,State name,Day for which estimate is made,"Hospital inpatient bed occupancy, estimate","Hospital inpatient bed occupancy, lower 95% CI","Hospital inpatient bed occupancy, upper 95% CI","Hospital inpatient beds available, estimate","Hospital inpatient bed occupancy, percent esti...","Hospital inpatient bed occupancy, lower 95% CI...","Hospital inpatient bed occupancy, upper 95% CI...",...,Number of patients in an inpatient care locati...,Number of patients in an inpatient care locati...,"ICU bed occupancy, estimate","ICU bed occupancy, lower 95% CI","ICU bed occupancy, upper 95% CI","ICU beds available, estimate","ICU bed occupancy, percent estimate (percent o...","ICU bed occupancy, lower 95% CI (percent of IC...","ICU bed occupancy, upper 95% CI (percent of IC...",This file contains National and State represen...
1,US,United States,01APR2020,416064,380186,451942,350555,54.3,52.5,56.0,...,8.6,11.0,66369,56770,75968,45110,59.5,55.8,63.2,These estimates are based on data retrieved on...
2,US,United States,02APR2020,422892,391381,454403,357231,54.2,52.7,55.7,...,8.7,10.9,69385,60557,78214,45784,60.2,57.1,63.4,Statistical methods were used to generate esti...
3,US,United States,03APR2020,408938,382065,435810,364108,52.9,51.2,54.6,...,8.9,11.1,70580,61067,80092,45788,60.7,57.3,64.0,The estimates are based on data submitted by a...
4,US,United States,04APR2020,398850,374147,423554,375854,51.5,50.0,53.0,...,9.2,11.4,70134,62054,78215,47622,59.6,56.6,62.5,The statistical methods include weighting (to ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5028,WY,Wyoming,03JUL2020,434,82,785,489,47.0,34.8,59.1,...,0.4,2.6,57,0,123,6,90.3,50.7,100.0,
5029,WY,Wyoming,04JUL2020,450,86,813,475,48.7,35.7,61.6,...,0.0,3.2,54,0,112,9,85.7,47.9,100.0,
5030,WY,Wyoming,05JUL2020,438,69,808,494,47.0,33.5,60.5,...,0.0,3.1,41,0,86,21,65.7,26.4,100.0,
5031,WY,Wyoming,06JUL2020,408,77,740,514,44.3,31.1,57.4,...,0.2,3.8,42,0,91,21,67.7,29.1,100.0,


In [96]:
### Drop the Notes & state columns and the first row.
hospital = hospital.drop(columns = ["state", "Notes"])
hospital = hospital.drop(index = 0)
hospital = hospital.reset_index(drop = True)
hospital

Unnamed: 0,statename,collectionDate,InpatBeds_Occ_AnyPat_Est,InpatBeds_Occ_AnyPat_LoCI,InpatBeds_Occ_AnyPat_UpCI,InpatBeds_Occ_AnyPat_Est_Avail,InBedsOccAnyPat__Numbeds_Est,InBedsOccAnyPat__Numbeds_LoCI,InBedsOccAnyPat__Numbeds_UpCI,InpatBeds_Occ_COVID_Est,...,InBedsOccCOVID__Numbeds_Est,InBedsOccCOVID__Numbeds_LoCI,InBedsOccCOVID__Numbeds_UpCI,ICUBeds_Occ_AnyPat_Est,ICUBeds_Occ_AnyPat_LoCI,ICUBeds_Occ_AnyPat_UpCI,ICUBeds_Occ_AnyPat_Est_Avail,ICUBedsOccAnyPat__N_ICUBeds_Est,ICUBedsOccAnyPat__N_ICUBeds_LoCI,ICUBedsOccAnyPat__N_ICUBeds_UpCI
0,United States,01APR2020,416064,380186,451942,350555,54.3,52.5,56.0,75104,...,9.8,8.6,11.0,66369,56770,75968,45110,59.5,55.8,63.2
1,United States,02APR2020,422892,391381,454403,357231,54.2,52.7,55.7,76546,...,9.8,8.7,10.9,69385,60557,78214,45784,60.2,57.1,63.4
2,United States,03APR2020,408938,382065,435810,364108,52.9,51.2,54.6,77122,...,10.0,8.9,11.1,70580,61067,80092,45788,60.7,57.3,64.0
3,United States,04APR2020,398850,374147,423554,375854,51.5,50.0,53.0,79742,...,10.3,9.2,11.4,70134,62054,78215,47622,59.6,56.6,62.5
4,United States,05APR2020,400937,376016,425858,381724,51.2,49.8,52.6,80287,...,10.3,9.1,11.4,69853,61615,78091,48620,59.0,55.6,62.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5027,Wyoming,03JUL2020,434,82,785,489,47.0,34.8,59.1,14,...,1.5,0.4,2.6,57,0,123,6,90.3,50.7,100.0
5028,Wyoming,04JUL2020,450,86,813,475,48.7,35.7,61.6,12,...,1.3,0.0,3.2,54,0,112,9,85.7,47.9,100.0
5029,Wyoming,05JUL2020,438,69,808,494,47.0,33.5,60.5,14,...,1.5,0.0,3.1,41,0,86,21,65.7,26.4,100.0
5030,Wyoming,06JUL2020,408,77,740,514,44.3,31.1,57.4,18,...,2.0,0.2,3.8,42,0,91,21,67.7,29.1,100.0


In [97]:
### Rename columns
hospital = hospital.rename(columns = {'statename' : "State", 
                                      'collectionDate': "Date"})
hospital

Unnamed: 0,State,Date,InpatBeds_Occ_AnyPat_Est,InpatBeds_Occ_AnyPat_LoCI,InpatBeds_Occ_AnyPat_UpCI,InpatBeds_Occ_AnyPat_Est_Avail,InBedsOccAnyPat__Numbeds_Est,InBedsOccAnyPat__Numbeds_LoCI,InBedsOccAnyPat__Numbeds_UpCI,InpatBeds_Occ_COVID_Est,...,InBedsOccCOVID__Numbeds_Est,InBedsOccCOVID__Numbeds_LoCI,InBedsOccCOVID__Numbeds_UpCI,ICUBeds_Occ_AnyPat_Est,ICUBeds_Occ_AnyPat_LoCI,ICUBeds_Occ_AnyPat_UpCI,ICUBeds_Occ_AnyPat_Est_Avail,ICUBedsOccAnyPat__N_ICUBeds_Est,ICUBedsOccAnyPat__N_ICUBeds_LoCI,ICUBedsOccAnyPat__N_ICUBeds_UpCI
0,United States,01APR2020,416064,380186,451942,350555,54.3,52.5,56.0,75104,...,9.8,8.6,11.0,66369,56770,75968,45110,59.5,55.8,63.2
1,United States,02APR2020,422892,391381,454403,357231,54.2,52.7,55.7,76546,...,9.8,8.7,10.9,69385,60557,78214,45784,60.2,57.1,63.4
2,United States,03APR2020,408938,382065,435810,364108,52.9,51.2,54.6,77122,...,10.0,8.9,11.1,70580,61067,80092,45788,60.7,57.3,64.0
3,United States,04APR2020,398850,374147,423554,375854,51.5,50.0,53.0,79742,...,10.3,9.2,11.4,70134,62054,78215,47622,59.6,56.6,62.5
4,United States,05APR2020,400937,376016,425858,381724,51.2,49.8,52.6,80287,...,10.3,9.1,11.4,69853,61615,78091,48620,59.0,55.6,62.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5027,Wyoming,03JUL2020,434,82,785,489,47.0,34.8,59.1,14,...,1.5,0.4,2.6,57,0,123,6,90.3,50.7,100.0
5028,Wyoming,04JUL2020,450,86,813,475,48.7,35.7,61.6,12,...,1.3,0.0,3.2,54,0,112,9,85.7,47.9,100.0
5029,Wyoming,05JUL2020,438,69,808,494,47.0,33.5,60.5,14,...,1.5,0.0,3.1,41,0,86,21,65.7,26.4,100.0
5030,Wyoming,06JUL2020,408,77,740,514,44.3,31.1,57.4,18,...,2.0,0.2,3.8,42,0,91,21,67.7,29.1,100.0


In [98]:
### Convert Date into datetime
hospital = hospital.astype({"Date" : "datetime64"})
hospital

Unnamed: 0,State,Date,InpatBeds_Occ_AnyPat_Est,InpatBeds_Occ_AnyPat_LoCI,InpatBeds_Occ_AnyPat_UpCI,InpatBeds_Occ_AnyPat_Est_Avail,InBedsOccAnyPat__Numbeds_Est,InBedsOccAnyPat__Numbeds_LoCI,InBedsOccAnyPat__Numbeds_UpCI,InpatBeds_Occ_COVID_Est,...,InBedsOccCOVID__Numbeds_Est,InBedsOccCOVID__Numbeds_LoCI,InBedsOccCOVID__Numbeds_UpCI,ICUBeds_Occ_AnyPat_Est,ICUBeds_Occ_AnyPat_LoCI,ICUBeds_Occ_AnyPat_UpCI,ICUBeds_Occ_AnyPat_Est_Avail,ICUBedsOccAnyPat__N_ICUBeds_Est,ICUBedsOccAnyPat__N_ICUBeds_LoCI,ICUBedsOccAnyPat__N_ICUBeds_UpCI
0,United States,2020-04-01,416064,380186,451942,350555,54.3,52.5,56.0,75104,...,9.8,8.6,11.0,66369,56770,75968,45110,59.5,55.8,63.2
1,United States,2020-04-02,422892,391381,454403,357231,54.2,52.7,55.7,76546,...,9.8,8.7,10.9,69385,60557,78214,45784,60.2,57.1,63.4
2,United States,2020-04-03,408938,382065,435810,364108,52.9,51.2,54.6,77122,...,10.0,8.9,11.1,70580,61067,80092,45788,60.7,57.3,64.0
3,United States,2020-04-04,398850,374147,423554,375854,51.5,50.0,53.0,79742,...,10.3,9.2,11.4,70134,62054,78215,47622,59.6,56.6,62.5
4,United States,2020-04-05,400937,376016,425858,381724,51.2,49.8,52.6,80287,...,10.3,9.1,11.4,69853,61615,78091,48620,59.0,55.6,62.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5027,Wyoming,2020-07-03,434,82,785,489,47.0,34.8,59.1,14,...,1.5,0.4,2.6,57,0,123,6,90.3,50.7,100.0
5028,Wyoming,2020-07-04,450,86,813,475,48.7,35.7,61.6,12,...,1.3,0.0,3.2,54,0,112,9,85.7,47.9,100.0
5029,Wyoming,2020-07-05,438,69,808,494,47.0,33.5,60.5,14,...,1.5,0.0,3.1,41,0,86,21,65.7,26.4,100.0
5030,Wyoming,2020-07-06,408,77,740,514,44.3,31.1,57.4,18,...,2.0,0.2,3.8,42,0,91,21,67.7,29.1,100.0


In [99]:
### Remove Puerto Rico 
PRindex = list(hospital["State"][hospital["State"] == "Puerto Rico"].index)
hospital = hospital.drop(index = PRindex)

### Rename DC
DCindex = list(hospital["State"][hospital["State"] == "District of Columbia"].index)
hospital["State"][DCindex] = "DC"

hospital["State"].unique()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


array(['United States', 'Alaska', 'Alabama', 'Arkansas', 'Arizona',
       'California', 'Colorado', 'Connecticut', 'DC', 'Delaware',
       'Florida', 'Georgia', 'Hawaii', 'Iowa', 'Idaho', 'Illinois',
       'Indiana', 'Kansas', 'Kentucky', 'Louisiana', 'Massachusetts',
       'Maryland', 'Maine', 'Michigan', 'Minnesota', 'Missouri',
       'Mississippi', 'Montana', 'North Carolina', 'North Dakota',
       'Nebraska', 'New Hampshire', 'New Jersey', 'New Mexico', 'Nevada',
       'New York', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania',
       'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee',
       'Texas', 'Utah', 'Virginia', 'Vermont', 'Washington', 'Wisconsin',
       'West Virginia', 'Wyoming'], dtype=object)

### Final Data

Save datasets to CSVs.

In [108]:
CountyData.to_csv("data/countyData.csv", index = False)
StateData.to_csv("data/stateData.csv", index = False)
USAData.to_csv("data/usaData.csv", index = False)
DeathsSexAge.to_csv("data/demoDeaths.csv", index = False)
raceNew.to_csv("data/raceDeaths.csv", index = False)
hospital.to_csv("data/hospitalData.csv", index = False)
GoogleUsaMobility.to_csv('data/GoogleUsaMobility.csv', index = False)
GoogleStateMobility.to_csv('data/GoogleStateMobility.csv', index = False)
GoogleCountyMobility.to_csv('data/GoogleCountyMobility.csv', index = False)