#  COVID-19 – Data-Based Prediction Tool 

## Ian Scarff (iie728)

## Practicum II Project 2020

### Import Packages

In [1]:
import numpy as np
import pandas as pd

# Covid-19 Data Preprocessing

### About the Data

Data comes from the website usafacts.org under the webpage 
"Coronavirus Locations: COVID-19 Map by County and State."

The 21 cases confirmed on the Grand Princess cruise ship on March 5 and 6 are attributed to the state of California, but not to any counties. The national numbers also include the 45 people with coronavirus repatriated from the Diamond Princess.

USAFacts attempts to match each case with a county, but some cases counted at the state level are not allocated to counties due to lack of information.

Data is updated each day.


NOTES FROM USAFacts:

Note from April 28: On April 14, New York City began a separate count of "probable deaths" of people believed to have died as a result of COVID-19, though weren't tested. On April 28, these deaths were retroactively added to our death counts, assigned to a New York City borough if possible. In the future, USAFacts will include "probable deaths" in the overall tally if a local government chooses to report that information separately.

Note from April 18: Certain states have changed their methodology in reporting deaths due to COVID-19. As a result, we are holding off on reporting death data in a few key states (New York is notable among these states due to the high number of confirmed cases and deaths). USAFacts is committed to providing official numbers confirmed by state or local health agencies, and we will appropriately backfill the death data when we receive more guidance from the CDC and relevant health departments.

Note from April 15: In certain states, probable deaths are listed alongside confirmed deaths. Following the lead of the CDC, we will begin publishing death counts that combine these two totals where applicable; this might result in larger than expected increases in deaths in certain counties.

Note from March 28: The data now includes all counties regardless of confirmed case count. Additionally, New York City data has been allotted to its five boroughs/counties, where possible.



##### There is no missing data.

#### Import Data

To unsure that we always have a copy of the data saved in the environment, every time the data is imported it will be saved.

In [2]:
### Number of confirmed cases by county
!curl https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_confirmed_usafacts.csv --output data/cases.csv

### Number of confirmed deaths by county
!curl https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_deaths_usafacts.csv --output data/deaths.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1646k  100 1646k    0     0  1807k      0 --:--:-- --:--:-- --:--:-- 1805k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1324k  100 1324k    0     0  1761k      0 --:--:-- --:--:-- --:--:-- 1759k


The labeling for counties in the population dataset were unreliable.

Created seperate population dataset with naming convention that matches other data frames.

Now load those datasets.

In [3]:
### Total Cases
cases = pd.read_csv("data/cases.csv")

odd = "Unnamed: " + str(len(cases.columns) - 1)

if (cases.columns[-1] == odd):
    cases = cases.drop(columns = cases.columns[-1])

cases

Unnamed: 0,countyFIPS,County Name,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,7/15/20,7/16/20,7/17/20,7/18/20,7/19/20,7/20/20,7/21/20,7/22/20,7/23/20,7/24/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,756,780,789,827,842,857,865,886,905,921
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,1518,1599,1689,1819,1937,2013,2102,2196,2461,2513
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,441,459,463,483,495,503,514,518,534,539
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,242,247,255,264,269,279,283,287,289,303
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,157,158,162,171,173,181,194,201,206,210
3191,56039,Teton County,WY,56,0,0,0,0,0,0,...,170,181,187,197,208,223,226,234,254,264
3192,56041,Uinta County,WY,56,0,0,0,0,0,0,...,208,217,219,221,221,222,223,224,227,232
3193,56043,Washakie County,WY,56,0,0,0,0,0,0,...,43,43,43,43,44,44,44,45,45,45


In [4]:
### Total Deaths
deaths = pd.read_csv("data/deaths.csv")

if (cases.columns[-1] == odd):
    deaths = deaths.drop(columns = deaths.columns[-1])

deaths

Unnamed: 0,countyFIPS,County Name,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,7/15/20,7/16/20,7/17/20,7/18/20,7/19/20,7/20/20,7/21/20,7/22/20,7/23/20,7/24/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,19,20,21,21,21,21,21,21,21,21
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,13,14,14,15,15,15,16,16,17,18
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,3,3,3,3,3,4,4,4,4,4
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,2,2,2,2,2,2,2,2,2,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,1,2,2,2,2,2,2,2,2,2
3191,56039,Teton County,WY,56,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
3192,56041,Uinta County,WY,56,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3193,56043,Washakie County,WY,56,0,0,0,0,0,0,...,5,5,5,5,5,5,5,5,5,5


In [5]:
### Total Population
population = pd.read_csv("data/population.csv")
population

Unnamed: 0,County Name,Population,countyFIPS
0,Statewide Unallocated,0,0
1,Autauga,55869,1001
2,Baldwin,223234,1003
3,Barbour,24686,1005
4,Bibb,22394,1007
...,...,...,...
3138,Sweetwater,42343,56037
3139,Teton,23464,56039
3140,Uinta,20226,56041
3141,Washakie,7805,56043


### Fixing Errors
In the cases and deaths dataframes, certain obervations need to be removed.

1: Wade Hampton Census Area, Alaska. This area no longer exists. Was renamed to Kusilvak Census Area.

2: New York City Unallocated/Probable. This is not a county. Observations for the NYC area are covered by the 5 counties of the metropolitan area.

3: Grand Princess Cruise Ship. This is a cruise ship, not a county, and these cases are attributed to California.

<br>

In addition, for the purposes of the dashboard, drop the Aleutians West Census Area.

In [6]:
#### County Data

### Remove Wade Hampton Area
cases = cases.drop(list(cases[cases["County Name"] == "Wade Hampton Census Area"].index))

### New York City Unallocated/Probable
cases = cases.drop(list(cases[cases["County Name"] == "New York City Unallocated/Probable"].index))

### Remove Grand Princess Cruise Ship
cases = cases.drop(list(cases[cases["County Name"] == "Grand Princess Cruise Ship"].index))

### Remove Aleutians West Census Area
cases = cases.drop(list(cases[cases["County Name"] == "Aleutians West Census Area"].index))


#### Deaths Data
### Remove Wade Hampton Area
deaths = deaths.drop(list(deaths[deaths["County Name"] == "Wade Hampton Census Area"].index))

### New York City Unallocated/Probable
deaths = deaths.drop(list(deaths[deaths["County Name"] == "New York City Unallocated/Probable"].index))

### Remove Grand Princess Cruise Ship
deaths = deaths.drop(list(deaths[deaths["County Name"] == "Grand Princess Cruise Ship"].index))

### Remove Aleutians West Census Area
deaths = deaths.drop(list(deaths[deaths["County Name"] == "Aleutians West Census Area"].index))

In [7]:
cases = cases.rename(columns = {"State" : "StateABV"})
cases

Unnamed: 0,countyFIPS,County Name,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,7/15/20,7/16/20,7/17/20,7/18/20,7/19/20,7/20/20,7/21/20,7/22/20,7/23/20,7/24/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,756,780,789,827,842,857,865,886,905,921
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,1518,1599,1689,1819,1937,2013,2102,2196,2461,2513
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,441,459,463,483,495,503,514,518,534,539
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,242,247,255,264,269,279,283,287,289,303
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,157,158,162,171,173,181,194,201,206,210
3191,56039,Teton County,WY,56,0,0,0,0,0,0,...,170,181,187,197,208,223,226,234,254,264
3192,56041,Uinta County,WY,56,0,0,0,0,0,0,...,208,217,219,221,221,222,223,224,227,232
3193,56043,Washakie County,WY,56,0,0,0,0,0,0,...,43,43,43,43,44,44,44,45,45,45


In [8]:
deaths = deaths.rename(columns = {"State" : "StateABV"})
deaths

Unnamed: 0,countyFIPS,County Name,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,7/15/20,7/16/20,7/17/20,7/18/20,7/19/20,7/20/20,7/21/20,7/22/20,7/23/20,7/24/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,19,20,21,21,21,21,21,21,21,21
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,13,14,14,15,15,15,16,16,17,18
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,3,3,3,3,3,4,4,4,4,4
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,2,2,2,2,2,2,2,2,2,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,1,2,2,2,2,2,2,2,2,2
3191,56039,Teton County,WY,56,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
3192,56041,Uinta County,WY,56,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3193,56043,Washakie County,WY,56,0,0,0,0,0,0,...,5,5,5,5,5,5,5,5,5,5


### Prep Data

#### Ensuring Labels

To ensure that county and state labels are the same across dataframes, replace them with labels in FIPS.csv

Bring in FIPS data

In [9]:
### County FIPS
countyFIPS = pd.read_csv("data/countyFIPS.csv")
countyFIPS

Unnamed: 0,County Name,countyFIPS
0,Statewide Unallocated,0
1,Autauga,1001
2,Baldwin,1003
3,Barbour,1005
4,Bibb,1007
...,...,...
3138,Sweetwater,56037
3139,Teton,56039
3140,Uinta,56041
3141,Washakie,56043


In [10]:
### State FIPS
stateFIPS = pd.read_csv("data/stateFIPS.csv")
stateFIPS

Unnamed: 0,State,stateFIPS
0,Alabama,1
1,Alaska,2
2,Arizona,4
3,Arkansas,5
4,California,6
5,Colorado,8
6,Connecticut,9
7,Delaware,10
8,DC,11
9,Florida,12


##### Fixing Cases Labels

In [11]:
### Drop cases county labels
cases = cases.drop(columns = "County Name")
cases

Unnamed: 0,countyFIPS,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,7/15/20,7/16/20,7/17/20,7/18/20,7/19/20,7/20/20,7/21/20,7/22/20,7/23/20,7/24/20
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,AL,1,0,0,0,0,0,0,0,...,756,780,789,827,842,857,865,886,905,921
2,1003,AL,1,0,0,0,0,0,0,0,...,1518,1599,1689,1819,1937,2013,2102,2196,2461,2513
3,1005,AL,1,0,0,0,0,0,0,0,...,441,459,463,483,495,503,514,518,534,539
4,1007,AL,1,0,0,0,0,0,0,0,...,242,247,255,264,269,279,283,287,289,303
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,WY,56,0,0,0,0,0,0,0,...,157,158,162,171,173,181,194,201,206,210
3191,56039,WY,56,0,0,0,0,0,0,0,...,170,181,187,197,208,223,226,234,254,264
3192,56041,WY,56,0,0,0,0,0,0,0,...,208,217,219,221,221,222,223,224,227,232
3193,56043,WY,56,0,0,0,0,0,0,0,...,43,43,43,43,44,44,44,45,45,45


In [12]:
### Add County Name from countyFIPS
cases = cases.merge(countyFIPS, how = "left")
cases

Unnamed: 0,countyFIPS,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,7/16/20,7/17/20,7/18/20,7/19/20,7/20/20,7/21/20,7/22/20,7/23/20,7/24/20,County Name
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Statewide Unallocated
1,1001,AL,1,0,0,0,0,0,0,0,...,780,789,827,842,857,865,886,905,921,Autauga
2,1003,AL,1,0,0,0,0,0,0,0,...,1599,1689,1819,1937,2013,2102,2196,2461,2513,Baldwin
3,1005,AL,1,0,0,0,0,0,0,0,...,459,463,483,495,503,514,518,534,539,Barbour
4,1007,AL,1,0,0,0,0,0,0,0,...,247,255,264,269,279,283,287,289,303,Bibb
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3186,56037,WY,56,0,0,0,0,0,0,0,...,158,162,171,173,181,194,201,206,210,Sweetwater
3187,56039,WY,56,0,0,0,0,0,0,0,...,181,187,197,208,223,226,234,254,264,Teton
3188,56041,WY,56,0,0,0,0,0,0,0,...,217,219,221,221,222,223,224,227,232,Uinta
3189,56043,WY,56,0,0,0,0,0,0,0,...,43,43,43,44,44,44,45,45,45,Washakie


In [13]:
### Add State names from stateFIPS
cases = cases.merge(stateFIPS, how = "left")
cases

Unnamed: 0,countyFIPS,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,7/17/20,7/18/20,7/19/20,7/20/20,7/21/20,7/22/20,7/23/20,7/24/20,County Name,State
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,Statewide Unallocated,Alabama
1,1001,AL,1,0,0,0,0,0,0,0,...,789,827,842,857,865,886,905,921,Autauga,Alabama
2,1003,AL,1,0,0,0,0,0,0,0,...,1689,1819,1937,2013,2102,2196,2461,2513,Baldwin,Alabama
3,1005,AL,1,0,0,0,0,0,0,0,...,463,483,495,503,514,518,534,539,Barbour,Alabama
4,1007,AL,1,0,0,0,0,0,0,0,...,255,264,269,279,283,287,289,303,Bibb,Alabama
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3186,56037,WY,56,0,0,0,0,0,0,0,...,162,171,173,181,194,201,206,210,Sweetwater,Wyoming
3187,56039,WY,56,0,0,0,0,0,0,0,...,187,197,208,223,226,234,254,264,Teton,Wyoming
3188,56041,WY,56,0,0,0,0,0,0,0,...,219,221,221,222,223,224,227,232,Uinta,Wyoming
3189,56043,WY,56,0,0,0,0,0,0,0,...,43,43,44,44,44,45,45,45,Washakie,Wyoming


##### Fixing Deaths Labels

In [14]:
### Drop deaths county labels
deaths = deaths.drop(columns = "County Name")
deaths

Unnamed: 0,countyFIPS,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,7/15/20,7/16/20,7/17/20,7/18/20,7/19/20,7/20/20,7/21/20,7/22/20,7/23/20,7/24/20
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,AL,1,0,0,0,0,0,0,0,...,19,20,21,21,21,21,21,21,21,21
2,1003,AL,1,0,0,0,0,0,0,0,...,13,14,14,15,15,15,16,16,17,18
3,1005,AL,1,0,0,0,0,0,0,0,...,3,3,3,3,3,4,4,4,4,4
4,1007,AL,1,0,0,0,0,0,0,0,...,2,2,2,2,2,2,2,2,2,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,WY,56,0,0,0,0,0,0,0,...,1,2,2,2,2,2,2,2,2,2
3191,56039,WY,56,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
3192,56041,WY,56,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3193,56043,WY,56,0,0,0,0,0,0,0,...,5,5,5,5,5,5,5,5,5,5


In [15]:
### Add County Name from countyFIPS
deaths = deaths.merge(countyFIPS, how = "left")
deaths

Unnamed: 0,countyFIPS,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,7/16/20,7/17/20,7/18/20,7/19/20,7/20/20,7/21/20,7/22/20,7/23/20,7/24/20,County Name
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Statewide Unallocated
1,1001,AL,1,0,0,0,0,0,0,0,...,20,21,21,21,21,21,21,21,21,Autauga
2,1003,AL,1,0,0,0,0,0,0,0,...,14,14,15,15,15,16,16,17,18,Baldwin
3,1005,AL,1,0,0,0,0,0,0,0,...,3,3,3,3,4,4,4,4,4,Barbour
4,1007,AL,1,0,0,0,0,0,0,0,...,2,2,2,2,2,2,2,2,2,Bibb
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3186,56037,WY,56,0,0,0,0,0,0,0,...,2,2,2,2,2,2,2,2,2,Sweetwater
3187,56039,WY,56,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,Teton
3188,56041,WY,56,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Uinta
3189,56043,WY,56,0,0,0,0,0,0,0,...,5,5,5,5,5,5,5,5,5,Washakie


In [16]:
### Add State names from stateFIPS
deaths = deaths.merge(stateFIPS, how = "left")
deaths

Unnamed: 0,countyFIPS,StateABV,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,7/17/20,7/18/20,7/19/20,7/20/20,7/21/20,7/22/20,7/23/20,7/24/20,County Name,State
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,Statewide Unallocated,Alabama
1,1001,AL,1,0,0,0,0,0,0,0,...,21,21,21,21,21,21,21,21,Autauga,Alabama
2,1003,AL,1,0,0,0,0,0,0,0,...,14,15,15,15,16,16,17,18,Baldwin,Alabama
3,1005,AL,1,0,0,0,0,0,0,0,...,3,3,3,4,4,4,4,4,Barbour,Alabama
4,1007,AL,1,0,0,0,0,0,0,0,...,2,2,2,2,2,2,2,2,Bibb,Alabama
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3186,56037,WY,56,0,0,0,0,0,0,0,...,2,2,2,2,2,2,2,2,Sweetwater,Wyoming
3187,56039,WY,56,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,Teton,Wyoming
3188,56041,WY,56,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,Uinta,Wyoming
3189,56043,WY,56,0,0,0,0,0,0,0,...,5,5,5,5,5,5,5,5,Washakie,Wyoming


##### Fixing Population Labels

In [17]:
### Drop population county and state labels
population = population.drop(columns = "County Name")
population

Unnamed: 0,Population,countyFIPS
0,0,0
1,55869,1001
2,223234,1003
3,24686,1005
4,22394,1007
...,...,...
3138,42343,56037
3139,23464,56039
3140,20226,56041
3141,7805,56043


In [18]:
### Add County Name from countyFIPS
population = population.merge(countyFIPS, how = "left")
population

Unnamed: 0,Population,countyFIPS,County Name
0,0,0,Statewide Unallocated
1,55869,1001,Autauga
2,223234,1003,Baldwin
3,24686,1005,Barbour
4,22394,1007,Bibb
...,...,...,...
3138,42343,56037,Sweetwater
3139,23464,56039,Teton
3140,20226,56041,Uinta
3141,7805,56043,Washakie


Turns out that the “Statewide Unallocated” data means that those measurements are correct, they just haven’t been assigned a county due to lack of information. 

Leave these observations out of the county dataframe, but included them in creating state dataframe.

#### County Level Data

The cases and deaths data is in a less usable form.

Unpivot the data using pd.melt to make the data more usable.

In [19]:
### Unpivot cases data
cases = pd.melt(cases, id_vars = ['County Name', "State", "StateABV", "countyFIPS", "stateFIPS"],
                 value_vars = cases.columns[3:-2],
                 var_name = "Date", value_name = "Cases")

cases

Unnamed: 0,County Name,State,StateABV,countyFIPS,stateFIPS,Date,Cases
0,Statewide Unallocated,Alabama,AL,0,1,1/22/20,0
1,Autauga,Alabama,AL,1001,1,1/22/20,0
2,Baldwin,Alabama,AL,1003,1,1/22/20,0
3,Barbour,Alabama,AL,1005,1,1/22/20,0
4,Bibb,Alabama,AL,1007,1,1/22/20,0
...,...,...,...,...,...,...,...
590330,Sweetwater,Wyoming,WY,56037,56,7/24/20,210
590331,Teton,Wyoming,WY,56039,56,7/24/20,264
590332,Uinta,Wyoming,WY,56041,56,7/24/20,232
590333,Washakie,Wyoming,WY,56043,56,7/24/20,45


In [20]:
### Unpivot death data
deaths = pd.melt(deaths, id_vars = ['County Name', "State", "StateABV", "countyFIPS", "stateFIPS"],
                 value_vars = list(deaths.columns[3:-2]),
                 var_name = "Date", value_name = "Deaths")

deaths

Unnamed: 0,County Name,State,StateABV,countyFIPS,stateFIPS,Date,Deaths
0,Statewide Unallocated,Alabama,AL,0,1,1/22/20,0
1,Autauga,Alabama,AL,1001,1,1/22/20,0
2,Baldwin,Alabama,AL,1003,1,1/22/20,0
3,Barbour,Alabama,AL,1005,1,1/22/20,0
4,Bibb,Alabama,AL,1007,1,1/22/20,0
...,...,...,...,...,...,...,...
590330,Sweetwater,Wyoming,WY,56037,56,7/24/20,2
590331,Teton,Wyoming,WY,56039,56,7/24/20,1
590332,Uinta,Wyoming,WY,56041,56,7/24/20,0
590333,Washakie,Wyoming,WY,56043,56,7/24/20,5


Combine cases and deaths into one data frame.

In [21]:
### Merge dataframes
cases_deaths = cases.merge(deaths, on = ["State", "StateABV", "County Name", "Date", "countyFIPS", "stateFIPS"])
cases_deaths

Unnamed: 0,County Name,State,StateABV,countyFIPS,stateFIPS,Date,Cases,Deaths
0,Statewide Unallocated,Alabama,AL,0,1,1/22/20,0,0
1,Autauga,Alabama,AL,1001,1,1/22/20,0,0
2,Baldwin,Alabama,AL,1003,1,1/22/20,0,0
3,Barbour,Alabama,AL,1005,1,1/22/20,0,0
4,Bibb,Alabama,AL,1007,1,1/22/20,0,0
...,...,...,...,...,...,...,...,...
590330,Sweetwater,Wyoming,WY,56037,56,7/24/20,210,2
590331,Teton,Wyoming,WY,56039,56,7/24/20,264,1
590332,Uinta,Wyoming,WY,56041,56,7/24/20,232,0
590333,Washakie,Wyoming,WY,56043,56,7/24/20,45,5


Add population to cases_deaths.

In [22]:
### Merge dataframes
cases_deaths = cases_deaths.merge(population, on = ["countyFIPS","County Name"], how = "left")

### Sort
cases_deaths = cases_deaths.astype({"Date" : "datetime64"})
cases_deaths = cases_deaths.sort_values(["State","County Name","Date"], ascending = [True, True, True])


### Rename population and cases
cases_deaths = cases_deaths.rename(columns = {"Cases" : "Total Cases",
                                              "Deaths" : "Total Deaths"})

cases_deaths = cases_deaths.reset_index().drop(columns = "index")
cases_deaths

Unnamed: 0,County Name,State,StateABV,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population
0,Autauga,Alabama,AL,1001,1,2020-01-22,0,0,55869
1,Autauga,Alabama,AL,1001,1,2020-01-23,0,0,55869
2,Autauga,Alabama,AL,1001,1,2020-01-24,0,0,55869
3,Autauga,Alabama,AL,1001,1,2020-01-25,0,0,55869
4,Autauga,Alabama,AL,1001,1,2020-01-26,0,0,55869
...,...,...,...,...,...,...,...,...,...
590330,Weston,Wyoming,WY,56045,56,2020-07-20,4,0,6927
590331,Weston,Wyoming,WY,56045,56,2020-07-21,4,0,6927
590332,Weston,Wyoming,WY,56045,56,2020-07-22,4,0,6927
590333,Weston,Wyoming,WY,56045,56,2020-07-23,4,0,6927


Change data types for County Name and State.

In [23]:
cases_deaths.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 590335 entries, 0 to 590334
Data columns (total 9 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   County Name   590335 non-null  object        
 1   State         590335 non-null  object        
 2   StateABV      590335 non-null  object        
 3   countyFIPS    590335 non-null  int64         
 4   stateFIPS     590335 non-null  int64         
 5   Date          590335 non-null  datetime64[ns]
 6   Total Cases   590335 non-null  int64         
 7   Total Deaths  590335 non-null  int64         
 8   Population    590335 non-null  int64         
dtypes: datetime64[ns](1), int64(5), object(3)
memory usage: 40.5+ MB


In [24]:
cases_deaths = cases_deaths.astype({"County Name" : "category",
                                    "State" : "category",
                                    "countyFIPS" : "str",
                                    "stateFIPS" : "str"})
cases_deaths.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 590335 entries, 0 to 590334
Data columns (total 9 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   County Name   590335 non-null  category      
 1   State         590335 non-null  category      
 2   StateABV      590335 non-null  object        
 3   countyFIPS    590335 non-null  object        
 4   stateFIPS     590335 non-null  object        
 5   Date          590335 non-null  datetime64[ns]
 6   Total Cases   590335 non-null  int64         
 7   Total Deaths  590335 non-null  int64         
 8   Population    590335 non-null  int64         
dtypes: category(2), datetime64[ns](1), int64(3), object(3)
memory usage: 33.3+ MB


##### Fixing countyFIPS labels

The first 6 states (Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut) have countyFIPS codes that need to start with 0.

Extract the first 6 states.

In [25]:
### First six states end where DC begins
firstSix = cases_deaths[:list(cases_deaths["countyFIPS"][cases_deaths["State"] == "DC"].index)[0]]
firstSix

Unnamed: 0,County Name,State,StateABV,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population
0,Autauga,Alabama,AL,1001,1,2020-01-22,0,0,55869
1,Autauga,Alabama,AL,1001,1,2020-01-23,0,0,55869
2,Autauga,Alabama,AL,1001,1,2020-01-24,0,0,55869
3,Autauga,Alabama,AL,1001,1,2020-01-25,0,0,55869
4,Autauga,Alabama,AL,1001,1,2020-01-26,0,0,55869
...,...,...,...,...,...,...,...,...,...
59565,Windham,Connecticut,CT,9015,9,2020-07-20,652,15,116782
59566,Windham,Connecticut,CT,9015,9,2020-07-21,653,15,116782
59567,Windham,Connecticut,CT,9015,9,2020-07-22,655,15,116782
59568,Windham,Connecticut,CT,9015,9,2020-07-23,655,15,116782


Fix FIPS codes.

In [26]:
### Create a new column with the fixed FIPS codes
firstSix.insert(2,"countyFIPS2", '0' + firstSix["countyFIPS"])
firstSix

Unnamed: 0,County Name,State,countyFIPS2,StateABV,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population
0,Autauga,Alabama,01001,AL,1001,1,2020-01-22,0,0,55869
1,Autauga,Alabama,01001,AL,1001,1,2020-01-23,0,0,55869
2,Autauga,Alabama,01001,AL,1001,1,2020-01-24,0,0,55869
3,Autauga,Alabama,01001,AL,1001,1,2020-01-25,0,0,55869
4,Autauga,Alabama,01001,AL,1001,1,2020-01-26,0,0,55869
...,...,...,...,...,...,...,...,...,...,...
59565,Windham,Connecticut,09015,CT,9015,9,2020-07-20,652,15,116782
59566,Windham,Connecticut,09015,CT,9015,9,2020-07-21,653,15,116782
59567,Windham,Connecticut,09015,CT,9015,9,2020-07-22,655,15,116782
59568,Windham,Connecticut,09015,CT,9015,9,2020-07-23,655,15,116782


In [27]:
### Drop the old FIPS codes and rename the new FIPS codes column
firstSix = firstSix.drop(columns = "countyFIPS")
firstSix = firstSix.rename(columns = {"countyFIPS2" : "countyFIPS"})
firstSix

Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869
...,...,...,...,...,...,...,...,...,...
59565,Windham,Connecticut,09015,CT,9,2020-07-20,652,15,116782
59566,Windham,Connecticut,09015,CT,9,2020-07-21,653,15,116782
59567,Windham,Connecticut,09015,CT,9,2020-07-22,655,15,116782
59568,Windham,Connecticut,09015,CT,9,2020-07-23,655,15,116782


Now drop the first six states in cases_deaths and stack firstSix on top.

In [28]:
firstSixIndex = np.arange(start = 0, stop = list(cases_deaths["countyFIPS"][cases_deaths["State"] == "DC"].index)[0])
cases_deaths = cases_deaths.drop(firstSixIndex)
cases_deaths

Unnamed: 0,County Name,State,StateABV,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population
59570,Washington,DC,DC,11001,11,2020-01-22,0,0,705749
59571,Washington,DC,DC,11001,11,2020-01-23,0,0,705749
59572,Washington,DC,DC,11001,11,2020-01-24,0,0,705749
59573,Washington,DC,DC,11001,11,2020-01-25,0,0,705749
59574,Washington,DC,DC,11001,11,2020-01-26,0,0,705749
...,...,...,...,...,...,...,...,...,...
590330,Weston,Wyoming,WY,56045,56,2020-07-20,4,0,6927
590331,Weston,Wyoming,WY,56045,56,2020-07-21,4,0,6927
590332,Weston,Wyoming,WY,56045,56,2020-07-22,4,0,6927
590333,Weston,Wyoming,WY,56045,56,2020-07-23,4,0,6927


In [29]:
cases_deaths = pd.concat([firstSix,cases_deaths])
cases_deaths

Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869
...,...,...,...,...,...,...,...,...,...
590330,Weston,Wyoming,56045,WY,56,2020-07-20,4,0,6927
590331,Weston,Wyoming,56045,WY,56,2020-07-21,4,0,6927
590332,Weston,Wyoming,56045,WY,56,2020-07-22,4,0,6927
590333,Weston,Wyoming,56045,WY,56,2020-07-23,4,0,6927


In [30]:
cases_deaths.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 590335 entries, 0 to 590334
Data columns (total 9 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   County Name   590335 non-null  category      
 1   State         590335 non-null  category      
 2   countyFIPS    590335 non-null  object        
 3   StateABV      590335 non-null  object        
 4   stateFIPS     590335 non-null  object        
 5   Date          590335 non-null  datetime64[ns]
 6   Total Cases   590335 non-null  int64         
 7   Total Deaths  590335 non-null  int64         
 8   Population    590335 non-null  int64         
dtypes: category(2), datetime64[ns](1), int64(3), object(3)
memory usage: 37.8+ MB


Now make a new data frame without "Statewide Unallocated."

In [31]:
cases_deaths2 = cases_deaths[cases_deaths["County Name"] != "Statewide Unallocated"]
cases_deaths2 = cases_deaths2.reset_index()
cases_deaths2 = cases_deaths2.drop(columns = "index")
cases_deaths2

Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869
...,...,...,...,...,...,...,...,...,...
581080,Weston,Wyoming,56045,WY,56,2020-07-20,4,0,6927
581081,Weston,Wyoming,56045,WY,56,2020-07-21,4,0,6927
581082,Weston,Wyoming,56045,WY,56,2020-07-22,4,0,6927
581083,Weston,Wyoming,56045,WY,56,2020-07-23,4,0,6927


### State Level Data

Now create a data frame that summarizes the data for each state.

In [32]:
### First for Alabama
### Aggregate data
StateData = cases_deaths[cases_deaths['State'] == "Alabama"].groupby("Date").agg(
        TotalCases = pd.NamedAgg(column = "Total Cases", aggfunc = sum),
        TotalDeaths = pd.NamedAgg(column = "Total Deaths", aggfunc = sum),
        Population = pd.NamedAgg(column = "Population", aggfunc = sum))

### Make a vector of the state and its FIPS
state = np.repeat("Alabama", len(cases_deaths["Date"].unique()))
stateABV = np.repeat("AL", len(cases_deaths["Date"].unique()))
statefips = np.repeat('1', len(cases_deaths["Date"].unique()))

### Grab dates
date = cases_deaths["Date"].unique()

### Insert into State Data
StateData.insert(0, "stateFIPS", statefips)
StateData.insert(0, "StateABV", stateABV)
StateData.insert(0, "State", state)
StateData.insert(0, "Date", date)

### Now the rest
for state, fipsNum, stateABV in zip(cases_deaths["State"].unique()[1:], cases_deaths["stateFIPS"].unique()[1:], 
                                    cases_deaths["StateABV"].unique()[1:]) :
    ### Aggregate data
    myStateData = cases_deaths[cases_deaths['State'] == state].groupby("Date").agg(
        TotalCases = pd.NamedAgg(column = "Total Cases", aggfunc = sum),
        TotalDeaths = pd.NamedAgg(column = "Total Deaths", aggfunc = sum),
        Population = pd.NamedAgg(column = "Population", aggfunc = sum))
    
    ### Make a vector of the state/fips and grab dates
    mystate = np.repeat(state, len(cases_deaths["Date"].unique()))
    mystateABV = np.repeat(stateABV, len(cases_deaths["Date"].unique()))
    mystatefips = np.repeat(fipsNum, len(cases_deaths["Date"].unique()))
    mydate = cases_deaths["Date"].unique()
    
    ### Insert data
    myStateData.insert(0, "stateFIPS", mystatefips)
    myStateData.insert(0, "StateABV", mystateABV)
    myStateData.insert(0, "State", state)
    myStateData.insert(0, "Date", date)
    
    ### Stack state datas
    StateData = pd.concat([StateData, myStateData])

### Reset indicies
StateData = StateData.set_index(np.arange(0,len(StateData)))

StateData

Unnamed: 0,Date,State,StateABV,stateFIPS,TotalCases,TotalDeaths,Population
0,2020-01-22,Alabama,AL,1,0,0,4903185
1,2020-01-23,Alabama,AL,1,0,0,4903185
2,2020-01-24,Alabama,AL,1,0,0,4903185
3,2020-01-25,Alabama,AL,1,0,0,4903185
4,2020-01-26,Alabama,AL,1,0,0,4903185
...,...,...,...,...,...,...,...
9430,2020-07-20,Wyoming,WY,56,2187,24,578759
9431,2020-07-21,Wyoming,WY,56,2237,25,578759
9432,2020-07-22,Wyoming,WY,56,2287,25,578759
9433,2020-07-23,Wyoming,WY,56,2347,25,578759


### USA Level Data

Now create a data set for the USA.

In [33]:
### First for date
### Aggregate data
USAData = StateData[StateData['Date'] == StateData["Date"].unique()[0]].groupby("Date").agg(
        TotalCases = pd.NamedAgg(column = "TotalCases", aggfunc = sum),
        TotalDeaths = pd.NamedAgg(column = "TotalDeaths", aggfunc = sum),
        Population = pd.NamedAgg(column = "Population", aggfunc = sum))

### Insert into usaData
USAData.insert(0, "Date", StateData["Date"].unique()[0])
USAData.insert(0, "Country", "United States")


### For the rest of dates
for day in StateData["Date"].unique()[1:]:
    ### Aggregate data
    myUSAData = StateData[StateData['Date'] == day].groupby("Date").agg(
        TotalCases = pd.NamedAgg(column = "TotalCases", aggfunc = sum),
        TotalDeaths = pd.NamedAgg(column = "TotalDeaths", aggfunc = sum),
        Population = pd.NamedAgg(column = "Population", aggfunc = sum))
        
    ### Insert date into data
    myUSAData.insert(0, "Date", day)
    myUSAData.insert(0, "Country", "United States")
    
    ### Stack state datas
    USAData = pd.concat([USAData, myUSAData])
    
    

### Reset indicies
USAData = USAData.set_index(np.arange(0,len(USAData)))

USAData

Unnamed: 0,Country,Date,TotalCases,TotalDeaths,Population
0,United States,2020-01-22,1,0,328233889
1,United States,2020-01-23,1,0,328233889
2,United States,2020-01-24,2,0,328233889
3,United States,2020-01-25,2,0,328233889
4,United States,2020-01-26,5,0,328233889
...,...,...,...,...,...
180,United States,2020-07-20,3804232,139775,328233889
181,United States,2020-07-21,3870905,140866,328233889
182,United States,2020-07-22,3938265,141952,328233889
183,United States,2020-07-23,4006771,143116,328233889


The final Total Cases & Total Deaths nubers are only a bit off. Give or take 100. This is due to removal of data points earlier.

Fix column names in State data and USA data.

In [34]:
### Rename Columns
StateData = StateData.rename(columns = {"TotalCases" : "Total Cases",
                                        "TotalDeaths" : "Total Deaths"})
USAData = USAData.rename(columns = {"TotalCases" : "Total Cases",
                                        "TotalDeaths" : "Total Deaths"})

### New Cases & New Deaths

Use multiprocessing to: 
1) Calculate the number of new cases each day.

2) Calculate the number of new deaths each day.

In [35]:
### Import Pool from multiprocessing
from multiprocessing import Pool

#### County Data

In [36]:
### Create a parallelizing function
def parallel1(data, func, n_cores = 25):
    ### Split data by state into 25 sections
    splits = np.array_split(data["State"].unique(), 25)
    
    ### Create empty list
    data_split = []
    
    ### Add each split dataframe to the list
    for i in range(25):
        data_split.append(data[data["State"].isin(list(splits[i]))])
    
    ### Run 
    pool = Pool(n_cores)
    data1 = pd.concat(pool.map(func, data_split))
    pool.close()
    pool.join()
    return data1

In [37]:
### Define function to create new cases data
def newCases1(data):
    changeInCases = []
    ### For each state.
    for state in data["State"].unique():
        ### For each county in the state
        for county in data["County Name"][data["State"] == state].unique():
            ### Calculate diff in case for each day, keep first day
            changeInCases.extend(np.diff(data["Total Cases"][(data["County Name"] == county) &
                                                                         (data["State"] == state)],
                                             prepend = data["Total Cases"][(data["County Name"] == county) &
                                                                         (data["State"] == state)].iloc[0]))
    ### Add to data
    data["New Cases"] = changeInCases

    return data

In [38]:
cases_deaths2 = parallel1(cases_deaths2, newCases1)
cases_deaths2

Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869,0
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869,0
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869,0
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869,0
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869,0
...,...,...,...,...,...,...,...,...,...,...
581080,Weston,Wyoming,56045,WY,56,2020-07-20,4,0,6927,0
581081,Weston,Wyoming,56045,WY,56,2020-07-21,4,0,6927,0
581082,Weston,Wyoming,56045,WY,56,2020-07-22,4,0,6927,0
581083,Weston,Wyoming,56045,WY,56,2020-07-23,4,0,6927,0


In [39]:
### Define function to create new deaths data
def newDeaths1(data):
    changeInDeaths = []
    ### For each state.
    for state in data["State"].unique():
        ### For each county in the state
        for county in data["County Name"][data["State"] == state].unique():
            ### Calculate diff in case for each day, keep first day
            changeInDeaths.extend(np.diff(data["Total Deaths"][(data["County Name"] == county) &
                                                                           (data["State"] == state)],
                                             prepend = data["Total Deaths"][(data["County Name"] == county) &
                                                                           (data["State"] == state)].iloc[0]))
            
    ### Add to data
    data["New Deaths"] = changeInDeaths
        
    return data

In [40]:
cases_deaths2 = parallel1(cases_deaths2, newDeaths1)
cases_deaths2

Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869,0,0
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869,0,0
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869,0,0
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869,0,0
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869,0,0
...,...,...,...,...,...,...,...,...,...,...,...
581080,Weston,Wyoming,56045,WY,56,2020-07-20,4,0,6927,0,0
581081,Weston,Wyoming,56045,WY,56,2020-07-21,4,0,6927,0,0
581082,Weston,Wyoming,56045,WY,56,2020-07-22,4,0,6927,0,0
581083,Weston,Wyoming,56045,WY,56,2020-07-23,4,0,6927,0,0


### State Data

In [41]:
### Create a parallelizing function
def parallel2(data, func, n_cores = 25):
    ### Split data by state into 25 sections
    splits = np.array_split(data["State"].unique(), 25)
    
    ### Create empty list
    data_split = []
    
    ### Add each split dataframe to the list
    for i in range(25):
        data_split.append(data[data["State"].isin(list(splits[i]))])
    
    pool = Pool(n_cores)
    data1 = pd.concat(pool.map(func, data_split))
    pool.close()
    pool.join()
    return data1

In [42]:
### Define function to create new cases data
def newCases2(data):
    changeInCases = []
    ### For each state.
    for state in data["State"].unique():
        ### Calculate diff in case for each day, keep first day
        changeInCases.extend(np.diff(data["Total Cases"][data["State"] == state],
                                         prepend = data["Total Cases"][data["State"] == state].iloc[0]))
    ### Add to data
    data["New Cases"] = changeInCases

    return data

In [43]:
StateData = parallel2(StateData, newCases2)
StateData

Unnamed: 0,Date,State,StateABV,stateFIPS,Total Cases,Total Deaths,Population,New Cases
0,2020-01-22,Alabama,AL,1,0,0,4903185,0
1,2020-01-23,Alabama,AL,1,0,0,4903185,0
2,2020-01-24,Alabama,AL,1,0,0,4903185,0
3,2020-01-25,Alabama,AL,1,0,0,4903185,0
4,2020-01-26,Alabama,AL,1,0,0,4903185,0
...,...,...,...,...,...,...,...,...
9430,2020-07-20,Wyoming,WY,56,2187,24,578759,61
9431,2020-07-21,Wyoming,WY,56,2237,25,578759,50
9432,2020-07-22,Wyoming,WY,56,2287,25,578759,50
9433,2020-07-23,Wyoming,WY,56,2347,25,578759,60


In [44]:
### Define function to create new deaths data
def newDeaths2(data):
    changeInDeaths = []
    ### For each state.
    for state in data["State"].unique():
        ### Calculate diff in case for each day, keep first day
        changeInDeaths.extend(np.diff(data["Total Deaths"][data["State"] == state],
                                         prepend = data["Total Deaths"][data["State"] == state].iloc[0]))
            
    ### Add to data
    data["New Deaths"] = changeInDeaths
        
    return data

In [45]:
StateData = parallel2(StateData, newDeaths2)
StateData

Unnamed: 0,Date,State,StateABV,stateFIPS,Total Cases,Total Deaths,Population,New Cases,New Deaths
0,2020-01-22,Alabama,AL,1,0,0,4903185,0,0
1,2020-01-23,Alabama,AL,1,0,0,4903185,0,0
2,2020-01-24,Alabama,AL,1,0,0,4903185,0,0
3,2020-01-25,Alabama,AL,1,0,0,4903185,0,0
4,2020-01-26,Alabama,AL,1,0,0,4903185,0,0
...,...,...,...,...,...,...,...,...,...
9430,2020-07-20,Wyoming,WY,56,2187,24,578759,61,0
9431,2020-07-21,Wyoming,WY,56,2237,25,578759,50,1
9432,2020-07-22,Wyoming,WY,56,2287,25,578759,50,0
9433,2020-07-23,Wyoming,WY,56,2347,25,578759,60,0


### USA Data

In [46]:
### New Cases
USAData["New Cases"] = abs(np.diff(USAData["Total Cases"], prepend = USAData["Total Cases"].iloc[0]))

### New Deaths
USAData["New Deaths"] = abs(np.diff(USAData["Total Deaths"], prepend = USAData["Total Deaths"].iloc[0]))

USAData

Unnamed: 0,Country,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths
0,United States,2020-01-22,1,0,328233889,0,0
1,United States,2020-01-23,1,0,328233889,0,0
2,United States,2020-01-24,2,0,328233889,1,0
3,United States,2020-01-25,2,0,328233889,0,0
4,United States,2020-01-26,5,0,328233889,3,0
...,...,...,...,...,...,...,...
180,United States,2020-07-20,3804232,139775,328233889,58785,445
181,United States,2020-07-21,3870905,140866,328233889,66673,1091
182,United States,2020-07-22,3938265,141952,328233889,67360,1086
183,United States,2020-07-23,4006771,143116,328233889,68506,1164


### Proportions

County data.

In [47]:
### Percent of population that have cases.
cases_deaths2["%Cases"] = np.where(cases_deaths2["Population"] != 0,
                                   round((cases_deaths2["Total Cases"] / cases_deaths2["Population"]) * 100, 3),
                                   0)

### Percent of population that have died.
cases_deaths2["%Deaths"] = np.where(cases_deaths2["Population"] != 0,
                                    round((cases_deaths2["Total Deaths"] / cases_deaths2["Population"]) * 100, 3),
                                    0)

cases_deaths2

Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869,0,0,0.000,0.0
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869,0,0,0.000,0.0
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869,0,0,0.000,0.0
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869,0,0,0.000,0.0
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869,0,0,0.000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
581080,Weston,Wyoming,56045,WY,56,2020-07-20,4,0,6927,0,0,0.058,0.0
581081,Weston,Wyoming,56045,WY,56,2020-07-21,4,0,6927,0,0,0.058,0.0
581082,Weston,Wyoming,56045,WY,56,2020-07-22,4,0,6927,0,0,0.058,0.0
581083,Weston,Wyoming,56045,WY,56,2020-07-23,4,0,6927,0,0,0.058,0.0


State data.

In [48]:
### Percent of population that have cases.
StateData["%Cases"] = np.where(StateData["Population"] != 0,
                               round((StateData["Total Cases"] / StateData["Population"]) * 100, 3),
                               0)

### Percent of population that have died.
StateData["%Deaths"] = np.where(StateData["Population"] != 0,
                                round((StateData["Total Deaths"] / StateData["Population"]) * 100, 3),
                                0)

StateData

Unnamed: 0,Date,State,StateABV,stateFIPS,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths
0,2020-01-22,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000
1,2020-01-23,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000
2,2020-01-24,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000
3,2020-01-25,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000
4,2020-01-26,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000
...,...,...,...,...,...,...,...,...,...,...,...
9430,2020-07-20,Wyoming,WY,56,2187,24,578759,61,0,0.378,0.004
9431,2020-07-21,Wyoming,WY,56,2237,25,578759,50,1,0.387,0.004
9432,2020-07-22,Wyoming,WY,56,2287,25,578759,50,0,0.395,0.004
9433,2020-07-23,Wyoming,WY,56,2347,25,578759,60,0,0.406,0.004


Country data.

In [49]:
### Percent of population that have cases.
USAData["%Cases"] = np.where(USAData["Population"] != 0,
                             round((USAData["Total Cases"] / USAData["Population"]) * 100, 3),
                             0)

### Percent of population that have died.
USAData["%Deaths"] = np.where(USAData["Population"] != 0,
                              round((USAData["Total Deaths"] / USAData["Population"]) * 100, 3),
                              0)

USAData

Unnamed: 0,Country,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths
0,United States,2020-01-22,1,0,328233889,0,0,0.000,0.000
1,United States,2020-01-23,1,0,328233889,0,0,0.000,0.000
2,United States,2020-01-24,2,0,328233889,1,0,0.000,0.000
3,United States,2020-01-25,2,0,328233889,0,0,0.000,0.000
4,United States,2020-01-26,5,0,328233889,3,0,0.000,0.000
...,...,...,...,...,...,...,...,...,...
180,United States,2020-07-20,3804232,139775,328233889,58785,445,1.159,0.043
181,United States,2020-07-21,3870905,140866,328233889,66673,1091,1.179,0.043
182,United States,2020-07-22,3938265,141952,328233889,67360,1086,1.200,0.043
183,United States,2020-07-23,4006771,143116,328233889,68506,1164,1.221,0.044


### Logarithmic Scales

County data.

In [50]:
cases_deaths2["log(Total Cases)"] = round(np.log(cases_deaths2["Total Cases"]), 3)

cases_deaths2["log(Total Deaths)"] = round(np.log(cases_deaths2["Total Deaths"]), 3)

cases_deaths2["log(New Cases)"] = round(np.log(cases_deaths2["New Cases"]), 3)

cases_deaths2["log(New Deaths)"] = round(np.log(cases_deaths2["New Deaths"]), 3)

cases_deaths2

  result = getattr(ufunc, method)(*inputs, **kwargs)
  result = getattr(ufunc, method)(*inputs, **kwargs)


Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths,log(Total Cases),log(Total Deaths),log(New Cases),log(New Deaths)
0,Autauga,Alabama,01001,AL,1,2020-01-22,0,0,55869,0,0,0.000,0.0,-inf,-inf,-inf,-inf
1,Autauga,Alabama,01001,AL,1,2020-01-23,0,0,55869,0,0,0.000,0.0,-inf,-inf,-inf,-inf
2,Autauga,Alabama,01001,AL,1,2020-01-24,0,0,55869,0,0,0.000,0.0,-inf,-inf,-inf,-inf
3,Autauga,Alabama,01001,AL,1,2020-01-25,0,0,55869,0,0,0.000,0.0,-inf,-inf,-inf,-inf
4,Autauga,Alabama,01001,AL,1,2020-01-26,0,0,55869,0,0,0.000,0.0,-inf,-inf,-inf,-inf
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
581080,Weston,Wyoming,56045,WY,56,2020-07-20,4,0,6927,0,0,0.058,0.0,1.386,-inf,-inf,-inf
581081,Weston,Wyoming,56045,WY,56,2020-07-21,4,0,6927,0,0,0.058,0.0,1.386,-inf,-inf,-inf
581082,Weston,Wyoming,56045,WY,56,2020-07-22,4,0,6927,0,0,0.058,0.0,1.386,-inf,-inf,-inf
581083,Weston,Wyoming,56045,WY,56,2020-07-23,4,0,6927,0,0,0.058,0.0,1.386,-inf,-inf,-inf


State data.

In [51]:
StateData["log(Total Cases)"] = round(np.log(StateData["Total Cases"]), 3)

StateData["log(Total Deaths)"] = round(np.log(StateData["Total Deaths"]), 3)

StateData["log(New Cases)"] = round(np.log(StateData["New Cases"]), 3)

StateData["log(New Deaths)"] = round(np.log(StateData["New Deaths"]), 3)

StateData

  result = getattr(ufunc, method)(*inputs, **kwargs)
  result = getattr(ufunc, method)(*inputs, **kwargs)


Unnamed: 0,Date,State,StateABV,stateFIPS,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths,log(Total Cases),log(Total Deaths),log(New Cases),log(New Deaths)
0,2020-01-22,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
1,2020-01-23,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
2,2020-01-24,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
3,2020-01-25,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
4,2020-01-26,Alabama,AL,1,0,0,4903185,0,0,0.000,0.000,-inf,-inf,-inf,-inf
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9430,2020-07-20,Wyoming,WY,56,2187,24,578759,61,0,0.378,0.004,7.690,3.178,4.111,-inf
9431,2020-07-21,Wyoming,WY,56,2237,25,578759,50,1,0.387,0.004,7.713,3.219,3.912,0.0
9432,2020-07-22,Wyoming,WY,56,2287,25,578759,50,0,0.395,0.004,7.735,3.219,3.912,-inf
9433,2020-07-23,Wyoming,WY,56,2347,25,578759,60,0,0.406,0.004,7.761,3.219,4.094,-inf


Country data.

In [52]:
USAData["log(Total Cases)"] = round(np.log(USAData["Total Cases"]), 3)

USAData["log(Total Deaths)"] = round(np.log(USAData["Total Deaths"]), 3)

USAData["log(New Cases)"] = round(np.log(USAData["New Cases"]), 3)

USAData["log(New Deaths)"] = round(np.log(USAData["New Deaths"]), 3)

USAData

  result = getattr(ufunc, method)(*inputs, **kwargs)


Unnamed: 0,Country,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths,log(Total Cases),log(Total Deaths),log(New Cases),log(New Deaths)
0,United States,2020-01-22,1,0,328233889,0,0,0.000,0.000,0.000,-inf,-inf,-inf
1,United States,2020-01-23,1,0,328233889,0,0,0.000,0.000,0.000,-inf,-inf,-inf
2,United States,2020-01-24,2,0,328233889,1,0,0.000,0.000,0.693,-inf,0.000,-inf
3,United States,2020-01-25,2,0,328233889,0,0,0.000,0.000,0.693,-inf,-inf,-inf
4,United States,2020-01-26,5,0,328233889,3,0,0.000,0.000,1.609,-inf,1.099,-inf
...,...,...,...,...,...,...,...,...,...,...,...,...,...
180,United States,2020-07-20,3804232,139775,328233889,58785,445,1.159,0.043,15.152,11.848,10.982,6.098
181,United States,2020-07-21,3870905,140866,328233889,66673,1091,1.179,0.043,15.169,11.856,11.108,6.995
182,United States,2020-07-22,3938265,141952,328233889,67360,1086,1.200,0.043,15.186,11.863,11.118,6.990
183,United States,2020-07-23,4006771,143116,328233889,68506,1164,1.221,0.044,15.203,11.871,11.135,7.060


### Finalize Cases & Deaths Data

Change data types in State and USA data.

In [53]:
StateData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9435 entries, 0 to 9434
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Date               9435 non-null   datetime64[ns]
 1   State              9435 non-null   object        
 2   StateABV           9435 non-null   object        
 3   stateFIPS          9435 non-null   object        
 4   Total Cases        9435 non-null   int64         
 5   Total Deaths       9435 non-null   int64         
 6   Population         9435 non-null   int64         
 7   New Cases          9435 non-null   int64         
 8   New Deaths         9435 non-null   int64         
 9   %Cases             9435 non-null   float64       
 10  %Deaths            9435 non-null   float64       
 11  log(Total Cases)   9435 non-null   float64       
 12  log(Total Deaths)  9435 non-null   float64       
 13  log(New Cases)     9431 non-null   float64       
 14  log(New 

In [54]:
StateData = StateData.astype({"State" : "category",
                              "stateFIPS" : "str"})
StateData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9435 entries, 0 to 9434
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Date               9435 non-null   datetime64[ns]
 1   State              9435 non-null   category      
 2   StateABV           9435 non-null   object        
 3   stateFIPS          9435 non-null   object        
 4   Total Cases        9435 non-null   int64         
 5   Total Deaths       9435 non-null   int64         
 6   Population         9435 non-null   int64         
 7   New Cases          9435 non-null   int64         
 8   New Deaths         9435 non-null   int64         
 9   %Cases             9435 non-null   float64       
 10  %Deaths            9435 non-null   float64       
 11  log(Total Cases)   9435 non-null   float64       
 12  log(Total Deaths)  9435 non-null   float64       
 13  log(New Cases)     9431 non-null   float64       
 14  log(New 

In [55]:
USAData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 185 entries, 0 to 184
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Country            185 non-null    object        
 1   Date               185 non-null    datetime64[ns]
 2   Total Cases        185 non-null    int64         
 3   Total Deaths       185 non-null    int64         
 4   Population         185 non-null    int64         
 5   New Cases          185 non-null    int64         
 6   New Deaths         185 non-null    int64         
 7   %Cases             185 non-null    float64       
 8   %Deaths            185 non-null    float64       
 9   log(Total Cases)   185 non-null    float64       
 10  log(Total Deaths)  185 non-null    float64       
 11  log(New Cases)     185 non-null    float64       
 12  log(New Deaths)    185 non-null    float64       
dtypes: datetime64[ns](1), float64(6), int64(5), object(1)
memory usag

In [56]:
USAData = USAData.astype({"Country" : "category"})
USAData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 185 entries, 0 to 184
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Country            185 non-null    category      
 1   Date               185 non-null    datetime64[ns]
 2   Total Cases        185 non-null    int64         
 3   Total Deaths       185 non-null    int64         
 4   Population         185 non-null    int64         
 5   New Cases          185 non-null    int64         
 6   New Deaths         185 non-null    int64         
 7   %Cases             185 non-null    float64       
 8   %Deaths            185 non-null    float64       
 9   log(Total Cases)   185 non-null    float64       
 10  log(Total Deaths)  185 non-null    float64       
 11  log(New Cases)     185 non-null    float64       
 12  log(New Deaths)    185 non-null    float64       
dtypes: category(1), datetime64[ns](1), float64(6), int64(5)
memory us

Save cases_deaths2 as county data.

In [57]:
CountyData = cases_deaths2

# Google Mobility Data Preporcessing

### About the Data

The mobility data for this project comes from Google's open source Covid-19 Community Mobility Reports.

The data constists of anonymized aggregated location data.

The data tracks movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential.

Changes for each day are compared to a baseline value for that day of the week:

- The baseline is the median value, for the corresponding day of the week, during the 5-week period Jan 3–Feb 6, 2020 (pre-pandemic).

- __The datasets show trends over several months with the most recent data representing approximately 2-3 days ago—this is how long it takes to produce the datasets.__

<br>

#### Place categories
- Grocery & pharmacy
    - Mobility trends for places like grocery markets, food warehouses, farmers markets, specialty food shops, drug stores, and pharmacies.

- Parks
    - Mobility trends for places like local parks, national parks, public beaches, marinas, dog parks, plazas, and public gardens.

- Transit stations
    - Mobility trends for places like public transport hubs such as subway, bus, and train stations.

- Retail & recreation
    - Mobility trends for places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters.

- Residential
    - Mobility trends for places of residence.

- Workplaces
    - Mobility trends for places of work.


<br>

No personally identifiable information, such as an individual’s location, contacts or movement, is made available at any point.

This data will be available for a limited time, as long as public health officials find it useful in their work to stop the spread of COVID-19.

The data can be found here: https://www.google.com/covid19/mobility/

#### Import Data

To unsure that we always have a copy of the data saved in the environment, every time the data is imported it will be saved.

In [58]:
### Google Mobility data
!curl https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv?cachebust=7d0cb7d254d29111 --output data/mobility.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 46.8M  100 46.8M    0     0  45.3M      0  0:00:01  0:00:01 --:--:-- 45.3M


Load in mobility data.

In [59]:
GoogleMobility = pd.read_csv("data/mobility.csv", dtype = "str")
GoogleMobility

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,iso_3166_2_code,census_fips_code,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
0,AE,United Arab Emirates,,,,,2020-02-15,0,4,5,0,2,1
1,AE,United Arab Emirates,,,,,2020-02-16,1,4,4,1,2,1
2,AE,United Arab Emirates,,,,,2020-02-17,-1,1,5,1,2,1
3,AE,United Arab Emirates,,,,,2020-02-18,-2,1,5,0,2,1
4,AE,United Arab Emirates,,,,,2020-02-19,-2,0,4,-1,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
713523,ZW,Zimbabwe,Midlands Province,,ZW-MI,,2020-07-17,,,,,-1,
713524,ZW,Zimbabwe,Midlands Province,,ZW-MI,,2020-07-18,,,,,24,
713525,ZW,Zimbabwe,Midlands Province,,ZW-MI,,2020-07-19,,,,,16,
713526,ZW,Zimbabwe,Midlands Province,,ZW-MI,,2020-07-20,,,,,-4,


This dataset countains world wide information. Filter out anything that is not the United States.

In [60]:
### Keep only US
GoogleMobility = GoogleMobility[GoogleMobility["country_region_code"] == "US"]
GoogleMobility

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,iso_3166_2_code,census_fips_code,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
276618,US,United States,,,,,2020-02-15,6,2,15,3,2,-1
276619,US,United States,,,,,2020-02-16,7,1,16,2,0,-1
276620,US,United States,,,,,2020-02-17,6,0,28,-9,-24,5
276621,US,United States,,,,,2020-02-18,0,-1,6,1,0,1
276622,US,United States,,,,,2020-02-19,2,0,8,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
696675,US,United States,Wyoming,Weston County,,56045,2020-07-15,,,,,-27,
696676,US,United States,Wyoming,Weston County,,56045,2020-07-16,,,,,-24,
696677,US,United States,Wyoming,Weston County,,56045,2020-07-17,,,,,-26,
696678,US,United States,Wyoming,Weston County,,56045,2020-07-20,,,,,-29,


Luckily, we can separate the data into county, state, and country levels.

In [61]:
### Mobility data for whole country
GoogleUsaMobility = GoogleMobility[GoogleMobility["sub_region_1"].isnull()]

### Mobility data for states
GoogleStateMobility = GoogleMobility[(GoogleMobility["sub_region_1"].isnull() != True) & (GoogleMobility["sub_region_2"].isnull())]

### Mobility data for counties
GoogleCountyMobility = GoogleMobility[GoogleMobility["sub_region_2"].isnull() != True]

In [62]:
GoogleUsaMobility

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,iso_3166_2_code,census_fips_code,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
276618,US,United States,,,,,2020-02-15,6,2,15,3,2,-1
276619,US,United States,,,,,2020-02-16,7,1,16,2,0,-1
276620,US,United States,,,,,2020-02-17,6,0,28,-9,-24,5
276621,US,United States,,,,,2020-02-18,0,-1,6,1,0,1
276622,US,United States,,,,,2020-02-19,2,0,8,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
276771,US,United States,,,,,2020-07-17,-17,-3,54,-29,-37,11
276772,US,United States,,,,,2020-07-18,-19,-1,74,-21,-13,4
276773,US,United States,,,,,2020-07-19,-17,-4,60,-24,-16,3
276774,US,United States,,,,,2020-07-20,-14,-4,51,-32,-38,10


In [63]:
GoogleStateMobility

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,iso_3166_2_code,census_fips_code,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
276776,US,United States,Alabama,,US-AL,,2020-02-15,5,2,39,7,2,-1
276777,US,United States,Alabama,,US-AL,,2020-02-16,0,-2,-7,3,-1,1
276778,US,United States,Alabama,,US-AL,,2020-02-17,3,0,17,7,-17,4
276779,US,United States,Alabama,,US-AL,,2020-02-18,-4,-3,-11,-1,1,2
276780,US,United States,Alabama,,US-AL,,2020-02-19,4,1,6,4,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
693708,US,United States,Wyoming,,US-WY,,2020-07-17,7,33,,49,-26,4
693709,US,United States,Wyoming,,US-WY,,2020-07-18,5,32,343,55,-9,-1
693710,US,United States,Wyoming,,US-WY,,2020-07-19,17,36,320,45,-10,-2
693711,US,United States,Wyoming,,US-WY,,2020-07-20,18,30,330,38,-25,3


In [64]:
GoogleCountyMobility

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,iso_3166_2_code,census_fips_code,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
276934,US,United States,Alabama,Autauga County,,01001,2020-02-15,5,7,,,-4,
276935,US,United States,Alabama,Autauga County,,01001,2020-02-16,0,1,-23,,-4,
276936,US,United States,Alabama,Autauga County,,01001,2020-02-17,8,0,,,-27,5
276937,US,United States,Alabama,Autauga County,,01001,2020-02-18,-2,0,,,2,0
276938,US,United States,Alabama,Autauga County,,01001,2020-02-19,-2,0,,,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
696675,US,United States,Wyoming,Weston County,,56045,2020-07-15,,,,,-27,
696676,US,United States,Wyoming,Weston County,,56045,2020-07-16,,,,,-24,
696677,US,United States,Wyoming,Weston County,,56045,2020-07-17,,,,,-26,
696678,US,United States,Wyoming,Weston County,,56045,2020-07-20,,,,,-29,


We can drop some uneccesary columns from each dataframe level.

In [65]:
### Drop columns from usaMobility
GoogleUsaMobility = GoogleUsaMobility.drop(columns = ["country_region_code", "sub_region_1",
                                          "sub_region_2", "iso_3166_2_code",
                                          "census_fips_code"])

### Drop columns from stateMobility
GoogleStateMobility = GoogleStateMobility.drop(columns = ["country_region_code", "country_region", 
                                              "sub_region_2", "iso_3166_2_code", 
                                              "census_fips_code"])

### Drop columns from countyMobility
GoogleCountyMobility = GoogleCountyMobility.drop(columns = ["country_region_code", "country_region",
                                                "sub_region_1", "iso_3166_2_code"])

In [66]:
GoogleUsaMobility

Unnamed: 0,country_region,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
276618,United States,2020-02-15,6,2,15,3,2,-1
276619,United States,2020-02-16,7,1,16,2,0,-1
276620,United States,2020-02-17,6,0,28,-9,-24,5
276621,United States,2020-02-18,0,-1,6,1,0,1
276622,United States,2020-02-19,2,0,8,1,1,0
...,...,...,...,...,...,...,...,...
276771,United States,2020-07-17,-17,-3,54,-29,-37,11
276772,United States,2020-07-18,-19,-1,74,-21,-13,4
276773,United States,2020-07-19,-17,-4,60,-24,-16,3
276774,United States,2020-07-20,-14,-4,51,-32,-38,10


In [67]:
GoogleStateMobility

Unnamed: 0,sub_region_1,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
276776,Alabama,2020-02-15,5,2,39,7,2,-1
276777,Alabama,2020-02-16,0,-2,-7,3,-1,1
276778,Alabama,2020-02-17,3,0,17,7,-17,4
276779,Alabama,2020-02-18,-4,-3,-11,-1,1,2
276780,Alabama,2020-02-19,4,1,6,4,1,0
...,...,...,...,...,...,...,...,...
693708,Wyoming,2020-07-17,7,33,,49,-26,4
693709,Wyoming,2020-07-18,5,32,343,55,-9,-1
693710,Wyoming,2020-07-19,17,36,320,45,-10,-2
693711,Wyoming,2020-07-20,18,30,330,38,-25,3


In [68]:
GoogleCountyMobility

Unnamed: 0,sub_region_2,census_fips_code,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
276934,Autauga County,01001,2020-02-15,5,7,,,-4,
276935,Autauga County,01001,2020-02-16,0,1,-23,,-4,
276936,Autauga County,01001,2020-02-17,8,0,,,-27,5
276937,Autauga County,01001,2020-02-18,-2,0,,,2,0
276938,Autauga County,01001,2020-02-19,-2,0,,,2,0
...,...,...,...,...,...,...,...,...,...
696675,Weston County,56045,2020-07-15,,,,,-27,
696676,Weston County,56045,2020-07-16,,,,,-24,
696677,Weston County,56045,2020-07-17,,,,,-26,
696678,Weston County,56045,2020-07-20,,,,,-29,


Now rename columns to be more usable and to match covid-19 data naming convention.

Also make Date as datetime64.

In [69]:
### Rename usaMobility columns
GoogleUsaMobility = GoogleUsaMobility.rename(columns = {"country_region" : "Country",
                                            "date" : "Date",
                                            "retail_and_recreation_percent_change_from_baseline" : "%Retail/Rec Change",
                                            "grocery_and_pharmacy_percent_change_from_baseline" : "%Grocery/Pharm Change",
                                            "parks_percent_change_from_baseline" : "%Parks Change",
                                            "transit_stations_percent_change_from_baseline" : "%Transit Change",
                                            "workplaces_percent_change_from_baseline" : "%Workplace Change",
                                            "residential_percent_change_from_baseline" : "%Residential Change"})
GoogleUsaMobility = GoogleUsaMobility.astype({"Date" : "datetime64"})


### Rename stateMobility columns
GoogleStateMobility = GoogleStateMobility.rename(columns = {"sub_region_1" : "State",
                                            "date" : "Date",
                                            "retail_and_recreation_percent_change_from_baseline" : "%Retail/Rec Change",
                                            "grocery_and_pharmacy_percent_change_from_baseline" : "%Grocery/Pharm Change",
                                            "parks_percent_change_from_baseline" : "%Parks Change",
                                            "transit_stations_percent_change_from_baseline" : "%Transit Change",
                                            "workplaces_percent_change_from_baseline" : "%Workplace Change",
                                            "residential_percent_change_from_baseline" : "%Residential Change"})
GoogleStateMobility = GoogleStateMobility.astype({"Date" : "datetime64"})


### Rename countyMobility columns
GoogleCountyMobility = GoogleCountyMobility.rename(columns = {"sub_region_2" : "County Name",
                                            "census_fips_code" : "countyFIPS",
                                            "date" : "Date",
                                            "retail_and_recreation_percent_change_from_baseline" : "%Retail/Rec Change",
                                            "grocery_and_pharmacy_percent_change_from_baseline" : "%Grocery/Pharm Change",
                                            "parks_percent_change_from_baseline" : "%Parks Change",
                                            "transit_stations_percent_change_from_baseline" : "%Transit Change",
                                            "workplaces_percent_change_from_baseline" : "%Workplace Change",
                                            "residential_percent_change_from_baseline" : "%Residential Change"})
GoogleCountyMobility = GoogleCountyMobility.astype({"Date" : "datetime64"})


In [70]:
GoogleUsaMobility

Unnamed: 0,Country,Date,%Retail/Rec Change,%Grocery/Pharm Change,%Parks Change,%Transit Change,%Workplace Change,%Residential Change
276618,United States,2020-02-15,6,2,15,3,2,-1
276619,United States,2020-02-16,7,1,16,2,0,-1
276620,United States,2020-02-17,6,0,28,-9,-24,5
276621,United States,2020-02-18,0,-1,6,1,0,1
276622,United States,2020-02-19,2,0,8,1,1,0
...,...,...,...,...,...,...,...,...
276771,United States,2020-07-17,-17,-3,54,-29,-37,11
276772,United States,2020-07-18,-19,-1,74,-21,-13,4
276773,United States,2020-07-19,-17,-4,60,-24,-16,3
276774,United States,2020-07-20,-14,-4,51,-32,-38,10


In [71]:
GoogleStateMobility

Unnamed: 0,State,Date,%Retail/Rec Change,%Grocery/Pharm Change,%Parks Change,%Transit Change,%Workplace Change,%Residential Change
276776,Alabama,2020-02-15,5,2,39,7,2,-1
276777,Alabama,2020-02-16,0,-2,-7,3,-1,1
276778,Alabama,2020-02-17,3,0,17,7,-17,4
276779,Alabama,2020-02-18,-4,-3,-11,-1,1,2
276780,Alabama,2020-02-19,4,1,6,4,1,0
...,...,...,...,...,...,...,...,...
693708,Wyoming,2020-07-17,7,33,,49,-26,4
693709,Wyoming,2020-07-18,5,32,343,55,-9,-1
693710,Wyoming,2020-07-19,17,36,320,45,-10,-2
693711,Wyoming,2020-07-20,18,30,330,38,-25,3


In [72]:
GoogleCountyMobility

Unnamed: 0,County Name,countyFIPS,Date,%Retail/Rec Change,%Grocery/Pharm Change,%Parks Change,%Transit Change,%Workplace Change,%Residential Change
276934,Autauga County,01001,2020-02-15,5,7,,,-4,
276935,Autauga County,01001,2020-02-16,0,1,-23,,-4,
276936,Autauga County,01001,2020-02-17,8,0,,,-27,5
276937,Autauga County,01001,2020-02-18,-2,0,,,2,0
276938,Autauga County,01001,2020-02-19,-2,0,,,2,0
...,...,...,...,...,...,...,...,...,...
696675,Weston County,56045,2020-07-15,,,,,-27,
696676,Weston County,56045,2020-07-16,,,,,-24,
696677,Weston County,56045,2020-07-17,,,,,-26,
696678,Weston County,56045,2020-07-20,,,,,-29,


In [73]:
### Re-label District of Columbia as DC
DCindex = list(GoogleStateMobility["State"][GoogleStateMobility["State"] == "District of Columbia"].index)
for index in DCindex:
    GoogleStateMobility["State"][index] = "DC"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


 # 

# Deaths by Sex & Age (Country, State Level)

This data comes from the Centers of Disease Control and Prevention (CDC) and is provided by the National Center for Health Statistics.

It contains aggregated death data based on Country, State, Sex, and Age Group.

__NOTE__: "Number of deaths reported in this table are the total number of deaths received and coded as of the date of analysis, and do not represent all deaths that occurred in that period. Data during this period are incomplete because of the lag in time between when the death occurred and when the death certificate is completed, submitted to NCHS and processed for reporting purposes. This delay can range from 1 week to 8 weeks or more."

__NOTE__: One or more data cells have counts between 1–9 and have been suppressed in accordance with NCHS confidentiality standards.

Data can be found here: 
* Centers for Disease Control and Prevention. *Provisional Death Counts for Coronavirus Disease (COVID-19): Weekly State-Specific Data Updates*. 2020 April 2020. https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Sex-Age-and-S/9bhg-hcku

In [74]:
### Go grab data
!curl https://data.cdc.gov/api/views/9bhg-hcku/rows.csv?accessType=DOWNLOAD --output data/sexage.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  215k    0  215k    0     0   260k      0 --:--:-- --:--:-- --:--:--  259k


In [75]:
### Read in data
DeathsSexAge = pd.read_csv("data/sexage.csv")
DeathsSexAge

Unnamed: 0,Data as of,Start week,End Week,State,Sex,Age group,COVID-19 Deaths,Total Deaths,Pneumonia Deaths,Pneumonia and COVID-19 Deaths,Influenza Deaths,"Pneumonia, Influenza, or COVID-19 Deaths",Footnote
0,07/22/2020,02/01/2020,07/11/2020,United States,All,Under 1 year,11.0,8072.0,75.0,2.0,14.0,98.0,
1,07/22/2020,02/01/2020,07/11/2020,United States,All,1-4 years,9.0,1547.0,52.0,2.0,41.0,100.0,
2,07/22/2020,02/01/2020,07/11/2020,United States,All,5-14 years,16.0,2379.0,78.0,5.0,49.0,138.0,
3,07/22/2020,02/01/2020,07/11/2020,United States,All,15-24 years,190.0,14810.0,300.0,62.0,51.0,475.0,
4,07/22/2020,02/01/2020,07/11/2020,United States,All,25-34 years,935.0,30885.0,1113.0,416.0,149.0,1768.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1411,07/22/2020,02/01/2020,07/11/2020,Puerto Rico,Female,75-84 years,0.0,0.0,0.0,0.0,0.0,0.0,
1412,07/22/2020,02/01/2020,07/11/2020,Puerto Rico,Female,85 years and over,0.0,0.0,0.0,0.0,0.0,0.0,
1413,07/22/2020,02/01/2020,07/11/2020,Puerto Rico,Female,All ages,,434.0,44.0,,,52.0,One or more data cells have counts between 1–9...
1414,07/22/2020,02/01/2020,07/11/2020,Puerto Rico,Unknown,All ages,0.0,,0.0,0.0,0.0,0.0,One or more data cells have counts between 1–9...


Only interested in Covid-19 Deaths.

In [76]:
DeathsSexAge = DeathsSexAge.drop(columns = ["Total Deaths",
                                            "Pneumonia Deaths",
                                            "Pneumonia and COVID-19 Deaths",
                                            "Influenza Deaths", 
                                            "Pneumonia, Influenza, or COVID-19 Deaths",
                                            "Footnote"])
DeathsSexAge

Unnamed: 0,Data as of,Start week,End Week,State,Sex,Age group,COVID-19 Deaths
0,07/22/2020,02/01/2020,07/11/2020,United States,All,Under 1 year,11.0
1,07/22/2020,02/01/2020,07/11/2020,United States,All,1-4 years,9.0
2,07/22/2020,02/01/2020,07/11/2020,United States,All,5-14 years,16.0
3,07/22/2020,02/01/2020,07/11/2020,United States,All,15-24 years,190.0
4,07/22/2020,02/01/2020,07/11/2020,United States,All,25-34 years,935.0
...,...,...,...,...,...,...,...
1411,07/22/2020,02/01/2020,07/11/2020,Puerto Rico,Female,75-84 years,0.0
1412,07/22/2020,02/01/2020,07/11/2020,Puerto Rico,Female,85 years and over,0.0
1413,07/22/2020,02/01/2020,07/11/2020,Puerto Rico,Female,All ages,
1414,07/22/2020,02/01/2020,07/11/2020,Puerto Rico,Unknown,All ages,0.0


In [77]:
### Drop Puerto Rico, Puerto Rico Total
PRindex = list(DeathsSexAge["State"][(DeathsSexAge["State"] == "Puerto Rico") | (DeathsSexAge["State"] == "Puerto Rico Total")].index)
DeathsSexAge = DeathsSexAge.drop(index = PRindex)
DeathsSexAge


### Rename DC
DCindex = list(DeathsSexAge["State"][DeathsSexAge["State"] == "District of Columbia"].index)
DeathsSexAge["State"][DCindex] = "DC"

DeathsSexAge["State"].unique()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':


array(['United States', 'United States Total', 'Alabama', 'Alabama Total',
       'Alaska', 'Alaska Total', 'Arizona', 'Arizona Total', 'Arkansas',
       'Arkansas Total', 'California', 'California Total', 'Colorado',
       'Colorado Total', 'Connecticut', 'Connecticut Total', 'Delaware',
       'Delaware Total', 'DC', 'District of Columbia Total', 'Florida',
       'Florida Total', 'Georgia', 'Georgia Total', 'Hawaii',
       'Hawaii Total', 'Idaho', 'Idaho Total', 'Illinois',
       'Illinois Total', 'Indiana', 'Indiana Total', 'Iowa', 'Iowa Total',
       'Kansas', 'Kansas Total', 'Kentucky', 'Kentucky Total',
       'Louisiana', 'Louisiana Total', 'Maine', 'Maine Total', 'Maryland',
       'Maryland Total', 'Massachusetts', 'Massachusetts Total',
       'Michigan', 'Michigan Total', 'Minnesota', 'Minnesota Total',
       'Mississippi', 'Mississippi Total', 'Missouri', 'Missouri Total',
       'Montana', 'Montana Total', 'Nebraska', 'Nebraska Total', 'Nevada',
       'Nevada Total

# 
# 

# Deaths by Race (Country, State level)

This data comes from the Centers of Disease Control and Prevention (CDC) and is provided by the National Center for Health Statistics.

It contains aggregated death data based on Country, State, and Race.

__NOTE__: "The percent of deaths reported in this table are the total number of represent all deaths received and coded as of the date of analysis and do not represent all deaths that occurred in that period. Data are incomplete because of the lag in time between when the death occurred and when the death certificate is completed, submitted to NCHS and processed for reporting purposes. This delay can range from 1 week to 8 weeks or more, depending on the jurisdiction, age, and cause of death. Provisional counts reported here track approximately 1–2 weeks behind other published data sources on the number of COVID-19 deaths in the U.S. COVID-19 deaths are defined as having confirmed or presumed COVID-19, and are coded to ICD–10 code U07.1."

"Unweighted population percentages are based on the Single-Race Population Estimates from the U.S. Census Bureau, for the year 2018 (available from: https://wonder.cdc.gov/single-race-population.html )."

"Weighted population percentages are computed by multiplying county-level population counts by the count of COVID deaths for each county, summing to the state-level, and then estimating the percent of the population within each racial and ethnic group. These weighted population distributions therefore more accurately reflect the geographic locations where COVID outbreaks are occurring. Jurisdictions are included in this table if more than 100 deaths were received and processed by NCHS as of the data of analysis. 1. Race and Hispanic-origin categories are based on the 1997 Office of Management and Budget (OMB) standards (1,2), allowing for the presentation of data by single race and Hispanic origin. These race and Hispanic-origin groups—non-Hispanic single-race white, non-Hispanic single-race black or African American, non-Hispanic single-race American Indian or Alaska Native (AIAN), and non-Hispanic single-race Asian—differ from the bridged-race categories shown in most reports using mortality data. 2. Includes persons having origins in any of the original peoples of North and South America 3. Includes persons having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent. 4. Includes Native Hawaiian and Other Pacific Islander, more than one race, race unknown, and Hispanic origin unknown 5. Excludes New York City."

__NOTE__: One or more data cells have counts between 1–9 and have been suppressed in accordance with NCHS confidentiality standards.

Data can be found here:

* Centers for Disease Control and Prevention. *Provisional COVID-19 Death Counts by Sex, Age, and State*. 1 May 2020. <https://data.cdc.gov/NCHS/Provisional-Death-Counts-for-Coronavirus-Disease-C/pj7m-y5uh>

In [78]:
### Go grab data
!curl https://data.cdc.gov/api/views/pj7m-y5uh/rows.csv?accessType=DOWNLOAD --output data/race.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 24950    0 24950    0     0  69305      0 --:--:-- --:--:-- --:--:-- 69305


In [79]:
### Read in Data
race = pd.read_csv("data/race.csv")
race

Unnamed: 0,Data as of,State,Indicator,Non-Hispanic White,Non-Hispanic Black or African American,Non-Hispanic American Indian or Alaska Native,Non-Hispanic Asian,Hispanic or Latino,Other,Footnote
0,07/22/2020,United States,Count of COVID-19 deaths,68377.0,29476.0,1143.0,6305.0,23256.0,1693.0,
1,07/22/2020,United States,Distribution of COVID-19 deaths (%),52.5,22.6,0.9,4.8,17.9,1.3,
2,07/22/2020,United States,Unweighted distribution of population (%),60.4,12.5,0.7,5.7,18.3,2.4,
3,07/22/2020,United States,Weighted distribution of population (%),41.8,16.6,0.3,10.5,28.9,1.9,
4,07/22/2020,Alabama,Count of COVID-19 deaths,641.0,577.0,,,35.0,,One or more data cells have counts between 1–9...
...,...,...,...,...,...,...,...,...,...,...
187,07/22/2020,West Virginia,Weighted distribution of population (%),88.6,5.9,0.2,1.3,1.7,2.3,
188,07/22/2020,Wisconsin,Count of COVID-19 deaths,505.0,194.0,,25.0,94.0,,One or more data cells have counts between 1–9...
189,07/22/2020,Wisconsin,Distribution of COVID-19 deaths (%),61.2,23.5,,3.0,11.4,,One or more data cells have counts between 1–9...
190,07/22/2020,Wisconsin,Unweighted distribution of population (%),81.1,6.4,0.9,3.0,6.9,1.7,


Drop Footnote

In [80]:
race = race.drop(columns = "Footnote")
race

Unnamed: 0,Data as of,State,Indicator,Non-Hispanic White,Non-Hispanic Black or African American,Non-Hispanic American Indian or Alaska Native,Non-Hispanic Asian,Hispanic or Latino,Other
0,07/22/2020,United States,Count of COVID-19 deaths,68377.0,29476.0,1143.0,6305.0,23256.0,1693.0
1,07/22/2020,United States,Distribution of COVID-19 deaths (%),52.5,22.6,0.9,4.8,17.9,1.3
2,07/22/2020,United States,Unweighted distribution of population (%),60.4,12.5,0.7,5.7,18.3,2.4
3,07/22/2020,United States,Weighted distribution of population (%),41.8,16.6,0.3,10.5,28.9,1.9
4,07/22/2020,Alabama,Count of COVID-19 deaths,641.0,577.0,,,35.0,
...,...,...,...,...,...,...,...,...,...
187,07/22/2020,West Virginia,Weighted distribution of population (%),88.6,5.9,0.2,1.3,1.7,2.3
188,07/22/2020,Wisconsin,Count of COVID-19 deaths,505.0,194.0,,25.0,94.0,
189,07/22/2020,Wisconsin,Distribution of COVID-19 deaths (%),61.2,23.5,,3.0,11.4,
190,07/22/2020,Wisconsin,Unweighted distribution of population (%),81.1,6.4,0.9,3.0,6.9,1.7


Remove New York City and rename "New York < sup > 5 < /sup>" to New York

In [81]:
### Drop NYC.
NYCindex = list(race["State"][race["State"] == "New York City"].index)
race = race.drop(index = NYCindex)

### Rename New York<sup>5</sup> to New York.
NYindex = list(race["State"][race["State"] == "New York<sup>5</sup>"].index)
race["State"][NYindex] = "New York"

### Rename DC
DCindex = list(race["State"][race["State"] == "District of Columbia"].index)
race["State"][DCindex] = "DC"

race["State"].unique()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()


array(['United States', 'Alabama', 'Arizona', 'Arkansas', 'California',
       'Colorado', 'Connecticut', 'Delaware', 'DC', 'Florida', 'Georgia',
       'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky',
       'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan',
       'Minnesota', 'Mississippi', 'Missouri', 'Nebraska', 'Nevada',
       'New Hampshire', 'New Jersey', 'New Mexico', 'New York',
       'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon',
       'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota',
       'Tennessee', 'Texas', 'Utah', 'Virginia', 'Washington',
       'West Virginia', 'Wisconsin'], dtype=object)

In [82]:
race

Unnamed: 0,Data as of,State,Indicator,Non-Hispanic White,Non-Hispanic Black or African American,Non-Hispanic American Indian or Alaska Native,Non-Hispanic Asian,Hispanic or Latino,Other
0,07/22/2020,United States,Count of COVID-19 deaths,68377.0,29476.0,1143.0,6305.0,23256.0,1693.0
1,07/22/2020,United States,Distribution of COVID-19 deaths (%),52.5,22.6,0.9,4.8,17.9,1.3
2,07/22/2020,United States,Unweighted distribution of population (%),60.4,12.5,0.7,5.7,18.3,2.4
3,07/22/2020,United States,Weighted distribution of population (%),41.8,16.6,0.3,10.5,28.9,1.9
4,07/22/2020,Alabama,Count of COVID-19 deaths,641.0,577.0,,,35.0,
...,...,...,...,...,...,...,...,...,...
187,07/22/2020,West Virginia,Weighted distribution of population (%),88.6,5.9,0.2,1.3,1.7,2.3
188,07/22/2020,Wisconsin,Count of COVID-19 deaths,505.0,194.0,,25.0,94.0,
189,07/22/2020,Wisconsin,Distribution of COVID-19 deaths (%),61.2,23.5,,3.0,11.4,
190,07/22/2020,Wisconsin,Unweighted distribution of population (%),81.1,6.4,0.9,3.0,6.9,1.7


To get the data into a more usable form, first split into separate dataframes by Indicator. Will merge later.

In [83]:
countDeaths = race[race["Indicator"] == "Count of COVID-19 deaths"]
distDeaths = race[race["Indicator"] == "Distribution of COVID-19 deaths (%)"]
unweightDeaths = race[race["Indicator"] == "Unweighted distribution of population (%)"]
weightDeaths = race[race["Indicator"] == "Weighted distribution of population (%)"]

#### Count of COVID-19 deaths

In [84]:
### Unpivot
countDeaths = pd.melt(countDeaths, id_vars = ["Data as of","State", "Indicator"],
       value_vars = countDeaths.columns[3:9],
       var_name = "Race", value_name = "Count of COVID-19 deaths")
countDeaths

Unnamed: 0,Data as of,State,Indicator,Race,Count of COVID-19 deaths
0,07/22/2020,United States,Count of COVID-19 deaths,Non-Hispanic White,68377.0
1,07/22/2020,Alabama,Count of COVID-19 deaths,Non-Hispanic White,641.0
2,07/22/2020,Arizona,Count of COVID-19 deaths,Non-Hispanic White,1101.0
3,07/22/2020,Arkansas,Count of COVID-19 deaths,Non-Hispanic White,206.0
4,07/22/2020,California,Count of COVID-19 deaths,Non-Hispanic White,2175.0
...,...,...,...,...,...
277,07/22/2020,Utah,Count of COVID-19 deaths,Other,12.0
278,07/22/2020,Virginia,Count of COVID-19 deaths,Other,
279,07/22/2020,Washington,Count of COVID-19 deaths,Other,24.0
280,07/22/2020,West Virginia,Count of COVID-19 deaths,Other,0.0


In [85]:
### Drop Indicator
countDeaths = countDeaths.drop(columns = "Indicator")
countDeaths

Unnamed: 0,Data as of,State,Race,Count of COVID-19 deaths
0,07/22/2020,United States,Non-Hispanic White,68377.0
1,07/22/2020,Alabama,Non-Hispanic White,641.0
2,07/22/2020,Arizona,Non-Hispanic White,1101.0
3,07/22/2020,Arkansas,Non-Hispanic White,206.0
4,07/22/2020,California,Non-Hispanic White,2175.0
...,...,...,...,...
277,07/22/2020,Utah,Other,12.0
278,07/22/2020,Virginia,Other,
279,07/22/2020,Washington,Other,24.0
280,07/22/2020,West Virginia,Other,0.0


#### Distribution of COVID-19 deaths (%)

In [86]:
### Unpivot
distDeaths = pd.melt(distDeaths, id_vars = ["Data as of","State", "Indicator"],
       value_vars = distDeaths.columns[3:9],
       var_name = "Race", value_name = "Distribution of COVID-19 deaths (%)")
distDeaths

Unnamed: 0,Data as of,State,Indicator,Race,Distribution of COVID-19 deaths (%)
0,07/22/2020,United States,Distribution of COVID-19 deaths (%),Non-Hispanic White,52.5
1,07/22/2020,Alabama,Distribution of COVID-19 deaths (%),Non-Hispanic White,50.7
2,07/22/2020,Arizona,Distribution of COVID-19 deaths (%),Non-Hispanic White,45.1
3,07/22/2020,Arkansas,Distribution of COVID-19 deaths (%),Non-Hispanic White,56.9
4,07/22/2020,California,Distribution of COVID-19 deaths (%),Non-Hispanic White,30.6
...,...,...,...,...,...
277,07/22/2020,Utah,Distribution of COVID-19 deaths (%),Other,5.5
278,07/22/2020,Virginia,Distribution of COVID-19 deaths (%),Other,
279,07/22/2020,Washington,Distribution of COVID-19 deaths (%),Other,1.9
280,07/22/2020,West Virginia,Distribution of COVID-19 deaths (%),Other,0.0


In [87]:
### Drop Indicator
distDeaths = distDeaths.drop(columns = "Indicator")
distDeaths

Unnamed: 0,Data as of,State,Race,Distribution of COVID-19 deaths (%)
0,07/22/2020,United States,Non-Hispanic White,52.5
1,07/22/2020,Alabama,Non-Hispanic White,50.7
2,07/22/2020,Arizona,Non-Hispanic White,45.1
3,07/22/2020,Arkansas,Non-Hispanic White,56.9
4,07/22/2020,California,Non-Hispanic White,30.6
...,...,...,...,...
277,07/22/2020,Utah,Other,5.5
278,07/22/2020,Virginia,Other,
279,07/22/2020,Washington,Other,1.9
280,07/22/2020,West Virginia,Other,0.0


#### Unweighted distribution of population (%)

In [88]:
### Unpivot
unweightDeaths = pd.melt(unweightDeaths, id_vars = ["Data as of","State", "Indicator"],
       value_vars = unweightDeaths.columns[3:9],
       var_name = "Race", value_name = "Unweighted distribution of population (%)")
unweightDeaths

Unnamed: 0,Data as of,State,Indicator,Race,Unweighted distribution of population (%)
0,07/22/2020,United States,Unweighted distribution of population (%),Non-Hispanic White,60.4
1,07/22/2020,Alabama,Unweighted distribution of population (%),Non-Hispanic White,65.4
2,07/22/2020,Arizona,Unweighted distribution of population (%),Non-Hispanic White,54.4
3,07/22/2020,Arkansas,Unweighted distribution of population (%),Non-Hispanic White,72.2
4,07/22/2020,California,Unweighted distribution of population (%),Non-Hispanic White,36.8
...,...,...,...,...,...
277,07/22/2020,Utah,Unweighted distribution of population (%),Other,3.1
278,07/22/2020,Virginia,Unweighted distribution of population (%),Other,2.7
279,07/22/2020,Washington,Unweighted distribution of population (%),Other,4.8
280,07/22/2020,West Virginia,Unweighted distribution of population (%),Other,1.7


In [89]:
### Drop Indicator
unweightDeaths = unweightDeaths.drop(columns = "Indicator")
unweightDeaths

Unnamed: 0,Data as of,State,Race,Unweighted distribution of population (%)
0,07/22/2020,United States,Non-Hispanic White,60.4
1,07/22/2020,Alabama,Non-Hispanic White,65.4
2,07/22/2020,Arizona,Non-Hispanic White,54.4
3,07/22/2020,Arkansas,Non-Hispanic White,72.2
4,07/22/2020,California,Non-Hispanic White,36.8
...,...,...,...,...
277,07/22/2020,Utah,Other,3.1
278,07/22/2020,Virginia,Other,2.7
279,07/22/2020,Washington,Other,4.8
280,07/22/2020,West Virginia,Other,1.7


#### Weighted distribution of population (%)

In [90]:
### Unpivot
weightDeaths = pd.melt(weightDeaths, id_vars = ["Data as of","State", "Indicator"],
       value_vars = weightDeaths.columns[3:9],
       var_name = "Race", value_name = "Weighted distribution of population (%)")
weightDeaths

Unnamed: 0,Data as of,State,Indicator,Race,Weighted distribution of population (%)
0,07/22/2020,United States,Weighted distribution of population (%),Non-Hispanic White,41.8
1,07/22/2020,Alabama,Weighted distribution of population (%),Non-Hispanic White,53.2
2,07/22/2020,Arizona,Weighted distribution of population (%),Non-Hispanic White,54.7
3,07/22/2020,Arkansas,Weighted distribution of population (%),Non-Hispanic White,60.8
4,07/22/2020,California,Weighted distribution of population (%),Non-Hispanic White,28.2
...,...,...,...,...,...
277,07/22/2020,Utah,Weighted distribution of population (%),Other,2.3
278,07/22/2020,Virginia,Weighted distribution of population (%),Other,3.2
279,07/22/2020,Washington,Weighted distribution of population (%),Other,4.5
280,07/22/2020,West Virginia,Weighted distribution of population (%),Other,2.3


In [91]:
### Drop Indicator
weightDeaths = weightDeaths.drop(columns = "Indicator")
weightDeaths

Unnamed: 0,Data as of,State,Race,Weighted distribution of population (%)
0,07/22/2020,United States,Non-Hispanic White,41.8
1,07/22/2020,Alabama,Non-Hispanic White,53.2
2,07/22/2020,Arizona,Non-Hispanic White,54.7
3,07/22/2020,Arkansas,Non-Hispanic White,60.8
4,07/22/2020,California,Non-Hispanic White,28.2
...,...,...,...,...
277,07/22/2020,Utah,Other,2.3
278,07/22/2020,Virginia,Other,3.2
279,07/22/2020,Washington,Other,4.5
280,07/22/2020,West Virginia,Other,2.3


Now merge all of them together.

In [92]:
raceNew = countDeaths.merge(distDeaths, how = "inner", on = ["Data as of", "State", "Race"])
raceNew = raceNew.merge(unweightDeaths, how = "inner", on = ["Data as of", "State", "Race"])
raceNew = raceNew.merge(weightDeaths, how = "inner", on = ["Data as of", "State", "Race"])
raceNew

Unnamed: 0,Data as of,State,Race,Count of COVID-19 deaths,Distribution of COVID-19 deaths (%),Unweighted distribution of population (%),Weighted distribution of population (%)
0,07/22/2020,United States,Non-Hispanic White,68377.0,52.5,60.4,41.8
1,07/22/2020,Alabama,Non-Hispanic White,641.0,50.7,65.4,53.2
2,07/22/2020,Arizona,Non-Hispanic White,1101.0,45.1,54.4,54.7
3,07/22/2020,Arkansas,Non-Hispanic White,206.0,56.9,72.2,60.8
4,07/22/2020,California,Non-Hispanic White,2175.0,30.6,36.8,28.2
...,...,...,...,...,...,...,...
277,07/22/2020,Utah,Other,12.0,5.5,3.1,2.3
278,07/22/2020,Virginia,Other,,,2.7,3.2
279,07/22/2020,Washington,Other,24.0,1.9,4.8,4.5
280,07/22/2020,West Virginia,Other,0.0,0.0,1.7,2.3


# 
# 

# Hospitalization Estimates (Country, State level)

This data comes from the CDC’s National Healthcare Safety Network (NHSN). It enables hospitals to report:
* Current inpatient and intensive care unit (ICU) bed occupancy.
* Healthcare worker staffing.
* Personal protective equipment (PPE) supply status and availability.

Reporting is currently available to all U.S. acute care hospitals, critical access hospitals, inpatient rehabilitation facilities, inpatient psychiatric facilities, and long-term acute care hospitals.

__Important__: "Statistical methods were used to generate estimates of patient impact and hospital capacity measures that are representative at the national level. The estimates are based on data submitted by acute care hospitals to the NHSN COVID-19 Module. The statistical methods include weighting (to account for non-response) and multiple imputation (to account for missing data). The estimates (number and percentage) are shown along with 95% confidence intervals that reflect the statistical error that is primarily due to non-response."


__NOTE__: This data was submitted directly to CDC’s National Healthcare Safety Network (NHSN) and does not include data submitted to other entities contracted by or within the federal government.

The data can be found here: https://www.cdc.gov/nhsn/covid19/report-overview.html#anchor_1590010579051

In [93]:
### Go grab data
!curl https://www.cdc.gov/nhsn/pdfs/covid19/covid19-NatEst.csv --output data/hospital.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  576k    0  576k    0     0  2571k      0 --:--:-- --:--:-- --:--:-- 2560k


In [94]:
### Load in data
hospital = pd.read_csv("data/hospital.csv")
hospital

Unnamed: 0,state,statename,collectionDate,InpatBeds_Occ_AnyPat_Est,InpatBeds_Occ_AnyPat_LoCI,InpatBeds_Occ_AnyPat_UpCI,InpatBeds_Occ_AnyPat_Est_Avail,InBedsOccAnyPat__Numbeds_Est,InBedsOccAnyPat__Numbeds_LoCI,InBedsOccAnyPat__Numbeds_UpCI,...,InBedsOccCOVID__Numbeds_LoCI,InBedsOccCOVID__Numbeds_UpCI,ICUBeds_Occ_AnyPat_Est,ICUBeds_Occ_AnyPat_LoCI,ICUBeds_Occ_AnyPat_UpCI,ICUBeds_Occ_AnyPat_Est_Avail,ICUBedsOccAnyPat__N_ICUBeds_Est,ICUBedsOccAnyPat__N_ICUBeds_LoCI,ICUBedsOccAnyPat__N_ICUBeds_UpCI,Notes
0,Two-letter state abbreviation,State name,Day for which estimate is made,"Hospital inpatient bed occupancy, estimate","Hospital inpatient bed occupancy, lower 95% CI","Hospital inpatient bed occupancy, upper 95% CI","Hospital inpatient beds available, estimate","Hospital inpatient bed occupancy, percent esti...","Hospital inpatient bed occupancy, lower 95% CI...","Hospital inpatient bed occupancy, upper 95% CI...",...,Number of patients in an inpatient care locati...,Number of patients in an inpatient care locati...,"ICU bed occupancy, estimate","ICU bed occupancy, lower 95% CI","ICU bed occupancy, upper 95% CI","ICU beds available, estimate","ICU bed occupancy, percent estimate (percent o...","ICU bed occupancy, lower 95% CI (percent of IC...","ICU bed occupancy, upper 95% CI (percent of IC...",This file contains National and State represen...
1,US,United States,01APR2020,416064,380186,451942,350555,54.3,52.5,56.0,...,8.6,11.0,66369,56770,75968,45110,59.5,55.8,63.2,These estimates are based on data retrieved on...
2,US,United States,02APR2020,422892,391381,454403,357231,54.2,52.7,55.7,...,8.7,10.9,69385,60557,78214,45784,60.2,57.1,63.4,Statistical methods were used to generate esti...
3,US,United States,03APR2020,408938,382065,435810,364108,52.9,51.2,54.6,...,8.9,11.1,70580,61067,80092,45788,60.7,57.3,64.0,The estimates are based on data submitted by a...
4,US,United States,04APR2020,398850,374147,423554,375854,51.5,50.0,53.0,...,9.2,11.4,70134,62054,78215,47622,59.6,56.6,62.5,The statistical methods include weighting (to ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5028,WY,Wyoming,03JUL2020,434,82,785,489,47.0,34.8,59.1,...,0.4,2.6,57,0,123,6,90.3,50.7,100.0,
5029,WY,Wyoming,04JUL2020,450,86,813,475,48.7,35.7,61.6,...,0.0,3.2,54,0,112,9,85.7,47.9,100.0,
5030,WY,Wyoming,05JUL2020,438,69,808,494,47.0,33.5,60.5,...,0.0,3.1,41,0,86,21,65.7,26.4,100.0,
5031,WY,Wyoming,06JUL2020,408,77,740,514,44.3,31.1,57.4,...,0.2,3.8,42,0,91,21,67.7,29.1,100.0,


In [95]:
### Drop the Notes & state columns and the first row.
hospital = hospital.drop(columns = ["state", "Notes"])
hospital = hospital.drop(index = 0)
hospital = hospital.reset_index(drop = True)
hospital

Unnamed: 0,statename,collectionDate,InpatBeds_Occ_AnyPat_Est,InpatBeds_Occ_AnyPat_LoCI,InpatBeds_Occ_AnyPat_UpCI,InpatBeds_Occ_AnyPat_Est_Avail,InBedsOccAnyPat__Numbeds_Est,InBedsOccAnyPat__Numbeds_LoCI,InBedsOccAnyPat__Numbeds_UpCI,InpatBeds_Occ_COVID_Est,...,InBedsOccCOVID__Numbeds_Est,InBedsOccCOVID__Numbeds_LoCI,InBedsOccCOVID__Numbeds_UpCI,ICUBeds_Occ_AnyPat_Est,ICUBeds_Occ_AnyPat_LoCI,ICUBeds_Occ_AnyPat_UpCI,ICUBeds_Occ_AnyPat_Est_Avail,ICUBedsOccAnyPat__N_ICUBeds_Est,ICUBedsOccAnyPat__N_ICUBeds_LoCI,ICUBedsOccAnyPat__N_ICUBeds_UpCI
0,United States,01APR2020,416064,380186,451942,350555,54.3,52.5,56.0,75104,...,9.8,8.6,11.0,66369,56770,75968,45110,59.5,55.8,63.2
1,United States,02APR2020,422892,391381,454403,357231,54.2,52.7,55.7,76546,...,9.8,8.7,10.9,69385,60557,78214,45784,60.2,57.1,63.4
2,United States,03APR2020,408938,382065,435810,364108,52.9,51.2,54.6,77122,...,10.0,8.9,11.1,70580,61067,80092,45788,60.7,57.3,64.0
3,United States,04APR2020,398850,374147,423554,375854,51.5,50.0,53.0,79742,...,10.3,9.2,11.4,70134,62054,78215,47622,59.6,56.6,62.5
4,United States,05APR2020,400937,376016,425858,381724,51.2,49.8,52.6,80287,...,10.3,9.1,11.4,69853,61615,78091,48620,59.0,55.6,62.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5027,Wyoming,03JUL2020,434,82,785,489,47.0,34.8,59.1,14,...,1.5,0.4,2.6,57,0,123,6,90.3,50.7,100.0
5028,Wyoming,04JUL2020,450,86,813,475,48.7,35.7,61.6,12,...,1.3,0.0,3.2,54,0,112,9,85.7,47.9,100.0
5029,Wyoming,05JUL2020,438,69,808,494,47.0,33.5,60.5,14,...,1.5,0.0,3.1,41,0,86,21,65.7,26.4,100.0
5030,Wyoming,06JUL2020,408,77,740,514,44.3,31.1,57.4,18,...,2.0,0.2,3.8,42,0,91,21,67.7,29.1,100.0


In [96]:
### Rename columns
hospital = hospital.rename(columns = {'statename' : "State", 
                                      'collectionDate': "Date"})
hospital

Unnamed: 0,State,Date,InpatBeds_Occ_AnyPat_Est,InpatBeds_Occ_AnyPat_LoCI,InpatBeds_Occ_AnyPat_UpCI,InpatBeds_Occ_AnyPat_Est_Avail,InBedsOccAnyPat__Numbeds_Est,InBedsOccAnyPat__Numbeds_LoCI,InBedsOccAnyPat__Numbeds_UpCI,InpatBeds_Occ_COVID_Est,...,InBedsOccCOVID__Numbeds_Est,InBedsOccCOVID__Numbeds_LoCI,InBedsOccCOVID__Numbeds_UpCI,ICUBeds_Occ_AnyPat_Est,ICUBeds_Occ_AnyPat_LoCI,ICUBeds_Occ_AnyPat_UpCI,ICUBeds_Occ_AnyPat_Est_Avail,ICUBedsOccAnyPat__N_ICUBeds_Est,ICUBedsOccAnyPat__N_ICUBeds_LoCI,ICUBedsOccAnyPat__N_ICUBeds_UpCI
0,United States,01APR2020,416064,380186,451942,350555,54.3,52.5,56.0,75104,...,9.8,8.6,11.0,66369,56770,75968,45110,59.5,55.8,63.2
1,United States,02APR2020,422892,391381,454403,357231,54.2,52.7,55.7,76546,...,9.8,8.7,10.9,69385,60557,78214,45784,60.2,57.1,63.4
2,United States,03APR2020,408938,382065,435810,364108,52.9,51.2,54.6,77122,...,10.0,8.9,11.1,70580,61067,80092,45788,60.7,57.3,64.0
3,United States,04APR2020,398850,374147,423554,375854,51.5,50.0,53.0,79742,...,10.3,9.2,11.4,70134,62054,78215,47622,59.6,56.6,62.5
4,United States,05APR2020,400937,376016,425858,381724,51.2,49.8,52.6,80287,...,10.3,9.1,11.4,69853,61615,78091,48620,59.0,55.6,62.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5027,Wyoming,03JUL2020,434,82,785,489,47.0,34.8,59.1,14,...,1.5,0.4,2.6,57,0,123,6,90.3,50.7,100.0
5028,Wyoming,04JUL2020,450,86,813,475,48.7,35.7,61.6,12,...,1.3,0.0,3.2,54,0,112,9,85.7,47.9,100.0
5029,Wyoming,05JUL2020,438,69,808,494,47.0,33.5,60.5,14,...,1.5,0.0,3.1,41,0,86,21,65.7,26.4,100.0
5030,Wyoming,06JUL2020,408,77,740,514,44.3,31.1,57.4,18,...,2.0,0.2,3.8,42,0,91,21,67.7,29.1,100.0


In [97]:
### Convert Date into datetime
hospital = hospital.astype({"Date" : "datetime64"})
hospital

Unnamed: 0,State,Date,InpatBeds_Occ_AnyPat_Est,InpatBeds_Occ_AnyPat_LoCI,InpatBeds_Occ_AnyPat_UpCI,InpatBeds_Occ_AnyPat_Est_Avail,InBedsOccAnyPat__Numbeds_Est,InBedsOccAnyPat__Numbeds_LoCI,InBedsOccAnyPat__Numbeds_UpCI,InpatBeds_Occ_COVID_Est,...,InBedsOccCOVID__Numbeds_Est,InBedsOccCOVID__Numbeds_LoCI,InBedsOccCOVID__Numbeds_UpCI,ICUBeds_Occ_AnyPat_Est,ICUBeds_Occ_AnyPat_LoCI,ICUBeds_Occ_AnyPat_UpCI,ICUBeds_Occ_AnyPat_Est_Avail,ICUBedsOccAnyPat__N_ICUBeds_Est,ICUBedsOccAnyPat__N_ICUBeds_LoCI,ICUBedsOccAnyPat__N_ICUBeds_UpCI
0,United States,2020-04-01,416064,380186,451942,350555,54.3,52.5,56.0,75104,...,9.8,8.6,11.0,66369,56770,75968,45110,59.5,55.8,63.2
1,United States,2020-04-02,422892,391381,454403,357231,54.2,52.7,55.7,76546,...,9.8,8.7,10.9,69385,60557,78214,45784,60.2,57.1,63.4
2,United States,2020-04-03,408938,382065,435810,364108,52.9,51.2,54.6,77122,...,10.0,8.9,11.1,70580,61067,80092,45788,60.7,57.3,64.0
3,United States,2020-04-04,398850,374147,423554,375854,51.5,50.0,53.0,79742,...,10.3,9.2,11.4,70134,62054,78215,47622,59.6,56.6,62.5
4,United States,2020-04-05,400937,376016,425858,381724,51.2,49.8,52.6,80287,...,10.3,9.1,11.4,69853,61615,78091,48620,59.0,55.6,62.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5027,Wyoming,2020-07-03,434,82,785,489,47.0,34.8,59.1,14,...,1.5,0.4,2.6,57,0,123,6,90.3,50.7,100.0
5028,Wyoming,2020-07-04,450,86,813,475,48.7,35.7,61.6,12,...,1.3,0.0,3.2,54,0,112,9,85.7,47.9,100.0
5029,Wyoming,2020-07-05,438,69,808,494,47.0,33.5,60.5,14,...,1.5,0.0,3.1,41,0,86,21,65.7,26.4,100.0
5030,Wyoming,2020-07-06,408,77,740,514,44.3,31.1,57.4,18,...,2.0,0.2,3.8,42,0,91,21,67.7,29.1,100.0


In [98]:
### Remove Puerto Rico 
PRindex = list(hospital["State"][hospital["State"] == "Puerto Rico"].index)
hospital = hospital.drop(index = PRindex)

### Rename DC
DCindex = list(hospital["State"][hospital["State"] == "District of Columbia"].index)
hospital["State"][DCindex] = "DC"

hospital["State"].unique()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


array(['United States', 'Alaska', 'Alabama', 'Arkansas', 'Arizona',
       'California', 'Colorado', 'Connecticut', 'DC', 'Delaware',
       'Florida', 'Georgia', 'Hawaii', 'Iowa', 'Idaho', 'Illinois',
       'Indiana', 'Kansas', 'Kentucky', 'Louisiana', 'Massachusetts',
       'Maryland', 'Maine', 'Michigan', 'Minnesota', 'Missouri',
       'Mississippi', 'Montana', 'North Carolina', 'North Dakota',
       'Nebraska', 'New Hampshire', 'New Jersey', 'New Mexico', 'Nevada',
       'New York', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania',
       'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee',
       'Texas', 'Utah', 'Virginia', 'Vermont', 'Washington', 'Wisconsin',
       'West Virginia', 'Wyoming'], dtype=object)

# 
# 

# New York Times Covid-19 Data

The New York Times (NYT) is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time.

Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

NYT counts include both laboratory confirmed and probable cases using criteria that were developed by states and the federal government. Not all geographies are reporting probable cases and yet others are providing confirmed and probable as a single total.

The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020.

## Notable Geographic Exceptions
- New York
    - All cases for the five boroughs of New York City (New York, Kings, Queens, Bronx and Richmond counties) are assigned to a single area called New York City. There is a large jump in the number of deaths on April 6th due to switching from data from New York City to data from New York state for deaths.
    
- Kansas City, Missouri
    - Four counties (Cass, Clay, Jackson and Platte) overlap the municipality of Kansas City, Missouri. The cases and deaths that NYT shows for these four counties are only for the portions exclusive of Kansas City. Cases and deaths for Kansas City are reported as their own line.
    
- Joplin, Missouri
    - Starting June 25, cases and deaths for Joplin are reported separately from Jasper and Newton counties. The cases and deaths reported for those counties are only for the portions exclusive of Joplin. Joplin cases and deaths previously appeared in the counts for those counties or as Unknown.
    
- Alameda County, California.
    - Counts for Alameda County include cases and deaths from Berkeley and the Grand Princess cruise ship.

- Douglas County, Nebraska.
    - Counts for Douglas County include cases brought to the state from the Diamond Princess cruise ship.
    
See more about these geographic exceptions and more about the data at https://github.com/nytimes/covid-19-data.



### Data Processing

#### National Level

In [99]:
### Go grab data
!curl https://raw.githubusercontent.com/nytimes/covid-19-data/master/us.csv --output data/NYTusa.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4191  100  4191    0     0  26193      0 --:--:-- --:--:-- --:--:-- 26193


In [100]:
### Load in data
NYTusa = pd.read_csv('data/NYTusa.csv')
NYTusa

Unnamed: 0,date,cases,deaths
0,2020-01-21,1,0
1,2020-01-22,1,0
2,2020-01-23,1,0
3,2020-01-24,2,0
4,2020-01-25,3,0
...,...,...,...
182,2020-07-21,3910398,142031
183,2020-07-22,3980128,143167
184,2020-07-23,4050126,144283
185,2020-07-24,4123651,145430


In [101]:
### Set data as datetime and rename columns
NYTusa = NYTusa.astype({"date" : "datetime64"})

NYTusa = NYTusa.rename(columns = {'date' : "Date",
                                  'cases' : 'Total Cases',
                                  'deaths' : 'Total Deaths'})

NYTusa

Unnamed: 0,Date,Total Cases,Total Deaths
0,2020-01-21,1,0
1,2020-01-22,1,0
2,2020-01-23,1,0
3,2020-01-24,2,0
4,2020-01-25,3,0
...,...,...,...
182,2020-07-21,3910398,142031
183,2020-07-22,3980128,143167
184,2020-07-23,4050126,144283
185,2020-07-24,4123651,145430


In [102]:
### New Cases
NYTusa["New Cases"] = abs(np.diff(NYTusa["Total Cases"], prepend = NYTusa["Total Cases"].iloc[0]))

### New Deaths
NYTusa["New Deaths"] = abs(np.diff(NYTusa["Total Deaths"], prepend = NYTusa["Total Deaths"].iloc[0]))

NYTusa

Unnamed: 0,Date,Total Cases,Total Deaths,New Cases,New Deaths
0,2020-01-21,1,0,0,0
1,2020-01-22,1,0,0,0
2,2020-01-23,1,0,0,0
3,2020-01-24,2,0,1,0
4,2020-01-25,3,0,1,0
...,...,...,...,...,...
182,2020-07-21,3910398,142031,65274,1127
183,2020-07-22,3980128,143167,69730,1136
184,2020-07-23,4050126,144283,69998,1116
185,2020-07-24,4123651,145430,73525,1147


In [103]:
### Add population.
NYTusa["Population"] = np.repeat(USAData["Population"][0], len(NYTusa))
NYTusa

Unnamed: 0,Date,Total Cases,Total Deaths,New Cases,New Deaths,Population
0,2020-01-21,1,0,0,0,328233889
1,2020-01-22,1,0,0,0,328233889
2,2020-01-23,1,0,0,0,328233889
3,2020-01-24,2,0,1,0,328233889
4,2020-01-25,3,0,1,0,328233889
...,...,...,...,...,...,...
182,2020-07-21,3910398,142031,65274,1127,328233889
183,2020-07-22,3980128,143167,69730,1136,328233889
184,2020-07-23,4050126,144283,69998,1116,328233889
185,2020-07-24,4123651,145430,73525,1147,328233889


In [104]:
### Percent of population that have cases.
NYTusa["%Cases"] = np.where(NYTusa["Population"] != 0,
                             round((NYTusa["Total Cases"] / NYTusa["Population"]) * 100, 3),
                             0)

### Percent of population that have died.
NYTusa["%Deaths"] = np.where(NYTusa["Population"] != 0,
                              round((NYTusa["Total Deaths"] / NYTusa["Population"]) * 100, 3),
                              0)

### Logarithmic Scales
NYTusa["log(Total Cases)"] = round(np.log(NYTusa["Total Cases"]), 3)
NYTusa["log(Total Deaths)"] = round(np.log(NYTusa["Total Deaths"]), 3)
NYTusa["log(New Cases)"] = round(np.log(NYTusa["New Cases"]), 3)
NYTusa["log(New Deaths)"] = round(np.log(NYTusa["New Deaths"]), 3)

NYTusa['Country'] = np.repeat('United States', len(NYTusa))
NYTusa = NYTusa[list(USAData.columns)]

NYTusa

  result = getattr(ufunc, method)(*inputs, **kwargs)


Unnamed: 0,Country,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths,log(Total Cases),log(Total Deaths),log(New Cases),log(New Deaths)
0,United States,2020-01-21,1,0,328233889,0,0,0.000,0.000,0.000,-inf,-inf,-inf
1,United States,2020-01-22,1,0,328233889,0,0,0.000,0.000,0.000,-inf,-inf,-inf
2,United States,2020-01-23,1,0,328233889,0,0,0.000,0.000,0.000,-inf,-inf,-inf
3,United States,2020-01-24,2,0,328233889,1,0,0.000,0.000,0.693,-inf,0.000,-inf
4,United States,2020-01-25,3,0,328233889,1,0,0.000,0.000,1.099,-inf,0.000,-inf
...,...,...,...,...,...,...,...,...,...,...,...,...,...
182,United States,2020-07-21,3910398,142031,328233889,65274,1127,1.191,0.043,15.179,11.864,11.086,7.027
183,United States,2020-07-22,3980128,143167,328233889,69730,1136,1.213,0.044,15.197,11.872,11.152,7.035
184,United States,2020-07-23,4050126,144283,328233889,69998,1116,1.234,0.044,15.214,11.880,11.156,7.018
185,United States,2020-07-24,4123651,145430,328233889,73525,1147,1.256,0.044,15.232,11.887,11.205,7.045


#### State Level

In [105]:
### Go grab data
!curl https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv --output data/NYTstate.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  253k  100  253k    0     0  1626k      0 --:--:-- --:--:-- --:--:-- 1626k


In [106]:
### Load in data
NYTstate = pd.read_csv('data/NYTstate.csv')
NYTstate

Unnamed: 0,date,state,fips,cases,deaths
0,2020-01-21,Washington,53,1,0
1,2020-01-22,Washington,53,1,0
2,2020-01-23,Washington,53,1,0
3,2020-01-24,Illinois,17,1,0
4,2020-01-24,Washington,53,1,0
...,...,...,...,...,...
7984,2020-07-25,Virginia,51,83609,2075
7985,2020-07-25,Washington,53,53884,1592
7986,2020-07-25,West Virginia,54,5821,103
7987,2020-07-25,Wisconsin,55,51735,900


In [107]:
### Set data to datetime and set fips to string
NYTstate = NYTstate.astype({"date" : "datetime64",
                            "fips" : 'str'})

### Rename columns
NYTstate = NYTstate.rename(columns = {'date' : "Date",
                                      'state' : 'State',
                                      'fips' : 'stateFIPS',
                                      'cases' : 'Total Cases',
                                      'deaths' : 'Total Deaths'})

### Sort data
NYTstate = NYTstate.sort_values(['State', 'Date'], ascending = [True, True])

NYTstate

Unnamed: 0,Date,State,stateFIPS,Total Cases,Total Deaths
586,2020-03-13,Alabama,1,6,0
637,2020-03-14,Alabama,1,12,0
689,2020-03-15,Alabama,1,23,0
742,2020-03-16,Alabama,1,29,0
795,2020-03-17,Alabama,1,39,0
...,...,...,...,...,...
7768,2020-07-21,Wyoming,56,2238,25
7823,2020-07-22,Wyoming,56,2288,25
7878,2020-07-23,Wyoming,56,2347,25
7933,2020-07-24,Wyoming,56,2405,25


In [108]:
### Rename District of Columbia to DC
DCindex = list(NYTstate["State"][NYTstate["State"] == "District of Columbia"].index)
for index in DCindex:
    NYTstate["State"][index] = "DC"

### Remove Guam, Puerto Rico, Virgin Islands, Northern Mariana Islands
Guamindex = list(NYTstate["State"][NYTstate["State"] == "Guam"].index)
NYTstate = NYTstate.drop(index = Guamindex)

PRindex = list(NYTstate["State"][NYTstate["State"] == "Puerto Rico"].index)
NYTstate = NYTstate.drop(index = PRindex)

VIindex = list(NYTstate["State"][NYTstate["State"] == "Virgin Islands"].index)
NYTstate = NYTstate.drop(index = VIindex)

NMIindex = list(NYTstate["State"][NYTstate["State"] == "Northern Mariana Islands"].index)
NYTstate = NYTstate.drop(index = NMIindex)

NYTstate["State"].unique()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


array(['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California',
       'Colorado', 'Connecticut', 'Delaware', 'DC', 'Florida', 'Georgia',
       'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas',
       'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts',
       'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana',
       'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico',
       'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma',
       'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina',
       'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont',
       'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'],
      dtype=object)

In [109]:
NYTstate = parallel2(NYTstate, newCases2)
NYTstate = parallel2(NYTstate, newDeaths2)
NYTstate

Unnamed: 0,Date,State,stateFIPS,Total Cases,Total Deaths,New Cases,New Deaths
586,2020-03-13,Alabama,1,6,0,0,0
637,2020-03-14,Alabama,1,12,0,6,0
689,2020-03-15,Alabama,1,23,0,11,0
742,2020-03-16,Alabama,1,29,0,6,0
795,2020-03-17,Alabama,1,39,0,10,0
...,...,...,...,...,...,...,...
7768,2020-07-21,Wyoming,56,2238,25,51,1
7823,2020-07-22,Wyoming,56,2288,25,50,0
7878,2020-07-23,Wyoming,56,2347,25,59,0
7933,2020-07-24,Wyoming,56,2405,25,58,0


In [110]:
### Grab state population and state abbreviations
NYTstate = NYTstate.merge(StateData[['stateFIPS', 'Population', 'StateABV', 'Date']], on = ['stateFIPS', 'Date'], how = 'left')
NYTstate

Unnamed: 0,Date,State,stateFIPS,Total Cases,Total Deaths,New Cases,New Deaths,Population,StateABV
0,2020-03-13,Alabama,1,6,0,0,0,4903185.0,AL
1,2020-03-14,Alabama,1,12,0,6,0,4903185.0,AL
2,2020-03-15,Alabama,1,23,0,11,0,4903185.0,AL
3,2020-03-16,Alabama,1,29,0,6,0,4903185.0,AL
4,2020-03-17,Alabama,1,39,0,10,0,4903185.0,AL
...,...,...,...,...,...,...,...,...,...
7462,2020-07-21,Wyoming,56,2238,25,51,1,578759.0,WY
7463,2020-07-22,Wyoming,56,2288,25,50,0,578759.0,WY
7464,2020-07-23,Wyoming,56,2347,25,59,0,578759.0,WY
7465,2020-07-24,Wyoming,56,2405,25,58,0,578759.0,WY


In [111]:
### Percent of population that have cases.
NYTstate["%Cases"] = np.where(NYTstate["Population"] != 0,
                             round((NYTstate["Total Cases"] / NYTstate["Population"]) * 100, 3),
                             0)

### Percent of population that have died.
NYTstate["%Deaths"] = np.where(NYTstate["Population"] != 0,
                              round((NYTstate["Total Deaths"] / NYTstate["Population"]) * 100, 3),
                              0)

### Logarithmic Scales
NYTstate["log(Total Cases)"] = round(np.log(NYTstate["Total Cases"]), 3)
NYTstate["log(Total Deaths)"] = round(np.log(NYTstate["Total Deaths"]), 3)
NYTstate["log(New Cases)"] = round(np.log(NYTstate["New Cases"]), 3)
NYTstate["log(New Deaths)"] = round(np.log(NYTstate["New Deaths"]), 3)

NYTstate = NYTstate[list(StateData.columns)]

NYTstate

  result = getattr(ufunc, method)(*inputs, **kwargs)
  result = getattr(ufunc, method)(*inputs, **kwargs)


Unnamed: 0,Date,State,StateABV,stateFIPS,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths,log(Total Cases),log(Total Deaths),log(New Cases),log(New Deaths)
0,2020-03-13,Alabama,AL,1,6,0,4903185.0,0,0,0.000,0.000,1.792,-inf,-inf,-inf
1,2020-03-14,Alabama,AL,1,12,0,4903185.0,6,0,0.000,0.000,2.485,-inf,1.792,-inf
2,2020-03-15,Alabama,AL,1,23,0,4903185.0,11,0,0.000,0.000,3.135,-inf,2.398,-inf
3,2020-03-16,Alabama,AL,1,29,0,4903185.0,6,0,0.001,0.000,3.367,-inf,1.792,-inf
4,2020-03-17,Alabama,AL,1,39,0,4903185.0,10,0,0.001,0.000,3.664,-inf,2.303,-inf
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7462,2020-07-21,Wyoming,WY,56,2238,25,578759.0,51,1,0.387,0.004,7.713,3.219,3.932,0.0
7463,2020-07-22,Wyoming,WY,56,2288,25,578759.0,50,0,0.395,0.004,7.735,3.219,3.912,-inf
7464,2020-07-23,Wyoming,WY,56,2347,25,578759.0,59,0,0.406,0.004,7.761,3.219,4.078,-inf
7465,2020-07-24,Wyoming,WY,56,2405,25,578759.0,58,0,0.416,0.004,7.785,3.219,4.060,-inf


#### County Level

In [112]:
### Go grab data
!curl https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv --output data/NYTcounty.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 14.0M  100 14.0M    0     0  20.6M      0 --:--:-- --:--:-- --:--:-- 20.6M


In [113]:
### Load in data
NYTcounty = pd.read_csv('data/NYTcounty.csv', dtype = str)
NYTcounty

Unnamed: 0,date,county,state,fips,cases,deaths
0,2020-01-21,Snohomish,Washington,53061,1,0
1,2020-01-22,Snohomish,Washington,53061,1,0
2,2020-01-23,Snohomish,Washington,53061,1,0
3,2020-01-24,Cook,Illinois,17031,1,0
4,2020-01-24,Snohomish,Washington,53061,1,0
...,...,...,...,...,...,...
369904,2020-07-25,Sweetwater,Wyoming,56037,214,2
369905,2020-07-25,Teton,Wyoming,56039,275,1
369906,2020-07-25,Uinta,Wyoming,56041,233,0
369907,2020-07-25,Washakie,Wyoming,56043,44,5


In [114]:
### Rename columns, change data types, and sort data
NYTcounty = NYTcounty.astype({"date" : "datetime64",
                        "cases" : "int64",
                        "deaths" : 'int64'})

NYTcounty = NYTcounty.rename(columns = {'date' : 'Date',
                                        'county' : 'County Name',
                                        'state' : 'State',
                                        'fips' : 'countyFIPS',
                                        'cases' : "Total Cases",
                                        'deaths' : 'Total Deaths'})

NYTcounty = NYTcounty.sort_values(by = ["State", 'County Name', 'Date'], ascending = [True, True, True])


NYTcounty

Unnamed: 0,Date,County Name,State,countyFIPS,Total Cases,Total Deaths
9480,2020-03-24,Autauga,Alabama,01001,1,0
10835,2020-03-25,Autauga,Alabama,01001,4,0
12367,2020-03-26,Autauga,Alabama,01001,6,0
14025,2020-03-27,Autauga,Alabama,01001,6,0
15803,2020-03-28,Autauga,Alabama,01001,6,0
...,...,...,...,...,...,...
357077,2020-07-21,Weston,Wyoming,56045,4,0
360283,2020-07-22,Weston,Wyoming,56045,4,0
363491,2020-07-23,Weston,Wyoming,56045,4,0
366698,2020-07-24,Weston,Wyoming,56045,4,0


In [115]:
### Rename District of Columbia to DC
DCindex = list(NYTcounty["State"][NYTcounty["State"] == "District of Columbia"].index)
for index in DCindex:
    NYTcounty["State"][index] = "DC"
    
### Remove Aleutians West Census Area
NYTcounty = NYTcounty.drop(list(NYTcounty[NYTcounty["County Name"] == "Aleutians West Census Area"].index))

### Remove Guam, Puerto Rico, Virgin Islands, and Northern Mariana Islands
Guamindex = list(NYTcounty["State"][NYTcounty["State"] == "Guam"].index)
NYTcounty = NYTcounty.drop(index = Guamindex)

PRindex = list(NYTcounty["State"][NYTcounty["State"] == "Puerto Rico"].index)
NYTcounty = NYTcounty.drop(index = PRindex)

VIindex = list(NYTcounty["State"][NYTcounty["State"] == "Virgin Islands"].index)
NYTcounty = NYTcounty.drop(index = VIindex)

NMIindex = list(NYTcounty["State"][NYTcounty["State"] == "Northern Mariana Islands"].index)
NYTcounty = NYTcounty.drop(index = NMIindex)

NYTcounty["State"].unique()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


array(['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California',
       'Colorado', 'Connecticut', 'Delaware', 'DC', 'Florida', 'Georgia',
       'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas',
       'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts',
       'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana',
       'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico',
       'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma',
       'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina',
       'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont',
       'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'],
      dtype=object)

### Fixing County Labels

#### New York City

One of the "counties" listed for New York is "New York City." This is not an actual county. This is the summation of the 5 counties that make up the NYC metropolitan area.

To handle this, "New York City" will be split into 5 parts, and their case and death numbers will be divided by 5. This will evenly split the information across New York, Kings, Queens, Bronx and Richmond counties.

"New York City" will be kept for other parts of the dashboard.

__Note__: This division doesn't match reality, but for the purposes of this dashboard, it is good enough.

In [116]:
### Make copies for the 5 counties of NYC
NewYorkcounty = NYTcounty[NYTcounty["County Name"] == 'New York City'].copy()
Kingscounty = NYTcounty[NYTcounty["County Name"] == 'New York City'].copy()
Queenscounty = NYTcounty[NYTcounty["County Name"] == 'New York City'].copy()
Bronxcounty = NYTcounty[NYTcounty["County Name"] == 'New York City'].copy()
Richmondcounty = NYTcounty[NYTcounty["County Name"] == 'New York City'].copy()

##### New York County

In [117]:
### Change county name, countyFIPS, and divide Total Cases/Total Deaths by 5
NewYorkcounty['County Name'] = "New York"
NewYorkcounty['countyFIPS'] = '36061'
NewYorkcounty['Total Cases'] = NewYorkcounty['Total Cases'] / 5
NewYorkcounty['Total Deaths'] = NewYorkcounty['Total Deaths'] / 5
NewYorkcounty

Unnamed: 0,Date,County Name,State,countyFIPS,Total Cases,Total Deaths
416,2020-03-01,New York,New York,36061,0.2,0.0
448,2020-03-02,New York,New York,36061,0.2,0.0
482,2020-03-03,New York,New York,36061,0.4,0.0
518,2020-03-04,New York,New York,36061,0.4,0.0
565,2020-03-05,New York,New York,36061,0.8,0.0
...,...,...,...,...,...,...
355717,2020-07-21,New York,New York,36061,45355.8,4579.0
358922,2020-07-22,New York,New York,36061,45426.0,4579.8
362129,2020-07-23,New York,New York,36061,45503.4,4586.8
365336,2020-07-24,New York,New York,36061,45576.4,4587.2


##### Kings County

In [118]:
Kingscounty['County Name'] = "Kings"
Kingscounty['countyFIPS'] = '36047'
Kingscounty['Total Cases'] = Kingscounty['Total Cases'] / 5
Kingscounty['Total Deaths'] = Kingscounty['Total Deaths'] / 5
Kingscounty

Unnamed: 0,Date,County Name,State,countyFIPS,Total Cases,Total Deaths
416,2020-03-01,Kings,New York,36047,0.2,0.0
448,2020-03-02,Kings,New York,36047,0.2,0.0
482,2020-03-03,Kings,New York,36047,0.4,0.0
518,2020-03-04,Kings,New York,36047,0.4,0.0
565,2020-03-05,Kings,New York,36047,0.8,0.0
...,...,...,...,...,...,...
355717,2020-07-21,Kings,New York,36047,45355.8,4579.0
358922,2020-07-22,Kings,New York,36047,45426.0,4579.8
362129,2020-07-23,Kings,New York,36047,45503.4,4586.8
365336,2020-07-24,Kings,New York,36047,45576.4,4587.2


##### Queens County

In [119]:
Queenscounty['County Name'] = "Queens"
Queenscounty['countyFIPS'] = '36081'
Queenscounty['Total Cases'] = Queenscounty['Total Cases'] / 5
Queenscounty['Total Deaths'] = Queenscounty['Total Deaths'] / 5
Queenscounty

Unnamed: 0,Date,County Name,State,countyFIPS,Total Cases,Total Deaths
416,2020-03-01,Queens,New York,36081,0.2,0.0
448,2020-03-02,Queens,New York,36081,0.2,0.0
482,2020-03-03,Queens,New York,36081,0.4,0.0
518,2020-03-04,Queens,New York,36081,0.4,0.0
565,2020-03-05,Queens,New York,36081,0.8,0.0
...,...,...,...,...,...,...
355717,2020-07-21,Queens,New York,36081,45355.8,4579.0
358922,2020-07-22,Queens,New York,36081,45426.0,4579.8
362129,2020-07-23,Queens,New York,36081,45503.4,4586.8
365336,2020-07-24,Queens,New York,36081,45576.4,4587.2


##### Bronx County

In [120]:
Bronxcounty['County Name'] = "Bronx"
Bronxcounty['countyFIPS'] = '36005'
Bronxcounty['Total Cases'] = Bronxcounty['Total Cases'] / 5
Bronxcounty['Total Deaths'] = Bronxcounty['Total Deaths'] / 5
Bronxcounty

Unnamed: 0,Date,County Name,State,countyFIPS,Total Cases,Total Deaths
416,2020-03-01,Bronx,New York,36005,0.2,0.0
448,2020-03-02,Bronx,New York,36005,0.2,0.0
482,2020-03-03,Bronx,New York,36005,0.4,0.0
518,2020-03-04,Bronx,New York,36005,0.4,0.0
565,2020-03-05,Bronx,New York,36005,0.8,0.0
...,...,...,...,...,...,...
355717,2020-07-21,Bronx,New York,36005,45355.8,4579.0
358922,2020-07-22,Bronx,New York,36005,45426.0,4579.8
362129,2020-07-23,Bronx,New York,36005,45503.4,4586.8
365336,2020-07-24,Bronx,New York,36005,45576.4,4587.2


##### Richmond County

In [121]:
Richmondcounty['County Name'] = "Richmond"
Richmondcounty['countyFIPS'] = '36085'
Richmondcounty['Total Cases'] = Richmondcounty['Total Cases'] / 5
Richmondcounty['Total Deaths'] = Richmondcounty['Total Deaths'] / 5
Richmondcounty

Unnamed: 0,Date,County Name,State,countyFIPS,Total Cases,Total Deaths
416,2020-03-01,Richmond,New York,36085,0.2,0.0
448,2020-03-02,Richmond,New York,36085,0.2,0.0
482,2020-03-03,Richmond,New York,36085,0.4,0.0
518,2020-03-04,Richmond,New York,36085,0.4,0.0
565,2020-03-05,Richmond,New York,36085,0.8,0.0
...,...,...,...,...,...,...
355717,2020-07-21,Richmond,New York,36085,45355.8,4579.0
358922,2020-07-22,Richmond,New York,36085,45426.0,4579.8
362129,2020-07-23,Richmond,New York,36085,45503.4,4586.8
365336,2020-07-24,Richmond,New York,36085,45576.4,4587.2


In [122]:
### Now add those counties to the data frame.
NYTcounty = pd.concat([NYTcounty, NewYorkcounty, Kingscounty, Queenscounty, Bronxcounty, Richmondcounty])
NYTcounty = NYTcounty.sort_values(by = ["State", 'County Name', 'Date'], ascending = [True, True, True])
NYTcounty['County Name'][NYTcounty['State'] == 'New York'].unique()

array(['Albany', 'Allegany', 'Bronx', 'Broome', 'Cattaraugus', 'Cayuga',
       'Chautauqua', 'Chemung', 'Chenango', 'Clinton', 'Columbia',
       'Cortland', 'Delaware', 'Dutchess', 'Erie', 'Essex', 'Franklin',
       'Fulton', 'Genesee', 'Greene', 'Hamilton', 'Herkimer', 'Jefferson',
       'Kings', 'Lewis', 'Livingston', 'Madison', 'Monroe', 'Montgomery',
       'Nassau', 'New York', 'New York City', 'Niagara', 'Oneida',
       'Onondaga', 'Ontario', 'Orange', 'Orleans', 'Oswego', 'Otsego',
       'Putnam', 'Queens', 'Rensselaer', 'Richmond', 'Rockland',
       'Saratoga', 'Schenectady', 'Schoharie', 'Schuyler', 'Seneca',
       'St. Lawrence', 'Steuben', 'Suffolk', 'Sullivan', 'Tioga',
       'Tompkins', 'Ulster', 'Unknown', 'Warren', 'Washington', 'Wayne',
       'Westchester', 'Wyoming', 'Yates'], dtype=object)

In [123]:
### Calculate New Cases and New Deaths
NYTcounty = parallel1(NYTcounty, newCases1)
NYTcounty = parallel1(NYTcounty, newDeaths1)
NYTcounty

Unnamed: 0,Date,County Name,State,countyFIPS,Total Cases,Total Deaths,New Cases,New Deaths
9480,2020-03-24,Autauga,Alabama,01001,1.0,0.0,0.0,0.0
10835,2020-03-25,Autauga,Alabama,01001,4.0,0.0,3.0,0.0
12367,2020-03-26,Autauga,Alabama,01001,6.0,0.0,2.0,0.0
14025,2020-03-27,Autauga,Alabama,01001,6.0,0.0,0.0,0.0
15803,2020-03-28,Autauga,Alabama,01001,6.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...
357077,2020-07-21,Weston,Wyoming,56045,4.0,0.0,0.0,0.0
360283,2020-07-22,Weston,Wyoming,56045,4.0,0.0,0.0,0.0
363491,2020-07-23,Weston,Wyoming,56045,4.0,0.0,0.0,0.0
366698,2020-07-24,Weston,Wyoming,56045,4.0,0.0,0.0,0.0


In [124]:
### Grab populations and state abbreviations
NYTcounty = NYTcounty.merge(CountyData[['countyFIPS', 'Population', 'StateABV', 'Date', 'stateFIPS']], on = ['countyFIPS','Date'], how = 'left')

### For New York City, sum the populations of the 5 NYC counties
NewYorkpop = NYTcounty['Population'][(NYTcounty['State'] == 'New York') & (NYTcounty['County Name'] == 'New York')].iloc[0]
Kingspop = NYTcounty['Population'][(NYTcounty['State'] == 'New York') & (NYTcounty['County Name'] == 'Kings')].iloc[0]
Queenspop = NYTcounty['Population'][(NYTcounty['State'] == 'New York') & (NYTcounty['County Name'] == 'Queens')].iloc[0]
Bronxpop = NYTcounty['Population'][(NYTcounty['State'] == 'New York') & (NYTcounty['County Name'] == 'Bronx')].iloc[0]
Richmondpop = NYTcounty['Population'][(NYTcounty['State'] == 'New York') & (NYTcounty['County Name'] == 'Richmond')].iloc[0]

NYTcounty['Population'][NYTcounty['County Name'] == 'New York City'] = NewYorkpop + Kingspop + Queenspop + Bronxpop + Richmondpop


### For Kansas City, sum the populations of the 4 KC counties.
Casspop = NYTcounty['Population'][(NYTcounty['State'] == 'Missouri') & (NYTcounty['County Name'] == 'Cass')].iloc[0]
Claypop = NYTcounty['Population'][(NYTcounty['State'] == 'Missouri') & (NYTcounty['County Name'] == 'Clay')].iloc[0]
Jacksonpop = NYTcounty['Population'][(NYTcounty['State'] == 'Missouri') & (NYTcounty['County Name'] == 'Jackson')].iloc[0]
Plattepop = NYTcounty['Population'][(NYTcounty['State'] == 'Missouri') & (NYTcounty['County Name'] == 'Platte')].iloc[0]

NYTcounty['Population'][NYTcounty['County Name'] == 'Kansas City'] = Casspop + Claypop + Jacksonpop + Plattepop

NYTcounty

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,Date,County Name,State,countyFIPS,Total Cases,Total Deaths,New Cases,New Deaths,Population,StateABV,stateFIPS
0,2020-03-24,Autauga,Alabama,01001,1.0,0.0,0.0,0.0,55869.0,AL,1
1,2020-03-25,Autauga,Alabama,01001,4.0,0.0,3.0,0.0,55869.0,AL,1
2,2020-03-26,Autauga,Alabama,01001,6.0,0.0,2.0,0.0,55869.0,AL,1
3,2020-03-27,Autauga,Alabama,01001,6.0,0.0,0.0,0.0,55869.0,AL,1
4,2020-03-28,Autauga,Alabama,01001,6.0,0.0,0.0,0.0,55869.0,AL,1
...,...,...,...,...,...,...,...,...,...,...,...
363363,2020-07-21,Weston,Wyoming,56045,4.0,0.0,0.0,0.0,6927.0,WY,56
363364,2020-07-22,Weston,Wyoming,56045,4.0,0.0,0.0,0.0,6927.0,WY,56
363365,2020-07-23,Weston,Wyoming,56045,4.0,0.0,0.0,0.0,6927.0,WY,56
363366,2020-07-24,Weston,Wyoming,56045,4.0,0.0,0.0,0.0,6927.0,WY,56


In [125]:
NYTcounty["%Cases"] = np.where(NYTcounty["Population"] != 0,
                             round((NYTcounty["Total Cases"] / NYTcounty["Population"]) * 100, 3),
                             0)

### Percent of population that have died.
NYTcounty["%Deaths"] = np.where(NYTcounty["Population"] != 0,
                              round((NYTcounty["Total Deaths"] / NYTcounty["Population"]) * 100, 3),
                              0)

### Logarithmic Scales
NYTcounty["log(Total Cases)"] = round(np.log(NYTcounty["Total Cases"]), 3)
NYTcounty["log(Total Deaths)"] = round(np.log(NYTcounty["Total Deaths"]), 3)
NYTcounty["log(New Cases)"] = round(np.log(NYTcounty["New Cases"]), 3)
NYTcounty["log(New Deaths)"] = round(np.log(NYTcounty["New Deaths"]), 3)

NYTcounty = NYTcounty[list(CountyData.columns)]

NYTcounty

  result = getattr(ufunc, method)(*inputs, **kwargs)
  result = getattr(ufunc, method)(*inputs, **kwargs)


Unnamed: 0,County Name,State,countyFIPS,StateABV,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths,log(Total Cases),log(Total Deaths),log(New Cases),log(New Deaths)
0,Autauga,Alabama,01001,AL,1,2020-03-24,1.0,0.0,55869.0,0.0,0.0,0.002,0.0,0.000,-inf,-inf,-inf
1,Autauga,Alabama,01001,AL,1,2020-03-25,4.0,0.0,55869.0,3.0,0.0,0.007,0.0,1.386,-inf,1.099,-inf
2,Autauga,Alabama,01001,AL,1,2020-03-26,6.0,0.0,55869.0,2.0,0.0,0.011,0.0,1.792,-inf,0.693,-inf
3,Autauga,Alabama,01001,AL,1,2020-03-27,6.0,0.0,55869.0,0.0,0.0,0.011,0.0,1.792,-inf,-inf,-inf
4,Autauga,Alabama,01001,AL,1,2020-03-28,6.0,0.0,55869.0,0.0,0.0,0.011,0.0,1.792,-inf,-inf,-inf
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
363363,Weston,Wyoming,56045,WY,56,2020-07-21,4.0,0.0,6927.0,0.0,0.0,0.058,0.0,1.386,-inf,-inf,-inf
363364,Weston,Wyoming,56045,WY,56,2020-07-22,4.0,0.0,6927.0,0.0,0.0,0.058,0.0,1.386,-inf,-inf,-inf
363365,Weston,Wyoming,56045,WY,56,2020-07-23,4.0,0.0,6927.0,0.0,0.0,0.058,0.0,1.386,-inf,-inf,-inf
363366,Weston,Wyoming,56045,WY,56,2020-07-24,4.0,0.0,6927.0,0.0,0.0,0.058,0.0,1.386,-inf,-inf,-inf


### Final Data

Save datasets to CSVs.

In [126]:
CountyData.to_csv("data/countyData.csv", index = False)
StateData.to_csv("data/stateData.csv", index = False)
USAData.to_csv("data/usaData.csv", index = False)
DeathsSexAge.to_csv("data/demoDeaths.csv", index = False)
raceNew.to_csv("data/raceDeaths.csv", index = False)
hospital.to_csv("data/hospitalData.csv", index = False)
GoogleUsaMobility.to_csv('data/GoogleUsaMobility.csv', index = False)
GoogleStateMobility.to_csv('data/GoogleStateMobility.csv', index = False)
GoogleCountyMobility.to_csv('data/GoogleCountyMobility.csv', index = False)
NYTusa.to_csv('data/NYTusa.csv', index = False)
NYTstate.to_csv('data/NYTstate.csv', index = False)
NYTcounty.to_csv('data/NYTcounty.csv', index = False)