#  COVID-19 – Data-Based Prediction Tool 

## Ian Scarff (iie728)

## Practicum II Project 2020

### Import Packages

In [1]:
import numpy as np
import pandas as pd
%matplotlib inline

# Data Preprocessing

### About the Data

Data comes from the website usafacts.org under the webpage 
"Coronavirus Locations: COVID-19 Map by County and State."

The 21 cases confirmed on the Grand Princess cruise ship on March 5 and 6 are attributed to the state of California, but not to any counties. The national numbers also include the 45 people with coronavirus repatriated from the Diamond Princess.

USAFacts attempts to match each case with a county, but some cases counted at the state level are not allocated to counties due to lack of information.

Data is updated each day.


NOTES FROM USAFacts:

Note from April 28: On April 14, New York City began a separate count of "probable deaths" of people believed to have died as a result of COVID-19, though weren't tested. On April 28, these deaths were retroactively added to our death counts, assigned to a New York City borough if possible. In the future, USAFacts will include "probable deaths" in the overall tally if a local government chooses to report that information separately.

Note from April 18: Certain states have changed their methodology in reporting deaths due to COVID-19. As a result, we are holding off on reporting death data in a few key states (New York is notable among these states due to the high number of confirmed cases and deaths). USAFacts is committed to providing official numbers confirmed by state or local health agencies, and we will appropriately backfill the death data when we receive more guidance from the CDC and relevant health departments.

Note from April 15: In certain states, probable deaths are listed alongside confirmed deaths. Following the lead of the CDC, we will begin publishing death counts that combine these two totals where applicable; this might result in larger than expected increases in deaths in certain counties.

Note from March 28: The data now includes all counties regardless of confirmed case count. Additionally, New York City data has been allotted to its five boroughs/counties, where possible.



##### There is no missing data.

#### Import Data

To unsure that we always have a copy of the data saved in the environment, every time the data is imported it will be saved.

In [2]:
### Number of confirmed cases by county
!curl https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_confirmed_usafacts.csv --output data/cases.csv

### Number of confirmed deaths by county
!curl https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_deaths_usafacts.csv --output data/deaths.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1317k  100 1317k    0     0  1492k      0 --:--:-- --:--:-- --:--:-- 1490k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1107k  100 1107k    0     0  1593k      0 --:--:-- --:--:-- --:--:-- 1590k


The labeling for counties in the population dataset were unreliable.

Created seperate population dataset with naming convention that matches other data frames.

Now load those datasets.

In [3]:
### Total Cases
cases = pd.read_csv("data/cases.csv")
cases

Unnamed: 0,countyFIPS,County Name,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,6/15/20,6/16/20,6/17/20,6/18/20,6/19/20,6/20/20,6/21/20,6/22/20,6/23/20,6/24/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,368,373,375,400,411,431,434,442,453,469
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,383,389,392,401,413,420,430,437,450,464
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,238,245,251,263,266,272,272,277,280,288
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,111,116,118,121,126,126,127,129,135,141
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,40,40,43,44,48,49,53,56,58,65
3191,56039,Teton County,WY,56,0,0,0,0,0,0,...,105,105,107,108,109,109,110,111,113,113
3192,56041,Uinta County,WY,56,0,0,0,0,0,0,...,88,91,104,116,128,130,138,148,152,157
3193,56043,Washakie County,WY,56,0,0,0,0,0,0,...,39,39,39,39,39,39,39,39,39,39


In [4]:
### Total Deaths
deaths = pd.read_csv("data/deaths.csv")
deaths

Unnamed: 0,countyFIPS,County Name,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,6/15/20,6/16/20,6/17/20,6/18/20,6/19/20,6/20/20,6/21/20,6/22/20,6/23/20,6/24/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,6,7,7,8,8,9,9,9,9,11
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,9,9,9,9,9,9,9,9,9,9
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3191,56039,Teton County,WY,56,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
3192,56041,Uinta County,WY,56,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3193,56043,Washakie County,WY,56,0,0,0,0,0,0,...,3,3,3,3,5,5,5,5,5,5


In [5]:
### Total Population
population = pd.read_csv("data/population.csv")
population

Unnamed: 0,County Name,Population,countyFIPS
0,Statewide Unallocated,0,0
1,Autauga,55869,1001
2,Baldwin,223234,1003
3,Barbour,24686,1005
4,Bibb,22394,1007
...,...,...,...
3138,Sweetwater,42343,56037
3139,Teton,23464,56039
3140,Uinta,20226,56041
3141,Washakie,7805,56043


### Fixing Errors
In the cases and deaths dataframes, certain obervations need to be removed.

1: Wade Hampton Census Area, Alaska. This area no longer exists. Was renamed to Kusilvak Census Area.

2: New York City Unallocated/Probable. This is not a county. Observations for the NYC area are covered by the 5 counties of the metropolitan area.

3: Grand Princess Cruise Ship. This is a cruise ship, not a county, and these cases are attributed to California.

In [6]:
#### County Data

### Remove Wade Hampton Area
cases = cases.drop(list(cases[cases["County Name"] == "Wade Hampton Census Area"].index))

### New York City Unallocated/Probable
cases = cases.drop(list(cases[cases["County Name"] == "New York City Unallocated/Probable"].index))

### Remove Grand Princess Cruise Ship
cases = cases.drop(list(cases[cases["County Name"] == "Grand Princess Cruise Ship"].index))


#### Deaths Data
### Remove Wade Hampton Area
deaths = deaths.drop(list(deaths[deaths["County Name"] == "Wade Hampton Census Area"].index))

### New York City Unallocated/Probable
deaths = deaths.drop(list(deaths[deaths["County Name"] == "New York City Unallocated/Probable"].index))

### Remove Grand Princess Cruise Ship
deaths = deaths.drop(list(deaths[deaths["County Name"] == "Grand Princess Cruise Ship"].index))

In [7]:
cases

Unnamed: 0,countyFIPS,County Name,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,6/15/20,6/16/20,6/17/20,6/18/20,6/19/20,6/20/20,6/21/20,6/22/20,6/23/20,6/24/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,368,373,375,400,411,431,434,442,453,469
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,383,389,392,401,413,420,430,437,450,464
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,238,245,251,263,266,272,272,277,280,288
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,111,116,118,121,126,126,127,129,135,141
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,40,40,43,44,48,49,53,56,58,65
3191,56039,Teton County,WY,56,0,0,0,0,0,0,...,105,105,107,108,109,109,110,111,113,113
3192,56041,Uinta County,WY,56,0,0,0,0,0,0,...,88,91,104,116,128,130,138,148,152,157
3193,56043,Washakie County,WY,56,0,0,0,0,0,0,...,39,39,39,39,39,39,39,39,39,39


In [8]:
deaths

Unnamed: 0,countyFIPS,County Name,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,6/15/20,6/16/20,6/17/20,6/18/20,6/19/20,6/20/20,6/21/20,6/22/20,6/23/20,6/24/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,6,7,7,8,8,9,9,9,9,11
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,9,9,9,9,9,9,9,9,9,9
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3191,56039,Teton County,WY,56,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
3192,56041,Uinta County,WY,56,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3193,56043,Washakie County,WY,56,0,0,0,0,0,0,...,3,3,3,3,5,5,5,5,5,5


### Prep Data

#### Ensuring Labels

To ensure that county and state labels are the same across dataframes, replace them with labels in FIPS.csv

Bring in FIPS data

In [9]:
### County FIPS
countyFIPS = pd.read_csv("data/countyFIPS.csv")
countyFIPS

Unnamed: 0,County Name,countyFIPS
0,Statewide Unallocated,0
1,Autauga,1001
2,Baldwin,1003
3,Barbour,1005
4,Bibb,1007
...,...,...
3138,Sweetwater,56037
3139,Teton,56039
3140,Uinta,56041
3141,Washakie,56043


In [10]:
### State FIPS
stateFIPS = pd.read_csv("data/stateFIPS.csv")
stateFIPS

Unnamed: 0,State,stateFIPS
0,Alabama,1
1,Alaska,2
2,Arizona,4
3,Arkansas,5
4,California,6
5,Colorado,8
6,Connecticut,9
7,Delaware,10
8,DC,11
9,Florida,12


##### Fixing Cases Labels

In [11]:
### Drop cases county labels
cases = cases.drop(columns = "County Name")
cases

Unnamed: 0,countyFIPS,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,6/15/20,6/16/20,6/17/20,6/18/20,6/19/20,6/20/20,6/21/20,6/22/20,6/23/20,6/24/20
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,AL,1,0,0,0,0,0,0,0,...,368,373,375,400,411,431,434,442,453,469
2,1003,AL,1,0,0,0,0,0,0,0,...,383,389,392,401,413,420,430,437,450,464
3,1005,AL,1,0,0,0,0,0,0,0,...,238,245,251,263,266,272,272,277,280,288
4,1007,AL,1,0,0,0,0,0,0,0,...,111,116,118,121,126,126,127,129,135,141
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,WY,56,0,0,0,0,0,0,0,...,40,40,43,44,48,49,53,56,58,65
3191,56039,WY,56,0,0,0,0,0,0,0,...,105,105,107,108,109,109,110,111,113,113
3192,56041,WY,56,0,0,0,0,0,0,0,...,88,91,104,116,128,130,138,148,152,157
3193,56043,WY,56,0,0,0,0,0,0,0,...,39,39,39,39,39,39,39,39,39,39


In [12]:
### Add County Name from countyFIPS
cases = cases.merge(countyFIPS, how = "left")
cases

Unnamed: 0,countyFIPS,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,6/16/20,6/17/20,6/18/20,6/19/20,6/20/20,6/21/20,6/22/20,6/23/20,6/24/20,County Name
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Statewide Unallocated
1,1001,AL,1,0,0,0,0,0,0,0,...,373,375,400,411,431,434,442,453,469,Autauga
2,1003,AL,1,0,0,0,0,0,0,0,...,389,392,401,413,420,430,437,450,464,Baldwin
3,1005,AL,1,0,0,0,0,0,0,0,...,245,251,263,266,272,272,277,280,288,Barbour
4,1007,AL,1,0,0,0,0,0,0,0,...,116,118,121,126,126,127,129,135,141,Bibb
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3187,56037,WY,56,0,0,0,0,0,0,0,...,40,43,44,48,49,53,56,58,65,Sweetwater
3188,56039,WY,56,0,0,0,0,0,0,0,...,105,107,108,109,109,110,111,113,113,Teton
3189,56041,WY,56,0,0,0,0,0,0,0,...,91,104,116,128,130,138,148,152,157,Uinta
3190,56043,WY,56,0,0,0,0,0,0,0,...,39,39,39,39,39,39,39,39,39,Washakie


In [13]:
### Drop cases state labels
cases = cases.drop(columns = "State")
cases

Unnamed: 0,countyFIPS,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,...,6/16/20,6/17/20,6/18/20,6/19/20,6/20/20,6/21/20,6/22/20,6/23/20,6/24/20,County Name
0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Statewide Unallocated
1,1001,1,0,0,0,0,0,0,0,0,...,373,375,400,411,431,434,442,453,469,Autauga
2,1003,1,0,0,0,0,0,0,0,0,...,389,392,401,413,420,430,437,450,464,Baldwin
3,1005,1,0,0,0,0,0,0,0,0,...,245,251,263,266,272,272,277,280,288,Barbour
4,1007,1,0,0,0,0,0,0,0,0,...,116,118,121,126,126,127,129,135,141,Bibb
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3187,56037,56,0,0,0,0,0,0,0,0,...,40,43,44,48,49,53,56,58,65,Sweetwater
3188,56039,56,0,0,0,0,0,0,0,0,...,105,107,108,109,109,110,111,113,113,Teton
3189,56041,56,0,0,0,0,0,0,0,0,...,91,104,116,128,130,138,148,152,157,Uinta
3190,56043,56,0,0,0,0,0,0,0,0,...,39,39,39,39,39,39,39,39,39,Washakie


In [14]:
### Add State names from stateFIPS
cases = cases.merge(stateFIPS, how = "left")
cases

Unnamed: 0,countyFIPS,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,...,6/17/20,6/18/20,6/19/20,6/20/20,6/21/20,6/22/20,6/23/20,6/24/20,County Name,State
0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,Statewide Unallocated,Alabama
1,1001,1,0,0,0,0,0,0,0,0,...,375,400,411,431,434,442,453,469,Autauga,Alabama
2,1003,1,0,0,0,0,0,0,0,0,...,392,401,413,420,430,437,450,464,Baldwin,Alabama
3,1005,1,0,0,0,0,0,0,0,0,...,251,263,266,272,272,277,280,288,Barbour,Alabama
4,1007,1,0,0,0,0,0,0,0,0,...,118,121,126,126,127,129,135,141,Bibb,Alabama
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3187,56037,56,0,0,0,0,0,0,0,0,...,43,44,48,49,53,56,58,65,Sweetwater,Wyoming
3188,56039,56,0,0,0,0,0,0,0,0,...,107,108,109,109,110,111,113,113,Teton,Wyoming
3189,56041,56,0,0,0,0,0,0,0,0,...,104,116,128,130,138,148,152,157,Uinta,Wyoming
3190,56043,56,0,0,0,0,0,0,0,0,...,39,39,39,39,39,39,39,39,Washakie,Wyoming


##### Fixing Deaths Labels

In [15]:
### Drop deaths county labels
deaths = deaths.drop(columns = "County Name")
deaths

Unnamed: 0,countyFIPS,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,6/15/20,6/16/20,6/17/20,6/18/20,6/19/20,6/20/20,6/21/20,6/22/20,6/23/20,6/24/20
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,AL,1,0,0,0,0,0,0,0,...,6,7,7,8,8,9,9,9,9,11
2,1003,AL,1,0,0,0,0,0,0,0,...,9,9,9,9,9,9,9,9,9,9
3,1005,AL,1,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
4,1007,AL,1,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,WY,56,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3191,56039,WY,56,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,1
3192,56041,WY,56,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3193,56043,WY,56,0,0,0,0,0,0,0,...,3,3,3,3,5,5,5,5,5,5


In [16]:
### Add County Name from countyFIPS
deaths = deaths.merge(countyFIPS, how = "left")
deaths

Unnamed: 0,countyFIPS,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,6/16/20,6/17/20,6/18/20,6/19/20,6/20/20,6/21/20,6/22/20,6/23/20,6/24/20,County Name
0,0,AL,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Statewide Unallocated
1,1001,AL,1,0,0,0,0,0,0,0,...,7,7,8,8,9,9,9,9,11,Autauga
2,1003,AL,1,0,0,0,0,0,0,0,...,9,9,9,9,9,9,9,9,9,Baldwin
3,1005,AL,1,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,Barbour
4,1007,AL,1,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,Bibb
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3187,56037,WY,56,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Sweetwater
3188,56039,WY,56,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,Teton
3189,56041,WY,56,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Uinta
3190,56043,WY,56,0,0,0,0,0,0,0,...,3,3,3,5,5,5,5,5,5,Washakie


In [17]:
### Drop deaths state labels
deaths = deaths.drop(columns = "State")
deaths

Unnamed: 0,countyFIPS,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,...,6/16/20,6/17/20,6/18/20,6/19/20,6/20/20,6/21/20,6/22/20,6/23/20,6/24/20,County Name
0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Statewide Unallocated
1,1001,1,0,0,0,0,0,0,0,0,...,7,7,8,8,9,9,9,9,11,Autauga
2,1003,1,0,0,0,0,0,0,0,0,...,9,9,9,9,9,9,9,9,9,Baldwin
3,1005,1,0,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,Barbour
4,1007,1,0,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,Bibb
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3187,56037,56,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Sweetwater
3188,56039,56,0,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,Teton
3189,56041,56,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Uinta
3190,56043,56,0,0,0,0,0,0,0,0,...,3,3,3,5,5,5,5,5,5,Washakie


In [18]:
### Add State names from stateFIPS
deaths = deaths.merge(stateFIPS, how = "left")
deaths

Unnamed: 0,countyFIPS,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,...,6/17/20,6/18/20,6/19/20,6/20/20,6/21/20,6/22/20,6/23/20,6/24/20,County Name,State
0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,Statewide Unallocated,Alabama
1,1001,1,0,0,0,0,0,0,0,0,...,7,8,8,9,9,9,9,11,Autauga,Alabama
2,1003,1,0,0,0,0,0,0,0,0,...,9,9,9,9,9,9,9,9,Baldwin,Alabama
3,1005,1,0,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,Barbour,Alabama
4,1007,1,0,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,Bibb,Alabama
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3187,56037,56,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,Sweetwater,Wyoming
3188,56039,56,0,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,Teton,Wyoming
3189,56041,56,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,Uinta,Wyoming
3190,56043,56,0,0,0,0,0,0,0,0,...,3,3,5,5,5,5,5,5,Washakie,Wyoming


##### Fixing Population Labels

In [19]:
### Drop population county and state labels
population = population.drop(columns = "County Name")
population

Unnamed: 0,Population,countyFIPS
0,0,0
1,55869,1001
2,223234,1003
3,24686,1005
4,22394,1007
...,...,...
3138,42343,56037
3139,23464,56039
3140,20226,56041
3141,7805,56043


In [20]:
### Add County Name from countyFIPS
population = population.merge(countyFIPS, how = "left")
population

Unnamed: 0,Population,countyFIPS,County Name
0,0,0,Statewide Unallocated
1,55869,1001,Autauga
2,223234,1003,Baldwin
3,24686,1005,Barbour
4,22394,1007,Bibb
...,...,...,...
3138,42343,56037,Sweetwater
3139,23464,56039,Teton
3140,20226,56041,Uinta
3141,7805,56043,Washakie


Turns out that the “Statewide Unallocated” data means that those measurements are correct, they just haven’t been assigned a county due to lack of information. 

Leave these observations out of the county dataframe, but included them in creating state dataframe.

#### County Level Data

The cases and deaths data is in a less usable form.

Unpivot the data using pd.melt to make the data more usable.

In [21]:
### Unpivot cases data
cases = pd.melt(cases, id_vars = ['County Name', "State", "countyFIPS", "stateFIPS"],
                 value_vars = cases.columns[2:-2],
                 var_name = "Date", value_name = "Cases")

cases

Unnamed: 0,County Name,State,countyFIPS,stateFIPS,Date,Cases
0,Statewide Unallocated,Alabama,0,1,1/22/20,0
1,Autauga,Alabama,1001,1,1/22/20,0
2,Baldwin,Alabama,1003,1,1/22/20,0
3,Barbour,Alabama,1005,1,1/22/20,0
4,Bibb,Alabama,1007,1,1/22/20,0
...,...,...,...,...,...,...
494755,Sweetwater,Wyoming,56037,56,6/24/20,65
494756,Teton,Wyoming,56039,56,6/24/20,113
494757,Uinta,Wyoming,56041,56,6/24/20,157
494758,Washakie,Wyoming,56043,56,6/24/20,39


In [22]:
### Unpivot death data
deaths = pd.melt(deaths, id_vars = ['County Name', "State", "countyFIPS", "stateFIPS"],
                 value_vars = list(deaths.columns[2:-2]),
                 var_name = "Date", value_name = "Deaths")

deaths

Unnamed: 0,County Name,State,countyFIPS,stateFIPS,Date,Deaths
0,Statewide Unallocated,Alabama,0,1,1/22/20,0
1,Autauga,Alabama,1001,1,1/22/20,0
2,Baldwin,Alabama,1003,1,1/22/20,0
3,Barbour,Alabama,1005,1,1/22/20,0
4,Bibb,Alabama,1007,1,1/22/20,0
...,...,...,...,...,...,...
494755,Sweetwater,Wyoming,56037,56,6/24/20,0
494756,Teton,Wyoming,56039,56,6/24/20,1
494757,Uinta,Wyoming,56041,56,6/24/20,0
494758,Washakie,Wyoming,56043,56,6/24/20,5


Combine cases and deaths into one data frame.

In [23]:
### Merge dataframes
cases_deaths = cases.merge(deaths, on = ["State","County Name", "Date", "countyFIPS", "stateFIPS"])
cases_deaths

Unnamed: 0,County Name,State,countyFIPS,stateFIPS,Date,Cases,Deaths
0,Statewide Unallocated,Alabama,0,1,1/22/20,0,0
1,Autauga,Alabama,1001,1,1/22/20,0,0
2,Baldwin,Alabama,1003,1,1/22/20,0,0
3,Barbour,Alabama,1005,1,1/22/20,0,0
4,Bibb,Alabama,1007,1,1/22/20,0,0
...,...,...,...,...,...,...,...
494755,Sweetwater,Wyoming,56037,56,6/24/20,65,0
494756,Teton,Wyoming,56039,56,6/24/20,113,1
494757,Uinta,Wyoming,56041,56,6/24/20,157,0
494758,Washakie,Wyoming,56043,56,6/24/20,39,5


Add population to cases_deaths.

In [24]:
### Merge dataframes
cases_deaths = cases_deaths.merge(population, on = ["countyFIPS","County Name"], how = "left")

### Sort
cases_deaths = cases_deaths.astype({"Date" : "datetime64"})
cases_deaths = cases_deaths.sort_values(["State","County Name","Date"], ascending = [True, True, True])


### Rename population and cases
cases_deaths = cases_deaths.rename(columns = {"Cases" : "Total Cases",
                                              "Deaths" : "Total Deaths"})

cases_deaths = cases_deaths.reset_index().drop(columns = "index")
cases_deaths

Unnamed: 0,County Name,State,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population
0,Autauga,Alabama,1001,1,2020-01-22,0,0,55869
1,Autauga,Alabama,1001,1,2020-01-23,0,0,55869
2,Autauga,Alabama,1001,1,2020-01-24,0,0,55869
3,Autauga,Alabama,1001,1,2020-01-25,0,0,55869
4,Autauga,Alabama,1001,1,2020-01-26,0,0,55869
...,...,...,...,...,...,...,...,...
494755,Weston,Wyoming,56045,56,2020-06-20,1,0,6927
494756,Weston,Wyoming,56045,56,2020-06-21,1,0,6927
494757,Weston,Wyoming,56045,56,2020-06-22,1,0,6927
494758,Weston,Wyoming,56045,56,2020-06-23,1,0,6927


Calculate the number of new cases and deaths each day.

In [25]:
changeInCases = []

In [26]:
### For each state.
for state in cases_deaths["State"].unique():
    ### For each county in the state
    for county in cases_deaths["County Name"][cases_deaths["State"] == state].unique():
        changeInCases.append(0) ### Add first date diff which is 0.
        ### Add diff in case for each following day
        changeInCases.extend(abs(np.diff(cases_deaths["Total Cases"][(cases_deaths["County Name"] == county) &
                                                                     (cases_deaths["State"] == state)])))
        

In [27]:
### Add to data
cases_deaths["New Cases"] = changeInCases
cases_deaths

Unnamed: 0,County Name,State,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases
0,Autauga,Alabama,1001,1,2020-01-22,0,0,55869,0
1,Autauga,Alabama,1001,1,2020-01-23,0,0,55869,0
2,Autauga,Alabama,1001,1,2020-01-24,0,0,55869,0
3,Autauga,Alabama,1001,1,2020-01-25,0,0,55869,0
4,Autauga,Alabama,1001,1,2020-01-26,0,0,55869,0
...,...,...,...,...,...,...,...,...,...
494755,Weston,Wyoming,56045,56,2020-06-20,1,0,6927,0
494756,Weston,Wyoming,56045,56,2020-06-21,1,0,6927,0
494757,Weston,Wyoming,56045,56,2020-06-22,1,0,6927,0
494758,Weston,Wyoming,56045,56,2020-06-23,1,0,6927,0


In [28]:
changeInDeaths = []

In [29]:
### For each state.
for state in cases_deaths["State"].unique():
    ### For each county in the state
    for county in cases_deaths["County Name"][cases_deaths["State"] == state].unique():
        changeInDeaths.append(0) ### Add first date diff which is 0.
        ### Add diff in case for each following day
        changeInDeaths.extend(abs(np.diff(cases_deaths["Total Deaths"][(cases_deaths["County Name"] == county) &
                                                                       (cases_deaths["State"] == state)])))
        

In [30]:
### Add to data
cases_deaths["New Deaths"] = changeInDeaths
cases_deaths

Unnamed: 0,County Name,State,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths
0,Autauga,Alabama,1001,1,2020-01-22,0,0,55869,0,0
1,Autauga,Alabama,1001,1,2020-01-23,0,0,55869,0,0
2,Autauga,Alabama,1001,1,2020-01-24,0,0,55869,0,0
3,Autauga,Alabama,1001,1,2020-01-25,0,0,55869,0,0
4,Autauga,Alabama,1001,1,2020-01-26,0,0,55869,0,0
...,...,...,...,...,...,...,...,...,...,...
494755,Weston,Wyoming,56045,56,2020-06-20,1,0,6927,0,0
494756,Weston,Wyoming,56045,56,2020-06-21,1,0,6927,0,0
494757,Weston,Wyoming,56045,56,2020-06-22,1,0,6927,0,0
494758,Weston,Wyoming,56045,56,2020-06-23,1,0,6927,0,0


Change data types for County Name and State.

In [31]:
cases_deaths.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 494760 entries, 0 to 494759
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   County Name   494760 non-null  object        
 1   State         494760 non-null  object        
 2   countyFIPS    494760 non-null  int64         
 3   stateFIPS     494760 non-null  int64         
 4   Date          494760 non-null  datetime64[ns]
 5   Total Cases   494760 non-null  int64         
 6   Total Deaths  494760 non-null  int64         
 7   Population    494760 non-null  int64         
 8   New Cases     494760 non-null  int64         
 9   New Deaths    494760 non-null  int64         
dtypes: datetime64[ns](1), int64(7), object(2)
memory usage: 37.7+ MB


In [32]:
cases_deaths = cases_deaths.astype({"County Name" : "category",
                                    "State" : "category",
                                    "countyFIPS" : "str",
                                    "stateFIPS" : "str"})
cases_deaths.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 494760 entries, 0 to 494759
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   County Name   494760 non-null  category      
 1   State         494760 non-null  category      
 2   countyFIPS    494760 non-null  object        
 3   stateFIPS     494760 non-null  object        
 4   Date          494760 non-null  datetime64[ns]
 5   Total Cases   494760 non-null  int64         
 6   Total Deaths  494760 non-null  int64         
 7   Population    494760 non-null  int64         
 8   New Cases     494760 non-null  int64         
 9   New Deaths    494760 non-null  int64         
dtypes: category(2), datetime64[ns](1), int64(5), object(2)
memory usage: 31.7+ MB


Now make a new data frame without "Statewide Unallocated."

In [33]:
cases_deaths2 = cases_deaths[cases_deaths["County Name"] != "Statewide Unallocated"]
cases_deaths2 = cases_deaths2.reset_index()
cases_deaths2 = cases_deaths2.drop(columns = "index")
cases_deaths2

Unnamed: 0,County Name,State,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths
0,Autauga,Alabama,1001,1,2020-01-22,0,0,55869,0,0
1,Autauga,Alabama,1001,1,2020-01-23,0,0,55869,0,0
2,Autauga,Alabama,1001,1,2020-01-24,0,0,55869,0,0
3,Autauga,Alabama,1001,1,2020-01-25,0,0,55869,0,0
4,Autauga,Alabama,1001,1,2020-01-26,0,0,55869,0,0
...,...,...,...,...,...,...,...,...,...,...
487005,Weston,Wyoming,56045,56,2020-06-20,1,0,6927,0,0
487006,Weston,Wyoming,56045,56,2020-06-21,1,0,6927,0,0
487007,Weston,Wyoming,56045,56,2020-06-22,1,0,6927,0,0
487008,Weston,Wyoming,56045,56,2020-06-23,1,0,6927,0,0


The first 6 states (Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut) have countyFIPS codes that need to start with 0.

In [34]:
### Add a zero to the countyFIPS for the first six states
### First six states end where DC begins
for i in range(list(cases_deaths2["countyFIPS"][cases_deaths2["State"] == "DC"].index)[0]):
    cases_deaths2["countyFIPS"][i] = '0' + cases_deaths2["countyFIPS"][i]

cases_deaths2

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


Unnamed: 0,County Name,State,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths
0,Autauga,Alabama,01001,1,2020-01-22,0,0,55869,0,0
1,Autauga,Alabama,01001,1,2020-01-23,0,0,55869,0,0
2,Autauga,Alabama,01001,1,2020-01-24,0,0,55869,0,0
3,Autauga,Alabama,01001,1,2020-01-25,0,0,55869,0,0
4,Autauga,Alabama,01001,1,2020-01-26,0,0,55869,0,0
...,...,...,...,...,...,...,...,...,...,...
487005,Weston,Wyoming,56045,56,2020-06-20,1,0,6927,0,0
487006,Weston,Wyoming,56045,56,2020-06-21,1,0,6927,0,0
487007,Weston,Wyoming,56045,56,2020-06-22,1,0,6927,0,0
487008,Weston,Wyoming,56045,56,2020-06-23,1,0,6927,0,0


In [35]:
cases_deaths2.loc[48031,:]

County Name               Hartford 
State                   Connecticut
countyFIPS                    09003
stateFIPS                         9
Date            2020-06-06 00:00:00
Total Cases                   10747
Total Deaths                   1279
Population                   891720
New Cases                        70
New Deaths                        2
Name: 48031, dtype: object

### State Level Data

Now create a data frame that summarizes the data for each state.

In [36]:
### First for Alabama
### Aggregate data
StateData = cases_deaths[cases_deaths['State'] == "Alabama"].groupby("Date").agg(
        TotalCases = pd.NamedAgg(column = "Total Cases", aggfunc = sum),
        TotalDeaths = pd.NamedAgg(column = "Total Deaths", aggfunc = sum),
        Population = pd.NamedAgg(column = "Population", aggfunc = sum),
        NewCases = pd.NamedAgg(column = "New Cases", aggfunc = sum),
        NewDeaths = pd.NamedAgg(column = "New Deaths", aggfunc = sum))

### Make a vector of the state and its FIPS
state = np.repeat("Alabama", len(cases_deaths["Date"].unique()))
statefips = np.repeat('1', len(cases_deaths["Date"].unique()))

### Grab dates
date = cases_deaths["Date"].unique()

### Insert into State Data
StateData.insert(0, "stateFIPS", statefips)
StateData.insert(0, "State", state)
StateData.insert(0, "Date", date)

### Now the rest
for state, fipsNum in zip(cases_deaths["State"].unique()[1:], cases_deaths["stateFIPS"].unique()[1:]) :
    ### Aggregate data
    myStateData = cases_deaths[cases_deaths['State'] == state].groupby("Date").agg(
        TotalCases = pd.NamedAgg(column = "Total Cases", aggfunc = sum),
        TotalDeaths = pd.NamedAgg(column = "Total Deaths", aggfunc = sum),
        Population = pd.NamedAgg(column = "Population", aggfunc = sum),
        NewCases = pd.NamedAgg(column = "New Cases", aggfunc = sum),
        NewDeaths = pd.NamedAgg(column = "New Deaths", aggfunc = sum))
    
    ### Make a vector of the state/fips and grab dates
    mystate = np.repeat(state, len(cases_deaths["Date"].unique()))
    mystatefips = np.repeat(fipsNum, len(cases_deaths["Date"].unique()))
    mydate = cases_deaths["Date"].unique()
    
    ### Insert data
    myStateData.insert(0, "stateFIPS", mystatefips)
    myStateData.insert(0, "State", state)
    myStateData.insert(0, "Date", date)
    
    ### Stack state datas
    StateData = pd.concat([StateData, myStateData])

### Reset indicies
StateData = StateData.set_index(np.arange(0,len(StateData)))

In [37]:
StateData

Unnamed: 0,Date,State,stateFIPS,TotalCases,TotalDeaths,Population,NewCases,NewDeaths
0,2020-01-22,Alabama,1,0,0,4903185,0,0
1,2020-01-23,Alabama,1,0,0,4903185,0,0
2,2020-01-24,Alabama,1,0,0,4903185,0,0
3,2020-01-25,Alabama,1,0,0,4903185,0,0
4,2020-01-26,Alabama,1,0,0,4903185,0,0
...,...,...,...,...,...,...,...,...
7900,2020-06-20,Wyoming,56,1179,20,578759,6,0
7901,2020-06-21,Wyoming,56,1196,20,578759,17,0
7902,2020-06-22,Wyoming,56,1230,20,578759,34,0
7903,2020-06-23,Wyoming,56,1254,20,578759,24,0


### USA Level Data

Now create a data set for the USA.

In [38]:
### First for date
### Aggregate data
USAData = StateData[StateData['Date'] == StateData["Date"].unique()[0]].groupby("Date").agg(
        TotalCases = pd.NamedAgg(column = "TotalCases", aggfunc = sum),
        TotalDeaths = pd.NamedAgg(column = "TotalDeaths", aggfunc = sum),
        Population = pd.NamedAgg(column = "Population", aggfunc = sum),
        NewCases = pd.NamedAgg(column = "NewCases", aggfunc = sum),
        NewDeaths = pd.NamedAgg(column = "NewDeaths", aggfunc = sum))

### Insert into usaData
USAData.insert(0, "Date", StateData["Date"].unique()[0])
USAData.insert(0, "Country", "United States")


### For the rest of dates
for day in StateData["Date"].unique()[1:]:
    ### Aggregate data
    myUSAData = StateData[StateData['Date'] == day].groupby("Date").agg(
        TotalCases = pd.NamedAgg(column = "TotalCases", aggfunc = sum),
        TotalDeaths = pd.NamedAgg(column = "TotalDeaths", aggfunc = sum),
        Population = pd.NamedAgg(column = "Population", aggfunc = sum),
        NewCases = pd.NamedAgg(column = "NewCases", aggfunc = sum),
        NewDeaths = pd.NamedAgg(column = "NewDeaths", aggfunc = sum))
        
    ### Insert date into data
    myUSAData.insert(0, "Date", day)
    myUSAData.insert(0, "Country", "United States")
    
    ### Stack state datas
    USAData = pd.concat([USAData, myUSAData])
    
    

### Reset indicies
USAData = USAData.set_index(np.arange(0,len(USAData)))

USAData

Unnamed: 0,Country,Date,TotalCases,TotalDeaths,Population,NewCases,NewDeaths
0,United States,2020-01-22,1,0,328239523,0,0
1,United States,2020-01-23,1,0,328239523,0,0
2,United States,2020-01-24,2,0,328239523,1,0
3,United States,2020-01-25,2,0,328239523,0,0
4,United States,2020-01-26,5,0,328239523,3,0
...,...,...,...,...,...,...,...
150,United States,2020-06-20,2239815,118844,328239523,50206,2321
151,United States,2020-06-21,2266750,119114,328239523,45913,2023
152,United States,2020-06-22,2296198,119471,328239523,48800,2120
153,United States,2020-06-23,2331209,120304,328239523,54108,2598


The final Total Cases & Total Deaths nubers are only a bit off. Give or take 50

In [39]:
USAData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 155 entries, 0 to 154
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   Country      155 non-null    object        
 1   Date         155 non-null    datetime64[ns]
 2   TotalCases   155 non-null    int64         
 3   TotalDeaths  155 non-null    int64         
 4   Population   155 non-null    int64         
 5   NewCases     155 non-null    int64         
 6   NewDeaths    155 non-null    int64         
dtypes: datetime64[ns](1), int64(5), object(1)
memory usage: 9.7+ KB


### Proportions

County data.

In [40]:
### Percent of population that have cases.
cases_deaths2["%Cases"] = np.where(cases_deaths2["Population"] != 0,
                                  (cases_deaths2["Total Cases"] / cases_deaths2["Population"]) * 100,
                                  0)

### Percent of population that have died.
cases_deaths2["%Deaths"] = np.where(cases_deaths2["Population"] != 0,
                                   (cases_deaths2["Total Deaths"] / cases_deaths2["Population"]) * 100,
                                   0)

cases_deaths2

Unnamed: 0,County Name,State,countyFIPS,stateFIPS,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths
0,Autauga,Alabama,01001,1,2020-01-22,0,0,55869,0,0,0.000000,0.0
1,Autauga,Alabama,01001,1,2020-01-23,0,0,55869,0,0,0.000000,0.0
2,Autauga,Alabama,01001,1,2020-01-24,0,0,55869,0,0,0.000000,0.0
3,Autauga,Alabama,01001,1,2020-01-25,0,0,55869,0,0,0.000000,0.0
4,Autauga,Alabama,01001,1,2020-01-26,0,0,55869,0,0,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
487005,Weston,Wyoming,56045,56,2020-06-20,1,0,6927,0,0,0.014436,0.0
487006,Weston,Wyoming,56045,56,2020-06-21,1,0,6927,0,0,0.014436,0.0
487007,Weston,Wyoming,56045,56,2020-06-22,1,0,6927,0,0,0.014436,0.0
487008,Weston,Wyoming,56045,56,2020-06-23,1,0,6927,0,0,0.014436,0.0


State data.

In [41]:
### Percent of population that have cases.
StateData["%Cases"] = np.where(StateData["Population"] != 0,
                                  (StateData["TotalCases"] / StateData["Population"]) * 100,
                                  0)

### Percent of population that have died.
StateData["%Deaths"] = np.where(StateData["Population"] != 0,
                                   (StateData["TotalDeaths"] / StateData["Population"]) * 100,
                                   0)

StateData

Unnamed: 0,Date,State,stateFIPS,TotalCases,TotalDeaths,Population,NewCases,NewDeaths,%Cases,%Deaths
0,2020-01-22,Alabama,1,0,0,4903185,0,0,0.000000,0.000000
1,2020-01-23,Alabama,1,0,0,4903185,0,0,0.000000,0.000000
2,2020-01-24,Alabama,1,0,0,4903185,0,0,0.000000,0.000000
3,2020-01-25,Alabama,1,0,0,4903185,0,0,0.000000,0.000000
4,2020-01-26,Alabama,1,0,0,4903185,0,0,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...
7900,2020-06-20,Wyoming,56,1179,20,578759,6,0,0.203712,0.003456
7901,2020-06-21,Wyoming,56,1196,20,578759,17,0,0.206649,0.003456
7902,2020-06-22,Wyoming,56,1230,20,578759,34,0,0.212524,0.003456
7903,2020-06-23,Wyoming,56,1254,20,578759,24,0,0.216670,0.003456


Country data.

In [42]:
### Percent of population that have cases.
USAData["%Cases"] = np.where(USAData["Population"] != 0,
                                  (USAData["TotalCases"] / USAData["Population"]) * 100,
                                  0)

### Percent of population that have died.
USAData["%Deaths"] = np.where(USAData["Population"] != 0,
                                   (USAData["TotalDeaths"] / USAData["Population"]) * 100,
                                   0)

USAData

Unnamed: 0,Country,Date,TotalCases,TotalDeaths,Population,NewCases,NewDeaths,%Cases,%Deaths
0,United States,2020-01-22,1,0,328239523,0,0,3.046556e-07,0.000000
1,United States,2020-01-23,1,0,328239523,0,0,3.046556e-07,0.000000
2,United States,2020-01-24,2,0,328239523,1,0,6.093111e-07,0.000000
3,United States,2020-01-25,2,0,328239523,0,0,6.093111e-07,0.000000
4,United States,2020-01-26,5,0,328239523,3,0,1.523278e-06,0.000000
...,...,...,...,...,...,...,...,...,...
150,United States,2020-06-20,2239815,118844,328239523,50206,2321,6.823721e-01,0.036206
151,United States,2020-06-21,2266750,119114,328239523,45913,2023,6.905780e-01,0.036289
152,United States,2020-06-22,2296198,119471,328239523,48800,2120,6.995495e-01,0.036398
153,United States,2020-06-23,2331209,120304,328239523,54108,2598,7.102158e-01,0.036651


### Finalize Data

Fix column names in State data and USA data.

In [43]:
StateData = StateData.rename(columns = {"TotalCases" : "Total Cases",
                                        "TotalDeaths" : "Total Deaths",
                                        "NewCases" : "New Cases",
                                        "NewDeaths" : "New Deaths"})
StateData

Unnamed: 0,Date,State,stateFIPS,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths
0,2020-01-22,Alabama,1,0,0,4903185,0,0,0.000000,0.000000
1,2020-01-23,Alabama,1,0,0,4903185,0,0,0.000000,0.000000
2,2020-01-24,Alabama,1,0,0,4903185,0,0,0.000000,0.000000
3,2020-01-25,Alabama,1,0,0,4903185,0,0,0.000000,0.000000
4,2020-01-26,Alabama,1,0,0,4903185,0,0,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...
7900,2020-06-20,Wyoming,56,1179,20,578759,6,0,0.203712,0.003456
7901,2020-06-21,Wyoming,56,1196,20,578759,17,0,0.206649,0.003456
7902,2020-06-22,Wyoming,56,1230,20,578759,34,0,0.212524,0.003456
7903,2020-06-23,Wyoming,56,1254,20,578759,24,0,0.216670,0.003456


In [44]:
USAData = USAData.rename(columns = {"TotalCases" : "Total Cases",
                                        "TotalDeaths" : "Total Deaths",
                                        "NewCases" : "New Cases",
                                        "NewDeaths" : "New Deaths"})
USAData

Unnamed: 0,Country,Date,Total Cases,Total Deaths,Population,New Cases,New Deaths,%Cases,%Deaths
0,United States,2020-01-22,1,0,328239523,0,0,3.046556e-07,0.000000
1,United States,2020-01-23,1,0,328239523,0,0,3.046556e-07,0.000000
2,United States,2020-01-24,2,0,328239523,1,0,6.093111e-07,0.000000
3,United States,2020-01-25,2,0,328239523,0,0,6.093111e-07,0.000000
4,United States,2020-01-26,5,0,328239523,3,0,1.523278e-06,0.000000
...,...,...,...,...,...,...,...,...,...
150,United States,2020-06-20,2239815,118844,328239523,50206,2321,6.823721e-01,0.036206
151,United States,2020-06-21,2266750,119114,328239523,45913,2023,6.905780e-01,0.036289
152,United States,2020-06-22,2296198,119471,328239523,48800,2120,6.995495e-01,0.036398
153,United States,2020-06-23,2331209,120304,328239523,54108,2598,7.102158e-01,0.036651


Change data types in State and USA data.

In [45]:
StateData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7905 entries, 0 to 7904
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   Date          7905 non-null   datetime64[ns]
 1   State         7905 non-null   object        
 2   stateFIPS     7905 non-null   object        
 3   Total Cases   7905 non-null   int64         
 4   Total Deaths  7905 non-null   int64         
 5   Population    7905 non-null   int64         
 6   New Cases     7905 non-null   int64         
 7   New Deaths    7905 non-null   int64         
 8   %Cases        7905 non-null   float64       
 9   %Deaths       7905 non-null   float64       
dtypes: datetime64[ns](1), float64(2), int64(5), object(2)
memory usage: 679.3+ KB


In [46]:
StateData = StateData.astype({"State" : "category",
                              "stateFIPS" : "str"})
StateData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7905 entries, 0 to 7904
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   Date          7905 non-null   datetime64[ns]
 1   State         7905 non-null   category      
 2   stateFIPS     7905 non-null   object        
 3   Total Cases   7905 non-null   int64         
 4   Total Deaths  7905 non-null   int64         
 5   Population    7905 non-null   int64         
 6   New Cases     7905 non-null   int64         
 7   New Deaths    7905 non-null   int64         
 8   %Cases        7905 non-null   float64       
 9   %Deaths       7905 non-null   float64       
dtypes: category(1), datetime64[ns](1), float64(2), int64(5), object(1)
memory usage: 628.2+ KB


In [47]:
USAData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 155 entries, 0 to 154
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   Country       155 non-null    object        
 1   Date          155 non-null    datetime64[ns]
 2   Total Cases   155 non-null    int64         
 3   Total Deaths  155 non-null    int64         
 4   Population    155 non-null    int64         
 5   New Cases     155 non-null    int64         
 6   New Deaths    155 non-null    int64         
 7   %Cases        155 non-null    float64       
 8   %Deaths       155 non-null    float64       
dtypes: datetime64[ns](1), float64(2), int64(5), object(1)
memory usage: 12.1+ KB


In [48]:
USAData = USAData.astype({"Country" : "category"})
USAData.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 155 entries, 0 to 154
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   Country       155 non-null    category      
 1   Date          155 non-null    datetime64[ns]
 2   Total Cases   155 non-null    int64         
 3   Total Deaths  155 non-null    int64         
 4   Population    155 non-null    int64         
 5   New Cases     155 non-null    int64         
 6   New Deaths    155 non-null    int64         
 7   %Cases        155 non-null    float64       
 8   %Deaths       155 non-null    float64       
dtypes: category(1), datetime64[ns](1), float64(2), int64(5)
memory usage: 11.1 KB


### Final Data

Save datasets under new names.

In [49]:
countyData = cases_deaths2
stateData = StateData
usaData = USAData