Steps:
1. Finalise data sets (be brutal, identify roots and stems; address missing values, model missing value evaluate to mean)
2. Model linear regression statistics (feature importances; chicken feed/auto)
3. Prediction: random forest
4. data visualisation (pairplots)

In [1]:
import pandas as pd

### Covid 19 Cases by County (USA Facts/CDC)

For most states, USAFacts directly collects the daily county-level cumulative totals of positive cases and deaths from a table, dashboard, or PDF on the state public health website. This data is compiled either through scraping or manual entry. The underlying data is available for download below the US county map and has helped government agencies like the Centers for Disease Control and Prevention in its nationwide efforts.

REFERENCES:
1. https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/

In [2]:
covid_cases = pd.read_csv("data/covid_confirmed_usafacts_200803.csv")

In [3]:
covid_cases.head()

Unnamed: 0,countyFIPS,County Name,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,7/23/20,7/24/20,7/25/20,7/26/20,7/27/20,7/28/20,7/29/20,7/30/20,7/31/20,8/1/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,905,921,932,942,965,974,974,1002,1015,1030
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,2461,2513,2662,2708,2770,2835,2835,3028,3101,3142
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,534,539,552,562,569,575,575,585,598,602
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,289,303,318,324,334,337,338,352,363,368


In [4]:
covid_cases_dropped = covid_cases.drop(columns=['8/1/20'])

In [5]:
covid_cases_dropped_only = covid_cases_dropped.iloc[:,-192:]

In [6]:
covid_cases_total = covid_cases_dropped['Total Cases']= covid_cases_dropped.iloc[:, -192:].sum(axis=1)

In [7]:
covid_cases_filter = covid_cases_dropped.loc[:,["countyFIPS", "County Name", "State", "stateFIPS", "Total Cases"]]
covid_cases_filter["countyFIPS"] = covid_cases_filter["countyFIPS"].astype(str)
print(covid_cases_filter.dtypes)

countyFIPS     object
County Name    object
State          object
stateFIPS       int64
Total Cases     int64
dtype: object


In [8]:
covid_cases_filter['countyFIPS_2d'] = covid_cases_filter['countyFIPS'].str[-2:]
covid_cases_filter = covid_cases_filter.loc[:,["stateFIPS", "countyFIPS_2d", "County Name", "State", "Total Cases"]]

In [53]:
covid_cases_clean = covid_cases_filter.copy()

In [54]:
covid_cases_clean = covid_cases_clean.loc[covid_cases_clean['County Name'] != "Statewide Unallocated"]
covid_cases_clean

Unnamed: 0,stateFIPS,countyFIPS_2d,County Name,State,Total Cases
1,1,01,Autauga County,AL,39746
2,1,03,Baldwin County,AL,76970
3,1,05,Barbour County,AL,24625
4,1,07,Bibb County,AL,13636
5,1,09,Blount County,AL,19311
...,...,...,...,...,...
3190,56,37,Sweetwater County,WY,7361
3191,56,39,Teton County,WY,13823
3192,56,41,Uinta County,WY,9737
3193,56,43,Washakie County,WY,3104


### Covid 19 Deaths by County (USA Facts/CDC)

For most states, USAFacts directly collects the daily county-level cumulative totals of positive cases and deaths from a table, dashboard, or PDF on the state public health website. This data is compiled either through scraping or manual entry. The underlying data is available for download below the US county map and has helped government agencies like the Centers for Disease Control and Prevention in its nationwide efforts.

REFERENCES:
1. https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/

In [10]:
covid_deaths = pd.read_csv("data/covid_deaths_usafacts_200803.csv")

In [11]:
covid_deaths_dropped = covid_deaths.drop(columns=['8/1/20'])

In [12]:
covid_deaths_total = covid_deaths_dropped['Total Deaths']= covid_deaths_dropped.iloc[:, -192:].sum(axis=1)

In [13]:
covid_deaths_filter = covid_deaths_dropped.loc[:,["countyFIPS", "County Name", "State", "stateFIPS", "Total Deaths"]]

In [14]:
covid_deaths_filter = covid_deaths_dropped.loc[:,["countyFIPS", "County Name", "State", "stateFIPS", "Total Deaths"]]
covid_deaths_filter["countyFIPS"] = covid_deaths_filter["countyFIPS"].astype(str)
print(covid_deaths_filter.dtypes)

countyFIPS      object
County Name     object
State           object
stateFIPS        int64
Total Deaths     int64
dtype: object


In [15]:
covid_deaths_filter['countyFIPS_2d'] = covid_deaths_filter['countyFIPS'].str[-2:]
covid_deaths_filter = covid_deaths_filter.loc[:,["stateFIPS", "countyFIPS_2d", "County Name", "State", "Total Deaths"]]
covid_deaths_filter

Unnamed: 0,stateFIPS,countyFIPS_2d,County Name,State,Total Deaths
0,1,0,Statewide Unallocated,AL,0
1,1,01,Autauga County,AL,909
2,1,03,Baldwin County,AL,958
3,1,05,Barbour County,AL,155
4,1,07,Bibb County,AL,103
...,...,...,...,...,...
3190,56,37,Sweetwater County,WY,34
3191,56,39,Teton County,WY,101
3192,56,41,Uinta County,WY,0
3193,56,43,Washakie County,WY,291


In [52]:
covid_deaths_clean = covid_deaths_filter.copy()
covid_deaths_clean = covid_deaths_clean.loc[covid_deaths_clean['County Name'] != "Statewide Unallocated"]
covid_deaths_clean

Unnamed: 0,stateFIPS,countyFIPS_2d,County Name,State,Total Deaths
1,1,01,Autauga County,AL,909
2,1,03,Baldwin County,AL,958
3,1,05,Barbour County,AL,155
4,1,07,Bibb County,AL,103
5,1,09,Blount County,AL,82
...,...,...,...,...,...
3190,56,37,Sweetwater County,WY,34
3191,56,39,Teton County,WY,101
3192,56,41,Uinta County,WY,0
3193,56,43,Washakie County,WY,291


### Per capital incidence of poverty by U.S county (U.S Census)

The poverty universe is made up of persons for whom the Census Bureau can determine poverty status (either "in poverty" or "not in poverty").

REFERENCES:
1. SAIPE Model Input Data: https://www.census.gov/data/datasets/time-series/demo/saipe/model-tables.html

In [17]:
poverty = pd.read_csv("data/allpovu.csv")
poverty_all_ages = poverty.loc[:,["State FIPS code", "County FIPS code", "Name", "State Postal Code", "Poverty Universe, All Ages"]]
poverty_all_ages.rename(columns={'State FIPS code': 'stateFIPS', 'County FIPS code': 'countyFIPS_2d'}, inplace=True)
poverty_all_ages

Unnamed: 0,stateFIPS,countyFIPS_2d,Name,State Postal Code,"Poverty Universe, All Ages"
0,0,0,United States,US,319184033.0
1,1,0,Alabama,AL,4763811.0
2,1,1,Autauga County,AL,55073.0
3,1,3,Baldwin County,AL,215255.0
4,1,5,Barbour County,AL,21979.0
...,...,...,...,...,...
3196,56,37,Sweetwater County,WY,42205.0
3197,56,39,Teton County,WY,22888.0
3198,56,41,Uinta County,WY,20135.0
3199,56,43,Washakie County,WY,7735.0


In [47]:
poverty_all_ages.rename(columns={'Name': 'County Name', 'State Postal Code': 'State'}, inplace=True)
poverty_clean = poverty_all_ages.copy()
poverty_clean["countyFIPS_2d"] = poverty_clean["countyFIPS_2d"].astype(int)
poverty_clean["stateFIPS"] = poverty_clean["stateFIPS"].astype(int)
poverty_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3201 entries, 0 to 3200
Data columns (total 5 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   stateFIPS                   3201 non-null   int64  
 1   countyFIPS_2d               3201 non-null   int64  
 2   County Name                 3201 non-null   object 
 3   State                       3201 non-null   object 
 4   Poverty Universe, All Ages  3193 non-null   float64
dtypes: float64(1), int64(2), object(2)
memory usage: 125.2+ KB


In [48]:
poverty_clean = poverty_clean.loc[poverty_clean['countyFIPS_2d'] != 0]
poverty_clean

Unnamed: 0,stateFIPS,countyFIPS_2d,County Name,State,"Poverty Universe, All Ages"
2,1,1,Autauga County,AL,55073.0
3,1,3,Baldwin County,AL,215255.0
4,1,5,Barbour County,AL,21979.0
5,1,7,Bibb County,AL,20212.0
6,1,9,Blount County,AL,57238.0
...,...,...,...,...,...
3196,56,37,Sweetwater County,WY,42205.0
3197,56,39,Teton County,WY,22888.0
3198,56,41,Uinta County,WY,20135.0
3199,56,43,Washakie County,WY,7735.0


### County Population by Racial/Ethnic Characteristics 2010-2019 (U.S. Census Bureau)

METHODOLOGY FOR THE UNITED STATES POPULATION ESTIMATES: VINTAGE 2019
Nation, States, Counties, and Puerto Rico – April 1, 2010 to July 1, 2019

Each year, the United States Census Bureau produces and publishes estimates of the population for the
nation, states, counties, state/county equivalents, and Puerto Rico.1 We estimate the resident population for
each year since the most recent decennial census by using measures of population change. The resident
population includes all people currently residing in the United States.

With each annual release of population estimates, the Population Estimates Program revises and updates the
entire time series of estimates from April 1, 2010 to July 1 of the current year, which we refer to as the
vintage year. We use the term “vintage” to denote an entire time series created with a consistent population
starting point and methodology. The release of a new vintage of estimates supersedes any previous series
and incorporates the most up-to-date input data and methodological improvements

REFERENCES:
1. Annual County Resident Population Estimates by Age, Sex, Race, and Hispanic Origin: April 1, 2010 to July 1, 2019 (https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-detail.html)
2. File Layout: https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2019/cc-est2019-alldata.pdf

In [57]:
race = pd.read_csv("data/cc-est2019-alldata.csv", encoding = "ISO-8859-1")

In [61]:
# race.columns.tolist()

# SELECTION - Z Value
# sum columns by race and gender 
# e.g. race["WA_MALE_TOTAL"] = race.loc[:, ["WA_MALE", "WAC_MALE"].sum()

# WA_MALE
# WAC_MALE

# WA_FEMALE
# WAC_FEMALE

# BA_MALE
# BAC_MALE

# BA_FEMALE
# BAC_FEMALE

# IA_MALE
# IAC_MALE

# IA_FEMALE
# IAC_FEMALE

# AA_MALE
# AAC_MALE 

# AA_FEMALE
# AAC_FEMALE

# NA_MALE
# NAC_MALE 

# NA_FEMALE
# NAC_FEMALE

# TOM_MALE
# TOM_FEMALE

race["WA_MALE_TOTAL"] = race.loc[:, ["WA_MALE", "WAC_MALE"]].sum(axis=1)
race["WA_FEMALE_TOTAL"] = race.loc[:, ["WA_FEMALE", "WAC_FEMALE"]].sum(axis=1)
race["BA_MALE_TOTAL"] = race.loc[:, ["BA_MALE", "BAC_MALE"]].sum(axis=1)
race["BA_FEMALE_TOTAL"] = race.loc[:, ["BA_FEMALE", "BAC_FEMALE"]].sum(axis=1)
race["IA_MALE_TOTAL"] = race.loc[:, ["IA_MALE", "IAC_MALE"]].sum(axis=1)
race["IA_FEMALE_TOTAL"] = race.loc[:, ["IA_FEMALE", "IAC_FEMALE"]].sum(axis=1)
race["AA_MALE_TOTAL"] = race.loc[:, ["AA_MALE", "AAC_MALE"]].sum(axis=1)
race["AA_FEMALE_TOTAL"] = race.loc[:, ["AA_FEMALE", "AAC_FEMALE"]].sum(axis=1)
race["NA_MALE_TOTAL"] = race.loc[:, ["NA_MALE", "NAC_MALE"]].sum(axis=1)
race["NA_FEMALE_TOTAL"] = race.loc[:, ["NA_FEMALE", "NAC_FEMALE"]].sum(axis=1)

In [62]:
race["YEAR"] = race["YEAR"].astype(int)
race

Unnamed: 0,SUMLEV,STATE,COUNTY,STNAME,CTYNAME,YEAR,AGEGRP,TOT_POP,TOT_MALE,TOT_FEMALE,...,WA_MALE_TOTAL,WA_FEMALE_TOTAL,BA_MALE_TOTAL,BA_FEMALE_TOTAL,IA_MALE_TOTAL,IA_FEMALE_TOTAL,AA_MALE_TOTAL,AA_FEMALE_TOTAL,NA_MALE_TOTAL,NA_FEMALE_TOTAL
0,50,1,1,Alabama,Autauga County,1,0,54571,26569,28002,...,42928,44393,9263,10436,396,453,500,693,71,55
1,50,1,1,Alabama,Autauga County,1,1,3579,1866,1713,...,2890,2684,767,679,28,21,47,43,4,1
2,50,1,1,Alabama,Autauga County,1,2,3991,2001,1990,...,3091,3109,824,777,41,27,49,63,4,7
3,50,1,1,Alabama,Autauga County,1,3,4290,2171,2119,...,3352,3301,884,842,44,39,55,55,8,6
4,50,1,1,Alabama,Autauga County,1,4,4290,2213,2077,...,3292,3209,1027,868,35,27,64,45,10,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
716371,50,56,45,Wyoming,Weston County,12,14,499,280,219,...,514,409,1,2,7,2,38,25,0,0
716372,50,56,45,Wyoming,Weston County,12,15,352,180,172,...,349,339,0,1,4,2,7,2,0,0
716373,50,56,45,Wyoming,Weston County,12,16,229,107,122,...,212,240,0,0,2,4,0,0,0,0
716374,50,56,45,Wyoming,Weston County,12,17,198,82,116,...,161,230,0,0,2,2,1,0,0,0


In [64]:
# YEAR: 12 = 7/1/2019 & AGEGRP: 0 = Total

race_12 = race.loc[(race['YEAR'] == 12) & (race['AGEGRP'] == 0)]
race_12.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3142 entries, 209 to 716357
Data columns (total 90 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   SUMLEV           3142 non-null   int64 
 1   STATE            3142 non-null   int64 
 2   COUNTY           3142 non-null   int64 
 3   STNAME           3142 non-null   object
 4   CTYNAME          3142 non-null   object
 5   YEAR             3142 non-null   int64 
 6   AGEGRP           3142 non-null   int64 
 7   TOT_POP          3142 non-null   int64 
 8   TOT_MALE         3142 non-null   int64 
 9   TOT_FEMALE       3142 non-null   int64 
 10  WA_MALE          3142 non-null   int64 
 11  WA_FEMALE        3142 non-null   int64 
 12  BA_MALE          3142 non-null   int64 
 13  BA_FEMALE        3142 non-null   int64 
 14  IA_MALE          3142 non-null   int64 
 15  IA_FEMALE        3142 non-null   int64 
 16  AA_MALE          3142 non-null   int64 
 17  AA_FEMALE        3142 non-nul

In [65]:
race_12.loc[:,["STATE", "COUNTY", "STNAME", "CTYNAME", "WA_MALE_TOTAL", "WA_FEMALE_TOTAL", "BA_MALE_TOTAL", "BA_FEMALE_TOTAL", "IA_MALE_TOTAL", "IA_FEMALE_TOTAL", "AA_MALE_TOTAL", "AA_FEMALE_TOTAL", "NA_MALE_TOTAL", "NA_FEMALE_TOTAL"]]

Unnamed: 0,STATE,COUNTY,STNAME,CTYNAME,WA_MALE_TOTAL,WA_FEMALE_TOTAL,BA_MALE_TOTAL,BA_FEMALE_TOTAL,IA_MALE_TOTAL,IA_FEMALE_TOTAL,AA_MALE_TOTAL,AA_FEMALE_TOTAL,NA_MALE_TOTAL,NA_FEMALE_TOTAL
209,1,1,Alabama,Autauga County,42250,43920,10751,12270,395,446,727,879,87,75
437,1,3,Alabama,Baldwin County,191540,202761,19832,21115,2721,2624,2337,3394,254,267
665,1,5,Alabama,Barbour County,12906,11608,12743,11280,285,182,127,141,72,41
893,1,7,Alabama,Bibb County,17635,16976,5951,3719,159,151,72,73,50,16
1121,1,9,Alabama,Blount County,54866,56713,1174,1080,592,598,232,265,102,62
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
715445,56,37,Wyoming,Sweetwater County,41325,38927,828,640,881,801,516,619,97,99
715673,56,39,Wyoming,Teton County,23328,21591,248,181,310,272,358,573,63,43
715901,56,41,Wyoming,Uinta County,19698,19209,199,186,391,402,123,180,54,44
716129,56,43,Wyoming,Washakie County,7602,7321,80,57,169,198,66,102,13,13


In [66]:
race_12

Unnamed: 0,SUMLEV,STATE,COUNTY,STNAME,CTYNAME,YEAR,AGEGRP,TOT_POP,TOT_MALE,TOT_FEMALE,...,WA_MALE_TOTAL,WA_FEMALE_TOTAL,BA_MALE_TOTAL,BA_FEMALE_TOTAL,IA_MALE_TOTAL,IA_FEMALE_TOTAL,AA_MALE_TOTAL,AA_FEMALE_TOTAL,NA_MALE_TOTAL,NA_FEMALE_TOTAL
209,50,1,1,Alabama,Autauga County,12,0,55869,27092,28777,...,42250,43920,10751,12270,395,446,727,879,87,75
437,50,1,3,Alabama,Baldwin County,12,0,223234,108247,114987,...,191540,202761,19832,21115,2721,2624,2337,3394,254,267
665,50,1,5,Alabama,Barbour County,12,0,24686,13064,11622,...,12906,11608,12743,11280,285,182,127,141,72,41
893,50,1,7,Alabama,Bibb County,12,0,22394,11929,10465,...,17635,16976,5951,3719,159,151,72,73,50,16
1121,50,1,9,Alabama,Blount County,12,0,57826,28472,29354,...,54866,56713,1174,1080,592,598,232,265,102,62
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
715445,50,56,37,Wyoming,Sweetwater County,12,0,42343,21808,20535,...,41325,38927,828,640,881,801,516,619,97,99
715673,50,56,39,Wyoming,Teton County,12,0,23464,12142,11322,...,23328,21591,248,181,310,272,358,573,63,43
715901,50,56,41,Wyoming,Uinta County,12,0,20226,10224,10002,...,19698,19209,199,186,391,402,123,180,54,44
716129,50,56,43,Wyoming,Washakie County,12,0,7805,3963,3842,...,7602,7321,80,57,169,198,66,102,13,13


In [71]:
race_12.rename(columns={'CTYNAME': 'County Name'}, inplace=True)
race_12.rename(columns={'STATE': 'stateFIPS'}, inplace=True)
race_12.rename(columns={'COUNTY': 'countyFIPS_2d'}, inplace=True)
race_12

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,SUMLEV,stateFIPS,countyFIPS_2d,STNAME,County Name,YEAR,AGEGRP,TOT_POP,TOT_MALE,TOT_FEMALE,...,WA_MALE_TOTAL,WA_FEMALE_TOTAL,BA_MALE_TOTAL,BA_FEMALE_TOTAL,IA_MALE_TOTAL,IA_FEMALE_TOTAL,AA_MALE_TOTAL,AA_FEMALE_TOTAL,NA_MALE_TOTAL,NA_FEMALE_TOTAL
209,50,1,1,Alabama,Autauga County,12,0,55869,27092,28777,...,42250,43920,10751,12270,395,446,727,879,87,75
437,50,1,3,Alabama,Baldwin County,12,0,223234,108247,114987,...,191540,202761,19832,21115,2721,2624,2337,3394,254,267
665,50,1,5,Alabama,Barbour County,12,0,24686,13064,11622,...,12906,11608,12743,11280,285,182,127,141,72,41
893,50,1,7,Alabama,Bibb County,12,0,22394,11929,10465,...,17635,16976,5951,3719,159,151,72,73,50,16
1121,50,1,9,Alabama,Blount County,12,0,57826,28472,29354,...,54866,56713,1174,1080,592,598,232,265,102,62
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
715445,50,56,37,Wyoming,Sweetwater County,12,0,42343,21808,20535,...,41325,38927,828,640,881,801,516,619,97,99
715673,50,56,39,Wyoming,Teton County,12,0,23464,12142,11322,...,23328,21591,248,181,310,272,358,573,63,43
715901,50,56,41,Wyoming,Uinta County,12,0,20226,10224,10002,...,19698,19209,199,186,391,402,123,180,54,44
716129,50,56,43,Wyoming,Washakie County,12,0,7805,3963,3842,...,7602,7321,80,57,169,198,66,102,13,13


### Incidence of Pre-existing Conditions & Coverage of Flu Vaccine

People of any age with the following conditions are at increased risk of severe illness from COVID-19 (according to CDC, 17 July 17 2020:

PolicyMap worked with journalists at the New York Times to create this index assessing a county’s relative risk of its population developing severe COVID-19 symptoms. The index represents the relative risk for a high proportion of residents in each county to develop serious health complications from COVID-19 because of underlying health conditions identified by the CDC as contributing to a person’s risk of developing severe symptoms from the virus. These conditions include COPD, heart disease, high blood pressure, diabetes, and obesity.

Estimates of COPD, heart disease, high blood pressure, and diabetes and obesity prevalence at the tract and ZCTA level are from PolicyMap’s Health Outcome Estimates. Estimates of diabetes and obesity prevalence at the county level are from the CDC’s U.S. Diabetes Surveillance System.

Normalized scores were then converted to percentiles and z scores for easier interpretation. Percentiles rank counties from the lowest score to the highest on a scale of 0 to 100, where a score of 50 represents the median value. A county’s z score shows how many standard deviations above or below the average a county’s risk level falls. A score of 0.6, for example, would mean that the county has a higher risk than average, but is still within one standard deviation of the average and is therefore not unusually high. Risk categories from very low to very high are assigned based on z scores.

Constrained features to the following (according to CDC advisory 28 July, 2020):
- Serious heart conditions, such as heart failure, coronary artery disease, or cardiomyopathies (CVDINFR4, CVDCRHD4)
- Cancer (CHCOCNCR)
- Chronic kidney disease (CHCKDNY)
- COPD (CHCCOPD1)
- Obesity (BMI> 30) ( _BMI5CAT value 4; not available at county level)
- Sickle cell disease (not available)
- Solid organ transplantation 
- Type 2 diabetes mellitus (proxy; taking insulin: INSULIN)


Proxy Prevention Coverage
- Adult flu shot/spray past 12 mos (FLUSHOT6)


REFERENCES:
1. Covid 19 People with Certain Medical Conditions https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/people-with-medical-conditions.html?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fcoronavirus%2F2019-ncov%2Fneed-extra-precautions%2Fgroups-at-higher-risk.html
2. Centers for Disease Control and Prevention (CDC). Behavioral Risk Factor Surveillance System Survey Data. Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2017.: https://www.cdc.gov/brfss/smart/smart_2017.html
3. Evidence used to update the list of underlying medical conditions that increase a person’s risk of severe illness from COVID-19: https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/evidence-table.html
4. PolicyMap Severe COVID-19 Health Risk Index: https://www.policymap.com/download-covid19-data/

In [72]:
# CDC SMART Data
# preexisting = pd.read_sas("data/llcp2018_2.xpt")
# preexisting.to_csv('data/llcp2018.csv')
# preexisting = pd.read_csv("data/MMSA2017.csv")
# preexisting["_MMSA"] = preexisting["_MMSA"].astype(str)
# print(preexisting.dtypes)
# preexisting['countyFIPS_2d'] = preexisting['_MMSA'].str[2:4]
# preexisting['stateFIPS_2d'] = preexisting['_MMSA'].str[0:2]

In [88]:
preexisting = pd.read_csv("data/COVID_Risk_Index_Data.csv")

In [89]:
preexisting

Unnamed: 0,geo_boundary_type_id,geo_boundary_identifier,geo_boundary_definition_id,time_frame,index_raw,index_normalized,index_zscore,index_percentile,index_category
0,4,1001,54,2020,39300,0.98,0.36,65.42,Above Average
1,4,1003,54,2020,145554,0.99,0.43,68.39,Above Average
2,4,1005,54,2020,24233,1.18,1.85,97.09,High
3,4,1007,54,2020,18562,1.06,0.95,83.36,Above Average
4,4,1009,54,2020,45082,1.05,0.89,81.75,Above Average
...,...,...,...,...,...,...,...,...,...
3229,4,72151,54,2020,-9999,-9999.00,-9999.00,-9999.00,
3230,4,72153,54,2020,-9999,-9999.00,-9999.00,-9999.00,
3231,4,78010,54,2020,-9999,-9999.00,-9999.00,-9999.00,
3232,4,78020,54,2020,-9999,-9999.00,-9999.00,-9999.00,


In [97]:
preexisting["geo_boundary_identifier"] = preexisting["geo_boundary_identifier"].astype(str)
print(preexisting.dtypes)

geo_boundary_type_id            int64
geo_boundary_identifier        object
geo_boundary_definition_id      int64
time_frame                      int64
index_raw                       int64
index_normalized              float64
index_zscore                  float64
index_percentile              float64
index_category                 object
countyFIPS_2d                  object
stateFIPS_2d                   object
dtype: object


In [98]:
preexisting['countyFIPS_2d'] = preexisting['geo_boundary_identifier'].str[2:]
preexisting['stateFIPS_2d'] = preexisting['geo_boundary_identifier'].str[0:2]

In [99]:
preexisting

Unnamed: 0,geo_boundary_type_id,geo_boundary_identifier,geo_boundary_definition_id,time_frame,index_raw,index_normalized,index_zscore,index_percentile,index_category,countyFIPS_2d,stateFIPS_2d
0,4,1001,54,2020,39300,0.98,0.36,65.42,Above Average,01,10
1,4,1003,54,2020,145554,0.99,0.43,68.39,Above Average,03,10
2,4,1005,54,2020,24233,1.18,1.85,97.09,High,05,10
3,4,1007,54,2020,18562,1.06,0.95,83.36,Above Average,07,10
4,4,1009,54,2020,45082,1.05,0.89,81.75,Above Average,09,10
...,...,...,...,...,...,...,...,...,...,...,...
3229,4,72151,54,2020,-9999,-9999.00,-9999.00,-9999.00,,151,72
3230,4,72153,54,2020,-9999,-9999.00,-9999.00,-9999.00,,153,72
3231,4,78010,54,2020,-9999,-9999.00,-9999.00,-9999.00,,010,78
3232,4,78020,54,2020,-9999,-9999.00,-9999.00,-9999.00,,020,78


In [92]:
preexisting_clean = preexisting.loc[:,["stateFIPS_2d", "countyFIPS_2d", "index_percentile"]]
preexisting_clean.rename(columns={'stateFIPS_2d': 'stateFIPS', 'countyFIPS_2d': 'countyFIPS_2d'}, inplace=True)

In [93]:
preexisting_clean["countyFIPS_2d"] = preexisting_clean["countyFIPS_2d"].astype(int)
preexisting_clean["stateFIPS"] = preexisting_clean["stateFIPS"].astype(int)
preexisting_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3234 entries, 0 to 3233
Data columns (total 3 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   stateFIPS         3234 non-null   int64  
 1   countyFIPS_2d     3234 non-null   int64  
 2   index_percentile  3234 non-null   float64
dtypes: float64(1), int64(2)
memory usage: 75.9 KB


In [94]:
preexisting_clean.rename(columns={'index_percentile': 'Risk Index'}, inplace=True)
preexisting_clean

preexisting_clean = preexisting_clean.loc[preexisting_clean['stateFIPS'] < 57]

In [95]:
preexisting_clean

Unnamed: 0,stateFIPS,countyFIPS_2d,Risk Index
0,10,1,65.42
1,10,3,68.39
2,10,5,97.09
3,10,7,83.36
4,10,9,81.75
...,...,...,...
3138,56,37,10.42
3139,56,39,2.94
3140,56,41,27.13
3141,56,43,32.76


### Flu Coverage (CDC Wonder)? 

In [110]:
covid_deaths_clean

Unnamed: 0,stateFIPS,countyFIPS_2d,County Name,State,Total Deaths
1,1,01,Autauga County,AL,909
2,1,03,Baldwin County,AL,958
3,1,05,Barbour County,AL,155
4,1,07,Bibb County,AL,103
5,1,09,Blount County,AL,82
...,...,...,...,...,...
3190,56,37,Sweetwater County,WY,34
3191,56,39,Teton County,WY,101
3192,56,41,Uinta County,WY,0
3193,56,43,Washakie County,WY,291


## Merging DataFrames

In [111]:
merged_cases_death = covid_cases_clean.merge(covid_deaths_clean, on=["stateFIPS", "countyFIPS_2d", "County Name", "State"], how='inner', validate="1:1")
merged_cases_death["countyFIPS_2d"] = merged_cases_death["countyFIPS_2d"].astype(int)

In [112]:
merged_cases_death

Unnamed: 0,stateFIPS,countyFIPS_2d,County Name,State,Total Cases,Total Deaths
0,1,1,Autauga County,AL,39746,909
1,1,3,Baldwin County,AL,76970,958
2,1,5,Barbour County,AL,24625,155
3,1,7,Bibb County,AL,13636,103
4,1,9,Blount County,AL,19311,82
...,...,...,...,...,...,...
3125,56,37,Sweetwater County,WY,7361,34
3126,56,39,Teton County,WY,13823,101
3127,56,41,Uinta County,WY,9737,0
3128,56,43,Washakie County,WY,3104,291


In [107]:
merged_cases_death_pov = merged_cases_death.merge(poverty_clean, on=["stateFIPS", "countyFIPS_2d", "County Name", "State"], how='inner', validate="m:m")

In [108]:
merged_cases_death_pov

Unnamed: 0,stateFIPS,countyFIPS_2d,County Name,State,Total Cases,Total Deaths,"Poverty Universe, All Ages"
0,1,1,Autauga County,AL,39746,909,55073.0
1,1,3,Baldwin County,AL,76970,958,215255.0
2,1,5,Barbour County,AL,24625,155,21979.0
3,1,7,Bibb County,AL,13636,103,20212.0
4,1,9,Blount County,AL,19311,82,57238.0
...,...,...,...,...,...,...,...
1897,56,37,Sweetwater County,WY,7361,34,42205.0
1898,56,39,Teton County,WY,13823,101,22888.0
1899,56,41,Uinta County,WY,9737,0,20135.0
1900,56,43,Washakie County,WY,3104,291,7735.0


In [102]:
merged_cases_death_pov_race = merged_cases_death_pov.merge(race_12, on=["stateFIPS", "countyFIPS_2d", "County Name"], how='inner', validate="m:m")

In [103]:
merged_cases_death_pov_race_risk = merged_cases_death_pov.merge(preexisting_clean, on=["stateFIPS", "countyFIPS_2d"], how='inner', validate="1:m")

In [104]:
merged_cases_death_pov_race_risk

Unnamed: 0,stateFIPS,countyFIPS_2d,County Name,State,Total Cases,Total Deaths,"Poverty Universe, All Ages",Risk Index
0,10,1,Kent County,DE,153306,7157,173363.0,65.42
1,10,1,Kent County,DE,153306,7157,173363.0,56.02
2,10,3,New Castle County,DE,433338,18549,541853.0,68.39
3,10,3,New Castle County,DE,433338,18549,541853.0,21.74
4,10,5,Sussex County,DE,428818,14068,225563.0,97.09
...,...,...,...,...,...,...,...,...
1728,56,37,Sweetwater County,WY,7361,34,42205.0,10.42
1729,56,39,Teton County,WY,13823,101,22888.0,2.94
1730,56,41,Uinta County,WY,9737,0,20135.0,27.13
1731,56,43,Washakie County,WY,3104,291,7735.0,32.76


In [None]:
merged_cases_death_pov_race