# Project 1
### Epidemiological Study: US Vaccination Campaign (November 2020 - March 2021)
---
### Project Description/Outline
Determine the effectiveness of the US vaccination campaign in attending the population that is most affected by covid-19 disease. This is achieved by revising the total number of doses applied, vaccination coverage, population demographics such as gender, race, socioeconomic status, and education, versus epidemiologic variables: incidence, prevalence, hospitalization, UCI, death numbers.

In [2]:
## Dependencies
import pandas as pd
import requests
import time
import datetime
import matplotlib.pyplot as plt

# Import API key
from app_tokens import cdc_token

### Covid-19 Vaccination Data

In [6]:
## Import data from CSV
hesitancy_df = pd.read_csv('data/Vaccine_Hesitancy_Covid19.csv')#, encoding='latin-1')
hesitancy_df.head()

Unnamed: 0,FIPS Code,County Name,State,Estimated hesitant,Estimated strongly hesitant,Social Vulnerability Index (SVI),SVI Category,Ability to handle a COVID-19 outbreak (CVAC),CVAC Category,Percent adults fully vaccinated against COVID-19,Percent Hispanic,Percent non-Hispanic American Indian/Alaska Native,Percent non-Hispanic Asian,Percent non-Hispanic Black,Percent non-Hispanic Native Hawaiian/Pacific Islander,Percent non-Hispanic White,Geographical Point,State Code,County Boundary,State Boundary
0,1123,"Tallapoosa County, Alabama",ALABAMA,0.23,0.12,0.89,Very High Vulnerability,0.64,High Vulnerability,0.161,0.0242,0.0022,0.0036,0.2697,0.0,0.6887,POINT (-86.844516 32.756889),AL,"MULTIPOLYGON (((-85.841259 33.104456, -85.8409...","MULTIPOLYGON (((-88.139988 34.581703, -88.1352..."
1,1121,"Talladega County, Alabama",ALABAMA,0.23,0.11,0.87,Very High Vulnerability,0.84,Very High Vulnerability,0.133,0.0229,0.0043,0.0061,0.3237,0.0003,0.6263,POINT (-86.844516 32.756889),AL,"MULTIPOLYGON (((-86.303069 33.46316, -86.30306...","MULTIPOLYGON (((-88.139988 34.581703, -88.1352..."
2,1131,"Wilcox County, Alabama",ALABAMA,0.23,0.11,0.93,Very High Vulnerability,0.94,Very High Vulnerability,0.228,0.0053,0.0009,0.0003,0.6938,0.0,0.2684,POINT (-86.844516 32.756889),AL,"MULTIPOLYGON (((-87.52534299999999 32.132773, ...","MULTIPOLYGON (((-88.139988 34.581703, -88.1352..."
3,1129,"Washington County, Alabama",ALABAMA,0.23,0.11,0.73,High Vulnerability,0.82,Very High Vulnerability,0.192,0.0146,0.0731,0.0025,0.2354,0.0,0.6495,POINT (-86.844516 32.756889),AL,"MULTIPOLYGON (((-88.45317899999999 31.505388, ...","MULTIPOLYGON (((-88.139988 34.581703, -88.1352..."
4,1133,"Winston County, Alabama",ALABAMA,0.22,0.11,0.7,High Vulnerability,0.8,High Vulnerability,0.085,0.0315,0.0034,0.0016,0.0073,0.0005,0.937,POINT (-86.844516 32.756889),AL,"MULTIPOLYGON (((-87.63656399999999 34.120908, ...","MULTIPOLYGON (((-88.139988 34.581703, -88.1352..."


### Pendiente
- Retrieve lat,lng as columns by fips
- "POINT (-86.844516 32.756889)	"
- separate


In [135]:
columns = [
    'FIPS Code',
    'Geographical Point',
    'Social Vulnerability Index (SVI)',
    'SVI Category',
    'Percent adults fully vaccinated against COVID-19',
    'Percent Hispanic',
    'Percent non-Hispanic American Indian/Alaska Native',
    'Percent non-Hispanic Asian',
    'Percent non-Hispanic Black',
    'Percent non-Hispanic Native Hawaiian/Pacific Islander',
    'Percent non-Hispanic White'
]

vaccination_df = hesitancy_df[columns].sort_values('FIPS Code')
vaccination_df.reset_index(inplace=True, drop=True)

vaccination_df['Percent non-Hispanic Other'] = 1 - vaccination_df.iloc[:,4:10].sum(axis=1)

vaccination_df.set_index('FIPS Code', drop=True, inplace=True)
vaccination_df.head()

# del hesitancy_df

Unnamed: 0_level_0,Geographical Point,Social Vulnerability Index (SVI),SVI Category,Percent adults fully vaccinated against COVID-19,Percent Hispanic,Percent non-Hispanic American Indian/Alaska Native,Percent non-Hispanic Asian,Percent non-Hispanic Black,Percent non-Hispanic Native Hawaiian/Pacific Islander,Percent non-Hispanic White,Percent non-Hispanic Other
FIPS Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1001,POINT (-86.844516 32.756889),0.44,Moderate Vulnerability,0.114,0.0283,0.0025,0.0103,0.19,0.0001,0.746,0.6548
1003,POINT (-86.844516 32.756889),0.22,Low Vulnerability,0.176,0.0456,0.0065,0.0092,0.0917,0.0,0.8307,0.671
1005,POINT (-86.844516 32.756889),1.0,Very High Vulnerability,0.128,0.0436,0.0029,0.0048,0.4744,0.0,0.4581,0.3463
1007,POINT (-86.844516 32.756889),0.6,High Vulnerability,0.115,0.0257,0.0013,0.0012,0.2214,0.0,0.7453,0.6354
1009,POINT (-86.844516 32.756889),0.42,Moderate Vulnerability,0.095,0.0926,0.0007,0.0037,0.0153,0.0004,0.8689,0.7923


### US Census Reference (2019)

In [8]:
census_df = pd.read_csv(('data/US_Census2019_totals.csv'), encoding='latin-1')
census_df.head()

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTYNAME,CENSUS2010POP,ESTIMATESBASE2010,POPESTIMATE2010,...,RDOMESTICMIG2019,RNETMIG2011,RNETMIG2012,RNETMIG2013,RNETMIG2014,RNETMIG2015,RNETMIG2016,RNETMIG2017,RNETMIG2018,RNETMIG2019
0,40,3,6,1,0,Alabama,Alabama,4779736,4780125,4785437,...,1.917501,0.578434,1.186314,1.522549,0.563489,0.626357,0.745172,1.090366,1.773786,2.483744
1,50,3,6,1,1,Alabama,Autauga County,54571,54597,54773,...,4.84731,6.018182,-6.226119,-3.902226,1.970443,-1.712875,4.777171,0.849656,0.540916,4.560062
2,50,3,6,1,3,Alabama,Baldwin County,182265,182265,183112,...,24.017829,16.64187,17.488579,22.751474,20.184334,17.725964,21.279291,22.398256,24.727215,24.380567
3,50,3,6,1,5,Alabama,Barbour County,27457,27455,27327,...,-5.690302,0.292676,-6.897817,-8.132185,-5.140431,-15.724575,-18.238016,-24.998528,-8.754922,-5.165664
4,50,3,6,1,7,Alabama,Bibb County,22915,22915,22870,...,1.385134,-4.998356,-3.787545,-5.797999,1.331144,1.329817,-0.708717,-3.234669,-6.857092,1.831952


In [9]:
census_2019 = census_df[['STATE', 'COUNTY', 'STNAME', 'CTYNAME', 'POPESTIMATE2019']]
census_2019.head()

Unnamed: 0,STATE,COUNTY,STNAME,CTYNAME,POPESTIMATE2019
0,1,0,Alabama,Alabama,4903185
1,1,1,Alabama,Autauga County,55869
2,1,3,Alabama,Baldwin County,223234
3,1,5,Alabama,Barbour County,24686
4,1,7,Alabama,Bibb County,22394


In [10]:
state_2019 = census_2019.loc[census_2019['COUNTY']==0]
state_2019.reset_index(inplace=True, drop=True)
# state_2019['POPESTIMATE2019'].sum()
state_2019

Unnamed: 0,STATE,COUNTY,STNAME,CTYNAME,POPESTIMATE2019
0,1,0,Alabama,Alabama,4903185
1,2,0,Alaska,Alaska,731545
2,4,0,Arizona,Arizona,7278717
3,5,0,Arkansas,Arkansas,3017804
4,6,0,California,California,39512223
5,8,0,Colorado,Colorado,5758736
6,9,0,Connecticut,Connecticut,3565287
7,10,0,Delaware,Delaware,973764
8,11,0,District of Columbia,District of Columbia,705749
9,12,0,Florida,Florida,21477737


In [101]:
county_2019 = census_2019.drop(census_2019.index[census_2019["COUNTY"]==0])
county_2019

Unnamed: 0,STATE,COUNTY,STNAME,CTYNAME,POPESTIMATE2019
1,1,1,Alabama,Autauga County,55869
2,1,3,Alabama,Baldwin County,223234
3,1,5,Alabama,Barbour County,24686
4,1,7,Alabama,Bibb County,22394
5,1,9,Alabama,Blount County,57826
...,...,...,...,...,...
3188,56,37,Wyoming,Sweetwater County,42343
3189,56,39,Wyoming,Teton County,23464
3190,56,41,Wyoming,Uinta County,20226
3191,56,43,Wyoming,Washakie County,7805


In [102]:
fips = []
for index, row in  county_2019.iterrows():
    fips.append(f'{row["STATE"]:>02}{row["COUNTY"]:>03}')
    
county_2019.insert(0, 'FIPS Code', fips)
county_2019.drop(labels=['STATE', 'COUNTY'], axis=1, inplace=True)

In [103]:
county_2019

Unnamed: 0,FIPS Code,STNAME,CTYNAME,POPESTIMATE2019
1,01001,Alabama,Autauga County,55869
2,01003,Alabama,Baldwin County,223234
3,01005,Alabama,Barbour County,24686
4,01007,Alabama,Bibb County,22394
5,01009,Alabama,Blount County,57826
...,...,...,...,...
3188,56037,Wyoming,Sweetwater County,42343
3189,56039,Wyoming,Teton County,23464
3190,56041,Wyoming,Uinta County,20226
3191,56043,Wyoming,Washakie County,7805


In [14]:
county_2019.shape

(3142, 4)

### Pendiente
- Agregar Male/Female
- Agregar AgeGroup
- Mover Ethnicity (desde Vaccination) en número / % ???


### Nomenclatura AGEGRP de US Census a age_group


In [21]:
import numpy as np

all_data = pd.read_csv("data/cc-est2019-alldata.csv")#, encoding='latin-1')
all_data.head()

Unnamed: 0,SUMLEV,STATE,COUNTY,STNAME,CTYNAME,YEAR,AGEGRP,TOT_POP,TOT_MALE,TOT_FEMALE,...,HWAC_MALE,HWAC_FEMALE,HBAC_MALE,HBAC_FEMALE,HIAC_MALE,HIAC_FEMALE,HAAC_MALE,HAAC_FEMALE,HNAC_MALE,HNAC_FEMALE
0,50,1,1,Alabama,Autauga County,1,0,54571,26569,28002,...,607,538,57,48,26,32,9,11,19,10
1,50,1,1,Alabama,Autauga County,1,1,3579,1866,1713,...,77,56,9,5,4,1,0,0,2,1
2,50,1,1,Alabama,Autauga County,1,2,3991,2001,1990,...,64,66,2,3,2,7,2,3,2,0
3,50,1,1,Alabama,Autauga County,1,3,4290,2171,2119,...,51,57,13,7,5,5,2,1,1,1
4,50,1,1,Alabama,Autauga County,1,4,4290,2213,2077,...,48,44,7,5,0,2,2,1,3,1


In [22]:
age_conditions = [
    (all_data['AGEGRP'] == 0),
    (all_data['AGEGRP'] >= 1) & (all_data['AGEGRP'] <= 4),
    (all_data['AGEGRP'] >= 5) & (all_data['AGEGRP'] <= 10),
    (all_data['AGEGRP'] >= 11) & (all_data['AGEGRP'] <= 13),
    (all_data['AGEGRP'] >= 14) & (all_data['AGEGRP'] <= 18) 
]

age_values = ['0', '0 - 17 years', '18 - 49 years', '50 - 64 years', '65 + years']

In [23]:
all_data['Age_group'] = np.select(age_conditions, age_values)
all_data.head()

Unnamed: 0,SUMLEV,STATE,COUNTY,STNAME,CTYNAME,YEAR,AGEGRP,TOT_POP,TOT_MALE,TOT_FEMALE,...,HWAC_FEMALE,HBAC_MALE,HBAC_FEMALE,HIAC_MALE,HIAC_FEMALE,HAAC_MALE,HAAC_FEMALE,HNAC_MALE,HNAC_FEMALE,Age_group
0,50,1,1,Alabama,Autauga County,1,0,54571,26569,28002,...,538,57,48,26,32,9,11,19,10,0
1,50,1,1,Alabama,Autauga County,1,1,3579,1866,1713,...,56,9,5,4,1,0,0,2,1,0 - 17 years
2,50,1,1,Alabama,Autauga County,1,2,3991,2001,1990,...,66,2,3,2,7,2,3,2,0,0 - 17 years
3,50,1,1,Alabama,Autauga County,1,3,4290,2171,2119,...,57,13,7,5,5,2,1,1,1,0 - 17 years
4,50,1,1,Alabama,Autauga County,1,4,4290,2213,2077,...,44,7,5,0,2,2,1,3,1,0 - 17 years


In [24]:
all_data_tot = all_data.loc[all_data['AGEGRP']==0]


In [25]:
all_data_tot = all_data_tot.loc[all_data['YEAR']==12]
all_data_tot.reset_index(inplace=True, drop=True)

In [26]:
fips_all_data = []
for index, row in  all_data_tot.iterrows():
    fips_all_data.append(f'{row["STATE"]:>02}{row["COUNTY"]:>03}')
    
all_data_tot.insert(0, 'FIPS Code', fips_all_data)
all_data_tot.drop(labels=['STATE', 'COUNTY'], axis=1, inplace=True)

In [27]:
all_data_tot

Unnamed: 0,FIPS Code,SUMLEV,STNAME,CTYNAME,YEAR,AGEGRP,TOT_POP,TOT_MALE,TOT_FEMALE,WA_MALE,...,HWAC_FEMALE,HBAC_MALE,HBAC_FEMALE,HIAC_MALE,HIAC_FEMALE,HAAC_MALE,HAAC_FEMALE,HNAC_MALE,HNAC_FEMALE,Age_group
0,01001,50,Alabama,Autauga County,12,0,55869,27092,28777,20878,...,687,89,93,40,27,15,19,16,11,0
1,01003,50,Alabama,Baldwin County,12,0,223234,108247,114987,94810,...,4646,268,281,264,197,69,65,55,35,0
2,01005,50,Alabama,Barbour County,12,0,24686,13064,11622,6389,...,408,63,50,61,26,1,0,14,8,0
3,01007,50,Alabama,Bibb County,12,0,22394,11929,10465,8766,...,253,32,19,6,15,5,1,17,3,0
4,01009,50,Alabama,Blount County,12,0,57826,28472,29354,27258,...,2516,76,58,67,66,18,21,34,21,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3137,56037,50,Wyoming,Sweetwater County,12,0,42343,21808,20535,20446,...,2997,83,73,212,196,33,36,16,9,0
3138,56039,50,Wyoming,Teton County,12,0,23464,12142,11322,11567,...,1578,25,23,105,81,16,15,12,7,0
3139,56041,50,Wyoming,Uinta County,12,0,20226,10224,10002,9753,...,840,17,23,82,111,3,12,8,2,0
3140,56043,50,Wyoming,Washakie County,12,0,7805,3963,3842,3759,...,489,7,9,54,59,7,8,4,2,0


In [106]:
county_2019.reset_index(inplace=True, drop=True)
county_2019.set_index('FIPS Code',drop=True, inplace=True)

In [107]:
county_2019

Unnamed: 0_level_0,STNAME,CTYNAME,POPESTIMATE2019
FIPS Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
01001,Alabama,Autauga County,55869
01003,Alabama,Baldwin County,223234
01005,Alabama,Barbour County,24686
01007,Alabama,Bibb County,22394
01009,Alabama,Blount County,57826
...,...,...,...
56037,Wyoming,Sweetwater County,42343
56039,Wyoming,Teton County,23464
56041,Wyoming,Uinta County,20226
56043,Wyoming,Washakie County,7805


In [29]:
gender_tot_sub = all_data_tot[['FIPS Code', 'TOT_MALE','TOT_FEMALE']]
gender_tot_sub

Unnamed: 0,FIPS Code,TOT_MALE,TOT_FEMALE
0,01001,27092,28777
1,01003,108247,114987
2,01005,13064,11622
3,01007,11929,10465
4,01009,28472,29354
...,...,...,...
3137,56037,21808,20535
3138,56039,12142,11322
3139,56041,10224,10002
3140,56043,3963,3842


In [82]:
census_2019_sex = gender_tot_sub.set_index('FIPS Code', drop=True)

In [30]:
# county_2019 = county_2019.merge(gender_tot_sub, on='FIPS Code', how='outer')
# county_2019

Unnamed: 0,FIPS Code,STNAME,CTYNAME,POPESTIMATE2019,TOT_MALE,TOT_FEMALE
0,01001,Alabama,Autauga County,55869,27092,28777
1,01003,Alabama,Baldwin County,223234,108247,114987
2,01005,Alabama,Barbour County,24686,13064,11622
3,01007,Alabama,Bibb County,22394,11929,10465
4,01009,Alabama,Blount County,57826,28472,29354
...,...,...,...,...,...,...
3137,56037,Wyoming,Sweetwater County,42343,21808,20535
3138,56039,Wyoming,Teton County,23464,12142,11322
3139,56041,Wyoming,Uinta County,20226,10224,10002
3140,56043,Wyoming,Washakie County,7805,3963,3842


###### County_2019 dataframe with age group totals

In [41]:
all_data_age = all_data[['STATE', 'COUNTY', 'YEAR', 'TOT_POP', 'Age_group']]
all_data_age

Unnamed: 0,STATE,COUNTY,YEAR,TOT_POP,Age_group
0,1,1,1,54571,0
1,1,1,1,3579,0 - 17 years
2,1,1,1,3991,0 - 17 years
3,1,1,1,4290,0 - 17 years
4,1,1,1,4290,0 - 17 years
...,...,...,...,...,...
716371,56,45,12,499,65 + years
716372,56,45,12,352,65 + years
716373,56,45,12,229,65 + years
716374,56,45,12,198,65 + years


In [42]:
all_data_age = all_data_age.drop(all_data_age.index[all_data_age["Age_group"]=='0'])

In [43]:
all_data_age = all_data_age.loc[all_data_age['YEAR']==12]
all_data_age

Unnamed: 0,STATE,COUNTY,YEAR,TOT_POP,Age_group
210,1,1,12,3277,0 - 17 years
211,1,1,12,3465,0 - 17 years
212,1,1,12,3851,0 - 17 years
213,1,1,12,3659,0 - 17 years
214,1,1,12,3178,18 - 49 years
...,...,...,...,...,...
716371,56,45,12,499,65 + years
716372,56,45,12,352,65 + years
716373,56,45,12,229,65 + years
716374,56,45,12,198,65 + years


In [57]:
fips_age = []
for index, row in  all_data_age.iterrows():
    fips_age.append(f'{row["STATE"]:>02}{row["COUNTY"]:>03}')
    
all_data_age.insert(0, 'FIPS Code', fips_age)
all_data_age.drop(labels=['STATE', 'COUNTY'], axis=1, inplace=True)

In [58]:
all_data_age

Unnamed: 0,FIPS Code,YEAR,TOT_POP,Age_group
210,01001,12,3277,0 - 17 years
211,01001,12,3465,0 - 17 years
212,01001,12,3851,0 - 17 years
213,01001,12,3659,0 - 17 years
214,01001,12,3178,18 - 49 years
...,...,...,...,...
716371,56045,12,499,65 + years
716372,56045,12,352,65 + years
716373,56045,12,229,65 + years
716374,56045,12,198,65 + years


In [83]:
age_group = pd.DataFrame(all_data_age.groupby(['FIPS Code','Age_group'])['TOT_POP'].sum())

In [84]:
census_2019_age = age_group.unstack()

In [100]:
county_2019

Unnamed: 0,STNAME,CTYNAME,POPESTIMATE2019,TOT_MALE,TOT_FEMALE
0,Alabama,Autauga County,55869,27092,28777
1,Alabama,Baldwin County,223234,108247,114987
2,Alabama,Barbour County,24686,13064,11622
3,Alabama,Bibb County,22394,11929,10465
4,Alabama,Blount County,57826,28472,29354
...,...,...,...,...,...
3137,Wyoming,Sweetwater County,42343,21808,20535
3138,Wyoming,Teton County,23464,12142,11322
3139,Wyoming,Uinta County,20226,10224,10002
3140,Wyoming,Washakie County,7805,3963,3842


In [244]:
census_2019_ethnicity = vaccination_df[[
    'Percent Hispanic',
    'Percent non-Hispanic American Indian/Alaska Native',
    'Percent non-Hispanic Asian',
    'Percent non-Hispanic Black',
    'Percent non-Hispanic Native Hawaiian/Pacific Islander',
    'Percent non-Hispanic White',
    'Percent non-Hispanic Other'
]]
census_2019_ethnicity.reset_index(inplace=True)
type(f"{census_2019_ethnicity['FIPS Code'][0]}")

str

In [257]:
fips_ethnicity = []
for index, row in  census_2019_ethnicity.iterrows():
    row_string = f"{row['FIPS Code']:.0f}"
    fips_ethnicity.append(f"{row_string:>05}")

In [259]:
fips_ethnicity[0]

'01001'

In [260]:
census_2019_ethnicity.drop('FIPS Code', axis=1)

Unnamed: 0,Percent Hispanic,Percent non-Hispanic American Indian/Alaska Native,Percent non-Hispanic Asian,Percent non-Hispanic Black,Percent non-Hispanic Native Hawaiian/Pacific Islander,Percent non-Hispanic White,Percent non-Hispanic Other
0,0.0283,0.0025,0.0103,0.1900,0.0001,0.7460,0.6548
1,0.0456,0.0065,0.0092,0.0917,0.0000,0.8307,0.6710
2,0.0436,0.0029,0.0048,0.4744,0.0000,0.4581,0.3463
3,0.0257,0.0013,0.0012,0.2214,0.0000,0.7453,0.6354
4,0.0926,0.0007,0.0037,0.0153,0.0004,0.8689,0.7923
...,...,...,...,...,...,...,...
3137,0.1588,0.0102,0.0074,0.0112,0.0003,0.7956,0.5841
3138,0.1503,0.0033,0.0125,0.0124,0.0012,0.8134,0.5153
3139,0.0913,0.0065,0.0016,0.0011,0.0000,0.8752,0.6935
3140,0.1423,0.0052,0.0000,0.0004,0.0000,0.8190,0.5881


In [262]:
census_2019_ethnicity

Unnamed: 0,FIPS Code,Percent Hispanic,Percent non-Hispanic American Indian/Alaska Native,Percent non-Hispanic Asian,Percent non-Hispanic Black,Percent non-Hispanic Native Hawaiian/Pacific Islander,Percent non-Hispanic White,Percent non-Hispanic Other
0,01001,0.0283,0.0025,0.0103,0.1900,0.0001,0.7460,0.6548
1,01003,0.0456,0.0065,0.0092,0.0917,0.0000,0.8307,0.6710
2,01005,0.0436,0.0029,0.0048,0.4744,0.0000,0.4581,0.3463
3,01007,0.0257,0.0013,0.0012,0.2214,0.0000,0.7453,0.6354
4,01009,0.0926,0.0007,0.0037,0.0153,0.0004,0.8689,0.7923
...,...,...,...,...,...,...,...,...
3137,56037,0.1588,0.0102,0.0074,0.0112,0.0003,0.7956,0.5841
3138,56039,0.1503,0.0033,0.0125,0.0124,0.0012,0.8134,0.5153
3139,56041,0.0913,0.0065,0.0016,0.0011,0.0000,0.8752,0.6935
3140,56043,0.1423,0.0052,0.0000,0.0004,0.0000,0.8190,0.5881


In [184]:
census_2019_combined = county_2019
census_2019_combined = census_2019_combined.merge(census_2019_sex, how='inner', left_index=True, right_index=True)
census_2019_combined = census_2019_combined.merge(census_2019_age['TOT_POP'], how='inner', left_index=True, right_index=True)

In [182]:
census_2019_combined = census_2019_combined.merge(census_2019_ethnicity, how='inner', left_index=True, right_index=True)

In [185]:
census_2019_combined

Unnamed: 0_level_0,STNAME,CTYNAME,POPESTIMATE2019,TOT_MALE,TOT_FEMALE,0 - 17 years,18 - 49 years,50 - 64 years,65 + years
FIPS Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
01001,Alabama,Autauga County,55869,27092,28777,14252,21652,11041,8924
01003,Alabama,Baldwin County,223234,108247,114987,52268,77402,46734,46830
01005,Alabama,Barbour County,24686,13064,11622,5595,9477,4753,4861
01007,Alabama,Bibb County,22394,11929,10465,4992,9233,4436,3733
01009,Alabama,Blount County,57826,28472,29354,14522,21002,11488,10814
...,...,...,...,...,...,...,...,...,...
56037,Wyoming,Sweetwater County,42343,21808,20535,12049,16959,7846,5489
56039,Wyoming,Teton County,23464,12142,11322,4586,10694,4467,3717
56041,Wyoming,Uinta County,20226,10224,10002,6215,7229,3757,3025
56043,Wyoming,Washakie County,7805,3963,3842,1960,2506,1609,1730


In [137]:
geo_points = vaccination_df['Geographical Point']

FIPS Code
1001     POINT (-86.844516 32.756889)
1003     POINT (-86.844516 32.756889)
1005     POINT (-86.844516 32.756889)
1007     POINT (-86.844516 32.756889)
1009     POINT (-86.844516 32.756889)
                     ...             
56037    POINT (-107.55145 42.999627)
56039    POINT (-107.55145 42.999627)
56041    POINT (-107.55145 42.999627)
56043    POINT (-107.55145 42.999627)
56045    POINT (-107.55145 42.999627)
Name: Geographical Point, Length: 3142, dtype: object

In [163]:
geo_points[1001]

'POINT (-86.844516 32.756889)'

In [152]:
test = geo_points[1001].split(' ')
lat=test[2]
lng=test[1]
print(lat,lng)

32.756889) (-86.844516


In [153]:
lng[1:len(lng)]

'-86.844516'

In [169]:
# points = [geo_points[1001]]
lat = []
lng = []

for point in geo_points:
     (a, b, c) = point.split(' ')
     lat.append(c[0:len(c)-1])
     lng.append(b[1:len(b)])

In [171]:
len(lat)

3142

In [172]:
len(lng)

3142

In [176]:
census_2019_combined['Latitude'] = lat
census_2019_combined['Longitude'] = lng

In [177]:
census_2019_combined

Unnamed: 0_level_0,STNAME,CTYNAME,POPESTIMATE2019,TOT_MALE,TOT_FEMALE,0 - 17 years,18 - 49 years,50 - 64 years,65 + years,Latitude,Longitude
FIPS Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
01001,Alabama,Autauga County,55869,27092,28777,14252,21652,11041,8924,32.756889,-86.844516
01003,Alabama,Baldwin County,223234,108247,114987,52268,77402,46734,46830,32.756889,-86.844516
01005,Alabama,Barbour County,24686,13064,11622,5595,9477,4753,4861,32.756889,-86.844516
01007,Alabama,Bibb County,22394,11929,10465,4992,9233,4436,3733,32.756889,-86.844516
01009,Alabama,Blount County,57826,28472,29354,14522,21002,11488,10814,32.756889,-86.844516
...,...,...,...,...,...,...,...,...,...,...,...
56037,Wyoming,Sweetwater County,42343,21808,20535,12049,16959,7846,5489,42.999627,-107.55145
56039,Wyoming,Teton County,23464,12142,11322,4586,10694,4467,3717,42.999627,-107.55145
56041,Wyoming,Uinta County,20226,10224,10002,6215,7229,3757,3025,42.999627,-107.55145
56043,Wyoming,Washakie County,7805,3963,3842,1960,2506,1609,1730,42.999627,-107.55145


In [178]:
census_2019_combined.to_csv('analysis_data/census_2019.csv')

### Covid-19 Case Surveillance


In [38]:
months = ['2020-01', '2020-02', '2020-03', '2020-04', '2020-05',
          '2020-06', '2020-07', '2020-08', '2020-09', '2020-10',
          '2020-11', '2020-12', '2021-01', '2021-02', '2021-03']

# fields = 'case_month, county_fips_code, current_status, sex, age_group, race, ethnicity, hosp_yn, icu_yn, death_yn'
fields = 'case_month, county_fips_code, hosp_yn, icu_yn, death_yn'
fips.append('NA')

patients_df = pd.DataFrame(index=fips)
hospitalized_df = pd.DataFrame(index=fips)
icu_df = pd.DataFrame(index=fips)
death_df = pd.DataFrame(index=fips)

patients_df.index.rename('FIPS Code', inplace=True)
hospitalized_df.index.rename('FIPS Code', inplace=True)
icu_df.index.rename('FIPS Code', inplace=True)
death_df.index.rename('FIPS Code', inplace=True)

In [None]:
query_url = "https://data.cdc.gov/resource/n8mc-b4w4.json?"
params = {
    '$$app_token': cdc_token,
    '$limit': 25000000,
    '$offset': 0,
    '$select': fields
}

In [None]:
months = ['2020-11']

## Print Log Header
print("Beginning Data Retrieval")
print("------------------------------")

## Retrieve Loop
for month in months:
    
    ## Print Log Status
    print(f"Processing Month: {month} [{datetime.datetime.now().strftime('%H:%M:%S')}]")
    
    ## Set month query
    params['case_month'] = month
    ## Retrieve month data & Store in DataFrame
    response_month = requests.get(query_url, params=params).json()
    response_df = pd.DataFrame(response_month)

    patients_df[month] = response_df.groupby('county_fips_code')['case_month'].count()
    hospitalized_df[month] = response_df.loc[response_df['hosp_yn'] == 'Yes'].groupby('county_fips_code')['hosp_yn'].count()
    icu_df[month] = response_df.loc[response_df['icu_yn'] == 'Yes'].groupby('county_fips_code')['icu_yn'].count()
    death_df[month] = response_df.loc[response_df['death_yn'] == 'Yes'].groupby('county_fips_code')['death_yn'].count()

    if month != months[-1]:
        print("Sleeping...")
        time.sleep(60*30)

## Print Log Footer
print("------------------------------")        
print("Data Retrieval Complete")
print("------------------------------")

In [None]:
patients_df

In [None]:
patients_df.fillna(0, inplace=True)
hospitalized_df.fillna(0, inplace=True)
icu_df.fillna(0, inplace=True)
death_df.fillna(0, inplace=True)

patients_df.to_csv('clean_data/patients.csv')
hospitalized_df.to_csv('clean_data/hospitalized.csv')
icu_df.to_csv('clean_data/icu.csv')
death_df.to_csv('clean_data/death.csv')

In [None]:
patients_df.sum()

In [None]:
hospitalized_df.sum()

In [None]:
icu_df.sum()

In [None]:
death_df.sum()

### Pendientes

1. Un DF con 18 meses (columnas) por: Deidentified Patients, Hosp_YN, ICU_YN, Death_YN
2. Al final, agregar suma de todos los meses
3. Un DF con todos los totales [patients, hosp, ]
4. Un DF con todos los totales [patients, hosp, ] (por c/100k hab)
    (total*pop/100,000)
5. Agregar Columna  Vacunados en 4 y 5


### Gráficas

...Peores fips por definir

1. Vacunados vs. tiempo (total/estado) + Pacientes vs. tiempo
2. Línea de tiempo (peores fips) vacunados vs afectados
3. Caracterizar (sexo, edad, etnicidad) a la media de los peores fips vs media mejores fips
4. Peores/Mejores fips: Avance vacunación (stacked bars)
5. Scatter (pacientes, vacunados) por fips
    a. Pearson + LinRegress
6. Heatmap (vacunación vs. afectados)
7. Regresiones por sexo, edad, grupo étnico
8. Barras agrupadas por grupo étnico





### Respaldo Mariana

In [None]:
import pandas as pd
import sqlalchemy
from sodapy import Socrata

socrata_domain = 'data.cdc.gov'
socrata_dataset_identifier = "n8mc-b4w4"

# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata(socrata_domain, None)

#Get metadata
metadata = client.get_metadata(socrata_dataset_identifier)
[x['name'] for x in metadata['columns']]

In [None]:
results = client.get(socrata_dataset_identifier,limit = 24441351,
                      #where = "current_status"=="Laboratory-confirmed case",
                     select="county_fips_code,case_month,current_status,sex,age_group,race,ethnicity,hosp_yn,icu_yn,death_yn"
                    )
                    
tryout_df = pd.DataFrame.from_records(results)

In [None]:
tryout_df.head()

In [None]:
tryout_df['current_status'].value_counts()

In [None]:
url = https://data.cdc.gov/resource/n8mc-b4w4.json?case_month=2020-12&current_status=Laboratory-confirmed case