# 141B Final Project: Vaccine Hesitancy in the United States
## by Antonio Pelayo, Gianni Spiga, and Sharon Vien

In this notebook, we explore data provided by the CDC in regards to the hesistancy of adults of different races in the United States in hopes to find any patterns or trends that might help better prepare for more efficient vaccine rollout on a nationwide and statewide scale in the future.

In [7]:
import numpy as np
import pandas as pd
import requests

### Extracting the Data

In [10]:
# Request hesitancy data
hesitancy_endpoint = 'https://data.cdc.gov/resource/q9mh-h2tw.json?$limit=4000'

r = requests.get(hesitancy_endpoint)
hesitancy_df = pd.DataFrame(r.json())
hesitancy_df.head()

Unnamed: 0,fips_code,county_name,state,estimated_hesitant,estimated_hesitant_or_unsure,estimated_strongly_hesitant,social_vulnerability_index,svi_category,ability_to_handle_a_covid,cvac_category,...,percent_non_hispanic_asian,percent_non_hispanic_black,percent_non_hispanic_native,percent_non_hispanic_white,geographical_point,state_code,county_boundary,state_boundary,:@computed_region_hjsp_umg2,:@computed_region_skr5_azej
0,1123,"Tallapoosa County, Alabama",ALABAMA,0.1806,0.24,0.1383,0.89,Very High Vulnerability,0.64,High Concern,...,0.0036,0.2697,0.0,0.6887,"{'type': 'Point', 'coordinates': [-86.844516, ...",AL,"{'type': 'MultiPolygon', 'coordinates': [[[[-8...","{'type': 'MultiPolygon', 'coordinates': [[[[-8...",29,94
1,1121,"Talladega County, Alabama",ALABAMA,0.1783,0.235,0.1368,0.87,Very High Vulnerability,0.84,Very High Concern,...,0.0061,0.3237,0.0003,0.6263,"{'type': 'Point', 'coordinates': [-86.844516, ...",AL,"{'type': 'MultiPolygon', 'coordinates': [[[[-8...","{'type': 'MultiPolygon', 'coordinates': [[[[-8...",29,94
2,1131,"Wilcox County, Alabama",ALABAMA,0.1735,0.2357,0.1337,0.93,Very High Vulnerability,0.94,Very High Concern,...,0.0003,0.6938,0.0,0.2684,"{'type': 'Point', 'coordinates': [-86.844516, ...",AL,"{'type': 'MultiPolygon', 'coordinates': [[[[-8...","{'type': 'MultiPolygon', 'coordinates': [[[[-8...",29,94
3,1129,"Washington County, Alabama",ALABAMA,0.1735,0.2357,0.1337,0.73,High Vulnerability,0.82,Very High Concern,...,0.0025,0.2354,0.0,0.6495,"{'type': 'Point', 'coordinates': [-86.844516, ...",AL,"{'type': 'MultiPolygon', 'coordinates': [[[[-8...","{'type': 'MultiPolygon', 'coordinates': [[[[-8...",29,94
4,1133,"Winston County, Alabama",ALABAMA,0.1805,0.2313,0.1379,0.7,High Vulnerability,0.8,High Concern,...,0.0016,0.0073,0.0005,0.937,"{'type': 'Point', 'coordinates': [-86.844516, ...",AL,"{'type': 'MultiPolygon', 'coordinates': [[[[-8...","{'type': 'MultiPolygon', 'coordinates': [[[[-8...",29,94


### Addressing and Understanding our Variables

In [11]:
hesitancy_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3142 entries, 0 to 3141
Data columns (total 23 columns):
 #   Column                         Non-Null Count  Dtype 
---  ------                         --------------  ----- 
 0   fips_code                      3142 non-null   object
 1   county_name                    3142 non-null   object
 2   state                          3142 non-null   object
 3   estimated_hesitant             3142 non-null   object
 4   estimated_hesitant_or_unsure   3142 non-null   object
 5   estimated_strongly_hesitant    3142 non-null   object
 6   social_vulnerability_index     3141 non-null   object
 7   svi_category                   3141 non-null   object
 8   ability_to_handle_a_covid      3142 non-null   object
 9   cvac_category                  3142 non-null   object
 10  percent_adults_fully           2864 non-null   object
 11  percent_hispanic               3142 non-null   object
 12  percent_non_hispanic_american  3142 non-null   object
 13  per

#### Variable Definitions
1. fips_code = numbers that uniquely identify geographic areas  
2. county_name = name of county  
3. state = name of state  
4. estimated_hesitant = percent of population that indicated they would "probably not" or "definitely not" receive a COVID-19 vaccine when available  
5. estimated_hesitant_or_unsure = percent of population that indicated they would "probably not" or "unsure" or "definitely not" receive a COVID-19 vaccine when available  
6. estimated_strongly_hesitant = percent of population that indicated they would "definitely not" receive a COVID-19 vaccine when available  
7. social_vulnerability_index = the extent to which a community is socially vulnerable to disaster. The factors considered in developing the SVI include economic data as well as data regarding education, family characteristics, housing language ability, ethnicity, and vehicle access.   
8. svi_category = low, moderate, high, very high vulnerability   
9. ability_to_handle_a_covid = ability to handle Covid-19 outbreak in percent  
10. cvac_category = level of concern for vaccine rollout   
11. percent_adults_fully = percent of adults fully vaccinated  
12. percent_hispanic = percent of hispanic adults fully vaccinated   
13. percent_non_hispanic_american = percent of non hispanic american adults fully vaccinated    
14. percent_non_hispanic_asian = percent of non hispanic asian adults fully vaccinated  
15. percent_non_hispanic_black = percent of non hispanic black adults fully vaccinated  
16. percent_non_hispanic_native = percent of non hispanic native adults fully vaccinated  
17. percent_non_hispanic_white = percent of non hispanic white adults fully vaccinated  
18. geographical point = where the county lies  
19. state_code = code of the state  
20. county_boundary = coordinates of boundary  
21. state_boundary = coordinate of state boundary

### Visualizing the Data