# 6. Us vaccine tracker

In this tutorial, we will use what we have learned from the previous 5 lessons to create an interactive map to track, for each state, the percentage of inhabitants that have been vaccinated against COVID-19.

We'll use two datasets.

- The first dataset has the total number of inhabitants of each state, along with latitude and longitude data for each state's capital city.  This dataset is pulled from the 2019 US Census, and you can check the source [here](https://www.kaggle.com/peretzcohen/2019-census-us-population-data-by-state).
- The second dataset contains a recent estimate for the total number of people that have been vaccinated in each state.  This [vaccine dataset](https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/us_state_vaccinations.csv) is drawn from [Our World In Data](https://ourworldindata.org/), who update their vaccine datasets from the CDC quite regularly.  Every time you run this notebook, you'll use the most recent version of their data.

In the next code cell, we load and preprocess the data.  As output, you'll see the total percent of the population that has been vaccinated in the US, along with a preview of the Pandas DataFrame that we'll use to make the tracker.

In [1]:
import pandas as pd
from datetime import date, timedelta
import folium
from folium import Marker
from folium.plugins import MarkerCluster
import math
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
root_path="/home/pliu/data_set/kaggle/geospatial/L06"
population_file_path=f"{root_path}/2019_Census_US_Population_Data_By_State_Lat_Long.csv"
# Population Data
populationData = pd.read_csv(population_file_path)


In [3]:
populationData.head()

Unnamed: 0,STATE,POPESTIMATE2019,lat,long
0,Alabama,4903185,32.377716,-86.300568
1,Alaska,731545,58.301598,-134.420212
2,Arizona,7278717,33.448143,-112.096962
3,Arkansas,3017804,34.746613,-92.288986
4,California,39512223,38.576668,-121.493629


In [7]:
# Get the most recent date for filtering
freshDate = date.today() - timedelta(days=500)
freshDate = date.strftime(freshDate,"%Y%m%d")
freshDate = freshDate[0:4] + "-" + freshDate[4:6] + "-" + freshDate[6:8]

print(freshDate)


2021-06-17


In [8]:

# Vaccination data, for most recent date
vaccinationData = pd.read_csv(f'{root_path}/us_state_vaccinations.csv')
vaccinationByLocation = vaccinationData.loc[(vaccinationData.date == freshDate)][["location", "people_vaccinated"]]

vaccinationByLocation.head()

Unnamed: 0,location,people_vaccinated
156,Alabama,1898131.0
788,Alaska,348126.0
1420,American Samoa,24684.0
2052,Arizona,3518666.0
2684,Arkansas,1234029.0


In [9]:
# Vaccination and population data
vaccinationAndPopulationByLocation = pd.merge(populationData, vaccinationByLocation, left_on='STATE',right_on='location').drop(columns="location")

# Calculate percentage vaccinated by state
vaccinationAndPopulationByLocation["percent_vaccinated"] = vaccinationAndPopulationByLocation["people_vaccinated"] / vaccinationAndPopulationByLocation["POPESTIMATE2019"]

vaccinationAndPopulationByLocation

Unnamed: 0,STATE,POPESTIMATE2019,lat,long,people_vaccinated,percent_vaccinated
0,Alabama,4903185,32.377716,-86.300568,1898131.0,0.387122
1,Alaska,731545,58.301598,-134.420212,348126.0,0.475878
2,Arizona,7278717,33.448143,-112.096962,3518666.0,0.483418
3,Arkansas,3017804,34.746613,-92.288986,1234029.0,0.408916
4,California,39512223,38.576668,-121.493629,23516676.0,0.595175
5,Colorado,5758736,39.739227,-104.984856,3259030.0,0.565928
6,Connecticut,3565287,41.764046,-72.682198,2339636.0,0.656227
7,Delaware,973764,39.157307,-75.519722,553340.0,0.568249
8,District of Columbia,705749,38.89511,-77.03637,419800.0,0.594829
9,Florida,21477737,30.438118,-84.281296,11164738.0,0.519828


In [10]:
# Calculate the total percent vaccinated in the US
percentageTotal = vaccinationAndPopulationByLocation["people_vaccinated"].sum() / vaccinationAndPopulationByLocation["POPESTIMATE2019"].sum()
print('Percentage Vaccinated in the US: {}%'.format(round(percentageTotal*100, 2)))

Percentage Vaccinated in the US: 50.34%


In [11]:
# Create the map
v_map = folium.Map(location=[42.32,-71.0589], tiles='cartodbpositron', zoom_start=4)

# Add points to the map
mc = MarkerCluster()
for idx, row in vaccinationAndPopulationByLocation.iterrows():
    if not math.isnan(row['long']) and not math.isnan(row['lat']):
        mc.add_child(Marker(location=[row['lat'], row['long']],
                            tooltip=str(round(row['percent_vaccinated']*100, 2))+"%"))
v_map.add_child(mc)

# Display the map
v_map

# Your turn

Here are some ideas for how you might improve on the work here:
- In Kaggle's [Geospatial Analysis course](https://www.kaggle.com/learn/geospatial-analysis), you learn how to use folium to create many different types of interactive maps.  How might you use this data to instead create a choropleth map?
- In case you would like to work with more data sources,
  - The Centers for Disease Control and Prevention (CDC) in the US releases daily vaccine data and has a vaccination progress tracker on its [COVID Data Tracker site](https://covid.cdc.gov/covid-data-tracker/#vaccinations).
  - NBC News has a [vaccine tracker](https://www.nbcnews.com/health/health-news/map-covid-19-vaccination-tracker-across-u-s-n1252085) as well which is quite well done.