# Introduction

It can be a troubling time, but we do have hope on the horizon, with the news we get daily about vaccines. Multiple companies are releasing and getting their vaccines approved; we may  soon see a path forward. 

Using the robust toolset provided by Kaggle, I'll show you how to create an interactive map to track, for each state, the percentage of inhabitants that have been vaccinated against COVID-19.  

To get started, if you haven't already, make your own copy of this notebook by clicking on the **[Copy and Edit]** button in the top right corner. 

This notebook is an example of a project that you can create based on what you'd learn from taking Kaggle's [Geospatial Analysis course](https://www.kaggle.com/learn/geospatial-analysis).

# US Vaccine Tracker

We'll use two datasets.  

- The first dataset has the total number of inhabitants of each state, along with latitude and longitude data for each state's capital city.  This dataset is pulled from the 2019 US Census, and I've uploaded it [here](https://www.kaggle.com/peretzcohen/2019-census-us-population-data-by-state).
- The second dataset contains a recent estimate for the total number of people that have been vaccinated in each state.  This [vaccine dataset](https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/us_state_vaccinations.csv) is drawn from [Our World In Data](https://ourworldindata.org/), who update their vaccine datasets from the CDC quite regularly.  Every time you run this notebook, you'll use the most recent version of their data.

In the next code cell, we load and preprocess the data.  As output, you'll see the total percent of the population that has been vaccinated in the US, along with a preview of the Pandas DataFrame that we'll use to make the tracker.

In [1]:
# Imports
import pandas as pd
import datetime
from datetime import date, timedelta
import folium
from folium import Marker
from folium.plugins import MarkerCluster
import math
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# Population Data
populationData = pd.read_csv('/kaggle/input/2019-census-us-population-data-by-state/2019_Census_US_Population_Data_By_State_Lat_Long.csv')

# Preview
print(f'Rows: {populationData.shape[0]}, Columns: {populationData.shape[1]}')
populationData.head()

Rows: 51, Columns: 4


Unnamed: 0,STATE,POPESTIMATE2019,lat,long
0,Alabama,4903185,32.377716,-86.300568
1,Alaska,731545,58.301598,-134.420212
2,Arizona,7278717,33.448143,-112.096962
3,Arkansas,3017804,34.746613,-92.288986
4,California,39512223,38.576668,-121.493629


In [3]:
# Vaccination data, for most recent date
vaccinationData = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/us_state_vaccinations.csv',)

# Preview
print(f'Rows: {vaccinationData.shape[0]}, Columns: {vaccinationData.shape[1]}')
vaccinationData.head()

Rows: 45220, Columns: 16


Unnamed: 0,date,location,total_vaccinations,total_distributed,people_vaccinated,people_fully_vaccinated_per_hundred,total_vaccinations_per_hundred,people_fully_vaccinated,people_vaccinated_per_hundred,distributed_per_hundred,daily_vaccinations_raw,daily_vaccinations,daily_vaccinations_per_million,share_doses_used,total_boosters,total_boosters_per_hundred
0,2021-01-12,Alabama,78134.0,377025.0,70861.0,0.15,1.59,7270.0,1.45,7.69,,,,0.207,,
1,2021-01-13,Alabama,84040.0,378975.0,74792.0,0.19,1.71,9245.0,1.53,7.73,5906.0,5906.0,1205.0,0.222,,
2,2021-01-14,Alabama,92300.0,435350.0,80480.0,,1.88,,1.64,8.88,8260.0,7083.0,1445.0,0.212,,
3,2021-01-15,Alabama,100567.0,444650.0,86956.0,0.28,2.05,13488.0,1.77,9.07,8267.0,7478.0,1525.0,0.226,,
4,2021-01-16,Alabama,,,,,,,,,,7498.0,1529.0,,,


In [4]:
# Info
vaccinationData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45220 entries, 0 to 45219
Data columns (total 16 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   date                                 45220 non-null  object 
 1   location                             45220 non-null  object 
 2   total_vaccinations                   31349 non-null  float64
 3   total_distributed                    31089 non-null  float64
 4   people_vaccinated                    31085 non-null  float64
 5   people_fully_vaccinated_per_hundred  29531 non-null  float64
 6   total_vaccinations_per_hundred       29630 non-null  float64
 7   people_fully_vaccinated              31005 non-null  float64
 8   people_vaccinated_per_hundred        29606 non-null  float64
 9   distributed_per_hundred              29610 non-null  float64
 10  daily_vaccinations_raw               28441 non-null  float64
 11  daily_vaccinations          

In [5]:
# Get the most recent date for filtering
freshDate = vaccinationData.date.max()
freshDate

'2022-12-14'

In [6]:
# Filter by the most recent date
vaccinationByLocation = vaccinationData.loc[(vaccinationData.date == freshDate)][["location", "people_vaccinated"]]
vaccinationByLocation

Unnamed: 0,location,people_vaccinated
701,Alabama,3175322.0
1403,Alaska,531514.0
2105,American Samoa,46120.0
2807,Arizona,5594106.0
3509,Arkansas,2097958.0
...,...,...
42411,Virginia,7711567.0
43113,Washington,6452206.0
43815,West Virginia,1204857.0
44517,Wisconsin,4353385.0


In [7]:
# Vaccination and population data
vaccinationAndPopulationByLocation = pd.merge(populationData, vaccinationByLocation, left_on='STATE',right_on='location').drop(columns="location")

# Calculate percentage vaccinated by state
vaccinationAndPopulationByLocation["percent_vaccinated"] = vaccinationAndPopulationByLocation["people_vaccinated"] / vaccinationAndPopulationByLocation["POPESTIMATE2019"]

# Preview
vaccinationAndPopulationByLocation.sample(5, random_state=42)

Unnamed: 0,STATE,POPESTIMATE2019,lat,long,people_vaccinated,percent_vaccinated
13,Illinois,12671821,39.798363,-89.654961,9991032.0,0.788445
39,South Carolina,5148714,34.000343,-81.033211,3637370.0,0.706462
30,New Jersey,8882190,40.220596,-74.769913,8358862.0,0.941081
45,Virginia,8535519,37.538857,-77.43364,7711567.0,0.903468
17,Kentucky,4467673,38.186722,-84.875374,3061282.0,0.685207


In [8]:
print("Date ran:", date.today())

# Calculate the total percent vaccinated in the US
percentageTotal = vaccinationAndPopulationByLocation["people_vaccinated"].sum() / vaccinationAndPopulationByLocation["POPESTIMATE2019"].sum()
print('Percentage Vaccinated in the US: {}% for {}'.format(round(percentageTotal*100, 2), freshDate)) 

Date ran: 2022-12-19
Percentage Vaccinated in the US: 79.28% for 2022-12-14


The next code cell uses the data to create a tracker, with one marker for each state.  You can click on the markers to see the percentage of the population that has been vaccinated.

In [9]:
# Create the map
v_map = folium.Map(location=[42.32,-71.0589], tiles='cartodbpositron', zoom_start=4) 

# Add points to the map
mc = MarkerCluster()
for idx, row in vaccinationAndPopulationByLocation.iterrows(): 
    if not math.isnan(row['long']) and not math.isnan(row['lat']):
        mc.add_child(Marker(location=[row['lat'], row['long']],
                            tooltip=str(round(row['percent_vaccinated']*100, 2))+"%"))
v_map.add_child(mc)

# Display the map
v_map

# Your turn

Here are some ideas for how you might improve on the work here:
- In Kaggle's [Geospatial Analysis course](https://www.kaggle.com/learn/geospatial-analysis), you learn how to use folium to create many different types of interactive maps.  How might you use this data to instead create a choropleth map?
- In case you would like to work with more data sources,
  - The Centers for Disease Control and Prevention (CDC) in the US releases daily vaccine data and has a vaccination progress tracker on its [COVID Data Tracker site](https://covid.cdc.gov/covid-data-tracker/#vaccinations).
  - NBC News has a [vaccine tracker](https://www.nbcnews.com/health/health-news/map-covid-19-vaccination-tracker-across-u-s-n1252085) as well which is quite well done.
  
Once you have created your own extension of this work, let us know about it in the comments!