# Global Covid-19 Vaccination Data Exploration
#### Kendall Dyke
___
## Geospatial Data Visualization

### Choropleth Maps [Visualization Technique]

In [None]:
from IPython.display import Image

I used geospatial data visualization, specifically a choropleth map to explore global COVID-19 vaccination progress. A choropleth map is a map that represents patterns using coloration based on certain geographic boundaries. In my example, I am going to use country boundaries to explore how many total vaccinations have been given in each country. Green countries have the most vaccinations per 100 people and red countries have the least.

A choropleth map can be used to show univariate data over a geographic region. In order to show higher dimensional data, some amount of interactivity needs to be added to the choropleth map. Additional information can be added via tooltip, interactive elements that allow the user to change which variables is represented, etc.

This is not the most effective visualization for showing change over time. Other visualizations such as bar charts, line charts, etc. are better at showing change over time. However, if you want to show a snapshot in time, a choropleth map is great because the viewer can see patterns in which regions have more or less vaccinations.


##### Example Choropleth Map

In [None]:
Image("../input/github-repository/download.png")

___
### Folium [Visualization Package]

[Folium Documentation](https://python-visualization.github.io/folium/)

Folium is an open source Python library that specializes in geospatial data visualization. It is easy to install using either pip install or conda install in your Python environment.

In general, it is a declarative programming package. Folium has many built in classes (eg. Choropleth() used below) that help the user explain what features should be included on the map without having to tell Folium exactly how to plot the Polygon objects. This being said, Folium itself is somewhat limited in how much a map can be customized. However, it integrates with other packages like brewer (for color map creation) and Vega/VegaLite (for additional annotations on the plot). This allows for more customization, outside of Folium's limitations.

Folium has a built-in class specifically for Choropleth maps. This class makes it very easy to specify which values should be included on the map without having to do too much overhead data manipulation. I also considered, Matplotlib, Bokeh, and Plotly. I decided on Folium because it seemed to have the most straight-forward process for creating choropleth maps. Further, I have used Matplotlib, Bokeh and Plotly in the past so I wanted to branch my learning into a new area.

___
### Creating my own Choropleth Map [Demonstration]

Geopandas allows for geospatial data structures. It makes data manipulation seamless with other pandas dataframes. We need to install it to injest the country boundary objects (Polygons and MultiPolygons).

[GeoPandas Documentation](https://geopandas.org/)

In [None]:
#uncomment if the geopandas package needs to be installed in your environment

#!pip install geopandas

In [None]:
import geopandas
import pandas as pd
import numpy as np
import json
from shapely.geometry import Polygon
from shapely.geometry import MultiPolygon
from datetime import datetime
import folium

#### Data Pre-Processing

<b>Data Source #1 -- COVID-19 Vaccination Data

I used a [Kaggle dataset](https://www.kaggle.com/gpreda/covid-world-vaccination-progress) to gather Covid-19 immunization data from many countries. This dataset is created and manged by a Romanian data scientist named [Gabriel Preda](https://www.kaggle.com/gpreda). It gets updated about once a day and gathers each country's data from various goverment websites.

In [None]:
# Data Source: https://www.kaggle.com/gpreda/covid-world-vaccination-progress

#load csv data
vaccinations = pd.read_csv("../input/covid-world-vaccination-progress/country_vaccinations.csv")



#drop rows that have no vaccination data for that day
vaccinations.dropna(subset=['total_vaccinations',
       'people_vaccinated', 'people_fully_vaccinated',
       'daily_vaccinations_raw', 'daily_vaccinations',
       'total_vaccinations_per_hundred', 'people_vaccinated_per_hundred',
       'people_fully_vaccinated_per_hundred', 'daily_vaccinations_per_million',
       'vaccines'], how="all", inplace=True)

<b> Raw Sample of COVID-19 Vaccination Dataset

In [None]:
vaccinations.head(3)



The vaccination dataset has a row for each date in which the dataset was updated. We are only interested in the most recent data for each country.

In [None]:
# filter on most recent date for each country
vaccinations = vaccinations.join(vaccinations.groupby('country').date.max().rename('MaxDate'), on="country")
vaccinations_recent = vaccinations[vaccinations["date"]==vaccinations["MaxDate"]]

A few of the ISO codes have a prefix "OWID_". We need to remove that prefix in order to properly match up each country with the geographic dataset.

In [None]:
# remove the ISO code prefix from a few columns
vaccinations_recent["iso_code"] = vaccinations_recent["iso_code"].str.replace("OWID_", "")
vaccinations_recent = vaccinations_recent.set_index("iso_code") # set index to ISO code

In [None]:
vaccinations_recent.head(3)

Some of the columns have many null values. It's best if we focus on columns with little to no null values.

In [None]:
vaccinations_recent.isnull().sum()

<b>Data Source #2 -- Geographic Data (Country Polygons)
    
The second data source comes from a [Folium GitHub repository](https://github.com/python-visualization/folium/blob/master/examples/data/world-countries.json). This dataset contains country names along with Shapely Polygon/MultiPolygon objects to aid in plotting the country shapes.

In [None]:
# Data Source: https://github.com/python-visualization/folium/blob/master/examples/data/world-countries.json

world2 = geopandas.read_file("../input/github-repository/world_countries.json") #read dataset
world2 = world2.set_index('id') #set index to ISO code to match vaccinations_recent

Raw geographic data source sample

In [None]:
world2.head(3)

We don't want to plot shapes on countries where we have no vaccination data. To eliminate these countries, we do an inner merge with the COVID-19 vaccination dataset.

In [None]:
# merge the two datasets together so that they only include countries where we have vaccination data
world2 = world2.merge(vaccinations_recent, how = "inner", left_index=True, right_index=True)
vaccinations_recent = vaccinations_recent.reset_index()

#### Choropleth Map with Folim

First step is initializing a Folium map. We want to center it on the global map (location=[0,0]) and initialize the zoom so that we see the whole map.


<b> Add plot Title
    
    
Add a title to map using an HTML object.

In [None]:
m = folium.Map(location=[0, 0], zoom_start=2)

In [None]:
# https://stackoverflow.com/questions/61928013/adding-a-title-or-text-to-a-folium-map
loc = 'Global COVID-19 Vaccination Data -- Total Vaccinations per 100 People (as of March 21, 2021)'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(loc)   

m.get_root().html.add_child(folium.Element(title_html));

<b> Preview

Initialized Map in Folium (no annotation yet)

In [None]:
m

<b> Adding Choropleth Map

Use the Folium Choropleth() class to plot Choropleth shapes over each country with vaccination data.

<b> Total Vaccinations per 100 People
    
I chose to plot total vaccinations per 100 people because it puts all countries on a more consistent scale. (Countries with bigger populations won't be heavily weighted vs smaller countries.)
    
    
<b> Binning Color Scale
    
It's required to bin the color scale because the distribution of vaccinations per 100 people is very heavily right skewed. There are many more low values than high values. The default scaling method would result in mostly red on the map. Chaning how the color scale is binned, allows the viewer to see more granularity in the many countries with lower vaccination counts, to date.

In [None]:
choropleth = folium.Choropleth(
    geo_data=world2, #geographic data
    name="choropleth",
    data=vaccinations_recent, # vaccination data
    columns=["iso_code", "total_vaccinations_per_hundred"],
    key_on="feature.id",
    fill_color = "RdYlGn", #specify red, yellow, green color scheme
    fill_opacity=0.8,
    line_opacity=.1,
    legend_name="Total Vaccinations per 100 People",# legend title (below legend)
    bins = [0, 5, 15, 30, 60, vaccinations_recent["total_vaccinations_per_hundred"].max()] # bin color scale
).add_to(m)


<b> Tooltip

Add tooltip with country name, total vaccination count per 100 people, and available vaccines.

In [None]:
# https://github.com/python-visualization/folium/issues/1074

#add tooltip
choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['country','total_vaccinations_per_hundred', 'vaccines'], 
                                   aliases = ["Country", "Vaccinations per 100 people", "Available Vaccines"])
);

#### Final Plot: View or Save HTML

In [None]:
#Uncomment the following line to save to HTML

# m.save('index.html')

#show plot
m

### Results & Conclusions

#### Results
The countries that are not included in the vaccination dataset are gray. It appears that most "gray" countries are in under-developed countries. The downfall with this dataset is that we don't know whether these countries don't have vaccines or have not published the number of vaccines available in publically available online form.

It's also important to note why some countries (like Israel) have over 100 vaccines per 100 people. On the surface, this may seem like a data issue, however this is because many of the vaccines require two doses. In this case, they need 200 vaccines/100 people to be complete.


#### Conclusions
Folium is great for building Choropleth maps. It makes it very easy to overlay the shapes onto an existing map. However, if you want to customize further, it's going to require expertise on different Python packages like Altair or even some HTML.