<a id="4"></a><h1 style='background:#555413; border:3; color:white'><center> Equity in the COVID-19 Vaccination: ideas, codes and worked examples </center></h1>

<center><img 
src="https://images.unsplash.com/photo-1612277795511-39caabca8185?ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&ixlib=rb-1.2.1&auto=format&fit=crop&w=1350&q=80" width="700" height="700"></img></center>

<br>

<a id="4"></a><h1 style='background:#aba926; border:0; color:black'><center> Introduction </center></h1>

COVID-19 has posed a significant threat to the health and well-being. Nearly 105 million cases have been diagnosed, resulting in 2.3 million deaths around the world. While the COVID-19 infections are still surging, the world is patiently waiting for the rollout of a wide-scale vaccination program. Countries such as Isreal (61.7%), UK (16.2%), and the USA (10.5%) among the high-income countries have achieved a rapid expansion in vaccination program. However, the countries like India (0.4%) and Nepal (0.4%) are still waiting for wide-scale vaccination. Further, information on [COVID-19 vaccination](https://ourworldindata.org/covid-vaccinations) is available elsewhere.

In this blog, I will summarize the latest information on COVID-19 vaccination and analyse important facets in global vaccination program, including:
      1. How equitable the vaccination program has been thuse far?
      2. The challenges that lies ahead.
I will also use **webscrapping** from [WorldoDometer](https://www.worldometers.info/), a popular website for COVID-19 statistics, to extend the information on vaccine coverage. 

A massive thanks to [@josephassaker](https://www.kaggle.com/josephassaker) and [@pawanbhandarkar](https://www.kaggle.com/pawanbhandarkar/covid-19-eda-man-vs-disease) for the data and inspirations for carrying out these analyses, with further insights from [@andreshg](https://www.kaggle.com/andreshg), [@soumyadipghorai](https://www.kaggle.com/soumyadipghorai),[@umerkk12](https://www.kaggle.com/umerkk12) and [@taha07](https://www.kaggle.com/taha07). üôèüèΩ

When you are finished reading, please leave your comments and suggestions. Your **upvote** will motivate me to continuously update this blog and bring awesome contents.

**Let's get started.** 

<a id="4"></a><h1 style='background:#aba926; border:0; color:black'><center> Dataset information </center></h1>

* **Country** ‚òû this is the country for which the vaccination information is provided;
* **Country ISO Code** ‚òû ISO code for the country;
* **Date** ‚òû date for the data entry; for some of the dates we have only the daily vaccinations, for others, only the (cumulative) total;
* **Total number of vaccinations** ‚òû this is the absolute number of total immunizations in the country;
* **Total number of people vaccinated** ‚òû a person, depending on the immunization scheme, will receive one or more (typically 2) vaccines; at a certain moment, the number of vaccination might be larger than the number of people;
* **Total number of people fully vaccinated** ‚òû this is the number of people that received the entire set of immunization according to the immunization scheme (typically 2); at a certain moment in time, there might be a certain number of people that received one vaccine and another number (smaller) of people that received all vaccines in the scheme;
* **Daily vaccinations (raw)** ‚òû for a certain data entry, the number of vaccination for that date/country;
* **Daily vaccinations** ‚òû for a certain data entry, the number of vaccination for that date/country;
* **Total vaccinations per hundred** ‚òû ratio (in percent) between vaccination number and total population up to the date in the country;
* **Total number of people vaccinated per hundred** ‚òû ratio (in percent) between population immunized and total population up to the date in the country;
* **Total number of people fully vaccinated per hundred** ‚òû ratio (in percent) between population fully immunized and total population up to the date in the country;
* **Number of vaccinations per day** ‚òû number of daily vaccination for that day and country;
* **Daily vaccinations per million** ‚òû ratio (in ppm) between vaccination number and total population for the current date in the country;
* **Vaccines used in the country** ‚òû total number of vaccines used in the country (up to date);
* **Source name** ‚òû source of the information (national authority, international organization, local organization etc.);
* **Source website** ‚òû website of the source of infomation.

***

Follow the Original Post "[COVID-19 Vaccination Progress](https://www.kaggle.com/gpreda/covid-19-vaccination-progress/comments)" for further information in this dataset and the project itself.

***


<a id="4"></a><h1 style='background:#aba926; border:0; color:black'><center> Data sources </center></h1>

I will combine information on total population size (2020) and GDP per capita (2017) for the each countries with information on vaccination into the [original dataset](../input/covid-world-vaccination-progress/country_vaccinations.csv) listed above, by webscrapping the WoldoMeter website:

1. [GDP per capita (2017)](https://www.worldometers.info/gdp/gdp-per-capita/) ‚òû Okey to use 2017 data. Not likely to have got a lot of changes coming at 2021.  
    1.1 GDP per capita (PPP)
    1.2 GDP per capita (nominal)
    1.3 GDP per capita (nominal) vs world average (i.e. 17,100 USD)
2. [Population (2020)](https://www.worldometers.info/coronavirus/) ‚òû These data are yearly updated.

In [None]:
import math
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns 
import plotly.graph_objs as go
import plotly.figure_factory as ff
from plotly import tools
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.express as px

import plotly.offline as pyo
from datetime import datetime

init_notebook_mode(connected=True)
import warnings
warnings.filterwarnings("ignore")
pyo.init_notebook_mode()

<a id="4"></a><h1 style='background:#aba926; border:0; color:black'><center> Web-Scrapping from WorldMeter Website (new data) üôå </center></h1>

In [None]:
#packages
#if needs, use 'pip install beautifulsoup4' in console to install 
#--Web scrapping packages

!pip install beautifulsoup4

from bs4 import BeautifulSoup
import requests

#-- COVID-19 statistics
website='https://www.worldometers.info/coronavirus/' # url for the site 
website_url=requests.get(website).text
soup = BeautifulSoup(website_url,'html.parser')

my_table1 = soup.find('tbody')

table_data1 = []
for row in my_table1.findAll('tr'):
    row_data = []
    for cell in row.findAll('td'):
        row_data.append(cell.text)
    if(len(row_data) > 0):
        data_item1 = {"country": row_data[1],
                     "TotalCases": row_data[2],
                     "NewCases": row_data[3],
                     "TotalDeaths": row_data[4],
                     "NewDeaths": row_data[5],
                     "TotalRecovered": row_data[6],
                     "ActiveCases": row_data[8],
                     "CriticalCases": row_data[9],
                     "Totcase1M": row_data[10],
                     "Totdeath1M": row_data[11],
                     "TotalTests": row_data[12],
                     "Tottest1M": row_data[13],
                     "Population": row_data[14],
        }
        
        table_data1.append(data_item1)
        df=pd.DataFrame(table_data1)
        
        #remove the rows which has information on contient- that is not going to be use- better to remove
        df=df.tail(-8)

In [None]:
df.head()

In [None]:
#Remove '+' from the select columns
df['NewCases'] = df['NewCases'].str.replace('+','')
df['NewDeaths'] = df['NewDeaths'].str.replace('+','')
df['ActiveCases'] = df['ActiveCases'].str.replace('+','')

In [None]:
df.head(100)

In [None]:
#-- GDP per capita (2017)
website='https://www.worldometers.info/gdp/gdp-per-capita/' # url for the site 
website_url=requests.get(website).text
soup = BeautifulSoup(website_url,'html.parser')

my_table2 = soup.find('tbody')

table_data2 = []
for row in my_table2.findAll('tr'):
    row_data = []
    for cell in row.findAll('td'):
        row_data.append(cell.text)
    if(len(row_data) > 0):
        data_item2 = {"country": row_data[1],
                     "GDP_ppp": row_data[2],
                     "GDP_nominal": row_data[3],
                     "vsWorld_ppp": row_data[4]
        }
        table_data2.append(data_item2)

In [None]:
df2 = pd.DataFrame(table_data2)

#Remove '+' from the select columns
df2['GDP_ppp'] = df2['GDP_ppp'].str.replace('$','')
df2['GDP_nominal'] = df2['GDP_nominal'].str.replace('$','')
df2['vsWorld_ppp'] = df2['vsWorld_ppp'].str.replace('%','')

#revise the name of few countries

df2.country = df2.country.replace().replace({
    "Czechia": "Czech Republic", 
    "United States": "USA", 
    "United Kingdom": "UK", 
    "Isle of Man": "Isle Of Man"
})

#Also remove these three states as they are already part of UK.
df2 = df2[df2.country.apply(lambda x: x not in ['England', 'Scotland', 'Wales'])]

df2.head(200)

<a id="4"></a><h1 style='background:#aba926; border:0; color:black'><center> Using the vaccination data from Kaggle </center></h1>

In [None]:
vacc_df = pd.read_csv("../input/covid-world-vaccination-progress/country_vaccinations.csv")
vacc_df.head()

#revise the name of few countries

vacc_df.country = vacc_df.country.replace().replace({
    "Czechia": "Czech Republic", 
    "United States": "USA", 
    "United Kingdom": "UK", 
    "Isle of Man": "Isle Of Man"
})

#Also remove these three states as they are already part of UK.
vacc_df = vacc_df[vacc_df.country.apply(lambda x: x not in ['England', 'Scotland', 'Wales'])]

In [None]:
print(vacc_df.country.unique().tolist())
print(df.country.unique().tolist())
print(df2.country.unique().tolist())

In [None]:

list(vacc_df.columns)
#vacc_df.head(200)

In [None]:
# define agrregate columns
def aggregate(df: pd.Series, agg_col: str) -> pd.DataFrame:
    
    data = df.groupby("country")[agg_col].max()
    data = pd.DataFrame(data)
    
    return data

In [None]:
#--merging: i) Vaccination data, ii) COVID-19 data, and iii) GDP data

# variables included in summarization
cols_to_summarize = ['people_vaccinated', 
                     'people_vaccinated_per_hundred', 
                     'people_fully_vaccinated', 
                     'people_fully_vaccinated_per_hundred', 
                     'total_vaccinations_per_hundred', 
                     'total_vaccinations', 'daily_vaccinations']

summary = df.set_index("country")
vaccines = vacc_df[['country', 'vaccines']].drop_duplicates().set_index('country')
summary = summary.join(vaccines)

for col in cols_to_summarize:   
    summary1 = summary.join(aggregate(vacc_df, col))
    
#--GDP per capita (2017)

GDP = df2[['country', 'GDP_nominal']].drop_duplicates().set_index('country')
summary1 = summary1.join(GDP)

In [None]:
summary1.head(200)
#print(type('TotalDeaths	')) 
#list(summary1.columns)

In [None]:
#Since all the quantatitive variables are in Str format, they have to be changed to numeric
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
summary1.TotalDeaths= labelencoder.fit_transform(summary1.TotalDeaths)
summary1.Population= labelencoder.fit_transform(summary1.Population)
summary1.TotalCases= labelencoder.fit_transform(summary1.TotalCases)
summary1.daily_vaccinations= labelencoder.fit_transform(summary1.daily_vaccinations)
summary1.TotalTests= labelencoder.fit_transform(summary1.TotalTests)
summary1.GDP_nominal[pd.isnull(summary1.GDP_nominal)]  = 'NaN' #Since it has both str and float types so converting everything into 'NaN'
summary1.GDP_nominal= labelencoder.fit_transform(summary1.GDP_nominal)


In [None]:
summary1['COVIDdeaths'] = summary1.TotalDeaths / summary1.Population * 10000
summary1['tested_positive'] = summary1.TotalCases / summary1.TotalTests * 10000
summary1['GDPvacc'] = summary1.daily_vaccinations / summary1.GDP_nominal * 10000
summary1['GDPtest'] = summary1.TotalTests / summary1.GDP_nominal * 10000

In [None]:
summary1.head(200)
#list(summary1.columns)

<a id="4"></a><h1 style='background:#aba926; border:0; color:black'><center> Data analysis and visualization</center></h1>

<span style="color:green;"> This blog uses Python environment (using Kaggle) for exploratory data analysis and data visualization. The data preparation and analysis is done using Pandas and Numpy libraries. Visualization is done using Matplotlib, Seaborn and Ploty libraries. Please check the documentation guides for respective libraries for further information.üëè </span>

<div class="alert alert-block alert-info"> üìå Please note that data in this blog may not have been updated therefore look out for original data sources for day-to-day process in vaccination.</div>

<br>

üòé Resources: How to style your markdown: [follow this amazing blog](https://www.kaggle.com/shubhamksingh/create-beautiful-notebooks-formatting-tutorial)

***

In [None]:
#helper function
def get_multi_line_title(title:str, subtitle:str):
    return f"{title}<br><sub>{subtitle}</sub>"

def visualize_column(data: pd.DataFrame, xcolumn: str, ycolumn:str, title:str, colors:str, ylabel="Count", n=None):
    hovertemplate ='<br><b>%{x}</b>'+f'<br><b>{ylabel}: </b>'+'%{y}<br><extra></extra>'    
    data = data.sort_values(ycolumn, ascending=False).dropna(subset=[ycolumn])        
    
    if n is not None: 
        data = data.iloc[:n]
    else:
        n = ""
    fig = go.Figure(go.Bar(
                    hoverinfo='skip',
                     x=data[xcolumn], 
                     y=data[ycolumn], 
                     hovertemplate = hovertemplate,
                     marker=dict(
                         color = data[ycolumn],
                         colorscale=colors,
                        ),
                    ),
                )
    
    fig.update_layout(
        title=title,
        xaxis_title=f"Top {n} {xcolumn.title()}",
        yaxis_title=ylabel,
        plot_bgcolor='rgba(0,0,0,0)',
        hovermode="x"
    )
    
    fig.show()  

<a id="4"></a><h1 style='background:#aba926; border:0; color:black'><center> COVID-19, and vaccination statistics by country</center></h1>

In [None]:
vaccine = vacc_df.groupby(["vaccines"])['total_vaccinations','total_vaccinations_per_hundred',
                                       'daily_vaccinations','daily_vaccinations_per_million'].max().reset_index()
vaccine.columns = ["Vaccines", "Total vaccinations", "Percent", "Daily vaccinations", 
                           "Daily vaccinations per million"]
def draw_trace_bar_vaccine(data, feature, title, xlab, ylab,color='Blue'):
    data = data.sort_values(feature, ascending=False)
    trace = go.Bar(
            x = data['Vaccines'],
            y = data[feature],
            marker=dict(color=color),
            text=data['Vaccines']
        )
    data = [trace]

    layout = dict(title = title,
              xaxis = dict(title = xlab, showticklabels=True, tickangle=45, 
                           zeroline=True, zerolinewidth=1, zerolinecolor='grey',
                           showline=True, linewidth=2, linecolor='black', mirror=True,
                          tickfont=dict(
                            size=10,
                            color='black'),), 
              yaxis = dict(title = ylab, gridcolor='lightgrey', zeroline=True, zerolinewidth=1, zerolinecolor='grey',
                          showline=True, linewidth=2, linecolor='black', mirror=True),
              plot_bgcolor = 'rgba(0, 0, 0, 0)', paper_bgcolor = 'rgba(0, 0, 0, 0)',
              hovermode = 'closest'
             )
    fig = dict(data = data, layout = layout)
    iplot(fig, filename='draw_trace')
    
draw_trace_bar_vaccine(vaccine, 'Total vaccinations', 'Total per vaccine scheme', 'Vaccine', 'Vaccination total', "darkmagenta" )


In [None]:
country = vacc_df.groupby(["country"])['total_vaccinations','total_vaccinations_per_hundred',
                                       'daily_vaccinations','daily_vaccinations_per_million'].max().reset_index()
country.columns = ["country", "Total vaccinations", "Percent", "Daily vaccinations", 
                           "Daily vaccinations per million"]
def draw_trace_bar_country(data, feature, title, xlab, ylab,color='Blue'):
    data = data.sort_values(feature, ascending=False)
    trace = go.Bar(
            x = data['country'],
            y = data[feature],
            marker=dict(color=color),
            text=data['country']
        )
    data = [trace]

    layout = dict(title = title,
              xaxis = dict(title = xlab, showticklabels=True, tickangle=45, 
                           zeroline=True, zerolinewidth=1, zerolinecolor='grey',
                           showline=True, linewidth=2, linecolor='black', mirror=True,
                          tickfont=dict(
                            size=10,
                            color='black'),), 
              yaxis = dict(title = ylab, gridcolor='lightgrey', zeroline=True, zerolinewidth=1, zerolinecolor='grey',
                          showline=True, linewidth=2, linecolor='black', mirror=True),
              plot_bgcolor = 'rgba(0, 0, 0, 0)', paper_bgcolor = 'rgba(0, 0, 0, 0)',
              hovermode = 'closest'
             )
    fig = dict(data = data, layout = layout)
    iplot(fig, filename='draw_trace')
    
draw_trace_bar_country(country, 'Total vaccinations', 'Total per country scheme', 'country', 'Vaccination total', "darkmagenta" )


<a id="4"></a><h1 style='background:#aba926; border:0; color:black'><center> How the vaccination has progressed </center></h1>

In [None]:
#Daily vaccination trend/day
#step 1
vacc_df.to_csv('dataframe.csv', index=False)
df_f = vacc_df.pivot_table(values='daily_vaccinations', index=['date'], columns = 'iso_code')
df_f[:20]

In [None]:
vacc_df['iso_code'].value_counts()[:10]

In [None]:
a = pd.DataFrame(vacc_df[vacc_df['iso_code']== 'USA']['total_vaccinations'].cumsum()).reset_index()
b = pd.DataFrame(vacc_df[vacc_df['iso_code']== 'GBR']['total_vaccinations'].cumsum()).reset_index()
c = pd.DataFrame(vacc_df[vacc_df['iso_code']== 'CAN']['total_vaccinations'].cumsum()).reset_index()
d = pd.DataFrame(vacc_df[vacc_df['iso_code']== 'CHN']['total_vaccinations'].cumsum()).reset_index()
e = pd.DataFrame(vacc_df[vacc_df['iso_code']== 'ISR']['total_vaccinations'].cumsum()).reset_index()
f = pd.DataFrame(vacc_df[vacc_df['iso_code']== 'RUS']['total_vaccinations'].cumsum()).reset_index()
g = pd.DataFrame(vacc_df[vacc_df['iso_code']== 'MEX']['total_vaccinations'].cumsum()).reset_index()

an = a.append(b)
bn = an.append(c)
cn = bn.append(d)
dn = cn.append(e)
en = dn.append(f)

final = en.append(g)

In [None]:
final.head(200)

In [None]:
vacc_df = vacc_df.reset_index()
final['TOTAL VACC'] = final['total_vaccinations']*1

In [None]:
df_5 = vacc_df.merge(final, left_on='index', right_on='index', how='inner')

In [None]:
df_5.info()

In [None]:
fig = px.scatter(df_5, x="daily_vaccinations", y="TOTAL VACC", animation_frame="date", animation_group="iso_code",
           hover_name="iso_code", text='iso_code',range_x=[0,1500000], range_y=[0,175000000])

fig.update_traces(marker=dict(size=40,  color='DarkSlateGrey'))

fig.show()

In [None]:
list(summary1.columns)

<a id="4"></a><h1 style='background:#aba926; border:0; color:black'><center> Daily vaccination by country - Geograph </center></h1>

In [None]:
#Total vaccinated per 10,000 population around the globe (GIS)
fig = go.Choropleth(locations = vacc_df["country"],locationmode = 'country names',z = vacc_df['daily_vaccinations'],
                                         text= vacc_df['country'],
                    colorbar = dict(title= "Daily vaccinations"),reversescale =True,colorscale = 'viridis')
data = [fig]

layout = go.Layout(title = 'Daily Vaccinations according to each Country')
fig = dict(data = data,layout = layout)
iplot(fig)

In [None]:
summary1.head()

<a id="4"></a><h1 style='background:#aba926; border:0; color:black'><center> Burden of COVID-19 across countries </center></h1>

# a.Total cases of COVID-19 per countries

In [None]:
# follow the same steps as above
#Total confirmed per 10,000 tested around the globe (GIS)
#summary1['percentage_vaccinated'] = summary1.TotalDeaths / summary1.Population * 100
#summary1['tested_positive'] = summary1.TotalCases / summary1.TotalTests * 10000
#summary1['GDPvacc'] = summary1.daily_vaccinations / summary1.GDP_nominal * 10000
#summary1['GDPtest'] = summary1.TotalTests / summary1.GDP_nominal * 10000

df3=summary1.reset_index('country')

fig = go.Choropleth(locations = df3["country"],locationmode = 'country names',z = df3['tested_positive'],
                                         text= df3['country'],colorbar = dict(title= "tested_positive") )
data = [fig]

layout = go.Layout(title = 'Total Cases per Countries')
fig = dict(data = data,layout = layout)
iplot(fig)

# b. Total vaccinated per 10,000 GDP per capita

In [None]:
# follow the same steps as above
#Total vaccinated per 10,000 USD GDP (nominal) around the globe (GIS)
fig = go.Choropleth(locations = df3["country"],locationmode = 'country names',z = df3['GDPvacc'],
                                         text= df3['country'],colorbar = dict(title= "GDPvacc") )
data = [fig]

layout = go.Layout(title = 'Total vaccinated per 10,000 GDP per capita')
fig = dict(data = data,layout = layout)
iplot(fig)

# c. Total tested per 10,000 GDP per capita

In [None]:
# follow the same steps as above
#Total tested per 10,000 USD GDP (nominal) around the globe (GIS)
fig = go.Choropleth(locations = df3["country"],locationmode = 'country names',z = df3['GDPtest'],
                                         text= df3['country'],colorbar = dict(title= "GDPtest") )
data = [fig]

layout = go.Layout(title = 'Total tested per 10,000 GDP per capita')
fig = dict(data = data,layout = layout)
iplot(fig)

In [None]:
list(df3.columns)
#df3.head(300)

In [None]:
def plot_custom_scatter(df3, x, y, size, color, hover_name, title):
    fig = px.scatter(df3, x=x, y=y, size=size, color=color,
               hover_name=hover_name, size_max=80, title = title)
    fig.update_layout({'legend_orientation':'h'})
    fig.update_layout(legend=dict(yanchor="top", y=-0.2))
    fig.update_layout({'legend_title':'Vaccine scheme'})
    fig.update_layout({'plot_bgcolor': 'rgba(0, 0, 0, 0)','paper_bgcolor': 'rgba(0, 0, 0, 0)'})
    fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
    fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
    fig.update_xaxes(zeroline=True, zerolinewidth=1, zerolinecolor='grey')
    fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='grey')
    fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey')
    fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey')
    fig.show() 

In [None]:
df3.head()

# d. Daily vaccination vs GDP per capita, grouped per country and vaccines

In [None]:
# follow the same steps as above
# 2-D Graph: Daily vaccination vs GDP per capita 
plot1=pd.DataFrame(df3,columns=['country','GDP_nominal','daily_vaccinations','Population','vaccines'])
plot2=plot1.dropna()
plot_custom_scatter(plot2, x="GDP_nominal", y="daily_vaccinations", size="Population", color="vaccines",
           hover_name="country", title = "Daily vaccination vs GDP per capita, grouped per country and vaccines")

# e.Total tests vs GDP per capita, grouped per country and vaccines

In [None]:
# 2-D Graph: Total tests vs GDP per capita
plot1=pd.DataFrame(df3,columns=['country','GDP_nominal','TotalTests','Population','vaccines'])
plot2=plot1.dropna()

plot_custom_scatter(plot2, x="GDP_nominal", y="TotalTests", size="Population", color="vaccines",
           hover_name="country", title = "Total tests vs GDP per capita, grouped per country and vaccines")


# f.Total cases vs total deaths, grouped per country and GDP per capita

In [None]:
# 2-D Graph: Total cases vs deaths
plot1=pd.DataFrame(df3,columns=['country','GDP_nominal','TotalCases','Population','TotalDeaths'])
plot2=plot1.dropna()

plot_custom_scatter(plot2, x="TotalCases", y="TotalDeaths", size="Population", color="GDP_nominal",
           hover_name="country", title = "Total cases vs total deaths, grouped per country and GDP per capita")

# g.Total tests vs population, grouped per country and GDP per capita

In [None]:
# 2-D Graph: Total tests vs population 
plot1=pd.DataFrame(df3,columns=['country','GDP_nominal','TotalTests','Population','vaccines'])
plot2=plot1.dropna()

plot_custom_scatter(plot2, x="TotalTests", y="Population", size="GDP_nominal", color="vaccines",
           hover_name="country", title = "Total tests vs population, grouped per country and GDP per capita")

<a id="4"></a><h1 style='background:#aba926; border:0; color:black'><center> Vaccination progression: are there equity gaps? </center></h1>

In [None]:
#The daily vaccination drive around the globe
dates = vacc_df.date.unique().tolist()
# countries without repetition use 'unique'
countries = vacc_df.country.unique().tolist()
short = vacc_df[['date', 'country', 'daily_vaccinations']]

# i.e we want to make sure we have some data for each, even if it is 0 
keys= list(zip(short.date.tolist(), short.country.tolist()))
for date in dates:
    for country in countries:
        idx = (date, country)
        if idx not in keys:
            if date == min(dates):
                # this means there's no entry for {country} on the earliest date 
                short = short.append({
                    "date": date, 
                    "country": country, 
                    "daily_vaccinations": 0
                }, ignore_index=True)
            else:
                # entry for {country} is missing on a date other than the earliest
                short = short.append({
                    "date": date, 
                    "country": country, 
                    "daily_vaccinations": pd.NA
                }, ignore_index=True)
                
#fill missing values with previous day values (this is OK since it is cumulative)
short = short.sort_values(['country', 'date'])

short.daily_vaccinations = short.daily_vaccinations.fillna(method='ffill')

# scale the number by log to make the color transitions smoother
vaccines = short.sort_values('date')
vaccines['log_scale'] = vaccines['daily_vaccinations'].apply(lambda x : math.log2(x+1))

fig =px.choropleth(vaccines, locations="country", 
                    locationmode='country names',
                    color="log_scale", 
                    hover_name="country", 
                    hover_data=['log_scale', "daily_vaccinations"],
                    animation_frame="date",
                    color_continuous_scale="blues",
                   )

title = get_multi_line_title("Vaccination Progress", "Daily Vaccination Around the Globe")
fig.update_layout(coloraxis={"cmax":25,"cmin":0})
fig.update_layout(title=title, title_x=0.5, coloraxis_showscale=False)

fig.show()

<a id="4"></a><h1 style='background:#aba926; border:0; color:black'><center> Bonus material üôå </center></h1>

# a. Daily vaccination trend (per 10,000 population) for 32 high, middle and low income countries

In [None]:
#-Trend in Daily vaccination per 10,000 population by country - (trend line for each country)
def plot_trend(dataframe,feature,title,country):
    plt.style.use('fast')
    plt.figure(figsize=(20,25))
    
    for i,country in enumerate(country):
        plt.subplot(8,4,i+1)
        data = dataframe[dataframe['country'] == country]
        sns.lineplot(x=data['date'] ,y=data[feature],label = feature)
        plt.xlabel('')
        plt.tick_params(axis='x',which='both',top=False,bottom=False,labelbottom=False)
        plt.title(country)
        
    plt.suptitle(title,y=1.05)
    plt.tight_layout()
    plt.show()
    
country = ['Argentina', 'Brazil', 'Austria', 'Belgium', 'Brazil','Canada','China','Denmark', 'Finland', 'France',
       'Germany','India','Ireland', 'Israel', 'Italy', 'Kuwait','India', 'Nepal','Mexico', 'Netherlands','Norway', 'Poland','Russia',
        'Saudi Arabia', 'Singapore','Spain', 'Sweden', 'Switzerland', 'Turkey',
        'United Arab Emirates', 'UK', 'USA']
plot_trend(vacc_df,'total_vaccinations','Trend of total vaccination',country)

#also try using plot_till_date
#example code: plot_till_date('people_fully_vaccinated', 'people_vaccinated','People vaccinated vs Fully vaccinated till date', '#c4eb28', '#35eb28')

In [None]:
list(df3.columns)

# b. COVID-19 infection, vaccination by GDP per capita

In [None]:
#-Heat Map (Daily vaccination by GDP per capita)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df4 = df3[['country','GDP_nominal', 'TotalCases']]
heatmap2_data = pd.pivot_table(df4,values='TotalCases', index=['country'], columns='GDP_nominal')
plt.figure(figsize=(8, 12))
sns.heatmap(heatmap2_data, cmap="RdBu")

In [None]:
#-Heat Map (Total tests by GDP per capita)
df4 = df3[['country','GDP_nominal', 'TotalTests']]
heatmap2_data = pd.pivot_table(df4,values='TotalTests', index=['country'], columns='GDP_nominal')
plt.figure(figsize=(8, 12))
sns.heatmap(heatmap2_data, cmap="RdBu")

<a id="4"></a><h1 style='background:#aba926; border:0; color:black'><center> Conclusions </center></h1>

COVID-19 has threatened the health system and economic integrity of countries worldwide; developing countries are significantly affected. On the positive side, the COVID-19 vaccines have been rolled out in many countries. Many marginalized and vulnerable populations in high and low-income countries are still waiting for their first vaccine. If unchecked, these population risks being left out from vaccine rollout. Therefore, my aim here is to critically analyze the vaccine coverage data, identify gaps to inform policy discussion and advocacy, with a focus on low and middle-income countries. 

On the totality, the findings showed that vaccine administration is primarily focused on high-income countries. Some vaccine companies (e.g., Pfizer and BioNTech) are making massive progress in vaccine expansion, though limited geographically. With the health systems in these countries coming back to their full speed, vaccination programs' rollout will become more effective in the coming days.

Thank you for reading the notebook. This is my first notebook in Kaggle, so excited about continuing writing in the future. Your **UPVOTE** will massively help me in keeping motivated and bring further resources on this topic. I will try to keep the notebook updated (The **web scrapping framework** will make the updating easy) and bring in more exciting visualization as we move along. See you around!


# Blogger
I recently completed my Ph.D. studies (*Thesis submitted*) focusing on epidemiology and clinical biostatistics from the University of Queensland, Australia. My aim in Kaggle is to combine data visualization and storytelling to convey powerful message in global health. 

Besides that, I am interested in cardiovascular epidemiology & on development, use, and interpretation of statistical & ML tools for observational studies and RCTs. I am also interested in statistical programming & visualization using SAS, Stata, R & Python (https://bit.ly/37n0kQG). From 2015-16, I worked as a commissioner in The Lancet YCEMP & as a freelance writer until 2018 (https://bit.ly/37qKVBF).


# My socials:
üà∫[Linkedin](https://www.linkedin.com/in/shivarajmishra/)
<br>
üà∫[YouTube](https://www.youtube.com/watch?v=WROuFKmYPVQ&t=13s)
<br>
üà∫[Researchgate](https://www.researchgate.net/profile/Shiva_Mishra2)
<br>
‚ò¢Ô∏è[Facebook](https://www.facebook.com/shivarajmishra)

**Work completed: 7.02.2021**
