# **Covid World Vaccination Progres Analysis**

In this notebook we will use the country vaccinations data and countries of the world dataset to explore how countries are doing in vaccinating their population in order to reach herd immunity.

![f](https://i.guim.co.uk/img/media/f1a723c4431b974f8b6ad4f7a34e0ada9e169b51/0_0_2560_1536/master/2560.jpg?width=445&quality=45&auto=format&fit=max&dpr=2&s=06480e53863e03fef6b71f8ce9022853)
When will a coronavirus vaccine be ready? Illustration by James Melaugh. Illustration: James Melaugh/The Observer

## In this notebook, we use Plotly to visualize the data.

In [None]:
import pandas as pd 

from itertools import chain
from urllib.parse import urlparse

import plotly.express as px
import plotly.graph_objects as go

## Importing the World Vaccination Dataset

In [None]:
vaccine_dataset_path = '../input/covid-world-vaccination-progress/country_vaccinations.csv'
vaccine_df = pd.read_csv(vaccine_dataset_path)

## Exploring the Data

In [None]:
vaccine_df.head()

In [None]:
# exploring the columns

vaccine_df.columns

* **Country:** *this is the country for which the vaccination information is provided;*

* **Country ISO Code :** *ISO code for the country;*

* **Date :** *date for the data entry; *

* **Total number of vaccinations :** *this is the absolute number of total immunizations in the country;

* **Total number of people vaccinated :** *a person, depending on the immunization scheme, will receive one or more (typically 2) vaccines;*

* **Total number of people fully vaccinated :** *this is the number of people that received the entire set of immunization according to the immunization scheme;*

* **Daily vaccinations (raw) :** *for a certain data entry, the number of vaccination for that date/country;*

* **Daily vaccinations :** *for a certain data entry, the number of vaccination for that date/country;*

* **Total vaccinations per hundred :** *ratio (in percent) between vaccination number and total population up to the date in the country;*

* **Total number of people vaccinated per hundred :** *ratio (in percent) between population immunized and total population up to the date in the country;*

* **Total number of people fully vaccinated per hundred :** *ratio (in percent) between population fully immunized and total population up to the date in the country;*

* **Number of vaccinations per day :** *number of daily vaccination for that day and country;*

* **Daily vaccinations per million :** *ratio (in ppm) between vaccination number and total population for the current date in the country;*

* **Vaccines used in the country :** *total number of vaccines used in the country (up to date);*

* **Source name :** *source of the information (national authority, international organization, local organization etc.);*

* **Source website :** *website of the source of information;*

In [None]:
# checkong the column type and null values in each column

vaccine_df.info()

In [None]:
# changing the data type

vaccine_df['date'] = vaccine_df['date'].astype('datetime64')

In [None]:
# replacing the Null values with the suitable value

vaccine_df['total_vaccinations'] = vaccine_df['total_vaccinations'].fillna(0)
vaccine_df['people_vaccinated'] = vaccine_df['people_vaccinated'].fillna(0)
vaccine_df['people_vaccinated'] = vaccine_df['people_vaccinated'].fillna(0)
vaccine_df['people_fully_vaccinated'] = vaccine_df['people_fully_vaccinated'].fillna(0)
vaccine_df['daily_vaccinations_raw'] = vaccine_df['daily_vaccinations_raw'].fillna(0)
vaccine_df['daily_vaccinations'] = vaccine_df['daily_vaccinations'].fillna(0)
vaccine_df['total_vaccinations_per_hundred'] = vaccine_df['total_vaccinations_per_hundred'].fillna(0)
vaccine_df['people_vaccinated_per_hundred'] = vaccine_df['people_vaccinated_per_hundred'].fillna(0)
vaccine_df['people_vaccinated_per_hundred'] = vaccine_df['people_vaccinated_per_hundred'].fillna(0)
vaccine_df['people_fully_vaccinated_per_hundred'] = vaccine_df['people_fully_vaccinated_per_hundred'].fillna(0)
vaccine_df['daily_vaccinations_per_million'] = vaccine_df['daily_vaccinations_per_million'].fillna(0)

In [None]:
# getting the numbers of countries which have started vaccination

len(set(vaccine_df['country'].unique()))

In [None]:
# getting the time-frame in the dataset

print('The earliest date in the dataset is: ', vaccine_df['date'].min())
print('The latest date in the dataset is: ', vaccine_df['date'].max())

In [None]:
# checking the news sources in the dataset

#set(vaccine_df['source_name'].unique())
print('Number of sources:', len(vaccine_df['source_name'].unique()))


In [None]:
# getting the type of vaccines 

countries_vaccine = list(vaccine_df['vaccines'].unique())
splitted_list = [vacc for item in countries_vaccine for vacc in item.split(', ')]
vaccines = set(splitted_list)

print('Number of vaccines used:', len(vaccines))
vaccines

In [None]:
vaccine_df.describe()

# **Visualizng the Data**

## **Visulizing the Reporting Sources**

In [None]:
# chart data
sources = vaccine_df['source_name'].value_counts()
sources = vaccine_df['source_name'].sort_values(ascending=False).value_counts()

# chart colors
colors = ['lightslategray',] * sources.shape[0]

# Bar Chart
fig = go.Figure(
    data=[go.Bar(
    x = sources.index,
    y = sources.values,
    marker_color=colors
    
)])


fig.update_layout(
    width=1600,
    height=800,
    plot_bgcolor='white',
    yaxis_title='Count',
    title={
        'text': 'Number of Reported Sources',
        'xanchor': 'center',
        'x':0.5,
        'yanchor': 'top'
})


fig.show()



### Since **Ministry of Health** is a vague source we will explore the source URL:

In [None]:
# chart data
vaccine_df['source_domain'] = [urlparse(x).netloc for x in vaccine_df['source_website'].values]
sources = vaccine_df['source_domain'].sort_values(ascending=False).value_counts()

# data color
colors = ['lightslategray',] * sources.shape[0]
colors[0:2] = ['crimson'] * 3

# Bar Chart
fig = go.Figure(
    data=[go.Bar(
    y = sources.values,
    x = sources.index,
    marker_color=colors
)])

fig.update_layout(
    width=1600, 
    height=800,
    plot_bgcolor='white',
    yaxis_title='Count',
    title={
        'text': 'Number of Reported Websites',
        'xanchor': 'center',
        'x':0.5,
        'yanchor': 'top'
})

fig.show()

### **Conclusion:** *Government of UK is the most reported source.*

## **Visualizing the most used vaccines**

In [None]:
# chart data
vaccines_df = pd.Series.to_frame(vaccine_df.groupby(['vaccines'])['daily_vaccinations'].sum().sort_values(ascending=False))
vaccines_df.reset_index(inplace=True)

# adding breaks to data labels in order to create readable visualization
vaccines_df['vaccines'] = vaccines_df['vaccines'].str.replace(', ', ',<br>')

# data color
colors = ['lightgray',] * vaccines_df.shape[0]
colors[0:3] = ['crimson'] * 3

# Treemap Chart
fig = go.Figure(
    go.Treemap(
    labels=vaccines_df['vaccines'].values, 
    parents  = ['']* vaccines_df.shape[0],
    values=vaccines_df['daily_vaccinations'].values,
    marker_colors = colors
))

fig.update_layout(
    width=1400,
    height=600,
    plot_bgcolor='white',
    font_color="white",
    font_size=24,
    title={
        'text': 'Number of used Vaccines Combination',
        'xanchor': 'center',
        'x':0.5,
        'font_size': 15,
        'font_color':'black',
        'yanchor': 'top'
})

fig.show()

### **Conclusion:** *Moderna and Pfzier/BioNTech are the most used vaccines.*

## Exploring vaccines used in each country

In [None]:
# calculating vaccines used in each country

vaccine_df[['country', 'vaccines']].drop_duplicates()
all_vaccines = vaccine_df['vaccines'].str.split(', ')

country_vaccines_df = pd.DataFrame({
    'country' : vaccine_df['country'].values.repeat(all_vaccines.str.len()),
    'vaccines' : list(chain.from_iterable(all_vaccines.tolist()))
}).drop_duplicates()

country_vaccines_df

In [None]:
vaccines = country_vaccines_df['vaccines'].unique()
vaccines_mapping = dict(zip(vaccines, range(len(vaccines))))
country_vaccines_mapping_df = country_vaccines_df.replace({'vaccines': vaccines_mapping})

In [None]:
# chart

fig = go.Figure(
    data=[go.Sankey(
    node = dict(
    pad = 15,
    thickness = 20,
    line = dict(color = 'red', width = 0.25),
    label = list(country_vaccines_mapping_df['country']) + list(vaccines),
    color = "crimson"),
    link = dict(
    source = list(range(0, country_vaccines_mapping_df.shape[0])), # indices correspond to labels, eg A1, A2, A2, B1, ...
    target = country_vaccines_mapping_df['vaccines'] + country_vaccines_mapping_df.shape[0],
    value = [1] * (country_vaccines_mapping_df.shape[0]),

  ))])


fig.update_layout(
    width=1200,
    height=4000,
    font_color="white",
    font_size=14,
    title={
    'text': 'Type of Vaccine Used in Each Country',
        'xanchor': 'center',
        'x':0.5,
        'font_size': 18,
        'font_color':'black',
        'yanchor': 'top'}

)

fig.show()

## Visualizing Percentage of Vaccined Population in Countries 

In [None]:
# chart data
data = vaccine_df.loc[vaccine_df['people_vaccinated_per_hundred'] > 5].groupby(['country'])['people_vaccinated_per_hundred'].max().sort_values(ascending=False)

# chart color
colors = ['lightslategray',] * data.shape[0]
colors[0:2] = ['crimson'] * 3

# Bar Chart
fig = go.Figure(
    data=[go.Bar(
    y = data.values,
    x = data.index,
    marker_color=colors
    
)])

fig.update_layout(
    width=1600, 
    height=600,
    plot_bgcolor='white',
    yaxis_title="%",
    title={
        'text': "% People Vaccinated in Countries",
        'xanchor': 'center',
        'x':0.5,
        'font_size': 18,
        'font_color':'black',
        'yanchor': 'top'}
)

fig.show()


### **Conclusion:** *Gibraltor has vaccined 74% of its population*

## Visualizing Number of Vaccination in Countries

In [None]:
# chart data
data = vaccine_df.loc[vaccine_df['total_vaccinations'] > 100000].groupby(['country'])['total_vaccinations'].max().sort_values(ascending=False)

# chart color
colors = ['lightslategray',] * data.shape[0]
colors[0:2] = ['crimson'] * 3

# Bar Chart
fig = go.Figure(
    data=[go.Bar(
    y = data.values,
    x = data.index,
    marker_color=colors
)])

fig.update_layout(
    width=1400, 
    height=600,
    plot_bgcolor='white',
    yaxis_title='Number of Vaccination',
    title={
        'text': 'Total Number of Vaccinations in Countries',
        'xanchor': 'center',
        'x':0.5,
        'font_size': 18,
        'font_color':'black',
        'yanchor': 'top'
})

fig.show()

## Which vaccine is used in more countries?


In [None]:
vaccines_count_df = country_vaccines_df.groupby(['vaccines']).count().reset_index().sort_values('country', ascending=False)
vaccines_count_df

In [None]:
vaccines_count_df['vaccines'] = vaccines_count_df['vaccines'].str.replace('&', '<br> &')

# chart color
colors = ['lightgray',] * vaccines_count_df.shape[0]
colors[0:3] = ['crimson'] * 3

# Treemap chart
fig = go.Figure(
    go.Treemap(
    labels=vaccines_count_df['vaccines'].values, 
    parents=['']* vaccines_count_df.shape[0],
    values=vaccines_count_df['country'].values,
    marker_colors=colors
))

fig.update_layout(
    width=1600,
    height=600,
    font_color="white",
    font_size=16,
    title={
        'text': 'Comparing Vaccines by Number of Countries Using Them',
        'xanchor': 'center',
        'x':0.5,
        'font_size': 18,
        'font_color':'black',
        'yanchor': 'top'
})

fig.show()

### *Conclusion:* Most countries has used Oxford/AstraZeneca and Pfizer/BioNTech Vaccines .

## Visualizing the Caccination Start Date in Countirs

In [None]:
country_date_df = pd.Series.to_frame(vaccine_df.groupby(['country'])['date'].min())
country_date_df.reset_index(inplace=True)
country_date_df.rename(columns={'date':'start date'}, inplace = True)
country_date_df['end date'] = vaccine_df.groupby(['country'])['date'].max().values
country_date_df.sort_values('start date', ascending=True, inplace=True)

In [None]:
# deviding countries in three groups based on their start date of vaccination

country_date_df['status'] = pd.cut(country_date_df['start date'], bins=3, labels=['Early Starters', 'Middle', 'Late Starters'])

In [None]:
# Timeline Chart

fig = px.timeline(
    country_date_df, 
    x_start='start date',
    x_end='end date',
    y='country',
    color='status',
    width=1000, 
    height=1200)


fig.update_layout(
    width=1000,
    height=800,

    title={
        'text': 'Vaccination Timeline for Countries',
        'xanchor': 'center',
        'x':0.5,
        'font_size': 18,
        'font_color':'black',
        'yanchor': 'top'
})

fig.update_yaxes(autorange="reversed") 
fig.show()

# **Importing the Countries Population Dataset**

In [None]:
countries_pop_dataset_path = '../input/countries-of-the-world/countries of the world.csv'
countries_pop = pd.read_csv(countries_pop_dataset_path)

In [None]:
vaccines_df.columns

In [None]:
# dropping the useless columns

countries_pop = countries_pop[['Country', 'Region', 'Population']]

In [None]:
countries_pop['Country'] = countries_pop['Country'].str.rstrip(' ')

In [None]:
# getting the countries which has not started the vaccination

print(len(set(countries_pop['Country']) - set(vaccine_df['country'])), 'Countries have not started with the vaccination.')

## Exploring the Countries Which has not started the vaccination yet

In [None]:
# Table

fig = go.Figure(
    data=[go.Table(
    header=dict(values=['Country', 'Region', 'Population'],
    line_color='darkslategray',
    fill_color='crimson',
    font_color='white',
    font_size=16,
    align='left'),
    cells=dict(values=[countries_pop[~countries_pop['Country'].isin(vaccine_df['country'])].dropna()['Country'].values,         # 1st column
                countries_pop[~countries_pop['Country'].isin(vaccine_df['country'])].dropna()['Region'].values,          # 2nd column
                countries_pop[~countries_pop['Country'].isin(vaccine_df['country'])].dropna()['Population'].values,       # 3rd column
            ], 
           line_color='darkslategray',
           fill_color='lightgray',
           font_color='black',
           align='left'))
])

fig.update_layout(width=800, height=1000,
        title={
            'text': 'Countries Which Has Not Started With Vaccination',
            'xanchor': 'center',
            'x':0.5,
            'font_size': 18,
            'font_color':'black',
            'yanchor': 'top'
})

fig.show()

## Joining two datasets

In [None]:
countries_pop.shape

In [None]:
vaccine_df.shape

In [None]:
# joining two data frames

vaccine_pop_df = vaccine_df.join(countries_pop.set_index('Country'), rsuffix='_other', on='country')

In [None]:
vaccine_pop_df.shape

In [None]:
# chart data
data = pd.Series.to_frame(vaccine_pop_df.groupby(['country','Region'])['people_vaccinated'].max())
                          
data.reset_index(inplace=True)

region_vaccinated = pd.Series.to_frame(data.groupby(['Region'])['people_vaccinated'].sum())
region_vaccinated.reset_index(inplace=True)
region_vaccinated

In [None]:
region_population = pd.Series.to_frame(vaccine_pop_df.groupby(['Region'])['Population'].sum())
region_population.reset_index(inplace=True)

In [None]:
new_df = region_vaccinated.join(region_population.set_index('Region'), rsuffix='_other', on='Region')
new_df['%vaccination'] = new_df['people_vaccinated'] / new_df['Population'] * 100
new_df = new_df.sort_values('%vaccination', ascending=False)

## Visualizing Percentage of People Vaccinated in Regions

In [None]:
# chart color
colors = ['lightgray',] * new_df.shape[0]
#colors[0:3] = ['crimson'] * 3

# Treemap Chart
fig = go.Figure(
    go.Treemap(
    labels=new_df['Region'], 
    parents=['']* new_df.shape[0],
    values=new_df['%vaccination'],
    marker_colors=colors)) 


fig.update_layout(
    width=1400,
    height=1000,
    font_color="white",
    font_size=16,
    title={
        'text': 'Comparing Vaccination in Regions',
        'xanchor': 'center',
        'x':0.5,
        'font_size': 18,
        'font_color':'black',
        'yanchor': 'top'
})

fig.show()

## Visualizing Top Countries' Vaccination Progress Over Time

In [None]:
# getting the top 6 countries in vaccination
countries_with_highest_daily_vaccine = list(vaccine_df.groupby(['country'])['daily_vaccinations'].max().sort_values(ascending=False).head(6).index)
countries_with_highest_daily_vaccine

In [None]:
# Line Chart

fig = px.line(vaccine_df[vaccine_df['country'].isin(countries_with_highest_daily_vaccine)], 
              x='date', y='daily_vaccinations', color='country', title='Daily Vaccination over Time')
fig.show()