# COVID-19 Vaccination Progress Around The World


## Introduction


The data contains the following information:

1. Country - this is the country for which the vaccination information is provided;
2. Country ISO Code - ISO code for the country;
3. Date- date for the data entry; for some of the dates we have only the daily vaccinations, for others, only the (cumulative) total;
4. Total number of vaccin ations - this is the absolute number of total immunizations in the country;
5. Total number of people vaccinated - a person, depending on the immunization scheme, will receive one or more (typically 2) vaccines; at a certain moment, the number of vaccination might be larger than the number of people;
6. Total number of people fully vaccinated - this is the number of people that received the entire set of immunization according to the immunization scheme (typically 2); at a certain moment in time, there might be a certain number of people that received one vaccine and another number (smaller) of people that received all vaccines in the scheme;
7. Daily vaccinations (raw) - for a certain data entry, the number of vaccination for that date/country;
8. Daily vaccinations - for a certain data entry, the number of vaccination for that date/country;
9. Total vaccinations per hundred - ratio (in percent) between vaccination number and total population up to the date in the country;
10. Total number of people vaccinated per hundred - ratio (in percent) between population immunized and total population up to the date in the country;
11. Total number of people fully vaccinated per hundred - ratio (in percent) between population fully immunized and total population up to the date in the country;
12. Number of vaccinations per day - number of daily vaccination for that day and country;
13. Daily vaccinations per million - ratio (in ppm) between vaccination number and total population for the current date in the country;
14. Vaccines used in the country - total number of vaccines used in the country (up to date);


Data Source - Kaggle;

Source website - https://www.kaggle.com/gpreda/covid-world-vaccination-progress/

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)


In [None]:
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


In [None]:
!pip install plotly --upgrade --quiet

# DATA PREPERATION AND CLEANING

In [None]:
!pip install jovian opendatasets --upgrade --quiet

In [None]:
'''dataset_url = 'https://www.kaggle.com/gpreda/covid-world-vaccination-progress/' 

import opendatasets as od
od.download(dataset_url)'''

In [None]:
df=pd.read_csv("../input/covid-world-vaccination-progress/country_vaccinations.csv")

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df.columns

In [None]:
#df.date = pd.to_datetime(df.date,infer_datetime_format=True,format='%Y-%b-%d')
df.fillna(0, inplace = True)
df['iso_code'].fillna('GBR', inplace=True)
df.drop(df.index[df['iso_code'] == 0], inplace = True)

In [None]:
df.drop(["source_name","source_website","people_fully_vaccinated","daily_vaccinations_raw","people_fully_vaccinated_per_hundred","daily_vaccinations_per_million","people_vaccinated_per_hundred"],axis=1, inplace=True)


In [None]:
#df.drop(df.index[df['people_vaccinated'] == 0], inplace = True)

In [None]:
df

# EXPLORATORY DATA ANALYSIS AND VISUALIZATION


## RAW VISUALIZATION OF 5 COUNTRIES

We will initialize the Python packages. we will use for data ingestion, preparation and visualization. We will use mostly seaborn for visualization. Then we read the data file and aggregate the data on few fields (country, iso_code and vaccines - that is the vaccination scheme used in a certain country).

In [None]:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import plotly.express as px
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 6)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
plt.rc('font', size=12)

## 1. INDIA

In [None]:
df_India = df[df["iso_code"] == 'IND'].copy()
df_India.drop(['people_vaccinated'], axis = 1, inplace = True)
df_India

In [None]:
plt.figure(figsize=(20,7))

sns.lineplot(data=df_India,x="date",y="daily_vaccinations",marker='d',markersize= 12, color = 'k')

plt.title("India's daily vaccinations population trend")
plt.xticks(rotation=45)
plt.show();

In [None]:
plt.figure(figsize=(20,7))
sns.barplot(data=df_India, y="total_vaccinations",x="date",hue = 'vaccines')

plt.title("India's total vaccinated population trend")
plt.xticks(rotation=45);

## 2. CHINA

In [None]:
df_China = df[df["iso_code"] == 'CHN'].copy()

In [None]:
df_China.drop(['people_vaccinated'], axis = 1, inplace = True)
df_China

In [None]:
plt.figure(figsize=(20,7))
sns.lineplot(data=df_China,x="date",y="daily_vaccinations",marker='o',markersize =12);

plt.xticks(rotation=90);
plt.title("China's daily vaccinations per milion");

## 3. UNITED KINGDOM

In [None]:
df_UK= df[df["iso_code"] == 'GBR'].copy()
df_UK

In [None]:
plt.figure(figsize=(20,7))

sns.lineplot(data=df_UK,x="date",y="daily_vaccinations",marker='X',markersize =12, color = 'm');

plt.xticks(rotation=90);
plt.title("UK's daily vaccinations per milion");

## 4. UNITED STATES of AMERICA

In [None]:
df_USA = df[df["iso_code"] == 'USA'].copy()
df_USA

In [None]:
plt.figure(figsize=(20,7))

df_USA.drop(df_USA.index[df_USA['people_vaccinated'] == 0], inplace = True)

sns.barplot(data=df_USA,x="date",y="people_vaccinated", hue = 'vaccines')
plt.title("USA's vaccinated per hundred")

plt.xticks(rotation=90);

plt.show();

In [None]:
plt.figure(figsize=(20,7))
sns.lineplot(data=df_USA,x="date",y="daily_vaccinations",marker='s', markersize = 12);

plt.xticks(rotation=90);
plt.title("USA's total vaccinations trend");

# 5. RUSSIA

In [None]:
df_Russia = df[df["iso_code"] == 'RUS'].copy()
df_Russia

In [None]:
plt.figure(figsize=(20,7))
sns.lineplot(data=df_Russia,x="date",y="daily_vaccinations",marker='p', markersize = 12);

plt.xticks(rotation=90)
plt.title("Russia's daily vaccinations");

# ASKING AND ANSWERING QUESTIONS

# **1. Which country developed the vaccine the fastest?**

In [None]:
cols = ['country', 'total_vaccinations', 'iso_code', 'vaccines','total_vaccinations_per_hundred']
vacc_amount = df[cols].groupby('country').max().sort_values('total_vaccinations', ascending=False).dropna(subset=['total_vaccinations'])
vacc_amount = vacc_amount.iloc[:10]

vacc_amount = vacc_amount.sort_values('total_vaccinations_per_hundred', ascending=False)


plt.figure(figsize=(9, 12))
sns.barplot(vacc_amount.total_vaccinations_per_hundred, vacc_amount.index, color = 'r')

plt.title('Top 10 countries with highest people vaccinated per hundred')
#plt.xticks(rotation = 90)    #not needed
plt.xlabel('Number of vaccinated people per hundred')
plt.ylabel('Countries')
plt.show();

# **2. Which country has highest vaccinated people?**

In [None]:
cols = ['country', 'total_vaccinations', 'iso_code', 'vaccines']
vacc_amount = df[cols].groupby('country').max().sort_values('total_vaccinations', ascending=False).dropna(subset=['total_vaccinations'])
vacc_amount = vacc_amount.iloc[:10]

plt.figure(figsize=(16, 7))
plt.bar(vacc_amount.index, vacc_amount.total_vaccinations, color = 'c')

plt.title('Top 10 countries having highest vaccinated people')
plt.xticks(rotation = 90)
plt.ylabel('Number of vaccinated citizens (per 10 Million)')
plt.xlabel('Countries')
plt.show();

# **3. Which categories of vaccines are offered?**

In [None]:
plt.figure(figsize=(16,7))
grp = ['country', 'total_vaccinations', 'iso_code', 'vaccines']
vacc_no = df[grp].groupby('vaccines').max().sort_values('total_vaccinations', ascending=False).dropna(subset=['total_vaccinations'])


sns.barplot(vacc_no.index, vacc_no.total_vaccinations, color ='m')

plt.title('Various categories of COVID-19 vaccines offered')
plt.xticks(rotation = 90)
plt.ylabel('Number of vaccinated citizens (per 10 millions)')
plt.xlabel('Vaccines')
plt.show();

# 4. Which vaccine is used by various countries?

In [None]:
fig = px.choropleth(df, locations="iso_code",
                    color="vaccines",
                    hover_name="country", # column to add to hover information
                    color_continuous_scale=px.colors.sequential.Plasma,
                   title= "Vaccines used by different countries")
fig.update_layout(showlegend=False)
fig.show()

# Inferences and Conclusion

From the above analysis and visualizations, we can conclude that:

1. The rate of applying vaccines to the patients is highest in Isreal.
Conjecture: It is because of its small size (in terms of both area and population), a relatively young population, relatively warm weather in December 2020, a centralized national system of government, and well-developed infrastructure for implementing prompt responses to large-scale national emergencies.

2. The United States has the most vaccinated people of around 60M of its total population followed by China and the United Kingdom.
Conjecture: Since these are developed countries the accessibility of the vaccine is easier to its public.

3. Moderna, Pfizer/BioNTech are the most popular vaccine used worldwide, since it has almost negligible side effects (known till date). Also, India uses Covaxin, Covishield for vaccinating its citizens.

4. Different countries are using various vaccines viz., India - Covaxin, Oxford/AstraZeneca, USA - Moderna, Pfizer/BioNTech, Israel - Moderna, Pfizer/BioNTech, UK- Oxford/AstraZeneca, Pfizer/BioNTech.

From the inferences and conjectures, it can be concluded that people from all parts of the world are educating themselves and willingly taking the vaccines under the governments' free vaccination program. Also, these vaccines have been proved effective against COVID-19 (till now). If the rate of people taking the vaccine continues to grow then all the countries can vaccinate their people before the end of this year.

# References and Future Work

By using this dataset we can compare the number of deaths before people taking vaccines vs after taking vaccines to ckeck the effectiveness of each vaccine on each country.

Resources :

1) Dataset : https://www.kaggle.com/gpreda/covid-world-vaccination-progress

2) Jovian Course : https://www.zerotopandas.com

3) DateTime library documentation : https://docs.python.org/3/library/datetime.html

4) Matplotlib documentation : https://matplotlib.org/3.1.1/contents.html

5) Tutorialspoint : https://www.tutorialspoint.com/matplotlib/matplotlib_bar_plot.htm

6) Seaborn documentation : https://seaborn.pydata.org/introduction.html

7) Pandas documentation : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.items.html

8) pie charts in matplotlib (w3schools) : https://www.w3schools.com/python/matplotlib_pie_charts.asp

### **The link to my medium article for detailed explanation: https://towardsdatascience.com/covid-19-vaccination-progress-analysis-around-the-world-736d7e57f198**

**Peace.**