# Worldwide Vaccination EDA for COVID-19

This project is about "Worldwide Vaccination Exploratory Data Analysis for COVID-19 using Python". 

![COVID-19 Vaccination image](https://images.news18.com/ibnlive/uploads/2021/01/1609996706_co-win-app.jpg?impolicy=website&width=510&height=356)

Here an  attempt has been made to analyse information of COVID-19 World Vaccination Progress on the basis of attributes such as country, total_vaccinations, people_vaccinated, daily_vaccinations, total_vaccinations_per_hundred, people_vaccinated_per_hundred, people_fully_vaccinated_per_hundred vaccines and many more.

Libraries Used:

* pandas
* matplotlib
* seaborn

## Data Preparation and Cleaning





>
> - Load the dataset into a data frame using Pandas
> - Explore the number of rows & columns, ranges of values etc.
> - Handle missing, incorrect and invalid data

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as py
py.init_notebook_mode(connected=True)
import warnings
warnings.filterwarnings("ignore")

In [None]:
vaccinations_df = pd.read_csv('../input/covid-world-vaccination-progress/country_vaccinations.csv')

In [None]:
vaccinations_df

In [None]:
vaccinations_df.info()

In [None]:
vaccinations_df.columns

In [None]:
vaccinations_df.shape

In [None]:
vaccinations_df.describe()

In [None]:
vaccinations_df.isnull().sum()

In [None]:
vaccinations_df.fillna(value=0, inplace=True)
date = vaccinations_df.date.str.split('-', expand=True)
date

In [None]:
vaccinations_df['year'] = date[0]
vaccinations_df['month'] = date[1]
vaccinations_df['day'] = date[2]

vaccinations_df.year = pd.to_numeric(vaccinations_df.year)
vaccinations_df.month = pd.to_numeric(vaccinations_df.month)
vaccinations_df.day = pd.to_numeric(vaccinations_df.day)

vaccinations_df.date = pd.to_datetime(vaccinations_df.date)

vaccinations_df.head()

In [None]:
vaccinations_df.info()

## Exploratory Data Analysis and Visualizations!




Let's begin by importing`matplotlib.pyplot` and `seaborn`.

In [None]:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

Explore the mean, min, max

In [None]:
vaccinations_df.mean()

In [None]:
vaccinations_df.min()

In [None]:
vaccinations_df.max()

Explore the country Coloum

In [None]:
vaccinations_df.country.value_counts()

In [None]:
vaccinations_df.country

In [None]:
vaccinations_df.country.nunique()

Explore the min and max of fully vacnated people. 

In [None]:
vaccinations_df.people_fully_vaccinated.min()

In [None]:
vaccinations_df.people_fully_vaccinated.max()

Explore the min and max date.

In [None]:
vaccinations_df.date.min()

In [None]:
vaccinations_df.date.max()

Explore The Number of daily vaccinations dynamic

In [None]:
plt.figure(figsize=(16,8))
sns.lineplot(x=vaccinations_df.date, y=vaccinations_df.daily_vaccinations)
plt.title('The Number of daily vaccinations dynamic')
plt.show()

Explore the Vaccination procedure go on rapidly from which date.

In [None]:
countries = vaccinations_df.groupby('country')['total_vaccinations'].max().sort_values(ascending= False)[:5].index

top_countries = pd.DataFrame(columns= vaccinations_df.columns)
for country in countries:
  top_countries = top_countries.append(vaccinations_df.loc[vaccinations_df['country'] == country])

In [None]:
plt.figure(figsize=(15,8))
sns.lineplot(top_countries['date'], 
             top_countries['daily_vaccinations_per_million'], 
             hue= top_countries['country'], ci= False)
plt.title('Vaccination procedure progress');


## Drawing Insights and Conclusions



In [None]:
fully_vaccinated = vaccinations_df.groupby("country")["people_fully_vaccinated"].max().sort_values(ascending= False).head(25)

In [None]:
fully_vaccinated.reset_index()

In [None]:
plt.figure(figsize=(16,10))
ax = sns.barplot(x=fully_vaccinated, y=fully_vaccinated.index)
plt.xlabel("Fully Vaccinated")
plt.ylabel("Country");
plt.title('Countries with the most number of fully vaccinated people');

for patch in ax.patches:
    width = patch.get_width()
    height = patch.get_height()
    x = patch.get_x()
    y = patch.get_y()
    
    plt.text(width + x, height + y, '{:.1f} '.format(width))

## From the above data we can conclude that China has the most number of fully vacccinated citizens followed by USA and India.

In [None]:
daily_vaccinations_per_million = vaccinations_df.groupby("country")["daily_vaccinations_per_million"].max().sort_values(ascending= False).head(15)

In [None]:
daily_vaccinations_per_million.reset_index()

In [None]:
plt.figure(figsize=(12,8))
ax = sns.barplot(x=daily_vaccinations_per_million, y=daily_vaccinations_per_million.index )
plt.xlabel("daily vaccinations per million")
plt.ylabel("Country")
plt.title("Daily COVID-19 vaccine doses administered per million people");

for patch in ax.patches:
    width = patch.get_width()
    height = patch.get_height()
    x = patch.get_x()
    y = patch.get_y()
    
    plt.text(width + x, height + y, '{:.1f} '.format(width))

### Above findings illustrate daily Covid vaccine administrations per million people.
### The top 3 countries are as follows:
* Bhutan
* Falkland Islands
* Cook Islands

In [None]:
india_df = vaccinations_df[vaccinations_df['country'] == 'India']
india_df

In [None]:
india_df.info()

In [None]:
india_df.daily_vaccinations_raw.sum()

In [None]:
plt.figure(figsize=(16,16))
sns.lineplot(x=india_df.date, y=india_df.daily_vaccinations_raw)
plt.xlabel("Date")
plt.ylabel("Daily_Vaccination")
plt.title('Daily Vaccination count in India');

# The above illustrations displays daily vaccination count in India.

In [None]:
total_vaccinated_in = india_df.people_fully_vaccinated.max()/1000000

In [None]:
print("Total fully vaccinated people in India: {0:.2f}M".format(total_vaccinated_in))

## Number of fully vaccinated citizens in India is 65.53M as per current data although the total number is incrementing daily.

In [None]:
population_country=vaccinations_df.groupby('country')['total_vaccinations_per_hundred'].max().sort_values(ascending=False).head(15)

In [None]:
population_country.reset_index()

In [None]:
plt.figure(figsize= (15, 8))
ax = sns.barplot(x=population_country, y=population_country.index)
plt.title('Total Vaccinations / Population')
plt.xlabel('Total Vaccinations')
plt.ylabel('Country')

for patch in ax.patches:
    width = patch.get_width()
    height = patch.get_height()
    x = patch.get_x()
    y = patch.get_y()
    
    plt.text(width + x, height + y, '{:.1f} %'.format(width))

## Gibraltar is the number 1 country with maximum population fully vaccinated followed by UAE and Malta respectively.

In [None]:
df = pd.read_csv('../input/covid-world-vaccination-progress/country_vaccinations.csv')
df[df['iso_code'].isnull()]
df = df.drop('daily_vaccinations_raw', axis=1)

In [None]:
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date', ascending=True)
df['date'] = df['date'].dt.strftime('%Y-%m-%d')

In [None]:
uniques = df['date'].unique()

In [None]:
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date', ascending=True)

df['date'] = df['date'].dt.strftime('%m-%d-%Y')

In [None]:
df_copy = df.copy()

In [None]:
fig = px.choropleth(
    df_copy,                         
    locations="iso_code",         
    color="total_vaccinations",              
    hover_name="country",           
    animation_frame="date",
    color_continuous_scale= 'viridis',
    projection="natural earth",      
    range_color=[0,5000000],
    title='<span style="font-size:36px; font-family:Times New Roman">Number of vaccinations per country</span>',
)      
fig.show() 

# The above Choropleth shows **'Total Number of Vaccinations per Country'** on a **'Daily'** basis

In [None]:
fig = px.choropleth(
    df_copy,                   
    locations="iso_code",         
    color="daily_vaccinations",    
    hover_name="country",          
    animation_frame="date",
    color_continuous_scale= 'viridis',
    projection="natural earth",   
    range_color=[0,5000000],
    title='<span style="font-size:36px; font-family:Times New Roman">Daily vaccinations per country</span>',
) 
fig.show()

# The above Choropleth shows **'Number of Vaccinations administered per Country'** on a **'Daily'** basis

In [None]:
dff = df.copy()
dff = dff.dropna(subset=['vaccines'])
dff = dff.groupby(['iso_code', 'vaccines']).max()
dff = dff.reset_index()
dff['vaccine_split'] = dff['vaccines'].apply(lambda x: [w.strip() for w in x.split(',')])

In [None]:
from sklearn.preprocessing import MultiLabelBinarizer
one_hot = MultiLabelBinarizer()
data = one_hot.fit_transform(dff['vaccine_split'])
vac_names = one_hot.classes_
vac_countries=dff['country']

final_vac_df = pd.DataFrame(data=data, columns=vac_names, index=vac_countries)
final_vac_df = final_vac_df.reset_index()

In [None]:
df_country = final_vac_df[vac_names].sum(axis=0).sort_values()
#colors = [primary_grey]*4 + [primary_blue2]*4 + [primary_blue]*2 
colors=['gray']*4+['blue']*4+['blue']*2 
fig = go.Figure(go.Bar(
                x = df_country.values,
                y = df_country.index,
                orientation = 'h'))
fig.update_traces(
                marker_color = colors,
                marker_line_color = 'black',
                marker_line_width = 1.5,
                opacity = 0.6)
fig.update_layout(
    title='Top vaccines distributed')

# The above Bar Graph shows the count of vaccines delivered by their respective distributor (in descending order) wherein Oxford has manufactured 'Astra Zeneca' the most.

# Conclusion

## Hence we performed Exploratory Data Analysis on COVID-19 Vaccine dataset. In future, we can extend this project to perform predictive analysis and Time Series Forecasting.