---

<h1 style="text-align: center;font-size: 40px;">Vaccination Progress of Covid19</h1>

---

<center><img src="https://www.jhsph.edu/sebin/z/z/SARS-CoV-2-vaccine-820x440.jpg
"width="500" height="400"></center>

---

#### The data contains the following information:

- **Country**- this is the country for which the vaccination information is provided;
- Country ISO Code - ISO code for the country;
- **Date**- date for the data entry; for some of the dates we have only the daily vaccinations, for others, only the (cumulative) total;
- **Total number of vaccinations** - this is the absolute number of total immunizations in the country;
- **Total number of people vaccinated** - a person, depending on the immunization scheme, will receive one or more (typically 2) vaccines; at a certain moment, the number of vaccination might be larger than the number of people;
- **Total number of people fully vaccinated** - this is the number of people that received the entire set of immunization according to the immunization scheme (typically 2); at a certain moment in time, there might be a certain number of people that received one vaccine and another number (smaller) of people that received all vaccines in the scheme;
- **Daily vaccinations (raw)** - for a certain data entry, the number of vaccination for that date/country;
- **Daily vaccinations** -       for a certain data entry, the number of vaccination for that date/country;
- **Total vaccinations per hundred** - ratio (in percent) between vaccination number and total population up to the date in the country;
- **Total number of people vaccinated per hundred** - ratio (in percent) between population immunized and total population up to the date in the country;
- **Total number of people fully vaccinated per hundred** - ratio (in percent) between population fully immunized and total population up to the date in the country;
- **Number of vaccinations per day** - number of daily vaccination for that day and country;
- **Daily vaccinations per million** - ratio (in ppm) between vaccination number and total population for the current date in the country;
- **Vaccines used in the country** - total number of vaccines used in the country (up to date);
- **Source name** - source of the information (national authority, international organization, local organization etc.);
- **Source website** - website of the source of information;

In [None]:
!pip install pywaffle

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.colors import n_colors
from wordcloud import WordCloud,ImageColorGenerator
init_notebook_mode(connected=True)
from plotly.subplots import make_subplots
from pywaffle import Waffle
import warnings
warnings.filterwarnings("ignore")

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df = pd.read_csv("/kaggle/input/covid-world-vaccination-progress/country_vaccinations.csv")
df.head()

In [None]:
new_df = df.groupby(["country",'iso_code','vaccines'])['total_vaccinations','people_vaccinated','people_fully_vaccinated',
                                           'daily_vaccinations','total_vaccinations_per_hundred','people_vaccinated_per_hundred',
                                           "people_fully_vaccinated_per_hundred",'daily_vaccinations_per_million'].max().reset_index()
new_df.head()

> #### What is the proportion of Top 10 vaccine in the race of fighting Covid19?

In [None]:
new_df['vaccines'].nunique()

In [None]:
top10 = new_df['vaccines'].value_counts().nlargest(10)
top10

In [None]:
#plt.figure(figsize=(16,9))
data = dict(new_df['vaccines'].value_counts(normalize = True).nlargest(10)*100)

vaccine = ['Oxford/AstraZeneca', 'Moderna, Oxford/AstraZeneca, Pfizer/BioNTech',
       'Oxford/AstraZeneca, Pfizer/BioNTech',
       'Johnson&Johnson, Moderna, Oxford/AstraZeneca, Pfizer/BioNTech',
       'Pfizer/BioNTech', 'Sputnik V', 'Oxford/AstraZeneca, Sinopharm/Beijing',
       'Sinopharm/Beijing', 'Moderna, Pfizer/BioNTech',
       'Oxford/AstraZeneca, Pfizer/BioNTech, Sinovac'] 
            

fig = plt.figure( 
    rows=7,
    columns=12,
    FigureClass = Waffle, 
    values = data, 
    title={'label': 'Proportion of Vaccines', 'loc': 'center',
          'fontsize':15},
    colors=("#FF0000", "#FF7F00", "#FFD400","#FFFF00","#BFFF00","#6AFF00","#00EAFF","#0095FF","#0040FF","#AA00FF"),#"#FF00AA",
           #"#EDB9B9","#E7E9B9","#B9EDE0","#B9D7ED","#DCB9ED","#8F2323","#8F6A23","#4F8F23","#23628F"),#"#6B238F","#000000",
            #"#737373","#CCCCCC","#ffa500","#d3ffce","#00ff00","#ff6666","#66cdaa","#ff00ff","#c0d6e4","#ccff00","#8a2be2","#7fffd4"),#,"#DE3163"
    labels=[f"{k} ({v:.2f}%)" for k, v in data.items()],
    legend={'loc': 'lower left', 'bbox_to_anchor': (0, -0.3), 'ncol': 3, 'framealpha': 0},
    figsize=(12, 9)
)
fig.show()

Observation:
- In a range of percentage of vaccines 29.08% used Oxford/AstraZeneca
- 10.20% used Moderna, Oxford/AstraZeneca
- Oxford/AstraZeneca is the mostly used Vaccine

- Later **Pfizer/BioNTech** was the mostly used Vaccine and now it's in the **5th** place also **Oxford/AstraZeneca** was not in the top 3 & now it's in the **1st place****. Looks like **Oxford/AstraZeneca** works **best** among the vaccines

> #### What is the number of total vaccinations,people vaccinated,daily vaccinations according to countries?

In [None]:
data = new_df[['country','total_vaccinations']].nlargest(25,'total_vaccinations')
fig = px.bar(data, x = 'country',y = 'total_vaccinations',title="Number of total vaccinations according to countries",)
fig.show()

In [None]:
data = new_df[['country','people_vaccinated']].nlargest(25,'people_vaccinated')
fig = px.bar(data, x = 'country',y = 'people_vaccinated',title="Number of people vaccinated according to countries",)
fig.show()

In [None]:
data = new_df[['country','daily_vaccinations']].nlargest(25,'daily_vaccinations')
fig = px.bar(data, x = 'country',y = 'daily_vaccinations',title="Number of daily vaccinations according to countries",)
fig.show()

In [None]:
data = new_df[['country','people_vaccinated_per_hundred']].nlargest(30,'people_vaccinated_per_hundred')
fig = px.bar(data, x = 'country',y = 'people_vaccinated_per_hundred',title="Highest Number of people vaccinated per hundred according to Countries",)
fig.show()

> #### Which vaccine is used by which Country?

['equirectangular', 'mercator', 'orthographic', 'natural
            earth', 'kavrayskiy7', 'miller', 'robinson', 'eckert4',
            'azimuthal equal area', 'azimuthal equidistant', 'conic
            equal area', 'conic conformal', 'conic equidistant',
            'gnomonic', 'stereographic', 'mollweide', 'hammer',
            'transverse mercator', 'albers usa', 'winkel tripel',
            'aitoff', 'sinusoidal']

In [None]:
fig = px.choropleth(new_df,locations = 'country',locationmode = 'country names',color = 'vaccines',
                   title = 'Vaccines used by specefic Country',hover_data= ['total_vaccinations'],
                   )#projection = "mercator"
fig.show()

In [None]:
vacc = new_df["vaccines"].unique()
for i in vacc:
    c = list(new_df[new_df["vaccines"] == i]['country'])
    print(f"Vaccine: {i}\nUsed countries: {c}")
    print('-'*70)

> #### Which Vaccine is Used the most?

In [None]:
vaccine = new_df["vaccines"].value_counts().reset_index()
vaccine.columns = ['Vaccines','Number of Country']
vaccine

In [None]:
fig = px.bar(vaccine.nlargest(20,"Number of Country"),x='Vaccines',y='Number of Country',hover_data = '',
             title = 'Number of Countries each vaccine is being used',height = 800)
fig.show()

- Oxford/AstraZeneca  is being used by highest number of countries which is 57

> #### Total Vaccinations per country grouped by Vaccines:

In [None]:
fig = px.treemap(new_df,names = 'country',values = 'total_vaccinations',path = ['vaccines','country'],
                 title="Total Vaccinations per country grouped by Vaccines",
                 color_discrete_sequence =px.colors.qualitative.Set1)
fig.show()

In [None]:
fig = go.Choropleth(locations = new_df["country"],locationmode = 'country names',z = new_df['total_vaccinations'],
                                         text= new_df['country'],colorbar = dict(title= "Total Vaccinations") )
data = [fig]

layout = go.Layout(title = 'Total Vaccinations per Country')
fig = dict(data = data,layout = layout)
iplot(fig)

> #### Number of People Vaccinated per country grouped by Vaccines:

In [None]:
fig = px.treemap(new_df,names = 'country',values = 'people_vaccinated',path = ['vaccines','country'],
                 title="People Vaccinated per country grouped by Vaccines",
                 color_discrete_sequence =px.colors.qualitative.Bold)
fig.show()

In [None]:
fig = go.Choropleth(locations = new_df["country"],locationmode = 'country names',z = new_df['people_vaccinated'],
                                         text= new_df['country'],colorbar = dict(title= "People Vaccinated") )
data = [fig]

layout = go.Layout(title = 'People Vaccinated per Countries')
fig = dict(data = data,layout = layout)
iplot(fig)

> #### Daily Vaccinations per Countries:

In [None]:
fig = go.Choropleth(locations = new_df["country"],locationmode = 'country names',z = new_df['daily_vaccinations'],
                                         text= new_df['country'],colorbar = dict(title= "Daily Vaccinations") )
data = [fig]

layout = go.Layout(title = 'Daily Vaccinations per Countries')
fig = dict(data = data,layout = layout)
iplot(fig)

> #### Total Vaccinations per hundred according to each Country:

In [None]:
fig = go.Choropleth(locations = new_df["country"],locationmode = 'country names',z = new_df['total_vaccinations_per_hundred'],
                                         text= new_df['country'],
                    colorbar = dict(title= "Total Vaccinations per hundred"),reversescale =True,colorscale = 'viridis')
data = [fig]

layout = go.Layout(title = 'Total Vaccinations per hundred according to each Country')
fig = dict(data = data,layout = layout)
iplot(fig)

In [None]:
fig = px.scatter(new_df,x = 'total_vaccinations',y='total_vaccinations_per_hundred',size='total_vaccinations',
                 hover_name = 'country',size_max = 50,title="Total vs Total vaccinations per hundred grouped by Vaccines",
                color_discrete_sequence = px.colors.qualitative.Bold)
fig.show()

Observation:

- Although USA & China produce the highest number of vaccinations to their citizens, according to their population this is not much.

> #### What is the trend of total vaccinations,daily vaccinations,people vaccinated per hundred according to countries?

In [None]:
def plot_trend(dataframe,feature,title,country):
    plt.style.use('ggplot')
    plt.figure(figsize=(20,25))
    
    for i,country in enumerate(country):
        plt.subplot(8,4,i+1)
        data = dataframe[dataframe['country'] == country]
        sns.lineplot(x=data['date'] ,y=data[feature],label = feature)
        plt.xlabel('')
        plt.tick_params(axis='x',which='both',top=False,bottom=False,labelbottom=False)
        plt.title(country)
        
    plt.suptitle(title,y=1.05)
    plt.tight_layout()
    plt.show()

In [None]:
country = ['Argentina', 'Austria', 'Belgium', 'Brazil','Canada','China','Czechia', 'Denmark', 'England','Finland', 'France',
       'Germany','India','Ireland', 'Israel', 'Italy', 'Kuwait','Mexico', 'Netherlands','Norway', 'Poland','Russia',
        'Saudi Arabia', 'Scotland','Singapore','Spain', 'Sweden', 'Switzerland', 'Turkey',
        'United Arab Emirates', 'United Kingdom', 'United States']
plot_trend(df,'total_vaccinations','Trend of total vaccination',country)

In [None]:
def plot_trend2(dataframe,feature,title,country):
    plt.style.use('ggplot')
    plt.figure(figsize=(20,18))
    
    for i,country in enumerate(country):
        plt.subplot(5,5,i+1)
        data = dataframe[dataframe['country'] == country]
        sns.lineplot(x=data['date'] ,y=data[feature],label = feature)
        plt.xlabel('')
        plt.tick_params(axis='x',which='both',top=False,bottom=False,labelbottom=False)
        plt.legend(loc = 'upper left')
        plt.title(country)
        
    plt.suptitle(title,y=1.05)
    plt.tight_layout()
    plt.show()

In [None]:
country = ['Argentina', 'Austria', 'Belgium', 'Brazil','Canada','China','Czechia', 'Denmark', 'England','Finland', 'France',
       'Germany','Ireland', 'Israel', 'Italy','Mexico','Norway', 'Poland','Scotland','Singapore','Spain', 'Sweden','United Arab Emirates', 'United Kingdom', 
        'United States']
plot_trend2(df,'people_vaccinated_per_hundred','Trend of people vaccinated per hundred',country)