# COVID-19 World Vaccination Progress

# # # Task Details

**Answer to questions:**

1. What vaccines are used and in which countries?
1. What country is vaccinated more people?
1. What country is vaccinated a larger percent from its population?

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
!pip install chart-studio

In [None]:
# importing all the libraries to be used
from matplotlib import pyplot as plt
import seaborn as sns
import chart_studio.plotly as py
import cufflinks as cf
import plotly.graph_objs as go
import plotly.express as px
%matplotlib inline

from plotly.offline import download_plotlyjs, plot, init_notebook_mode, iplot
init_notebook_mode(connected=True)# initiate notebook for offline plot
cf.go_offline()

**Importing Data File (in csv)**

In [None]:
covid_df = pd.read_csv("/kaggle/input/covid-world-vaccination-progress/country_vaccinations.csv")
covid_df.head(10)

**Shape of DataFrame**

In [None]:
covid_df.shape

**Data Cleaning by finding all the missing or null values and handling them**

In [None]:
covid_df.isnull().sum()

**all the paramters are based on total vaccinations, people vaccinated so lets drop those which are null**

In [None]:
covid_df.dropna(subset=["total_vaccinations", "people_vaccinated"], how="all", inplace=True)

**Validate Shape**

In [None]:
covid_df.isnull().sum()

**Task1: What vaccines are used and in which countries?**

**What we mainly need is country, iso_code, vaccines**

In [None]:
vaccinesbycountry_df = covid_df[['country', 'iso_code','vaccines']]
vaccinesbycountry_df.head()

**Grouping of Data by Countries**

In [None]:
vaccinesbycountry_grouped = vaccinesbycountry_df.groupby("country").max()
vaccinesbycountry_grouped

**Lets try to plot it on the World Map. Hover and Zoom on the country to view the details of vaccines being used**

In [None]:
fig = px.choropleth(vaccinesbycountry_grouped, locations="iso_code", projection="natural earth",
                    color=vaccinesbycountry_grouped.index, hover_name="vaccines")

fig.update_layout(title="Vaccines used by each Country")

iplot(fig)

**Task2: What country is vaccinated more people?**

**We need to find and observe which country has in total vaccinated more people irrespective of its population size.
So we need to handle total_vaccinations column data.**

In [None]:
#list of all the available countries
countries = covid_df.country.unique()
print(countries)
print(len(countries))

In [None]:
# total vaccinations available in each country
total_vaccinations = covid_df.groupby(by="country").sum()
total_vaccinations.head()

In [None]:
#sort vaccinations available in each country from highest to lowest
total_vaccinations.sort_values(by="total_vaccinations", ascending=False, inplace=True)
total_vaccinations.head()

**Countries with highest number of total vaccinations **

In [None]:
total_vacc_sorted = total_vaccinations[:10].sort_values(by="total_vaccinations")
trace = go.Bar(x=total_vacc_sorted["total_vaccinations"], y=total_vacc_sorted.index, 
               orientation="h", marker=dict(
                  opacity=0.8,
                  color=np.arange(12)
              ))

fig = go.Figure(data=[trace])
fig.update_layout(title="Top 10 Countries with Maximum Vaccinations")
fig.update_xaxes(title="Total Vaccination")
fig.update_yaxes(title="Country")

iplot(fig)

**We can clearly observe from this Bar plot that USA followed by UK and China have the maximum number of available vaccinations.
But that does not mean that a high ratio of people have been vaccinated as the population size is also on the higher side in these countries.**

That can be observed below in Task3

**Task3: What country is vaccinated a larger percent from its population?**

In [None]:
people_vacc_df = covid_df[['country', 'total_vaccinations_per_hundred']]
people_vacc_df.head()

In [None]:
# mean of all the values of total_vaccinations_per_hundred for each country
# total vaccinations will not be a good measure as US, UK has greater value of it but their population size is also higher
people_vacc_grouped = people_vacc_df.groupby("country").mean()
people_vacc_grouped.head()

**Sorting the data based on total_vaccinations_per_hundred from higher to lower**

In [None]:
people_vacc_grouped.sort_values(by="total_vaccinations_per_hundred", ascending=False, inplace=True)
people_vacc_grouped.head()

**We can observe that Israel, UAE, etc. have a higher ratio of total vaccinations per hundred as compared to USA, UK, China**

In [None]:
# top vaccines combination available
trace = go.Bar(x=people_vacc_grouped.index[:25], y=people_vacc_grouped["total_vaccinations_per_hundred"], 
                marker=dict(
                  color=np.arange(26)
              ))

fig = go.Figure(data=[trace])
fig.update_layout(title="Top 25 highly vaccinated countries (as per population)")
fig.update_xaxes(title="Country")
fig.update_yaxes(title="Vaccination per hundred")

iplot(fig)