# COVID19 Analysis on Vaccination in the World

### Exploratory Data Analysis (EDA) on COVID19 Vaccination information with country level details

In [None]:
from IPython.display import Image

Image("../input/covid19-vaccine-image/Covid19_Vaccine.png")

We will try to answer some key questions as part of this EDA.
The intent is to see how much insights can we get using EDA and what all questions does it answer in the process.

There are 2 key parts:
* Part A: We try to use regular Python libraries to analyse data and answer few questions and perform visualizations
* Part B: We try to use "Pandas_Profiling" library and see how it automates lot of analysis in a much simpler effort

Let's get started.

# Part A:

## 1. Import required libraries

In [None]:
import pandas as pd
import pandas_profiling as pp
import numpy as np
import os
import time
import seaborn as sns
import matplotlib.pyplot as plt
import scipy as sp
import re
import warnings

warnings.filterwarnings("ignore")

#from IPython.display import Image

## 2. Datasource information 

In [None]:
df = pd.read_csv("../input/covid-world-vaccination-progress/country_vaccinations.csv")

df.head()

In [None]:
df.describe()

In [None]:
df.info()

In [None]:
df.shape

In [None]:
df.dtypes

In [None]:
df.isnull().sum()

## 3. Data Preparation

In [None]:
# Convert date column datatype to "date"
df["date"] = pd.to_datetime(df.date)

In [None]:
# Adding a new column to store COUNT of vaccinations by Country
df["total_vaccinations_count"] = df.groupby("country").total_vaccinations.tail(1)

Let's now try to answer few questions.
## Question1: What are the top 15 countries based on total vaccinations administered?

In [None]:
top_country_by_vaccine = df.groupby("country")["total_vaccinations_count"].mean().sort_values(ascending=False)

top_country_by_vaccine.head(15)

In [None]:
# Visualization
sns.set_style("darkgrid")
plt.figure(figsize=(9,9))
ax = sns.barplot(top_country_by_vaccine.head(15).values,top_country_by_vaccine.head(15).index)
ax.set_xlabel("Total Vaccination Count")
ax.set_ylabel("Country")
plt.show()

## Question2: What are the top 15 countries based on full doses of vaccinations administered?

In [None]:
#Top countries with fully vaccinated count of people
df["full_vaccination_count"]= df.groupby("country").people_fully_vaccinated.tail(1)

top_country_by_fullvaccinedose = df.groupby("country")["full_vaccination_count"].mean().sort_values(ascending = False).head(15)

top_country_by_fullvaccinedose

In [None]:
# Visualization of top countries with most full doses of vaccinations administered

plt.style.use("ggplot")
plt.figure(figsize= (9,9))
ax= sns.barplot(top_country_by_fullvaccinedose.values,top_country_by_fullvaccinedose.index)
ax.set_xlabel("Fully Vaccination Count")
ax.set_ylabel("Country")
plt.show()

Somehow, China does not appear in this list of "Full dose of vaccine count" where as they are the Top country with "Total vaccine count". Therefore based on data available, 2nd dose has not been completed relatively for China like the way partial vaccinations administered.

## Question3: What are the top common vaccines being administered?

In [None]:
# Most administered vaccines
most_vaccines = df.vaccines.unique()
y = list(most_vaccines)

most_vaccines = df.vaccines.value_counts().sort_values(ascending = False).head(15)

most_vaccines

Oxford/AstraZeneca seems to be the top vaccine which is administered across the world by count.

In [None]:
plt.figure(figsize=(9,9))
sns.countplot(y = "vaccines",data = df)
plt.ylabel("Vaccines")
plt.xlabel("Count")
plt.show()

## Question4: What is trend of total vaccinations by every day from inception of vaccine administration across the world?

In [None]:
#daily vaccinations
total_daily_vaccinations = df.groupby("date").daily_vaccinations.sum()
plt.style.use("ggplot")
plt.figure(figsize= (16,5))
sns.lineplot(total_daily_vaccinations.index,total_daily_vaccinations.values)
plt.xlabel("Date")
plt.show()

#### We could observe there is a downward trend on daily vaccinations for last few days.

## Question5: What is trend of top 5 vaccinated countries in terms of "Total Vaccinations"?

In [None]:
top_country_by_vaccine.head(5)

In [None]:
# Defining a dataframe for top 5 vaccinated countries in terms of "Total Vaccinations" administered
top5_country_by_vaccine = df.loc[(df.country== "United States") | (df.country== "China")| (df.country== "India")| (df.country== "Unted Kingdom")|(df.country== "Brazil")]

# Comparison
plt.figure(figsize= (18,6))
sns.lineplot(x = "date", y = "total_vaccinations" ,data = top5_country_by_vaccine,hue= "country")
plt.xlabel("Date")
plt.ylabel("Total Vaccinations Administered")
plt.title("Total Vaccination Comparison")
plt.show()


# Part B:

#### Now, we will try to use the power of Pandas_Profiling and see what all insights generated.

In [None]:
pp.ProfileReport(df)

### If we observe, usage of "Pandas_Profiling" gave some quick insights.
 
#### It provided a summary of number of variables, number of observations, missing values, duplicate rows %, variable types, correlation between variables in terms of Perason coefficient, Spearman coefficient etc, whether there is high cardinality within a variable in the dataset and so on.