## Brazil vs India - Vaccination progress

Hi! I did this notebook to get some data visualization training. If you find it helpful, please upvote! :)

## Imports

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
import warnings
warnings.filterwarnings('ignore')
sns.set_style('white')

## Read Data

In [None]:
data = pd.read_csv('../input/covid-world-vaccination-progress/country_vaccinations.csv')
data.head()

In [None]:
data.info()

### Some colors

In [None]:
india_palette = ['#ff9933','#138808','#000080']
brazil_palette =['#009c3b','#ffdf00','#002776']

## Exploratory Data Analysis


In [None]:
data['date'] = pd.to_datetime(data['date'])

As we are going to use only Brazil and India countries, let's make new datasets with those countries.

In [None]:
df_br = data[ data['country'] == 'Brazil' ]
df_in = data[ data['country'] == 'India' ]

In [None]:
df_br.head()

In [None]:
df_in.head()

In [None]:
df_br.isnull().sum()

In [None]:
df_in.isnull().sum()

We have some missing data here. Before starting our analysis, I'll deal with some of those values.

I'm going to drop rows where total_vaccinations_per_hundred is null.

In [None]:
df_br = df_br[df_br['total_vaccinations_per_hundred'].notna()]
df_in = df_in[df_in['total_vaccinations_per_hundred'].notna()]


As we can see below, both Brazil and India uses **Oxford/AstraZeneca** vaccine. Brazil also uses **Sinovac** and India also uses **AstraZeneca**.

In [None]:
pd.DataFrame([df_br['vaccines'].unique(), df_in['vaccines'].unique()],columns=['Vaccine'],index=['Brazil','India'])

If we check the total vaccinations over time of those countries, we can clearly see that India has a higher absolute value of total vaccinations. But it's important to notice that India population is like 6 times bigger than Brazil population.

In [None]:
fig, ax = plt.subplots(1,1, figsize=(12,6))

g1 = sns.lineplot(x=df_in['date'],y=df_in['total_vaccinations'],color=india_palette[0])
g1.fill_between(df_in['date'], 0, df_in['total_vaccinations'], color=india_palette[0], alpha=0.9)

g2 = sns.lineplot(x=df_br['date'],y=df_br['total_vaccinations'],color=brazil_palette[0])
g2.fill_between(df_br['date'], 0, df_br['total_vaccinations'], color=brazil_palette[0], alpha=0.9)


for i in ['top', 'left', 'right','bottom']:
    ax.spines[i].set_visible(False)
    
fig.text(0.1, 0.82, 'Absolute number of total immunizations in Brazil and India over time', 
       fontsize=14, fontweight='bold', fontfamily='serif',color='black')
fig.text(0.18, 0.33, 'Brazil', 
       fontsize=14, fontweight='bold', fontfamily='serif',color=brazil_palette[0])
fig.text(0.24, 0.33, 'vs', 
       fontsize=14, fontweight='bold', fontfamily='serif',color='black')
fig.text(0.265, 0.33, 'India', 
       fontsize=14, fontweight='bold', fontfamily='serif',color=india_palette[0])

ax.yaxis.tick_right()
ax.tick_params(length=0)
plt.xlabel('')
plt.ylabel('')

Looking at the next chart (of daily vaccinations), can we conclude that India is doing a better job than Brazil?

In [None]:
fig, ax = plt.subplots(1,1, figsize=(12,6))

g1 = sns.lineplot(x=df_in['date'],y=df_in['daily_vaccinations'],color=india_palette[0])
g1.fill_between(df_in['date'], 0, df_in['daily_vaccinations'], color=india_palette[0], alpha=0.9)

g2 = sns.lineplot(x=df_br['date'],y=df_br['daily_vaccinations'],color=brazil_palette[0])
g2.fill_between(df_br['date'], 0, df_br['daily_vaccinations'], color=brazil_palette[0], alpha=0.9)

fig.text(0.1, 0.82, 'Total amount of vaccines on a date in Brazil and India', 
       fontsize=14, fontweight='bold', fontfamily='serif',color='black')
fig.text(0.18, 0.33, 'Brazil', 
       fontsize=14, fontweight='bold', fontfamily='serif',color=brazil_palette[0])
fig.text(0.24, 0.33, 'vs', 
       fontsize=14, fontweight='bold', fontfamily='serif',color='black')
fig.text(0.265, 0.33, 'India', 
       fontsize=14, fontweight='bold', fontfamily='serif',color=india_palette[0])

for i in ['top', 'left', 'right','bottom']:
    ax.spines[i].set_visible(False)

ax.yaxis.tick_right()
ax.tick_params(length=0)
plt.xlabel('')
plt.ylabel('')


What happens if we check the total vaccinations per hundred? Well, the lines switch sides. Even with India having a greater number of total vaccinations, Brazil got a higher relation between vaccination number and total population of the country.

In [None]:
fig, ax = plt.subplots(1,1, figsize=(12,6))


g1 = sns.lineplot(x=df_br['date'],y=df_br['total_vaccinations_per_hundred'],color=brazil_palette[0])
g1.fill_between(df_br['date'], 0, df_br['total_vaccinations_per_hundred'], color=brazil_palette[0], alpha=0.9)

g2 = sns.lineplot(x=df_in['date'],y=df_in['total_vaccinations_per_hundred'],color=india_palette[0])
g2.fill_between(df_in['date'], 0, df_in['total_vaccinations_per_hundred'], color=india_palette[0], alpha=0.9)

fig.text(0.1, 0.82, 'Ratio between vaccination number and total population in Brazil and India (%)', 
       fontsize=14, fontweight='bold', fontfamily='serif',color='black')

for i in ['top', 'left', 'right','bottom']:
    ax.spines[i].set_visible(False)

fig.text(0.18, 0.33, 'Brazil', 
       fontsize=14, fontweight='bold', fontfamily='serif',color=brazil_palette[0])
fig.text(0.24, 0.33, 'vs', 
       fontsize=14, fontweight='bold', fontfamily='serif',color='black')
fig.text(0.265, 0.33, 'India', 
       fontsize=14, fontweight='bold', fontfamily='serif',color=india_palette[0])

ax.yaxis.tick_right()
ax.tick_params(length=0)
plt.xlabel('')
plt.ylabel('')

In the chart below, you can check the comparison (between the two countries) of the highest observed percentage of people vaccinated per hundred.

In [None]:
y = [df_br['people_vaccinated_per_hundred'].max(),
     df_in['people_vaccinated_per_hundred'].max()]
x = ['Brazil','India']
  
g=sns.barplot(x,y,palette=[brazil_palette[0],india_palette[0]])

g.text(-0.5, 6, 'Max percentage of people vaccinated in Brazil and India', 
       fontsize=14, fontweight='bold', fontfamily='Serif',color='black')

for i in ['top', 'left', 'right','bottom']:
    g.spines[i].set_visible(False)

for i in range(2):
    g.annotate(f'{round(y[i])}%', 
                xy=(i,(y[i]/2)),
                ha = 'center', va='center',fontsize=40, fontweight='bold', 
                fontfamily='Serif', color='white')
    g.annotate(f'{x[i]}', 
                xy=(i,y[i]/2-0.8),
                ha = 'center', va='center',fontsize=12, fontweight='bold', 
                fontfamily='Serif', color='white')
    
for i in ['top', 'left', 'right','bottom']:
    g.spines[i].set_visible(False)

g.set(yticklabels=[],xticklabels=[])

Now we have a graph similar to the previous one, but with information of people **fully** vaccinated per hundred.

In [None]:
y = [df_br['people_fully_vaccinated_per_hundred'].max(),
     df_in['people_fully_vaccinated_per_hundred'].max()]
x = ['Brazil','India']

g=sns.barplot(x,y,palette=[brazil_palette[0],india_palette[0]])

g.text(-0.5,2, 'Max percentage of people fully vaccinated in Brazil and India', 
       fontsize=14, fontweight='bold', fontfamily='Serif',color='black')

for i in range(2):
    g.annotate(f'{round(y[i])}%', 
                xy=(i,(y[i]/2)+0.07),
                ha = 'center', va='center',fontsize=40, fontweight='bold', 
                fontfamily='Serif', color='white')
    g.annotate(f'{x[i]}', 
                xy=(i,y[i]/2-0.22),
                ha = 'center', va='center',fontsize=12, fontweight='bold', 
                fontfamily='Serif', color='white')
    
for i in ['top', 'left', 'right','bottom']:
    g.spines[i].set_visible(False)

g.set(yticklabels=[],xticklabels=[])

Our visualizations made us realize that although India has some higher absolute numbers, Brazil is doing a better job of vaccination coverage.

