# Covid 19 vaccination analysis by Martina Raabe

<img src="https://media.giphy.com/media/eTnTNQZkGoLPIXLWGu/giphy.gif">

[Source](https://giphy.com/search/corona-vaccination)

<h1 style='background: black; border:1; color: white'><center>Acknowledgement</center></h1>

This Notebook would not have been possible without the dataset provided by [@Gabriel Preda](https://www.kaggle.com/gpreda). The analysis is based on the [Covid-19 world vaccination progress](https://www.kaggle.com/gpreda/covid-world-vaccination-progress) dataset.

<h1 style='background: black; border:1; color: white'> <center>Introduction</center></h1>

This is a basic exploratory data analysis of the Covid-19 vaccination process. Since the approval of several vaccinations at the end of 2020 and beginning of 2021 the countries started their vaccination process.

The dataset contains the vaccination progress and status for 80 different countries. Hopefully this drive will bring success and save millions of lives around the world.


If you like this project then don't forget to **upvote**. Thanks





In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('dark')
sns.set(color_codes = True)
from wordcloud import WordCloud, ImageColorGenerator
from PIL import Image

[](https://media.giphy.com/media/2DZ26eg4k3vXVRue0G/giphy.gif)

<h1 style='background: black; border:1; color: white'><center>Importing and getting to know the data</center></h1>

In [None]:
#import dataset into a DataFrame

file_path = '../input/covid-world-vaccination-progress/country_vaccinations.csv'
data = pd.read_csv(file_path) #parse_dates=['date']
data.head()

In [None]:
data.describe()

In [None]:
data.info()

In [None]:
# dropping not needed columns source name and source website

data = data.drop(['source_name', 'source_website'], axis=1)
data.head()

In [None]:
# countries in the data set

countries = data['country'].unique()

print ('There are', len(countries), 'in the dataset.', '\n') 
print ('The following countries can be found in the dataset:','\n','\n', countries)

In [None]:
# get the latest reporting date

latest = list(data['date'].unique())
latest.sort(reverse=True)
latest = latest[0]
print('The latest date in the dataset is {}'.format (latest))


#create a dataframe from the latest date: data_latest_date

data_latest_date = data[data['date'] == latest]
data_latest_date.head()

<h1 style='background: black; border:1; color: white'><center>What country has vaccinated more people</center></h1>

In [None]:
# number of total vaccinations per country

total_vaccinations = data.groupby('country')['total_vaccinations'].max().sort_values(ascending=False)
df_total_vaccinations = pd.DataFrame(total_vaccinations).reset_index()
df_total_vaccinations_15 = df_total_vaccinations.iloc[:15, :]
df_total_vaccinations_15 

In [None]:
size_label=12
size_ticks = 10
size_title= 20

plt.figure(figsize=(12,10))
sns.barplot('total_vaccinations','country', data=df_total_vaccinations_15, orient='h', palette="rocket")

plt.title('Top 15 countries that have given most vaccinations on {}'.format(latest), size=size_title)
plt.xlabel('Country', size=size_label)
plt.ylabel('Number of vaccinations', size=size_label)
plt.xticks(size=size_ticks)
plt.yticks(size=size_ticks)

plt.show()

In total, 
1. US, 
2. China and 
3. UK / England 
are the countries which have given most vaccinations in total numbers.

Germany is only on the 9th place.

In [None]:
# creating the data for next two plots
vaccinations_per_hundred = data.groupby('country')['people_vaccinated_per_hundred'].max().sort_values(ascending=False)
vaccinations_per_hundred = pd.DataFrame(vaccinations_per_hundred).reset_index()
vaccinations_per_hundred_15 = vaccinations_per_hundred.iloc[:15,:]


full_vaccinations_per_hundred = data.groupby('country')['people_fully_vaccinated_per_hundred'].max().sort_values(ascending=False)
full_vaccinations_per_hundred = pd.DataFrame(full_vaccinations_per_hundred).reset_index()
full_vaccinations_per_hundred_15 = full_vaccinations_per_hundred.iloc[:15,:]

In [None]:
# number of vaccinations in percent of the total population
# column: people_fully_vaccinated_per_hundred vs people_vaccinated_per_hundred
# make a subplot with these two columns per country (or best 15 countries) or make both columns in one figure
size = 18
fig, ax = plt.subplots(1,2, figsize=(24,6))
fig.suptitle('Comparison between people fully vaccinated and people with only 1 vaccine shot', size = size)
ax[0].set_title('Top 15 countries people vaccinated with 1 shot', size = size)
ax[1].set_title('Top 15 countries people fully vaccinated', size = size)

sns.barplot(ax=ax[0],x='people_vaccinated_per_hundred',y='country', data=vaccinations_per_hundred_15, orient='h',palette="rocket")
sns.barplot(ax=ax[1],x='people_fully_vaccinated_per_hundred',y='country', data=full_vaccinations_per_hundred_15, palette="rocket" ,orient='h')


In [None]:
# Filter dataset for selected countries
selected_countries = ['Israel', 'Germany', 'United States', 'United Arab Emirates']
data_countries = data[data['country'].isin(selected_countries)].sort_values(by='date', ascending=True)


plt.figure(1, figsize=(20,10))
sns.lineplot(x='date',y='daily_vaccinations', hue='country', data = data_countries)
plt.xticks(rotation=70)
plt.title('Timeline of vaccinations for selected countries', size=20)
plt.legend()
plt.show()

<h1 style='background: black; border:1; color: white'><center>Vaccines used in the countries</center></h1>

In [None]:
country_vaccine = pd.DataFrame(data.groupby('vaccines')['country'].unique()).reset_index().sort_values(by='vaccines', ascending=True)
country_vaccine.head()


In [None]:
# Splitting the vaccine column and creating one list with all mentioned vaccines

vaccines_split = data['vaccines'].str.split(',')

# Create a list of all vaccines
all_vaccines = []
for item in vaccines_split:
    for vaccine in item:
        all_vaccines.append(vaccine.strip())
        
# Replacing Sinopharm/Beijing and Sinopharm/Wuhan with Sinopharm
all_vaccines = [vaccine.replace('Sinopharm/Beijing', 'Sinopharm').replace('Sinopharm/Wuhan', 'Sinopharm') for vaccine in all_vaccines]


# Joining the list where all items are separated by whitespace
all_vaccines_joined = ' '.join(all_vaccines)
all_vaccines_joined[:300]

In [None]:
# Plotting a WordCloud with all used vaccines

wordCloud = WordCloud(
    background_color='white',
    max_font_size = 50).generate(all_vaccines_joined)

plt.figure(figsize=(15,8))
plt.axis('off')
plt.imshow(wordCloud,  interpolation="bilinear")
plt.show()

In [None]:
# Create count for vaccines in the list 'all_vaccines' to calculate percentage

from collections import Counter 

d = Counter(all_vaccines) 

# Print all occurences in ascending order
for key, value in sorted(d.items(), key=lambda x: x[1], reverse=True):
    print('{} occured {} times in the list'.format(key, value))

In [None]:
# Create a DataFrame out of the Counter object

vaccine_count = pd.DataFrame.from_dict(d, orient='index').reset_index()
vaccine_count.rename(columns={'index':'vaccine', 0:'count_vaccine'}, inplace=True)
vaccine_count.head()

In [None]:
# Create a pie chart to display the most mentioned vaccins
cmap = plt.get_cmap('Spectral')
colors = [cmap(i) for i in np.linspace(0, 5, 60)]

plt.figure(1, figsize=(20,10))
plt.pie(vaccine_count['count_vaccine'], labels=vaccine_count['vaccine'], autopct='%1.1f%%', colors=colors)
plt.show()

The Pfizer/BionTech vaccin is mostly used with 56%. Moderna (16%) and AstraZeneca (11%) are on the second and third place.