# EDA project for COVID19 vaccination progress

Dataset is downloaded by Kaggle: https://www.kaggle.com/gpreda/covid-world-vaccination-progress

Content

1. The 'country vaccination' dataset contains the following information:

Country- this is the country for which the vaccination information is provided;

Country ISO Code - ISO code for the country;

Date - date for the data entry; for some of the dates we have only the daily vaccinations, for others, only the (cumulative) total;

Total number of vaccinations - this is the absolute number of total immunizations in the country;

Total number of people vaccinated - a person, depending on the immunization scheme, will receive one or more (typically 2) vaccines; at a certain moment, the number of vaccination might be larger than the number of people;

Total number of people fully vaccinated - this is the number of people that received the entire set of immunization according to the immunization scheme (typically 2); at a certain moment in time, there might be a certain number of people that received one vaccine and another number (smaller) of people that received all vaccines in the scheme;

Daily vaccinations (raw) - for a certain data entry, the number of vaccination for that date/country;

Daily vaccinations - for a certain data entry, the number of vaccination for that date/country;

Total vaccinations per hundred - ratio (in percent) between vaccination number and total population up to the date in the country;

Total number of people vaccinated per hundred - ratio (in percent) between population immunized and total population up to the date in the country;

Total number of people fully vaccinated per hundred - ratio (in percent) between population fully immunized and total population up to the date in the country;

Number of vaccinations per day - number of daily vaccination for that day and country;

Daily vaccinations per million - ratio (in ppm) between vaccination number and total population for the current date in the country;

Vaccines used in the country - total number of vaccines used in the country (up to date);

Source name - source of the information (national authority, international organization, local organization etc.);

Source website - website of the source of information;


2. The 'country vaccinations by manufacturer' dataset contains the following information:

Location - country;

Date - date;

Vaccine - vaccine type;

Total number of vaccinations - total number of vaccinations / current time and vaccine type.


Import libraries and plot style

In [1]:
 pip install plotly 

Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
%matplotlib inline

Import dataset and perform descripitve analysis

In [3]:
country=pd.read_csv('/Users/apple/Documents/Data science/EDA project/country_vaccinations.csv')
manu=pd.read_csv('/Users/apple/Documents/Data science/EDA project/country_vaccinations_by_manufacturer.csv')


FileNotFoundError: [Errno 2] No such file or directory: '/Users/apple/Documents/Data science/EDA project/country_vaccinations.csv'

In [None]:
country.head(20)

In [None]:
manu.head()

In [None]:
country.shape

In [None]:
manu.shape

In [None]:
country.describe()

In [None]:
manu.describe()

In [None]:
country.isnull().sum()

In [None]:
#Fill NaN with 0 and drop all countries with iso_code = 0
country.fillna(0,inplace=True)

The null value can be filled with 0 becuase the measurements are taken every couple day so cells with NaN are days when no measurement is taken so can be filled with 0.

In [None]:
manu.isnull().sum()

In [None]:
country.columns

In [None]:
country.drop(['daily_vaccinations_raw', 'source_name', 'source_website'],axis=1, inplace=True)

In [None]:
country.new=country[['country','total_vaccinations','people_vaccinated', 'people_fully_vaccinated']]
country.new
country.new=country.new.groupby('country').max()
country.new.reset_index()

In [None]:
country.new.shape

# New dataset with Asia countries only

In [None]:
Asia = country.new.loc[['Afghanistan','Armenia','Azerbaijan','Bahrain','Bangladesh','Bhutan','Brunei','Cambodia','China'
                        ,'Cyprus','Egypt','Hong Kong','Georgia','India','Indonesia','Iran','Iraq','Israel','Japan'
                       ,'Jordan','Kazakhstan','Kuwait','Kyrgyzstan','Laos','Lebanon','Malaysia','Maldives','Mongolia'
                       ,'Myanmar','Nepal','Oman','Pakistan','Palestine','Philippines','Qatar','Israel','Russia'
                       ,'Saudi Arabia','Singapore','South Korea','Sri Lanka','Syria','Taiwan','Tajikistan','Thailand','Timor','Turkey'
                       ,'Turkmenistan','United Arab Emirates','Uzbekistan','Vietnam','Yemen']]
Asia.reset_index()

In [None]:
Asia_vac = Asia.groupby('country').max().sort_values('total_vaccinations', ascending=False)
Asia_vac  = Asia_vac .iloc[:10]
Asia_vac 

In [None]:
plt.figure(figsize=(18, 6))
plt.bar(Asia_vac.index, Asia_vac.total_vaccinations)

plt.xticks(rotation = 90)
plt.ylabel('Total Vaccinations')
plt.xlabel('All Asian Country')
plt.show()

# Vaccination record in Hong Kong

In [None]:
country_HK = country[country['iso_code'] == 'HKG'].copy()
country_HK 

In [None]:
#Plot total vaccinations as a function of date
plt.figure(figsize=(18,6))
sns.lineplot(data=country_HK, x="date", y="total_vaccinations")
plt.title("Total vaccinations in Hong Kong")
plt.xticks(rotation=45)
plt.show()

The total vaccination has been increasing since the beginning of COVID.

In [None]:
#Plot daily vaccinations as a function of date
plt.figure(figsize=(18,6))
sns.lineplot(data=country_HK, x="date", y="daily_vaccinations")

plt.xticks(rotation=90)
plt.title("Daily vaccinations in Hong Kong")

HK experienced the 3rd wave of COVID-19 in July to August. They recorded the highest single day confirmed cases (149) since the outbreak on July 30th. Since the mean age of confirmed cases in the 3rd wave is the highest, the death rate of this eave is also the highest. The spike of confirmed cases and death rate might have motivated people to get vaccinated. 

# Choropleth Map of total vaccination

In [None]:
fig = px.choropleth(country.reset_index(), locations="iso_code",
                    color="total_vaccinations",
                    color_continuous_scale=px.colors.sequential.Electric,
                   title= "Total vaccinations")

fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})  #No margin on left, right, top and bottom
fig.show()

# Most popular vaccine in the world

In [None]:
#according to the country_vaccinations_by_manufacturer dataset, vaccines are classified considering country and total_vaccination
vac=manu.groupby("vaccine")["location", "total_vaccinations"].max().sort_values('total_vaccinations', ascending=False)
vac

In [None]:
vac_count = manu.groupby("vaccine")["total_vaccinations"].max()
vac_counts = vac_count.sort_values(ascending = False)
vac_counts

In [None]:
sns.barplot(vac_counts.index, vac_counts)
plt.xticks(rotation=75)

# Country with highest number of vaccinated population

In [None]:
vaccinated_country=country.groupby('country')[['total_vaccinations']].max()
most_vaccinated_country=vaccinated_country.sort_values('total_vaccinations', ascending=False).head(10)
most_vaccinated_country

In [None]:
plt.plot(most_vaccinated_country, 'p-c')
plt.xticks(rotation=75)
plt.title('Top 10 Countries in vaccinating more people');

# Top 10 countries with the highest vaccination ratio

Total vaccinations per hundred - ratio (in percent) between vaccination number and total population up to the date in the country

In [None]:
vaccinated_prop_country=country.groupby('country')[['total_vaccinations_per_hundred']].max()
vaccinated_prop_country=vaccinated_prop_country.sort_values('total_vaccinations_per_hundred', ascending=False).head(10)
vaccinated_prop_country

The number can be higher than 100 because some people receive more than 1 dose.

In [None]:
plt.plot(vaccinated_prop_country, 'p-c')
plt.xticks(rotation=75)
plt.title('Top 10 Countries in highest ratio of vaccination');

# Countries with highest number of people fully vaccinated
Immunization refers to recieving entire set of vaccinations, typically 2. According to the country_vaccinations dataset, the country named Gibralter was ranked as first and Israel placed as fifth.

In [None]:
most_people_fully_immunized=country.groupby('country')[['people_fully_vaccinated_per_hundred']].max()
most_people_fully_immunized_country=most_people_fully_immunized.sort_values('people_fully_vaccinated_per_hundred', ascending=False).head(10)
most_people_fully_immunized_country

In [None]:
plt.plot(most_people_fully_immunized_country, 'p-m')
plt.xticks(rotation=75)
plt.title('Top 10 Fully Vaccinated Countries per hundred');

In this WHO dataset which recorded vaccination data on country level, we were able to extract many useful information. China, India, and the United States are the top 3 countries with the highest number of total vaccination administered. This high number of vaccination can be attributed to the country's large population, as they were not countries with the highest vaccination ratio. Gibraltar (a British Overseas Territory), Cuba, and Chile were ranked the highest when examining ratio (in percent) between vaccination number and total population. Gibraltar,Pitcairn Island (another British Overseas Territory), and United Arab Emirates were the 3 countries with the highest proportion of people fully vaccinated (typically 2 doses). However, with the new Omicron variates and urges for people to get the 3rd jab, the ranking is expected to change. As for the vaccine, Pfizer/BioNTech,Moderna, andOxford/AstraZeneca are the top 3 vaccines being administrated around the world with Sinovac comes up as a close 4th. An new dataset containing only vaccination record of Asia countries is extracted for possible ML project. 