![alt text](purple.png "Vaccination Progress")

> # `Covid-19 World Vaccination Progress`

![alt text](covid-19.png "Vaccination Progress")


![alt text](purple.png "Vaccination Progress")

> # How are we doing ? An Introduction

So... I officially started learning Python in November 2020, the corona plagued year. Probably I wouldn't have if the pandemic hadn't happened. It has been a little over a year and we are all happy about the vaccination. All we are asking for is a little normalcy please. Basically, that's the inspiration behind this little exploratory data analysis. 

At the end of January 2021, I am interested in answers to questions such as;

The number of countries who have started vaccination.
The distribution across continents (I'm Ghanaian, so I'm particulary keen to know how Africa is doing).
The type of vaccines used across continents i.e. Are they evenly distributed?


So yeahhhhh! Let's go! 

I found the dataset on [kaggle](https://www.kaggle.com/gpreda/covid-world-vaccination-progress). Feel free to analyse it as well. 




![alt text](split.png "Vaccination Progress")

### The data contains the following information:

> **`country`** - this is the country for which the vaccination information is provided.


> **`iso_code`** - ISO code for the country.


> **`date`** - date for the data entry; for some of the dates we have only the daily vaccinations, for others, only the cummulative total.

> **`total_vaccinations`** - this is the absolute number of total immunizations in the country;

> **`people vaccinated`** - a person, depending on the immunization scheme, will receive one or more (typically 2) vaccines; at a certain moment, the number of vaccination might be larger than the number of people;

> **`people_fully_vaccinated`** - this is the number of people that received the entire set of immunization according to the immunization scheme (typically 2); at a certain moment in time, there might be a certain number of people that received one vaccine and another number (smaller) of people that received all vaccines in the scheme;

> **`daily_vaccinations_raw`** - for a certain data entry, the number of vaccination for that date/country;

> **`daily_vaccinations`** - for a certain data entry, the number of vaccination for that date/country;

> **`total_vaccinations_per_hundred`** - ratio (in percent) between vaccination number and total population up to the date in the country;

> **`people_vaccinated_per_hundred`** - ratio (in percent) between population immunized and total population up to the date in the country;


> **`people_fully_vaccinated_per_hundred`** - ratio (in percent) between population fully immunized and total population up to the date in the country;                                                                                                            

> **`daily_vaccinations_per_million`** - ratio (in ppm) between vaccination number and total population for the current date in the country;


> **`vaccines`**  - vaccines used in the country (up to date);


> **`source_name`** - source of the information (national authority, international organization, local organization etc.);

> **`source_website`** - website of the source of information;

![alt text](split.png "Vaccination Progress")

## Import libraries

In [2]:
import pandas as pd
import numpy as np 


## Let us take a quick look at the dataset

In [3]:
covid_vaccine = pd.read_csv("Dataset\country_vaccinations.csv")

In [4]:
covid_vaccine

Unnamed: 0,country,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,vaccines,source_name,source_website
0,Algeria,DZA,29/01/2021,0.0,,,,,0.00,,,,Sputnik V,Ministry of Health,https://www.aps.dz/regions/116777-blida-covid-...
1,Algeria,DZA,30/01/2021,30.0,,,30.0,30.0,0.00,,,1.0,Sputnik V,Ministry of Health,https://www.aps.dz/regions/116777-blida-covid-...
2,Argentina,ARG,29/12/2020,700.0,,,,,0.00,,,,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
3,Argentina,ARG,30/12/2020,,,,,15656.0,,,,346.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
4,Argentina,ARG,31/12/2020,32013.0,,,,15656.0,0.07,,,346.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1929,Wales,,27/01/2021,336745.0,336071.0,674.0,23801.0,20845.0,10.68,10.66,0.02,6611.0,"Oxford/AstraZeneca, Pfizer/BioNTech",Government of the United Kingdom,https://coronavirus.data.gov.uk/details/health...
1930,Wales,,28/01/2021,362970.0,362253.0,717.0,26225.0,21463.0,11.51,11.49,0.02,6807.0,"Oxford/AstraZeneca, Pfizer/BioNTech",Government of the United Kingdom,https://coronavirus.data.gov.uk/details/health...
1931,Wales,,29/01/2021,378950.0,378200.0,750.0,15980.0,19705.0,12.02,12.00,0.02,6250.0,"Oxford/AstraZeneca, Pfizer/BioNTech",Government of the United Kingdom,https://coronavirus.data.gov.uk/details/health...
1932,Wales,,30/01/2021,404249.0,403463.0,786.0,25299.0,19885.0,12.82,12.80,0.02,6307.0,"Oxford/AstraZeneca, Pfizer/BioNTech",Government of the United Kingdom,https://coronavirus.data.gov.uk/details/health...


### Let us drop all duplicates and only keep entries for the latest date for each country in the data. 

> We are interested in the number of countries who are in the dataset i.e. countries who have started vaccination. 

In [57]:
covid_vaccine_latest = covid_vaccine.drop_duplicates('country', keep='last')
covid_vaccine_latest['country'].replace(r'Czechia', 'Czech Republic', regex=True, inplace=True) #corrected

### Let us take a look at the new table

In [6]:
covid_vaccine_latest

Unnamed: 0,country,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,vaccines,source_name,source_website
1,Algeria,DZA,30/01/2021,30.0,,,30.0,30.0,0.00,,,1.0,Sputnik V,Ministry of Health,https://www.aps.dz/regions/116777-blida-covid-...
36,Argentina,ARG,01/02/2021,375851.0,281577.0,94274.0,,11924.0,0.83,0.62,0.21,264.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
59,Austria,AUT,01/02/2021,202345.0,186479.0,15866.0,1847.0,8714.0,2.25,2.07,0.18,968.0,Pfizer/BioNTech,Ministry of Health,https://www.data.gv.at/katalog/dataset/589132b...
99,Bahrain,BHR,31/01/2021,171568.0,171568.0,,1135.0,1841.0,10.08,10.08,,1082.0,"Pfizer/BioNTech, Sinopharm",Ministry of Health,https://twitter.com/MOH_Bahrain/status/1355984...
134,Belgium,BEL,31/01/2021,296961.0,279195.0,17766.0,368.0,9268.0,2.56,2.41,0.15,800.0,"Moderna, Pfizer/BioNTech",Sciensano,https://datastudio.google.com/embed/u/0/report...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1769,Turkey,TUR,01/02/2021,2136299.0,,,150062.0,119078.0,2.53,,,1412.0,Sinovac,COVID-19 Vaccine Information Platform,https://covid19asi.saglik.gov.tr/
1797,United Arab Emirates,ARE,01/02/2021,3440777.0,3190777.0,250000.0,106615.0,124241.0,34.79,32.26,2.53,12562.0,"Pfizer/BioNTech, Sinopharm",National Emergency Crisis and Disaster Managem...,http://covid19.ncema.gov.ae/en
1840,United Kingdom,GBR,31/01/2021,9790576.0,9296367.0,494209.0,322194.0,392361.0,14.42,13.69,0.73,5780.0,"Oxford/AstraZeneca, Pfizer/BioNTech",Government of the United Kingdom,https://coronavirus.data.gov.uk/details/health...
1883,United States,USA,31/01/2021,31123299.0,25201143.0,5657142.0,1545397.0,1324949.0,9.40,7.61,1.71,4003.0,"Moderna, Pfizer/BioNTech",Centers for Disease Control and Prevention,https://covid.cdc.gov/covid-data-tracker/#vacc...


# Q: How many countries have started vaccination?

In [7]:
print('There are a total of {} countries who have started vaccination according to this data'.format(len(covid_vaccine_latest)))

There are a total of 67 countries who have started vaccination according to this data


# A: These 67 countries that have vaccinated atleast one citizen. 
>Algeria, Argentina, Austria,
 Bahrain,
 Belgium,
 Bermuda,
 Brazil,
 Bulgaria',
 Canada,
 Chile,
 China,
 Costa Rica,
 Croatia,
 Cyprus,
 Czech Republic,
 Denmark,
 Ecuador,
 England,
 Estonia,
 Finland,
 France,
 Germany',
 Gibraltar',
 Greece,
 Hungary,
 Iceland,
 India,
 Indonesia,
 Ireland,
 Isle of Man,
 Israel,
 Italy,
 Kuwait,
 Latvia,
 Lithuania,
 Luxembourg,
 Malta,
 Mexico,
 Morocco,
 Myanmar,
 Netherlands,
 Northern Cyprus,
 Northern Ireland,
 Norway,
 Oman,
 Palau,
 Panama,
 Poland,
 Portugal,
 Romania,
 Russia,
 Saudi Arabia,
 Scotland,
 Serbia,
 Seychelles,
 Singapore,
 Slovakia,
 Slovenia,
 Spain,
 Sri Lanka,
 Sweden,
 Switzerland,
 Turkey,
 United Arab Emirates,
 United Kingdom,
 United States and Wales.

![alt text](split.png "Vaccination Progress")

## How are these countries distributed accross continents?

### To help us answer this, we will need to bring in extra data. 

I obtained a table of countries in the world and their continents from [here](https://www.whatarethe7continents.com/how-many-countries-in-the-world/)

In [11]:
countries = pd.read_html('https://www.whatarethe7continents.com/how-many-countries-in-the-world/')[0]
countries

Unnamed: 0,Countires of the World List,Capital City,Population,Continent
0,,,,
1,Countries of Africa,,,
2,Algeria,Algiers,3.966720e+07,Africa
3,Angola,Luanda,2.532676e+07,Africa
4,Benin,Porto-Novo,1.078236e+07,Africa
...,...,...,...,...
220,Tuvalu,Funafuti,1.064700e+04,Oceania
221,Vanuatu,Port-Vila,2.433040e+05,Oceania
222,Totals: 11,,2.291020e+06,
223,,,,To Top ↑


> ### Let's do some cleaning, shall we?. 

In [12]:
# Data cleaning & Transformation


countries = countries.iloc[2:-3]  
countries= countries[['Countires of the World List','Continent']]  # select columns of interest
countries.columns = ['country', 'continent'] # replace columns
countries = countries.dropna()
countries = countries.set_index('country')
countries = countries.drop(index=['Totals: 54','Totals: 48','Totals: 3', 'Totals: 51','Totals: 23','Totals: 12'])
countries = countries.reset_index()
countries['country'].replace(r'\W$', '', regex=True, inplace=True) #Remove '^' & '*' from country name
countries['country'].replace(r' of America$', '', regex=True, inplace=True) # to enable merge
countries['country'].replace(r' \(Burma$', '', regex=True, inplace=True)
countries.drop_duplicates('country', keep='first', inplace=True)

> ### Let's take a look at the table now

In [19]:
countries


Unnamed: 0,country,continent
0,Algeria,Africa
1,Angola,Africa
2,Benin,Africa
3,Botswana,Africa
4,Burkina Faso,Africa
...,...,...
197,Samoa,Oceania
198,Solomon Islands,Oceania
199,Tonga,Oceania
200,Tuvalu,Oceania


In [20]:
assert(len(countries)==195)

There are **195 countries** in the world which is equal to our number of rows so everything seems alright after cleaning.

> ### We need to add each country's continent to the dataset

In [22]:
covid_with_continent = pd.merge(covid_vaccine_latest, countries, on='country', how='outer') 

>### Let's take a look at the updated dataset 

In [23]:
covid_with_continent

Unnamed: 0,country,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,vaccines,source_name,source_website,continent
0,Algeria,DZA,30/01/2021,30.0,,,30.0,30.0,0.00,,,1.0,Sputnik V,Ministry of Health,https://www.aps.dz/regions/116777-blida-covid-...,Africa
1,Argentina,ARG,01/02/2021,375851.0,281577.0,94274.0,,11924.0,0.83,0.62,0.21,264.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...,South America
2,Austria,AUT,01/02/2021,202345.0,186479.0,15866.0,1847.0,8714.0,2.25,2.07,0.18,968.0,Pfizer/BioNTech,Ministry of Health,https://www.data.gv.at/katalog/dataset/589132b...,Europe
3,Bahrain,BHR,31/01/2021,171568.0,171568.0,,1135.0,1841.0,10.08,10.08,,1082.0,"Pfizer/BioNTech, Sinopharm",Ministry of Health,https://twitter.com/MOH_Bahrain/status/1355984...,Asia
4,Belgium,BEL,31/01/2021,296961.0,279195.0,17766.0,368.0,9268.0,2.56,2.41,0.15,800.0,"Moderna, Pfizer/BioNTech",Sciensano,https://datastudio.google.com/embed/u/0/report...,Europe
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
198,Samoa,,,,,,,,,,,,,,,Oceania
199,Solomon Islands,,,,,,,,,,,,,,,Oceania
200,Tonga,,,,,,,,,,,,,,,Oceania
201,Tuvalu,,,,,,,,,,,,,,,Oceania


In [28]:
covid_with_continent = covid_with_continent.dropna(subset=['vaccines']) # remove all countries not in covid_vaccine_latest dataset
covid_with_continent['continent'] = covid_with_continent['continent'].fillna('Europe')  # Fill GB nations with Europe 

In [29]:
covid_with_continent

Unnamed: 0,country,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,vaccines,source_name,source_website,continent
0,Algeria,DZA,30/01/2021,30.0,,,30.0,30.0,0.00,,,1.0,Sputnik V,Ministry of Health,https://www.aps.dz/regions/116777-blida-covid-...,Africa
1,Argentina,ARG,01/02/2021,375851.0,281577.0,94274.0,,11924.0,0.83,0.62,0.21,264.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...,South America
2,Austria,AUT,01/02/2021,202345.0,186479.0,15866.0,1847.0,8714.0,2.25,2.07,0.18,968.0,Pfizer/BioNTech,Ministry of Health,https://www.data.gv.at/katalog/dataset/589132b...,Europe
3,Bahrain,BHR,31/01/2021,171568.0,171568.0,,1135.0,1841.0,10.08,10.08,,1082.0,"Pfizer/BioNTech, Sinopharm",Ministry of Health,https://twitter.com/MOH_Bahrain/status/1355984...,Asia
4,Belgium,BEL,31/01/2021,296961.0,279195.0,17766.0,368.0,9268.0,2.56,2.41,0.15,800.0,"Moderna, Pfizer/BioNTech",Sciensano,https://datastudio.google.com/embed/u/0/report...,Europe
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
62,Turkey,TUR,01/02/2021,2136299.0,,,150062.0,119078.0,2.53,,,1412.0,Sinovac,COVID-19 Vaccine Information Platform,https://covid19asi.saglik.gov.tr/,Asia/Europe
63,United Arab Emirates,ARE,01/02/2021,3440777.0,3190777.0,250000.0,106615.0,124241.0,34.79,32.26,2.53,12562.0,"Pfizer/BioNTech, Sinopharm",National Emergency Crisis and Disaster Managem...,http://covid19.ncema.gov.ae/en,Asia
64,United Kingdom,GBR,31/01/2021,9790576.0,9296367.0,494209.0,322194.0,392361.0,14.42,13.69,0.73,5780.0,"Oxford/AstraZeneca, Pfizer/BioNTech",Government of the United Kingdom,https://coronavirus.data.gov.uk/details/health...,Europe
65,United States,USA,31/01/2021,31123299.0,25201143.0,5657142.0,1545397.0,1324949.0,9.40,7.61,1.71,4003.0,"Moderna, Pfizer/BioNTech",Centers for Disease Control and Prevention,https://covid.cdc.gov/covid-data-tracker/#vacc...,North America


In [39]:
assert(len(covid_vaccine_latest) ==len(covid_with_continent))
assert(covid_with_continent[covid_with_continent['continent'].isna()].empty)

### Has same number of countries and no null values for continent, so we are good to go!

![alt text](split.png "Vaccination Progress")

# Q. What is the distribution of countries who have started vaccination grouped by continent?

![alt text](firstchart.png "Vaccination Progress")

> ### Europe has the most number of countries who have started with a total of `39` countries, followed by Asia who have a total of `12` countries, then North America with a total of `5`, followed by South America, Africa, Asia/Europe and then Oceania.

####  ** Asia/Europe refers to countries who have parts in Asia and Parts in Europe in this case Turkey, Russia & Cyprus.


### Let us look at the percentage of countries who have started compared to the total number of countries in a continent. 

![alt text](second.png "Vaccination Progress")

>### About 89% of countries in Europe have started vaccination. This accounts for nearly twice the proportion of the next continent (North America) 45 %. The least performing continets are Africa and Oceania who have less than 10 % countries starting vaccination as at February 1, 2021.

#### ** This is a percentage of countries in the continent not the percentage of human population. 

![alt text](split.png "Vaccination Progress")

## Africa Under Focus

In [43]:
print('These countries are {}, {} and {}'.format(*list(covid_with_continent[covid_with_continent['continent']== 'Africa']['country'])))

These countries are Algeria, Morocco and Seychelles



>### Algeria, Morocco and Seychelles are the only countries to start vaccination in Africa as February 1,2021.

Seychelles was the first African country to begin the vaccination process. First data recorded by Seychelles was on January 9, 2020. 
Morocco was the second to begin, first data recorded was on January 28, 2021. Lastly, Algeria who started a day after Morocco. 

![alt text](split.png "Vaccination Progress")

 ## What is the most popular vaccine? 

** Based on the number of countries a vaccine is used

In [48]:
world_vaccine = [] #list for all vaccines
vaccines = covid_vaccine_latest['vaccines'].str.split(', ') #some countries use multiply vaccines
for vaccine in vaccines:
    for each in vaccine:
        world_vaccine.append(each)
        
vaccine_uniq = set(world_vaccine) #unique vaccines

for vacc in vaccine_uniq:
    print(vacc, ':', world_vaccine.count(vacc))

CNBG : 1
Moderna : 14
Sinovac : 5
Sinopharm : 5
Pfizer/BioNTech : 54
Covaxin : 1
Oxford/AstraZeneca : 11
Sputnik V : 4


> ### Pfizer/BioNTech ahead of the pack

![alt text](third.png "Vaccination Progress")

>### Pfizer/BioNTech is by far the most popular of the vaccines. A total of 54 countries have used it for vaccination. Followed by Moderna. However, this number of countries using a particular vaccine is not mutually exclusive. As some countries use two or three vaccines. 

>#### CNBG is only used in China. 
>#### Covaxin is only used in India.

 # How about Continent ?

In [51]:
only_vaccine =covid_with_continent[['country', 'continent','vaccines']]
only_vaccine

Unnamed: 0,country,continent,vaccines
0,Algeria,Africa,Sputnik V
1,Argentina,South America,Sputnik V
2,Austria,Europe,Pfizer/BioNTech
3,Bahrain,Asia,"Pfizer/BioNTech, Sinopharm"
4,Belgium,Europe,"Moderna, Pfizer/BioNTech"
...,...,...,...
62,Turkey,Asia/Europe,Sinovac
63,United Arab Emirates,Asia,"Pfizer/BioNTech, Sinopharm"
64,United Kingdom,Europe,"Oxford/AstraZeneca, Pfizer/BioNTech"
65,United States,North America,"Moderna, Pfizer/BioNTech"


In [58]:
only_vaccine['vaccines'] = only_vaccine['vaccines'].str.split(', ')

In [53]:
only_vaccine = (only_vaccine
 .set_index(['country', 'continent'])['vaccines']
 .apply(pd.Series)
 .stack()
 .reset_index()
 .drop('level_2', axis=1)
 .rename(columns={0:'vaccines'}))

only_vaccine

Unnamed: 0,country,continent,vaccines
0,Algeria,Africa,Sputnik V
1,Argentina,South America,Sputnik V
2,Austria,Europe,Pfizer/BioNTech
3,Bahrain,Asia,Pfizer/BioNTech
4,Bahrain,Asia,Sinopharm
...,...,...,...
90,United Kingdom,Europe,Pfizer/BioNTech
91,United States,North America,Moderna
92,United States,North America,Pfizer/BioNTech
93,Wales,Europe,Oxford/AstraZeneca


![alt text](fourth.png "Vaccination Progress")

> #### Asia has used all approved vaccines but one (Sputnik V).
> #### North America have only used Moderna & Pfizer/BioNTech.
> #### Pfizer/BioNTech has never been used in Africa or Oceania. Moderna has not been used in Africa  yet as well.
> #### CNBG & Covaxin have only been used in Asia.



# The Big Question: Moderna or Pfizer/BioNTech?

![alt text](fifth.png "Vaccination Progress")

>### At the moment, Pfizer/BioNTech is the prefered choice for most continents. The exception is only Oceania who have only used Moderna. There is one missing continent, both vaccines have not been used by any African country.  

This may be due to the earlier release of the Pfizer/BioNTech vaccine. However, Further analysis is needed to uncover insights for the high usage of Pfizer/BioNTech so far. 