# Summary



* Immunisation rates are still unknown for many eastern european countries
* Most Scandinavian countries do not yet vaccinate against Hepatitis-B
* Netherlands, Sweden and France made big progress with their Hepatitis-B immunisation
* Polands immunisation is decreasing at alarming rates
* Also Polands neighbouring countries Czech Republic and Slovak Republic have an alarming decrease in immunisation
* The Big-6 countries perform worse in all categories (Measles, DTP and Hepatitis-B)
* the overall best performing country is Portugal with an overall immunisation score of 98%
* Spain is the best performing big-6 country with an overall immunisation score of 96%

# Introduction

The outbreak of the Covid-19 virus has learnt us how a major virus outbreak can disrupt our society. However, The Covid-19 was just one virus of many different viruses that could just as well disrupt our lives. Every day someone is sick at home, he is not able to contribute to society, It can be management on a smaller scale, but it becomes a problem with a big outbreak. It is not only very important that we try to prevent new viruses from occuring, but also prevent the spread of existing virusses. One way to prevent the spread is to lock everything down and stay at home, however it disrupts our society in a major way. One way to prevent the spread, which does not disrupt our daily lives, is to vaccinate against the viruses and bacteria when that is a possibility. 

We live in a time where the whole world is connected. As a result, a pandemic is no longer a local problem, but a global problem. Although immunization remains a national responsibility, it is important to keep an eye not only on our own vaccination scores, but also those of other countries. A virus does not does discriminate or stop at the border, as soon as we stop vaccinating, the virus in question may spread again.

This notebook visualizes how immunization rates of European countries have developed during the period from 2010 to 2017. The data originates from the OECD and consists of the immunisation rates of childhoodvaccinations Measles, DTP and Hepatitis-B.  

The countries Germany, United Kingdom, France, Italy, Spain and Poland are good for most of the European population. In this notebook these countries are refered to as the 'Big-6'.

# **Chapter 1 - Mapping the data**

In [None]:
import numpy as np 
import pandas as pd
import plotly.express as px
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [None]:
# Reading global health care csv
# Quickview of columns names and the rows
dfx = pd.read_csv('/kaggle/input/uncover/UNCOVER/oecd/health-care-utilization.csv')
dfx.head()

# Explaining the most import columns

# 'variable': the variable we are looking at, in this case immunisation op hepatitis-B
# 'measure': explanation of what we are measuring, in this case the percentage of children immunised
# 'value': the actual percentage value of children immunised
# 'country': the country name
# 'cou': the country code
# 'year': year of measurement
# 'flag': important additional information, like a difference in methodology or estimated value
# 'flag_codes': Abbreviation code which represents a specific flag, 'Estimated value' has the flag_code 'E'

In [None]:
# Mapping the full size of the dataframe (rows and columns)
# The dataframe exists out of 158.095 rows and 11 columns
dfx.shape

In [None]:
# Zooming in into all the different types of variables in the dataframe
dfx.variable.unique()

In [None]:
# Locating only the rows with 'Immunisation'as a variable
dfx = dfx.loc[dfx.variable.isin(['Immunisation: Hepatitis B', 
                                   'Immunisation: Influenza',
                                   'Immunisation: Diphtheria, Tetanus, Pertussis',
                                   'Immunisation: Measles'])]

In [None]:
# Mapping the full size of the new dataframe with only 'immunisation' variables (rows and columns)
# The new dataframe has 1329 rows and 11 columns
dfx.shape

In [None]:
# Checking the dataframe for missing entries
# As it seems there are not as many non-null entries as rows for the columns 'flag_codes' and 'flags', but that is to be expected as they provide additonal information
# Further no missing entries
dfx.info()

In [None]:
# Quick view of all the countries with Immunisation data
dfx.country.unique()

In [None]:
# Quick view of what years the Immunisation data represents
dfx.year.unique()

In [None]:
# Mapping how the amount of rows is distributed over years
# 2018 seems to be incomplete in compare to other years
dfx.groupby('year').year.count()

# **Chapter 2 - Cleaning & Adjusting the data**

In [None]:
dfx.head()

In [None]:
# removing year 2018 as the data of this year is incomplete
dfx = dfx[dfx.year != 2018]

In [None]:
# Renaming columns 'cou' to 'country_code' and 'value' to 'immunisation'
dfx = dfx.rename(columns={'cou':'country_code', 'value': 'immunisation'})

In [None]:
# Importing 'continent' column to dataframe by merging with a second dataframe
import plotly.express as px
dfy = px.data.gapminder()
df = pd.merge(dfx, dfy, how = 'left', left_on='country_code', right_on='iso_alpha', suffixes=('_x', '_y'))
df = df[['var', 'variable', 'unit', 'measure', 'country_x', 'country_code', 'continent', 'year_x', 'immunisation', 'flag_codes', 'flags']]
df = df.drop_duplicates()
df = df.rename(columns={'country_x': 'country', 'year_x': 'year'})

In [None]:
# Checking for missing values after as result of mergin
# There seems to be some missing values in the 'continent' column.
df.info()

In [None]:
# Checking for which countries values are missing in the continent column
dfc = df[pd.isnull(df.continent)]
dfc.country.unique()

In [None]:
# Manually adding continents to the missing countries, as well as adjusting Turkeys continent to Asia
df.loc[df['country'] == 'Luxembourg', 'continent'] = 'Europe'
df.loc[df['country'] == 'Estonia', 'continent'] = 'Europe'
df.loc[df['country'] == 'Russia', 'continent'] = 'Asia'
df.loc[df['country'] == 'Latvia', 'continent'] = 'Europe'
df.loc[df['country'] == 'Lithuania', 'continent'] = 'Europe'
df.loc[df['country'] == 'Turkey', 'continent'] = 'Asia'

In [None]:
# Filtering the countries where continent is Europe
df= df.loc[df.continent == 'Europe']

# **Chapter 3 - Analysing the data**

In [None]:
df.head()

In [None]:
# Number of European countries avaiable for analysis
print("No. of Countries available for analysis :", df['country'].nunique())#

In [None]:
# # Number of European countries with specific immunisation data
print("No. of Countries available for analysis (Measles):", df.country[(df.variable == 'Immunisation: Measles')].nunique())
print("No. of Countries available for analysis (Diphtheria, Tetanus, Pertussis):", df.country[(df.variable == 'Immunisation: Diphtheria, Tetanus, Pertussis')].nunique())
print("No. of Countries available for analysis (Hepatitis-B):", df.country[(df.variable == 'Immunisation: Hepatitis B')].nunique())
print("No. of Countries available for analysis (Influenza):", df.country[(df.variable == 'Immunisation: Influenza')].nunique())

In [None]:
# European countries with immunisation data
plt.rcParams['figure.figsize'] = (18, 8)
plt.style.use('fivethirtyeight')

dfq = px.data.gapminder()
fig = px.choropleth(df,
                    locations="country_code", 
                    hover_name="country", 
                    )
fig.show()

In [None]:
# European countries with unknown immunisation data
plt.rcParams['figure.figsize'] = (18, 8)
plt.style.use('fivethirtyeight')

dfz = pd.read_csv('/kaggle/input/country-mapping-iso-continent-region/continents2.csv')
dfz = dfz.loc[dfz.region == 'Europe']
dfz = dfz.rename(columns={'name': 'country', 'alpha-3': 'country_code'}) 
dfz = dfz.loc[dfz.country != 'Russia']
common = dfz.merge(df, on=['country_code'])
dfg = dfz[(~dfz.country_code.isin(common.country_code))]

dfq = px.data.gapminder()
fig = px.choropleth(dfg,
                    locations="country_code",
                    hover_name="country")

fig.show()

In [None]:
# European population
dfd = pd.read_csv('/kaggle/input/population-by-country-2020/population_by_country_2020.csv')
dfd = dfd[['Country (or dependency)', 'Population (2020)']]
dfd = dfd.rename(columns={'Country (or dependency)': 'country', 'Population (2020)': 'population'})
dfd['country'].replace({'Czech Republic (Czechia)': 'Czech Republic', 'Bosnia and Herzegovina': 'Bosnia And Herzegovina', 'North Macedonia': 'Macedonia'}, inplace=True)
dfp = pd.merge(dfz, dfd, how = 'left', on='country', suffixes=('', '_r'))
dfk = dfp.population.sum()

# Population of European countries with immunisation data
dfl = pd.merge(dfp, df , how='inner', on='country_code', suffixes=('', '_r'))
dfe = dfl[['country', 'population']].groupby('country').population.agg('mean')
dfe = dfe.sum()

# Immunisation data represents part of european population
part_of_population = ((dfe / dfk) *100).round(1)
dfe_millions = (dfe / 1000000).round(1)
print("The countries with immunisation data cover", dfe_millions, "million European citizens, which represents", part_of_population, "percent of the European population")


In [None]:
# Europeans biggest countries (with immunisation data) according to population 
plt.style.use('seaborn-dark')
plt.figure(figsize=(25, 9))

dfl = dfl[['country', 'year', 'population']]
dfl = dfl.loc[dfl.groupby('country').year.idxmax()]
xyc = dfl[['country', 'population']].set_index('country')
xyc = xyc.sort_values(by = 'population', ascending = False)
xyc = (xyc / 1000000).round(1)

color = plt.cm.winter(np.linspace(0, 10, 100))
sns.barplot(x=xyc.index, y=xyc['population'], palette = 'winter')
plt.title("European Countries with Immunisation data, sorted by Population", fontsize = 30)
plt.xlabel("Name of Country")
plt.xticks(rotation = 90)
plt.ylabel("Population (in millions)")
plt.show()

# **Chapter 4 - Measles Immunisation Europe**

# **4.1 Measles Immunisation Today**

In [None]:
# Creating a new dataframe with only Measles Immunisation data
europe_mea = df.loc[df.variable == 'Immunisation: Measles']

# Most recent Measles immunisation data per country in Europe
europe_mea_recent = europe_mea.loc[europe_mea.groupby('country').year.idxmax()]

In [None]:
# Top 5 countries with the highest immunisation rate for Measles
europe_mea_recent[['variable', 'measure', 'country', 'year', 'immunisation']].set_index('country').sort_values(by='immunisation', ascending=False).head(5).style.background_gradient(cmap = 'Wistia', subset= 'immunisation')

In [None]:
# Top 5 countries with the lowest immunisation rate for Measles
europe_mea_recent[['variable', 'measure', 'country', 'year', 'immunisation']].set_index('country').sort_values(by='immunisation', ascending=True).head(5).style.background_gradient(cmap = 'Wistia', subset= 'immunisation')

# **4.2 Measles Immunisation Trend**

In [None]:
# Measles immunisation trend per country in Europe
plt.rcParams['figure.figsize'] = (18, 8)
plt.style.use('fivethirtyeight')

dfq = px.data.gapminder()
fig = px.choropleth(europe_mea,
                    locations="country_code", 
                    color="immunisation", 
                    hover_name="country",
                    animation_frame="year", 
                    range_color=[86,100],
                    )
fig.show()


In [None]:
# European Measles Immunisation Trend per year
europe_mea_mean = europe_mea[['year','immunisation']].groupby('year').immunisation.agg(['mean', 'min', 'max']).round(1).rename(columns={'mean': 'european_measles_mean'})
europe_mea_mean

In [None]:
# Measles Immunisation growth per country (difference between start_year and end_year)
europe_mea_year_max = europe_mea.loc[europe_mea.groupby('country').year.idxmax()]
europe_mea_year_min = europe_mea.loc[europe_mea.groupby('country').year.idxmin()]
europe_mea_value_difference = pd.merge(europe_mea_year_min, europe_mea_year_max, how = 'left', on='country_code', suffixes=('', '_r'))
europe_mea_value_difference = europe_mea_value_difference[['var', 'variable', 'unit', 'measure', 'country', 'country_code', 'continent', 'year', 'immunisation', 'flag_codes', 'flags', 'year_r', 'immunisation_r', 'flag_codes_r', 'flags_r']]
europe_mea_value_difference = europe_mea_value_difference.rename(columns={'year': 'start_year', 'immunisation': 'start_immunisation', 'year_r': 'end_year', 'immunisation_r': 'end_immunisation'})
europe_mea_value_difference['immunisation_growth'] = europe_mea_value_difference['end_immunisation'] - europe_mea_value_difference['start_immunisation']
europe_mea_value_difference = europe_mea_value_difference[['variable', 'measure', 'country', 'immunisation_growth']]


In [None]:
# Top 5 European countries with highest Measles Immunisation gain (between 2010 and 2017)
europe_mea_value_difference.set_index('country').sort_values(by='immunisation_growth', ascending=False).head(5).style.background_gradient(cmap = 'Wistia')

In [None]:
# Top 5 European countries with highest Measles Immunisation loss (between 2010 and 2017)
europe_mea_value_difference.set_index('country').sort_values(by='immunisation_growth', ascending=True).head(5).style.background_gradient(cmap = 'Wistia')

# **4.3 Measles immunisation trend for the 6 biggest European countries in population**

In [None]:
# Europes most populated countries (with immunisation data)
europe6_mea = europe_mea.loc[europe_mea.country.isin(['Germany', 'France', 'United Kingdom', 'Italy', 'Spain', 'Poland'])]

# Big-6 Measles Immunisation Trend per year
europe6_mea_mean = europe6_mea[['year','immunisation']].groupby('year').immunisation.agg(['mean', 'min', 'max']).round(1).rename(columns={'mean': 'big-6_measles_mean'})
europe6_mea_mean

In [None]:
# The Big-6 mean in comparison to the European mean
plt.style.use('seaborn-dark')
plt.figure(figsize=(20,8))

europe_compare_mea_mean = pd.merge(europe6_mea_mean, europe_mea_mean, how = 'left', on = 'year')
europe_compare_mea_mean1 = europe_compare_mea_mean[['big-6_measles_mean', 'european_measles_mean']]

sns.lineplot(data=europe_compare_mea_mean1)
plt.title('Big-6 mean VS European mean', fontsize = 20)
plt.xlabel('year')
plt.ylabel('Immunisation (in %)')
plt.show()

In [None]:
# Big-6 Most recent Measles Immunisation data
europe6_mea_recent = europe6_mea.loc[europe6_mea.groupby('country').year.idxmax()]
europe6_mea_recent[['variable', 'measure', 'country', 'immunisation']].set_index('country').sort_values(by='immunisation', ascending=False).head(6).style.background_gradient(cmap = 'Wistia')

In [None]:
# Big-6 Measless Immunisation growth (between 2010 and 2017)
europe6_mea_value_difference = europe_mea_value_difference.loc[europe_mea_value_difference.country.isin(['Germany', 'France', 'United Kingdom', 'Italy', 'Spain', 'Poland'])]
europe6_mea_value_difference = europe6_mea_value_difference[['variable', 'measure', 'country', 'immunisation_growth']]
europe6_mea_value_difference.set_index('country').sort_values(by='immunisation_growth', ascending=False).head(6).style.background_gradient(cmap = 'Wistia')

In [None]:
# Big-6 Measles Immunisation trend per country
plt.style.use('seaborn-dark')
plt.figure(figsize=(20,8))

europe6_mea1= europe6_mea[['country', 'year', 'immunisation']].set_index('year')
europe6_mea2 = europe6_mea1.pivot_table('immunisation', ['year'], 'country')

sns.lineplot(data=europe6_mea2)
plt.title('Big-6 Measles Immunisation Trend', fontsize = 20)
plt.xlabel('year')
plt.ylabel('Immunisation (in %)')
plt.show()

In [None]:
# Stability of the Big-6 Measles Immunisation rates (per country)
europe6_mea_std = europe6_mea.loc[europe6_mea.country.isin(['Germany', 'France', 'United Kingdom', 'Italy', 'Spain', 'Poland'])]
europe6_mea_std.groupby('country').immunisation.agg(['std']).round(1).rename(columns={'std': 'standard_deviation'}).sort_values(by='standard_deviation', ascending=True).head(6).style.background_gradient(cmap = 'Wistia')

# **4.4 Conclusion Measles Immunisation in Europe**

* The european average of Measle immunisation is 95,2% in 2017
* Between 2010 and 2017 Measle immunisation grew with 1,2%


* In 2017 the best countries where Hungary (99%), Luxembourg (99%) and Portugal (98%)
* In 2017 the worst country was France (90%), followed by Italy (92%) and United Kingdom (92%)


* The countries with the biggest growth are Austria with 16% growth and Denmark with 14% growth
* The biggest loser is Poland (-4%), followed by Slovak Republic (-3%) and Netherlands (-3%)


* The big-6 has an average Measle immunisation of 93,7%, lower than the european average of 95,2%
* Germany (97%) and Spain (97%) score above the european average
* Poland (94%), Italy (92%), United Kingdom (92%) and France (90%) all score below the european average


* Between 2010 and 2017 the big-6 Measle Immunisation grew with 0,7%, also lower than the european average
* United Kingdom (+3%) and Spain (+2%) made the biggest growth
* Poland (-4%) is the only big-6 country with a loss in Measle immunisation

# **Chapter 5 - Diphtheria, Tetanus, Pertussis (DTP) Immunisation Europe**

# **5.1 DTP Immunisation Today**

In [None]:
# Creating a new dataframe with only DTP Immunisation data
europe_dtp = df.loc[df.variable == 'Immunisation: Diphtheria, Tetanus, Pertussis']

# Most recent DTP immunisation data per country in Europe
europe_dtp_recent = europe_dtp.loc[europe_dtp.groupby('country').year.idxmax()]

In [None]:
# Top 5 countries with the highest immunisation rate for DTP
europe_dtp_recent[['variable', 'measure', 'country', 'year', 'immunisation']].set_index('country').sort_values(by='immunisation', ascending=False).head(5).style.background_gradient(cmap = 'Wistia', subset= 'immunisation')

In [None]:
# Top 5 countries with the lowest immunisation rate for DTP
europe_dtp_recent[['variable', 'measure', 'country', 'year', 'immunisation']].set_index('country').sort_values(by='immunisation', ascending=True).head(5).style.background_gradient(cmap = 'Wistia', subset= 'immunisation')

# **5.2 DTP Immunisation Trend**

In [None]:
# DTP Immunisation trend per country in Europe
plt.rcParams['figure.figsize'] = (18, 8)
plt.style.use('fivethirtyeight')

dfq = px.data.gapminder()
fig = px.choropleth(europe_dtp,
                    locations="country_code", 
                    color="immunisation", 
                    hover_name="country",
                    animation_frame="year", 
                    range_color=[86,100],
                    )
fig.show()


In [None]:
# European Measles Immunisation Trend per year
europe_dtp_mean = europe_dtp[['year','immunisation']].groupby('year').immunisation.agg(['mean', 'min', 'max']).round(1).rename(columns={'mean': 'european_dtp_mean'})
europe_dtp_mean

In [None]:
# DTP Immunisation growth per country (difference between start_year and end_year)
europe_dtp_year_max = europe_dtp.loc[europe_dtp.groupby('country').year.idxmax()]
europe_dtp_year_min = europe_dtp.loc[europe_dtp.groupby('country').year.idxmin()]
europe_dtp_value_difference = pd.merge(europe_dtp_year_min, europe_dtp_year_max, how = 'left', on='country_code', suffixes=('', '_r'))
europe_dtp_value_difference = europe_dtp_value_difference[['var', 'variable', 'unit', 'measure', 'country', 'country_code', 'continent', 'year', 'immunisation', 'flag_codes', 'flags', 'year_r', 'immunisation_r', 'flag_codes_r', 'flags_r']]
europe_dtp_value_difference = europe_dtp_value_difference.rename(columns={'year': 'start_year', 'immunisation': 'start_immunisation', 'year_r': 'end_year', 'immunisation_r': 'end_immunisation'})
europe_dtp_value_difference['immunisation_growth'] = europe_dtp_value_difference['end_immunisation'] - europe_dtp_value_difference['start_immunisation']
europe_dtp_value_difference = europe_dtp_value_difference[['variable', 'measure', 'country', 'immunisation_growth']]

In [None]:
# Top 5 European countries with highest DTP Immunisation gain (between 2010 and 2017)
europe_dtp_value_difference.set_index('country').sort_values(by='immunisation_growth', ascending=False).head(5).style.background_gradient(cmap = 'Wistia')

In [None]:
# Top 5 European countries with highest DTP Immunisation loss (between 2010 and 2017)
europe_dtp_value_difference.set_index('country').sort_values(by='immunisation_growth', ascending=True).head(10).style.background_gradient(cmap = 'Wistia')

# **5.3 DTP immunisation trend for the 6 biggest European countries in population**

In [None]:
# Europes most populated countries (with immunisation data)
europe6_dtp = europe_dtp.loc[europe_dtp.country.isin(['Germany', 'France', 'United Kingdom', 'Italy', 'Spain', 'Poland'])]

# Big-6 DTP Immunisation Trend per year
europe6_dtp_mean = europe6_dtp[['year','immunisation']].groupby('year').immunisation.agg(['mean', 'min', 'max']).round(1).rename(columns={'mean': 'big-6_dtp_mean'})
europe6_dtp_mean

In [None]:
# The Big-6 mean in comparison to the European mean
plt.style.use('seaborn-dark')
plt.figure(figsize=(20,8))

europe_compare_dtp_mean = pd.merge(europe6_dtp_mean, europe_dtp_mean, on = 'year', how = 'left')
europe_compare_dtp_mean1 = europe_compare_dtp_mean[['big-6_dtp_mean', 'european_dtp_mean']]

sns.lineplot(data=europe_compare_dtp_mean1)
plt.title('Big-6 mean VS European mean', fontsize = 20)
plt.xlabel('year')
plt.ylabel('Immunisation (in %)')
plt.show()

In [None]:
# Big-6 Most recent DTP Immunisation data
europe6_dtp_recent = europe6_dtp.loc[europe6_dtp.groupby('country').year.idxmax()]
europe6_dtp_recent[['variable', 'measure', 'country', 'immunisation']].set_index('country').sort_values(by='immunisation', ascending=False).head(6).style.background_gradient(cmap = 'Wistia')

In [None]:
# Big-6 DTP Immunisation growth (between 2010 and 2017)
europe6_dtp_value_difference = europe_dtp_value_difference.loc[europe_dtp_value_difference.country.isin(['Germany', 'France', 'United Kingdom', 'Italy', 'Spain', 'Poland'])]
europe6_dtp_value_difference = europe6_dtp_value_difference[['variable', 'measure', 'country', 'immunisation_growth']]
europe6_dtp_value_difference.set_index('country').sort_values(by='immunisation_growth', ascending=False).head(6).style.background_gradient(cmap = 'Wistia')

In [None]:
# Big-6 DTP Immunisation trend per country
plt.style.use('seaborn-dark')
plt.figure(figsize=(20,8))

europe6_dtp1= europe6_dtp[['country', 'year', 'immunisation']].set_index('year')
europe6_dtp2 = europe6_dtp1.pivot_table('immunisation', ['year'], 'country')

sns.lineplot(data=europe6_dtp2)
plt.title('Big-6 DTP Immunisation Trend', fontsize = 20)
plt.xlabel('year')
plt.ylabel('Immunisation (in %)')
plt.show()

In [None]:
# Stabilitiy of the Big-6 DTP Immunisation rates (per country)
europe6_dtp_std = europe6_dtp.loc[europe6_dtp.country.isin(['Germany', 'France', 'United Kingdom', 'Italy', 'Spain', 'Poland'])]
europe6_dtp_std.groupby('country').immunisation.agg(['std']).round(1).rename(columns={'std': 'standard_deviation'}).sort_values(by='standard_deviation', ascending=True).head(6).style.background_gradient(cmap = 'Wistia')


# **5.4 Conclusion DTP Immunisation in Europe**

* The european average of DTP immunisation is 95,8% in 2017
* Between 2010 and 2017 DTP immunisation decreased by 0,2%


* In 2017 the best countries where Hungary, Luxembourg, Greece and Finland, all with 99%
* In 2017 the worst countries where Iceland (89%) and Austria (90%)


* The countries with the biggest growth are Denmark (+8%), Latvia (+6%) and Austria (+4%)
* The biggest loser is Poland (-7%), however there are many more with big losses
* Czech Republic, Slovak Republic, France, Poland and Netherlands all decrease by 3%


* The big-6 has an average Measle immunisation of 95,5%, slightly lower than the european average of 95,8%
* Spain (98%), France (96%) and Poland (96%) score above the european average
* Germany (95%), Italy (94%), United Kingdom (94%) score below the european average


* Between 2010 and 2017 the big-6 Measle Immunisation decreased with 1,3%, quite a bit lower than the european average
* Spain (+1%) was the only one with growth
* Poland (-3%), France (-3%), Italy (-2%) and Germany (-1%) all made a loss in DTP immunisation


# **Chapter 6 - Hepatitis-B Immunisation Europe**

# **6.1 Hepatitis-B Immunisation Today**

In [None]:
# Creating a new dataframe with only Hepatitis-B Immunisation data
europe_hep = df.loc[df.variable == 'Immunisation: Hepatitis B']

# Most recent Hepatitis-B immunisation data per country in Europe
europe_hep_recent = europe_hep.loc[europe_hep.groupby('country').year.idxmax()]

In [None]:
# Top 5 countries with the highest immunisation rate for Hepatitis-B
europe_hep_recent[['variable', 'measure', 'country', 'year', 'immunisation']].set_index('country').sort_values(by='immunisation', ascending=False).head(5).style.background_gradient(cmap = 'Wistia', subset= 'immunisation')

In [None]:
# Top 5 countries with the lowest immunisation rate for Hepatitis-B
europe_hep_recent[['variable', 'measure', 'country', 'year', 'immunisation']].set_index('country').sort_values(by='immunisation', ascending=True).head(5).style.background_gradient(cmap = 'Wistia', subset= 'immunisation')

# **6.2 Hepatitis-B Immunisation Trend**

In [None]:
# Hepatitis-B Immunisation trend per country in Europe
plt.rcParams['figure.figsize'] = (18, 8)
plt.style.use('fivethirtyeight')

dfq = px.data.gapminder()
fig = px.choropleth(europe_hep,
                    locations="country_code", 
                    color="immunisation", 
                    hover_name="country",
                    animation_frame="year", 
                    range_color=[86,100],
                    )
fig.show()

In [None]:
# European Hepatitis-B Immunisation Trend per year
europe_hep_mean = europe_hep[['year','immunisation']].groupby('year').immunisation.agg(['mean', 'min', 'max']).round(1).rename(columns={'mean': 'european_hepatitis_mean'})
europe_hep_mean

In [None]:
# Hepatitis-B Immunisation growth per country (difference between start_year and end_year)
europe_hep_year_max = europe_hep.loc[europe_hep.groupby('country').year.idxmax()]
europe_hep_year_min = europe_hep.loc[europe_hep.groupby('country').year.idxmin()]
europe_hep_value_difference = pd.merge(europe_hep_year_min, europe_hep_year_max, how = 'left', on='country_code', suffixes=('', '_r'))
europe_hep_value_difference = europe_hep_value_difference[['var', 'variable', 'unit', 'measure', 'country', 'country_code', 'continent', 'year', 'immunisation', 'flag_codes', 'flags', 'year_r', 'immunisation_r', 'flag_codes_r', 'flags_r']]
europe_hep_value_difference = europe_hep_value_difference.rename(columns={'year': 'start_year', 'immunisation': 'start_immunisation', 'year_r': 'end_year', 'immunisation_r': 'end_immunisation'})
europe_hep_value_difference['immunisation_growth'] = europe_hep_value_difference['end_immunisation'] - europe_hep_value_difference['start_immunisation']
europe_hep_value_difference = europe_hep_value_difference[['variable', 'measure', 'country', 'immunisation_growth']]

In [None]:
# Top 5 European countries with highest Hepatitis-B Immunisation gain (between 2010 and 2017)
europe_hep_value_difference.set_index('country').sort_values(by='immunisation_growth', ascending=False).head(5).style.background_gradient(cmap = 'Wistia')

In [None]:
# Top 5 European countries with highest Hepatitis-B Immunisation loss (between 2010 and 2017)
europe_hep_value_difference.set_index('country').sort_values(by='immunisation_growth', ascending=True).head(5).style.background_gradient(cmap = 'Wistia')

# **6.3 Hepatitis-B immunisation trend for the 6 biggest European countries in population**

In [None]:
# Europes most populated countries (with immunisation data)
europe6_hep = europe_hep.loc[europe_hep.country.isin(['Germany', 'France', 'United Kingdom', 'Italy', 'Spain', 'Poland'])]

# Big-6 Hepatitis-B Immunisation Trend per year
europe6_hep_mean = europe6_hep[['year','immunisation']].groupby('year').immunisation.agg(['mean', 'min', 'max']).round(1).rename(columns={'mean': 'big-6_hepatitis_mean'})
europe6_hep_mean

In [None]:
# The Big-6 mean in comparison to the European mean
plt.style.use('seaborn-dark')
plt.figure(figsize=(20,8))

europe_compare_hep_mean = pd.merge(europe6_hep_mean, europe_hep_mean, on = 'year', how = 'left')
europe_compare_hep_mean1 = europe_compare_hep_mean[['big-6_hepatitis_mean', 'european_hepatitis_mean']]

sns.lineplot(data=europe_compare_hep_mean1)
plt.title('Big-6 mean VS European mean', fontsize = 20)
plt.xlabel('year')
plt.ylabel('Immunisation (in %)')
plt.show()

In [None]:
# Big-6 Most recent Hepatitis-B Immunisation data
europe6_hep_recent = europe6_hep.loc[europe6_hep.groupby('country').year.idxmax()]
europe6_hep_recent[['variable', 'measure', 'country', 'immunisation']].set_index('country').sort_values(by='immunisation', ascending=False).head(6).style.background_gradient(cmap = 'Wistia')

In [None]:
# Big-6 Hepatitis-B Immunisation growth (between 2010 and 2017)
europe6_hep_value_difference = europe_hep_value_difference.loc[europe_hep_value_difference.country.isin(['Germany', 'France', 'United Kingdom', 'Italy', 'Spain', 'Poland'])]
europe6_hep_value_difference = europe6_hep_value_difference[['variable', 'measure', 'country', 'immunisation_growth']]
europe6_hep_value_difference.set_index('country').sort_values(by='immunisation_growth', ascending=False).head(6).style.background_gradient(cmap = 'Wistia')

In [None]:
# Big-6 Hepatitis-b Immunisation trend per country
plt.style.use('seaborn-dark')
plt.figure(figsize=(20,8))

europe6_hep1= europe6_hep[['country', 'year', 'immunisation']].set_index('year')
europe6_hep2 = europe6_hep1.pivot_table('immunisation', ['year'], 'country')

sns.lineplot(data=europe6_hep2)
plt.title('Big-6 Hepatitis-B Immunisation Trend', fontsize = 20)
plt.xlabel('year')
plt.ylabel('Immunisation (in %)')
plt.show()

In [None]:
# Stability of the Big-6 Hepatitis-B Immunisation rates (per country)
europe6_hep_std = europe6_hep.loc[europe6_hep.country.isin(['Germany', 'France', 'United Kingdom', 'Italy', 'Spain', 'Poland'])]
europe6_hep_std.groupby('country').immunisation.agg(['std']).round(1).rename(columns={'std': 'standard_deviation'}).sort_values(by='standard_deviation', ascending=True).head(6).style.background_gradient(cmap = 'Wistia')


# **6.4 Conclusion Hepatitis-B Immunisation in Europe**


* Iceland, Sweden, Denmark, Finland, Norway and United Kingdom do not have any Hepatitis-B data available
* The european average of Hepatitis-B immunisation is 92,5% in 2017
* Between 2010 and 2017 Hepatitis-B immunisation grew with 7,3%


* In 2017 the best countries where Portugal (98%), Latvia (98%) and Belgium (97%)
* In 2017 the worst country was Switserland (17%), followed by Sweden (87%) and Germany (87%)


* The country with the biggest growth is by far Netherlands with a 73% growth
* Other countries with a big growth are Sweden (+49%), France (+26%) and Latvia (+7%)
* The biggest losers are Poland (-5%) and Czech Republic (-5%), followed by Spain (-4%)


* The big-6 has an average Measle immunisation of 91,4%, lower than the european average of 92,5%
* United Kingdom does not have any data about Hepatitis-B
* Italy (94%), Poland (93%) and Spain (93%) score above the european average
* France (90%) and Germany (87%) score below the european average


* Between 2010 and 2017 the big-6 Measle Immunisation grew with 2,4%, also lower than the european average
* France (+26%) was the only big-6 country with growth
* Poland (-5%), Spain (-4%), Germany (-3%) and Italy (-2%) all made a big loss in Hepatitis immunisation

# **Chapter 7 - Overall Immunisation Europe**

In [None]:
# Merging immunisation_mea, immunisation_dtp and immunisation_hep into one single dataframe
result_a = pd.merge(europe_mea, europe_dtp, how='outer', left_on=['country', 'country_code', 'year', 'continent', 'unit', 'measure'], right_on=['country', 'country_code', 'year', 'continent', 'unit', 'measure'], suffixes=('_l', '_r'))
result_b = pd.merge(result_a, europe_hep, how='outer', left_on=['country', 'country_code', 'year', 'continent', 'unit', 'measure'], right_on=['country', 'country_code', 'year', 'continent', 'unit', 'measure'], suffixes=('_l', '_r'))
result_b = result_b[['country', 'country_code', 'continent', 'unit', 'measure', 'year', 'variable_l', 'immunisation_l', 'flag_codes_l', 'variable_r', 'immunisation_r', 'flag_codes_r', 'variable', 'immunisation', 'flag_codes']]
result_b = result_b.rename(columns={'variable_l': 'variable_mea', 'variable_r': 'variable_dtp', 'variable': 'variable_hep', 'immunisation_l': 'immunisation_mea', 'immunisation_r': 'immunisation_dtp', 'immunisation': 'immunisation_hep', 'flag_codes_l': 'flag_codes_mea', 'flag_codes_r': 'flag_codes_dtp', 'flag_codes': 'flag_codes_hep'})
result_b = result_b.sort_values(by=['country', 'year'], ascending=True)
europe_overall = result_b.loc[result_b.continent == 'Europe']

# quickview of the europe_overall dataframe
europe_overall.head()


In [None]:
# Countries available for analysis
print("No. of European Countries available for analysis :", europe_overall.country.nunique())

In [None]:
# Countries available for each individual variable
print("No. of Countries available for analysis (Measles):", europe_overall.country[pd.notnull(europe_overall.immunisation_mea)].nunique())
print("No. of Countries available for analysis (Diphtheria, Tetanus, Pertussis):", europe_overall.country[pd.notnull(europe_overall.immunisation_dtp)].nunique())
print("No. of Countries available for analysis (Hepatitis-B):", europe_overall.country[pd.notnull(europe_overall.immunisation_hep)].nunique())


In [None]:
# Countries with missing Hepatitis-B immunisation data
europe_overall.country[pd.isnull(europe_overall.immunisation_hep)].unique()

In [None]:
# Amount of years with missing Hepatitis-B data per country
europe_overall[pd.isnull(europe_overall.immunisation_hep)].country.value_counts()

In [None]:
# Filling in all missing Hepatitis-B values with 0
values = {'variable_hep': 'Immunisation: Hepatitis B', 'immunisation_hep': 0}
europe_overall = europe_overall.fillna(value=values)

In [None]:
# Replacing the value of Slovenia 2010 with 92
# Slovenia only misses one year of data, using the same value as in 2011 because it represents a more accurate resemblence
europe_overall.loc[184, 'immunisation_hep'] = 92

In [None]:
# Adding a new column 'immunisation_overall' (the mean of all the three child immunisations)
europe_overall['immunisation_overall'] = europe_overall[['immunisation_mea', 'immunisation_dtp', 'immunisation_hep']].mean(axis=1).round(1)

In [None]:
# Checking for missing values
# All seems good
europe_overall.info()

# **7.1 Overall Immunisation Today**

In [None]:
# Quickview of the new dataframe including the new 'immunisation_overall' column
europe_overall.head()

In [None]:
# Most recent immunisation data per country in Europe
europe_overall_recent = europe_overall.loc[europe_overall.groupby('country').year.idxmax()]

In [None]:
# Overall top 10 european countries when it comes to all three in child immunisations
europe_overall_recent[['country', 'measure', 'immunisation_overall']].set_index('country').sort_values(by='immunisation_overall', ascending=False).head(10).style.background_gradient(cmap = 'Wistia')

In [None]:
# Overall worst 10 european countries when it comes to all three child immunisations
europe_overall_recent[['country', 'measure', 'immunisation_overall']].set_index('country').sort_values(by='immunisation_overall', ascending=True).head(10).style.background_gradient(cmap = 'Wistia')

# **7.2 Overall Immunisation Trend**

In [None]:
# Overall Immunisation trend per country in Europe
plt.rcParams['figure.figsize'] = (18, 8)
plt.style.use('fivethirtyeight')

dfq = px.data.gapminder()
fig = px.choropleth(europe_overall,
                    locations="country_code", 
                    color="immunisation_overall", 
                    hover_name="country",
                    animation_frame="year", 
                    range_color=[86,100],
                    )

fig.show()

In [None]:
# European Overall Immunisation Trend per year
europe_overall_mean = europe_overall[['year', 'immunisation_overall']].groupby('year').immunisation_overall.agg(['mean', 'min', 'max']).round(1).rename(columns={'mean': 'european_overall_mean'})
europe_overall_mean

In [None]:
# European Immunisation per year (the contribution of the three childimmunisations)
europe_overall_mean1 = europe_overall[['year', 'immunisation_mea', 'immunisation_dtp', 'immunisation_hep', 'immunisation_overall']].groupby('year').agg('mean').round(1).rename(columns={'immunisation_mea': 'european_mea_mean', 'immunisation_dtp': 'european_dtp_mean', 'immunisation_hep': 'european_hep_mean', 'immunisation_overall': 'european_overall_mean'})
europe_overall_mean1.style.background_gradient(cmap="Wistia")

In [None]:
# Overall Immunisation growth per country (difference between start_year and end_year)
europe_overall_year_max = europe_overall.loc[europe_overall.groupby('country').year.idxmax()]
europe_overall_year_min = europe_overall.loc[europe_overall.groupby('country').year.idxmin()]
europe_overall_value_difference = pd.merge(europe_overall_year_min, europe_overall_year_max, on = 'country_code', how = 'outer', suffixes=('', '_r'))
europe_overall_value_difference = europe_overall_value_difference.rename(columns={'year': 'start_year', 'immunisation_overall': 'start_immunisation', 'year_r': 'end_year', 'immunisation_overall_r': 'end_immunisation'})
europe_overall_value_difference['immunisation_growth'] = europe_overall_value_difference['end_immunisation'] - europe_overall_value_difference['start_immunisation']
europe_overall_value_difference = europe_overall_value_difference[['country', 'measure', 'immunisation_growth']]

In [None]:
# Top 10 European countries with highest Overall Immunisation gain (between 2010 and 2017)
europe_overall_value_difference.set_index('country').sort_values(by='immunisation_growth', ascending=False).head(10).style.background_gradient(cmap = 'Wistia')

In [None]:
# Top 10 European countries with the lowest Overall Immunisation gain (between 2010 and 2017)
europe_overall_value_difference.set_index('country').sort_values(by='immunisation_growth', ascending=True).head(10).style.background_gradient(cmap = 'Wistia')

# **7.3 Overall immunisation trend for the 6 biggest European countries in population**

In [None]:
# Europes 6 most populated countries (with immunisation data)
europe6_overall = europe_overall.loc[europe_overall.country.isin(['Germany', 'France', 'United Kingdom', 'Italy', 'Spain', 'Poland'])]
europe6_overall

In [None]:
# Big-6 Overall Immunisation Trend per year
europe6_overall_mean = europe6_overall[['year','immunisation_overall']].groupby('year').immunisation_overall.agg(['mean', 'min', 'max']).round(1).rename(columns={'mean': 'big-6_overall_mean'})
europe6_overall_mean

In [None]:
# Big-6 Overall Immunisation per year (the contribution of the three childimmunisations)
europe6_overall_mean = europe6_overall[['year', 'immunisation_mea', 'immunisation_dtp', 'immunisation_hep', 'immunisation_overall']].groupby('year').agg('mean').round(1).rename(columns={'immunisation_mea': 'big-6_mea_mean', 'immunisation_dtp': 'big-6_dtp_mean', 'immunisation_hep': 'big-6_hep_mean', 'immunisation_overall': 'big-6_overall_mean'})
europe6_overall_mean.style.background_gradient(cmap="Wistia")

In [None]:
# The Big-6 mean in comparison to the European mean
plt.style.use('seaborn-dark')
plt.figure(figsize=(20,8))

europe_compare_overall_mean = pd.merge(europe6_overall_mean, europe_overall_mean, on = 'year', how = 'left')
europe_compare_overall_mean1 = europe_compare_overall_mean[['big-6_overall_mean', 'european_overall_mean']]

sns.lineplot(data=europe_compare_overall_mean1)
plt.title('Big-6 mean VS European mean', fontsize = 20)
plt.xlabel('year')
plt.ylabel('Immunisation (in %)')
plt.show()

In [None]:
# Big-6 Most recent Overall Immunisation data
europe6_overall_recent = europe6_overall.loc[europe6_overall.groupby('country').year.idxmax()]
europe6_overall_recent[['country', 'measure', 'immunisation_overall']].set_index('country').sort_values(by='immunisation_overall', ascending=False).head(6).style.background_gradient(cmap = 'Wistia')

In [None]:
# Big-6 Overall Immunisation growth (between 2010 and 2017)
europe6_overall_value_difference = europe_overall_value_difference.loc[europe_overall_value_difference.country.isin(['Germany', 'France', 'United Kingdom', 'Italy', 'Spain', 'Poland'])]
europe6_overall_value_difference = europe6_overall_value_difference[['country', 'measure', 'immunisation_growth']]
europe6_overall_value_difference.set_index('country').sort_values(by='immunisation_growth', ascending=False).head(6).style.background_gradient(cmap = 'Wistia')

In [None]:
# Big-6 Overall Immunisation trend per country
plt.style.use('seaborn-dark')
plt.figure(figsize=(20,8))

europe6_overall1= europe6_overall[['country', 'year', 'immunisation_overall']].set_index('year')
europe6_overall2 = europe6_overall1.pivot_table('immunisation_overall', ['year'], 'country')

sns.lineplot(data=europe6_overall2)
plt.title('Big-6 Overall Immunisation Trend', fontsize = 20)
plt.xlabel('year')
plt.ylabel('Immunisation (in %)')
plt.show()

In [None]:
# Stability of the Big-6 Overall Immunisation rates (per country)
europe6_overall_std = europe6_overall.loc[europe6_overall.country.isin(['Germany', 'France', 'United Kingdom', 'Italy', 'Spain', 'Poland'])]
europe6_overall_std.groupby('country').immunisation_overall.agg(['std']).round(1).rename(columns={'std': 'standard_deviation'}).sort_values(by='standard_deviation', ascending=True).head(6).style.background_gradient(cmap = 'Wistia')

# **7.4 Conclusion Overall Immunisation in Europe**


* The european overall immunisation average is 86,2% in 2017
* Between 2010 and 2017 the overall immunisation grew with 2,0%


* In 2017 the overall best country is Portugal (98,0%)
* Latvia, Greece and Luxembourg share the second place with a 97,3% overall immunity
* In 2017 the worst country was iceland (60,3%), followed by United Kingdom (62,0%), Norway (64,0%) and Switserland (64,0%)
* These countries do necessary be the worst countries in immunisation, however they do not seem to vaccinate against Hepatitis-B, what results in a lower overall score


* The countries with the biggest overall growth are Netherlands with 22,3% growth and Sweden with 16% 
* These extreme overall growths are mainly the result of huge growth in Hepatitis-B immunisation in these countries
* The biggest overall immunity loser is Poland (-4,0%), followed by Czech Republic (-3,0%) and Slovak Republic (-3,0%)


* The big-6 has an overall immunisation of 88,4%, suprisingly higher than the european average of 86,2%
* The big-6 has lower scores all three immunisations, however countries which do not vaccinate against Hepatitis-B have lowered the overall score in favor of the big-6
* From the big-6 Spain has the best overall score with a good overall immunity of 96.0%
* From the big-6 United Kingdom becomes last place with (62,0%), mainly as a result of not vaccinating against Hepatitis-B
* The overall immunisation result for the other countries are: Poland (94,3%), Italy (93,3%), Germany (93,0%), France (92,0%)


* Between 2010 and 2017 the big-6 the overall immunisation grew with 0,4%
* France (+8,0%) and the United Kingdom (+1,0%) are the only Big-6 countries with overall growth
* Poland (-4%) has a huge overall loss in immunisation, the biggest of europe
* Spain (-0,3%) made the smallest overall los in immunisation, followed by Germany (-1,0%) and Italy (-1,0%)

# **Sources & Credits**

This is my first notebook at Kaggle and first try with python pandas. I want to give many thanks to Kaggle for their helpfull courses and Roshan Sharma for his inspiring notebook on "WHO Suicide Analysis". I also want to thank Andrada Olteanu and Tanu N Prabhu for the use of their great dataframes. Even though it was a lot of trial and error, I enjoyed making this notebook. As I am very new python pandas, tips for improvements are allways welcome. - Have fun -

Dataframes
* "Country Mapping - ISO, Continent, Region" by Andrada Olteanu
* "Population by Country - 2020" by Tanu N Prabhu
* "health-care-utilization" originating from the OECD

Kaggle sources
* Kaggle courses
   * Python
   * Pandas
   * Data visaualization
* Kaggle notebook "WHO Suicide Analysis" by Roshan Sharma

Other helpfull sources
* pandas - Python Data Analysis Library (pandas.pydata.org)
* Geeksforgeeks.org
* Stackoverflow.com