# Impfraten bei Kindern ccc

Gruppe: Léa Grandchamp, Brigit Marxe, Enea Solca

Fragen

- Wie haben sich die Durchimpfungsraten für zentrale Kinderimpfungen im Beobachtungszeitraum des Datensatzes entwickelt, und in welchen ausgewählten Ländern oder Weltregionen zeigen sich besonders starke Fortschritte bzw. Rückschritte?
- Gibt es systematische Unterschiede in der Höhe der Durchimpfungsraten zwischen verschiedenen Kinderimpfungen, und haben sich diese Unterschiede im Zeitverlauf vergrössert oder verkleinert?
- Lässt sich ein Zusammenhang zwischen den Durchimpfungsraten und der Kindersterblichkeit erstellen?

Quellen:
- https://ourworldindata.org/grapher/global-vaccination-coverage
- https://ourworldindata.org/child-mortality

## 1. Daten beschaffen und laden

In [1]:
# lade deine Bibliotheken -> Pandas, Seaborn, Matplotlib sind die wichtigsten für diese Aufgaben

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

In [2]:
df1 = pd.read_csv("global-vaccination-coverage.csv", skiprows=1, names=["Entity","Code","Year","Share of one-year-olds who have had three doses of the hepatitis B vaccine","Share of one-year-olds vaccinated against Haemophilus influenzae type b","Share of one-year-olds who have had the one dose of the inactivated polio vaccine","Share of one-year-olds who have had one dose of the measles vaccine","Share of one-year-olds who have had the third dose of the pneumococcal conjugate vaccine","Share of one-year-olds who have had three doses of the polio vaccine","Share of one-year-olds vaccinated against rubella","Share of one-year-olds vaccinated against rotavirus","Share of one-year-olds who have had three doses of the diphtheria, tetanus and pertussis vaccine"])
                 
df1.head(10000)

Unnamed: 0,Entity,Code,Year,Share of one-year-olds who have had three doses of the hepatitis B vaccine,Share of one-year-olds vaccinated against Haemophilus influenzae type b,Share of one-year-olds who have had the one dose of the inactivated polio vaccine,Share of one-year-olds who have had one dose of the measles vaccine,Share of one-year-olds who have had the third dose of the pneumococcal conjugate vaccine,Share of one-year-olds who have had three doses of the polio vaccine,Share of one-year-olds vaccinated against rubella,Share of one-year-olds vaccinated against rotavirus,"Share of one-year-olds who have had three doses of the diphtheria, tetanus and pertussis vaccine"
0,Afghanistan,AFG,2007,63.0,,,55.0,,63.0,,,63.0
1,Afghanistan,AFG,2008,64.0,,,59.0,,64.0,,,64.0
2,Afghanistan,AFG,2009,63.0,63.0,,60.0,,63.0,,,63.0
3,Afghanistan,AFG,2010,66.0,66.0,,62.0,,66.0,,,66.0
4,Afghanistan,AFG,2011,68.0,68.0,,64.0,,68.0,,,68.0
...,...,...,...,...,...,...,...,...,...,...,...,...
9198,Zimbabwe,ZWE,1990,,,,87.0,,89.0,,,88.0
9199,Zimbabwe,ZWE,1991,,,,87.0,,88.0,,,87.0
9200,Zimbabwe,ZWE,1992,,,,86.0,,86.0,,,86.0
9201,Zimbabwe,ZWE,1993,,,,86.0,,85.0,,,85.0


In [3]:
df2 = pd.read_csv("child-mortality.csv", skiprows=1, names=["Entity","Code","Year","Child mortality rate"])
df2.head(100000)

Unnamed: 0,Entity,Code,Year,Child mortality rate
0,Afghanistan,AFG,1957,37.13
1,Afghanistan,AFG,1958,36.52
2,Afghanistan,AFG,1959,35.95
3,Afghanistan,AFG,1960,35.32
4,Afghanistan,AFG,1961,34.76
...,...,...,...,...
16830,Zimbabwe,ZWE,2019,5.11
16831,Zimbabwe,ZWE,2020,5.01
16832,Zimbabwe,ZWE,2021,4.76
16833,Zimbabwe,ZWE,2022,4.60


## 2. Daten vorbereiten
- Die Roh-Daten sind bereits im Long-Format
- Für die Analyse werden die Daten pivotiert: Jede Spalte repräsentiert eine Region, die Zeilen sind die verschiedenen Jahre - in absteigender Reihenfolge.

In [4]:
df1_pivot = df1.pivot(index='Entity', columns='Year', values='Share of one-year-olds who have had three doses of the hepatitis B vaccine')
df1_pivot.head(100000)





Year,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
Entity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,64.00000,66.0000,64.00000,68.00000,65.00000,61.000000,55.00000,58.00000,60.00000,59.0
Africa,,,,,,,,,,,...,73.06001,75.6542,75.96257,77.12731,78.16292,75.550385,74.67232,74.55958,75.89457,
Africa (WHO),,,,,,0.0,0.0,0.0,0.0,0.0,...,70.00000,73.0000,74.00000,75.00000,76.00000,74.000000,73.00000,72.00000,75.00000,76.0
Albania,,,,,,,,,,,...,99.00000,99.0000,99.00000,99.00000,99.00000,98.000000,98.00000,97.00000,97.00000,97.0
Algeria,,,,,,,,,,,...,95.00000,91.0000,91.00000,91.00000,88.00000,84.000000,81.00000,77.00000,94.00000,92.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Western Pacific (WHO),,,,,,0.0,0.0,0.0,0.0,1.0,...,92.00000,93.0000,93.00000,91.00000,95.00000,95.000000,92.00000,94.00000,90.00000,91.0
World,,,,,,0.0,0.0,0.0,0.0,0.0,...,83.00000,84.0000,84.00000,84.00000,86.00000,83.000000,81.00000,84.00000,84.00000,84.0
Yemen,,,,,,,,,,,...,63.00000,63.0000,59.00000,54.00000,60.00000,57.000000,56.00000,58.00000,46.00000,42.0
Zambia,,,,,,,,,,,...,90.00000,95.0000,94.00000,90.00000,88.00000,84.000000,91.00000,92.00000,90.00000,91.0


In [5]:
df3_pivot = df1.pivot(index='Entity', columns='Year', values='Share of one-year-olds vaccinated against Haemophilus influenzae type b')
df3_pivot.head(1000000)


Year,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
Entity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,64.00000,66.00000,64.00000,68.00000,65.00000,61.0000,55.00000,58.00000,60.00000,59.0
Africa,,,,,,,,,,,...,73.06435,75.66934,75.96544,77.12595,78.16157,75.5128,74.67649,74.55958,75.85002,
Africa (WHO),,,,,,,,,,,...,70.00000,73.00000,74.00000,75.00000,76.00000,74.0000,73.00000,72.00000,75.00000,76.0
Albania,,,,,,,,,,,...,99.00000,99.00000,99.00000,99.00000,99.00000,98.0000,98.00000,97.00000,97.00000,97.0
Algeria,,,,,,,,,,,...,95.00000,91.00000,91.00000,91.00000,88.00000,84.0000,81.00000,77.00000,92.00000,92.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Western Pacific (WHO),,,,,,,,,,,...,26.00000,25.00000,25.00000,26.00000,27.00000,30.0000,30.00000,34.00000,33.00000,34.0
World,,,,,,,,,,,...,63.00000,70.00000,71.00000,73.00000,74.00000,72.0000,72.00000,77.00000,77.00000,78.0
Yemen,,,,,,,,,,,...,63.00000,63.00000,59.00000,54.00000,60.00000,57.0000,56.00000,58.00000,46.00000,42.0
Zambia,,,,,,,,,,,...,90.00000,95.00000,94.00000,90.00000,88.00000,84.0000,91.00000,92.00000,90.00000,91.0


In [6]:
#df4_pivot = df1.pivot(index='Entity', columns='Year', values='Share of one-year-olds who have had three doses of the hepatitis B vaccine' and 'Share of one-year-olds vaccinated against Haemophilus influenzae type b')
                    
#df4_pivot.head(100)

merge_df = pd.merge(df1, df2, on=['Entity', 'Year'])
rename_df = merge_df.rename(columns={'Share of one-year-olds who have had three doses of the hepatitis B vaccine': 'Hepatitis B Vaccine Coverage','Share of one-year-olds vaccinated against Haemophilus influenzae type b': 'Hib Vaccine Coverage','Share of one-year-olds who have had the one dose of the inactivated polio vaccine': 'Inactivated Polio Vaccine Coverage','Share of one-year-olds who have had one dose of the measles vaccine': 'Measles Vaccine Coverage','Share of one-year-olds who have had the third dose of the pneumococcal conjugate vaccine': 'Pneumococcal Conjugate Vaccine Coverage','Share of one-year-olds who have had three doses of the polio vaccine': 'Polio Vaccine Coverage','Share of one-year-olds vaccinated against rubella': 'Rubella Vaccine Coverage','Share of one-year-olds vaccinated against rotavirus': 'Rotavirus Vaccine Coverage','Share of one-year-olds who have had three doses of the diphtheria, tetanus and pertussis vaccine': 'Di te per Vaccine Coverage','Child mortality rate': 'Child Mortality Rate'})
rename_df





Unnamed: 0,Entity,Code_x,Year,Hepatitis B Vaccine Coverage,Hib Vaccine Coverage,Inactivated Polio Vaccine Coverage,Measles Vaccine Coverage,Pneumococcal Conjugate Vaccine Coverage,Polio Vaccine Coverage,Rubella Vaccine Coverage,Rotavirus Vaccine Coverage,Di te per Vaccine Coverage,Code_y,Child Mortality Rate
0,Afghanistan,AFG,2007,63.0,,,55.0,,63.0,,,63.0,AFG,10.07
1,Afghanistan,AFG,2008,64.0,,,59.0,,64.0,,,64.0,AFG,9.63
2,Afghanistan,AFG,2009,63.0,63.0,,60.0,,63.0,,,63.0,AFG,9.22
3,Afghanistan,AFG,2010,66.0,66.0,,62.0,,66.0,,,66.0,AFG,8.83
4,Afghanistan,AFG,2011,68.0,68.0,,64.0,,68.0,,,68.0,AFG,8.46
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8407,Zimbabwe,ZWE,1990,,,,87.0,,89.0,,,88.0,ZWE,7.97
8408,Zimbabwe,ZWE,1991,,,,87.0,,88.0,,,87.0,ZWE,8.33
8409,Zimbabwe,ZWE,1992,,,,86.0,,86.0,,,86.0,ZWE,8.73
8410,Zimbabwe,ZWE,1993,,,,86.0,,85.0,,,85.0,ZWE,9.16


In [7]:
#df2_pivot = df2.pivot(index='Entity', columns='Year', values='Child mortality rate')
#df2_pivot.head(100000)

In [8]:
df2_pivot = df2.pivot(index='Entity', columns='Year', values='Child mortality rate')
df2_pivot.head(100000)

Year,1751,1752,1753,1754,1755,1756,1757,1758,1759,1760,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
Entity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,7.51,7.24,7.00,6.76,6.54,6.33,6.13,5.93,5.74,5.55
Africa,,,,,,,,,,,...,7.68,7.49,7.31,7.17,6.83,6.66,6.43,6.27,6.22,5.91
Albania,,,,,,,,,,,...,0.99,0.96,0.94,0.93,0.93,0.94,0.94,0.95,0.94,0.94
Algeria,,,,,,,,,,,...,2.54,2.49,2.45,2.40,2.37,2.33,2.29,2.26,2.23,2.20
Andorra,,,,,,,,,,,...,0.37,0.35,0.34,0.32,0.31,0.30,0.29,0.28,0.27,0.26
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Vietnam,,,,,,,,,,,...,2.22,2.20,2.18,2.15,2.14,2.12,2.10,2.07,2.04,2.00
World,,,,,,,,,,,...,4.49,4.37,4.24,4.13,4.02,3.95,3.87,3.82,3.82,3.67
Yemen,,,,,,,,,,,...,4.84,4.90,4.77,4.66,4.70,4.59,4.32,4.24,4.08,3.93
Zambia,,,,,,,,,,,...,6.38,6.15,5.90,5.61,5.52,5.41,5.24,4.92,4.67,4.47


In [9]:
#df5_pivot = rename_df.pivot(index = 'Entity', columns='Year', values='Child Mortality Rate' and 'Hepatitis B Vaccine Coverage' and 'Hib Vaccine Coverage' and 'Inactivated Polio Vaccine Coverage' and 'Polio Vaccine Coverage' and 'Measles Vaccine Coverage' and 'Pneumococcal Conjugate Vaccine Coverage' and 'Rubella Vaccine Coverage' and 'Rotavirus Vaccine Coverage' and 'Di te per Vaccine Coverage')
#df5_pivot.head(100000)


## 3. Statistik der Roh-Daten (Analyse im Long-Format)

Über die einzelnen Spalten kann folgendes ausgesagt werden:

- Die Spalte *Gebiete* enthält 261 verschieden Gebiete
- Die Spalte *Codes* enthält 239 verschiedene Codes
- Es sind Daten zwischen 1543 and 2021 vorhanden. Ab dem Jahr 1950 sind die Daten komplett
- Die Lebenserwartung streut zwischen 12 und 86.5 Jahren

### 3.1 Analyse Länder

### 3.2 Analyse Codes

### 3.3 Analyse Jahr

In [10]:
scipy.stats
import scipy.stats as stats
from scipy.stats import pearsonr
correlation, p_value = pearsonr(rename_df['Hepatitis B Vaccine Coverage'], rename_df['Child Mortality Rate'])
print(f'Pearson correlation coefficient: {correlation}, P-value: {p_value}')
import scipy.stats as stats
from scipy.stats import pearsonr
correlation, p_value = pearsonr(rename_df['Hib Vaccine Coverage'], rename_df['Child Mortality Rate'])
print(f'Pearson correlation coefficient: {correlation}, P-value: {p_value}')
import scipy.stats as stats
from scipy.stats import pearsonr
correlation, p_value = pearsonr(rename_df['Inactivated Polio Vaccine Coverage'], rename_df['Child Mortality Rate'])
print(f'Pearson correlation coefficient: {correlation}, P-value: {p_value}')
scipy.stats.chisquare
import scipy.stats as stats
from scipy.stats import pearsonr
correlation, p_value = pearsonr(rename_df['Measles Vaccine Coverage'], rename_df['Child Mortality Rate'])
print(f'Pearson correlation coefficient: {correlation}, P-value: {p_value}')
import scipy.stats as stats
from scipy.stats import pearsonr
correlation, p_value = pearsonr(rename_df['Pneumococcal Conjugate Vaccine Coverage'], rename_df['Child Mortality Rate'])
scipy.stats.chi2_contingency
print(f'Pearson correlation coefficient: {correlation}, P-value: {p_value}')
import scipy.stats as stats
from scipy.stats import pearsonr
correlation, p_value = pearsonr(rename_df['Polio Vaccine Coverage'], rename_df['Child Mortality Rate'])





NameError: name 'scipy' is not defined

### 3.4 Analyse Durchimpfungsrate

## 4. Analyse
### Vorgehen
### Beobachtungen
### Interpretation