# Gapminder Datasets

## Life Expectancy
The average number of years a newborn child would live if current mortality patterns were to stay the same
[Source](http://gapm.io/ilex)

## CO2 Emissions
Carbon dioxide emissions from the burning of fossil fuels (mteric tonnes of CO2 per person). [Source](https://cdiac.ess-dive.lbl.gov/), [Source](http://gapm.io/ilex)

## Gross domestic income per person
Gross domestic product per person adjusted for differences in purchasing power (in international dollars, fixed 2011 prices, PPP based on 2011 ICP). [Source](http://gapm.io/dgdppc)

## Children per woman
Total fertility rate. The number of children that would be born to each women with prevailing age-specific fertility rates [Source](http://gapm.io/dtfr)

<!-- ## Nuclear power generation
The amount of electricity produced by nuclear power plants in a give year, counted in tons of oil equivalent (toe). [Source](https://data.worldbank.org/indicator/EG.ELC.NUCL.KH) -->

## Number of child deaths
The number of children dying before age 5. [Source](https://www.who.int/healthinfo/global_burden_disease/en/)

## Population total
Total population [Source](http://gapm.io/dpop)

## Long term unemployment rate (Dismissed)
Percentage of total population that has been registered as long-term unemployment during the given year. [Source](https://www.ilo.org/ilostat/)
Dismissed: The description says that the values are in percent. Some countries have long term unemployment rate below 1% (Venezuela, Uruguay) but other, so called developed countries have (Austria, Germany) have above 1%. That seams unprobable.

## Unemployment rates
- Age 15+: Percentage of total polulation, age group above 15, that has been registered as unemployed during the given year [Source](https://www.ilo.org/ilostat/)
- Age 15-24: Percentage of total polulation, age group 15-24, that has been registered as unemployed during the given year [Source](https://www.ilo.org/ilostat/)
- Age 25-54: Percentage of total polulation, age group 25-54, that has been registered as unemployed during the given year [Source](https://www.ilo.org/ilostat/)
- Age 55-64: Percentage of total polulation, age group 55-64, that has been registered as unemployed during the given year [Source](https://www.ilo.org/ilostat/)
- Age 64+: Percentage of total polulation, age group above 65, that has been registered as unemployed during the given year [Source](https://www.ilo.org/ilostat/)

In [120]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

dfGapminder = pd.read_csv(r"data/gapminder_life_expectancy_years.csv",sep=",",decimal=".")
dfLifeExpectancyYears = dfGapminder.iloc[:,[0,-7]]
dfLifeExpectancyYears = dfLifeExpectancyYears.rename(columns={'2012': 'life_expectancy_[years]'})
dfGapminder = pd.read_csv(r"data/gapminder_co2_emissions_tonnes_per_person.csv",sep=",",decimal=".")
dfCo2Emissions = dfGapminder.iloc[:,[0,-3]]
dfCo2Emissions = dfCo2Emissions.rename(columns={'2012': 'co2_emissions_year_[tons]'})
dfGapminder = pd.read_csv(r"data/gapminder_income_per_person_gdppercapita_ppp_inflation_adjusted.csv",sep=",",decimal=".")
dfIncome = dfGapminder.iloc[:,[0,-29]]
dfIncome = dfIncome.rename(columns={'2012': 'income_per_person'})
dfGapminder = pd.read_csv(r"data/gapminder_children_per_woman_total_fertility.csv",sep=",",decimal=".")
dfFertility = dfGapminder.iloc[:,[0,-7]]
dfFertility = dfFertility.rename(columns={'2012': 'fertility_children_per_woman'})
# dfGapminder = pd.read_csv(r"data/gapminder_nuclear_power_generation_total.csv",sep=",",decimal=".")
# dfNuclearPower = dfGapminder.iloc[:,[0,-7]]
# dfNuclearPower
dfGapminder = pd.read_csv(r"data/gapminder_number_of_child_deaths.csv",sep=",",decimal=".")
dfChildDeaths = dfGapminder.iloc[:,[0,-4]]
dfChildDeaths = dfChildDeaths.rename(columns={'2012': 'child_deaths'})
dfGapminder = pd.read_csv(r"data/gapminder_population_total.csv",sep=",",decimal=".")
dfPopulation = dfGapminder.iloc[:,[0,-89]]
dfPopulation = dfPopulation.rename(columns={'2012': 'population_total'})
# dfGapminder = pd.read_csv(r"data/gapminder_long_term_unemployment_rate_percent.csv",sep=",",decimal=".")
# dfLongTermUnemployment = dfGapminder.iloc[:,[0,-6]]
# dfLongTermUnemployment = dfLongTermUnemployment.rename(columns={'2012': 'long_term_unemployment_rate_[percent]'})
# dfGapminder

dfGapminder = pd.read_csv(r"data/gapminder_aged_15plus_unemployment_rate_percent.csv",sep=",",decimal=".")
filled = dfGapminder.iloc[:,1:].fillna(method="ffill", axis=1).fillna(method="bfill", axis=1)
filled['country'] = dfGapminder.iloc[:,0]
dfUnemployment15plus = filled.iloc[:,[-1, -7]]
dfUnemployment15plus = dfUnemployment15plus.rename(columns={'2012': 'unemployment_15plus'})

dfGapminder = pd.read_csv(r"data/gapminder_aged_15_24_unemployment_rate_percent.csv",sep=",",decimal=".")
filled = dfGapminder.iloc[:,1:].fillna(method="ffill", axis=1).fillna(method="bfill", axis=1)
filled['country'] = dfGapminder.iloc[:,0]
dfUnemployment15_24 = filled.iloc[:,[-1, -7]]
dfUnemployment15_24 = dfUnemployment15_24.rename(columns={'2012': 'unemployment_15_24'})

dfGapminder = pd.read_csv(r"data/gapminder_aged_25_54_unemployment_rate_percent.csv",sep=",",decimal=".")
filled = dfGapminder.iloc[:,1:].fillna(method="ffill", axis=1).fillna(method="bfill", axis=1)
filled['country'] = dfGapminder.iloc[:,0]
dfUnemployment25_54 = filled.iloc[:,[-1, -7]]
dfUnemployment25_54 = dfUnemployment25_54.rename(columns={'2012': 'unemployment_25_54'})

dfGapminder = pd.read_csv(r"data/gapminder_aged_55_64_unemployment_rate_percent.csv",sep=",",decimal=".")
filled = dfGapminder.iloc[:,1:].fillna(method="ffill", axis=1).fillna(method="bfill", axis=1)
filled['country'] = dfGapminder.iloc[:,0]
dfUnemployment55_64 = filled.iloc[:,[-1, -7]]
dfUnemployment55_64 = dfUnemployment55_64.rename(columns={'2012': 'unemployment_55_64'})

dfGapminder = pd.read_csv(r"data/gapminder_aged_65plus_unemployment_rate_percent.csv",sep=",",decimal=".")
filled = dfGapminder.iloc[:,1:].fillna(method="ffill", axis=1).fillna(method="bfill", axis=1)
filled['country'] = dfGapminder.iloc[:,0]
dfUnemployment65plus = filled.iloc[:,[-1, -7]]
dfUnemployment65plus = dfUnemployment65plus.rename(columns={'2012': 'unemployment_65plus'})


In [121]:
print('dfLifeExpectancyYears', dfLifeExpectancyYears.shape)
print('dfCo2Emissions', dfCo2Emissions.shape)
print('dfIncome', dfIncome.shape)
print('dfFertility', dfFertility.shape)
print('dfChildDeaths', dfChildDeaths.shape)
print('dfPopulation', dfPopulation.shape)
# print('dfLongTermUnemployment', dfLongTermUnemployment.shape)
print('dfUnemployment15plus', dfUnemployment15plus.shape)
print('dfUnemployment15_24', dfUnemployment15_24.shape)
print('dfUnemployment25_54', dfUnemployment25_54.shape)
print('dfUnemployment55_64', dfUnemployment55_64.shape)
print('dfUnemployment65plus', dfUnemployment65plus.shape)

dfLifeExpectancyYears (187, 2)
dfCo2Emissions (192, 2)
dfIncome (193, 2)
dfFertility (184, 2)
dfChildDeaths (193, 2)
dfPopulation (195, 2)
dfUnemployment15plus (186, 2)
dfUnemployment15_24 (167, 2)
dfUnemployment25_54 (167, 2)
dfUnemployment55_64 (165, 2)
dfUnemployment65plus (151, 2)


In [122]:
dfmerged = dfPopulation.merge(dfChildDeaths, how='left', left_on='country', right_on='country') 
dfmerged = dfmerged.merge(dfIncome, how='left', left_on='country', right_on='country') 
dfmerged = dfmerged.merge(dfCo2Emissions, how='left', left_on='country', right_on='country') 
dfmerged = dfmerged.merge(dfLifeExpectancyYears, how='left', left_on='country', right_on='country') 
dfmerged = dfmerged.merge(dfFertility, how='left', left_on='country', right_on='country') 
dfmerged = dfmerged.merge(dfUnemployment15plus, how='left', left_on='country', right_on='country') 
dfmerged = dfmerged.merge(dfUnemployment15_24, how='left', left_on='country', right_on='country') 
dfmerged = dfmerged.merge(dfUnemployment25_54, how='left', left_on='country', right_on='country') 
dfmerged = dfmerged.merge(dfUnemployment55_64, how='left', left_on='country', right_on='country') 
dfmerged = dfmerged.merge(dfUnemployment65plus, how='left', left_on='country', right_on='country') 
# dfmerged = dfmerged.merge(dfLongTermUnemployment, how='left', left_on='country', right_on='country') 
# display(dfmerged)


In [123]:
pd.set_option('display.max_columns', None)  
pd.set_option('display.max_rows', None)
print(dfmerged.shape)
dfmerged

(195, 12)


Unnamed: 0,country,population_total,child_deaths,income_per_person,co2_emissions_year_[tons],life_expectancy_[years],fertility_children_per_woman,unemployment_15plus,unemployment_15_24,unemployment_25_54,unemployment_55_64,unemployment_65plus
0,Afghanistan,30700000,106000.0,1840.0,0.35,57.2,5.38,1.69,2.93,1.09,1.91,0.958
1,Albania,2920000,567.0,10400.0,1.68,77.0,1.69,13.4,29.3,12.2,7.43,3.21
2,Algeria,37600000,25000.0,13200.0,3.46,76.8,2.94,11.0,27.5,8.09,2.03,
3,Andorra,82400,2.0,41900.0,5.92,82.6,,,,,,
4,Angola,25100000,175000.0,6000.0,1.33,61.7,6.0,16.8,34.8,8.71,5.75,3.15
5,Antigua and Barbuda,96800,13.0,19100.0,5.42,77.0,2.1,6.0,,,,
6,Argentina,42100000,10200.0,19200.0,4.57,76.1,2.35,7.22,18.3,5.57,4.17,2.86
7,Armenia,2880000,723.0,7510.0,1.98,74.3,1.73,17.5,37.0,16.2,11.8,6.81
8,Australia,22800000,1320.0,42600.0,17.0,82.3,1.9,5.22,11.7,4.05,3.45,1.62
9,Austria,8520000,315.0,44400.0,7.31,80.9,1.46,4.87,9.4,4.33,3.44,0.401


In [124]:
dfmerged.shape
dfmerged.dropna().shape

(195, 12)

In [127]:
dfGapminder = dfmerged.dropna().shape