# Project: Wealth and Health - Are economic powerful countries healthier?

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

Does the economic wealth of a country increase the health of its population? As we know, the economic strength and the income per capita is not evenly distributed over the world. Regions in first world countries hosting globally operating enterprises are capable to provide higher economic growth and, thus, a higher gross domestic product (GDP) as in third world problems. Additionally, health crises in less developed states are reported due to the lack of medical treatment and sanitation infrastructure. However, there are also risk factors in industry nations causing health problems and diseases.
This project will focus on the impact of economic power indicated by the GDP and the income on health related issues. Therefor, several datasets given by [Gapminder](https://www.gapminder.org/data/) are explored and analyzed to answer fundamental questions within the scope of economy and health. The main questions are as follows:
- What is the influence of economic power on the health of the population?
    - Does a higher income per capita increase the life expectancy in different countries in the world?
    - What is the influence of wealth and child mortality?
    - Does a higher GDP result in a higher spending on medical treatment and a more developed sanitation infrastructure?
- Which diseases are more common in first and third world countries?
    - Does a higher living standard as a result of wealth decrease the occurrence of diseases in general?
    - Are there any exceptions? 
    - Which risk factors occur more frequently in different regions of the world? Are there any dependencies on income or GDP?

In general, it is hypothesised that economic power has an essential positive affect on the health of the people resulting in a higher life expectancy and less occurrences of diseases.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline

def CreateDataFrameFromExcel(path,sheet,col_name_replace,regions):
    df = pd.read_excel(path,sheet_name=sheet,header=0,)
    print('Original Shape: ', df.shape)
    df.rename(columns={col_name_replace:'country'},inplace=True)
    df.set_index('country',inplace=True)
    n = df.size
    delete_rows = []
    for i, row in df.iterrows():
        if i not in regions:
            delete_rows.append(i)
    df.drop(delete_rows, inplace=True)
    print('New shape: ', df.shape)
    return df

<a id='wrangling'></a>
## Data Wrangling

> **Tip**: In this section of the report, you will load in the data, check for cleanliness, and then trim and clean your dataset for analysis. Make sure that you document your steps carefully and justify your cleaning decisions.

[Gapminder](https://www.gapminder.org/data/) provides data regarding indicators in the field of economy, population, infrastructure and many more for different countries in the world. The data is given in .xlsx-files. Each Excel file contains information on one indicator over time (yearly) for every country. Additionally, the Excel-file holds further information on the data in further sheets. Thus, we have to apply the pandas' *read_excel()*-function instead of the *read_csv()*-function to import the data.

To analyze various indicators of different research areas, we have to combine the data imported from Excel files into different dataframes. Therefor, we have to be aware of the right year the data is given, since each indicator has different up-to-dateness.To handle this problem, we consider the independent variables *income* and *GDP* for different years and countries in each dataframe. The health related dependent variables are only considered for the most recent year the data was recorded and are stored in one big dataframe containing indicators (columns) and countries (rows). Additionally, the time information of these indicators will also be provided by this dataframe in a further row. Another dataframe is created to map each country to its specific region.

To easily load the filtered and wrangled data of the indicators and countries, each dataframe is saved as a .csv-file.

### General Properties

In [6]:
# Load geographical data to assign the given countries to the region they belong to. 
# We regard four geographical regions: europe, asia, americas and africa.

regions = pd.read_excel('excel/Data Geographies - v1 - by Gapminder.xlsx',sheet_name='List of countries',header=0,index_col='name')
regions = regions['four_regions']
# Check weather the dataframe contains any NaN or NULL values and save as .csv-file
print('Number of NaN values: ', str(regions.isnull().sum()))
print('Shape: ', regions.shape)
regions.to_csv('regions.csv')
regions.head()

Number of NaN values:  0
Shape:  (197,)


name
Afghanistan      asia
Albania        europe
Algeria        africa
Andorra        europe
Angola         africa
Name: four_regions, dtype: object

In [7]:
# Load independent variable total GPD of each country over time
total_GDP = CreateDataFrameFromExcel('excel/indicator GDP at market prices, constant 2000 US$.xlsx','Data','GDP (constant 2000 US$)',regions)
total_GDP.head()

Original Shape:  (270, 53)
New shape:  (196, 52)


Unnamed: 0_level_0,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,,,,,,,,,,
Albania,,,,,,,,,,,...,4059112000.0,4290481000.0,4543619000.0,4793518000.0,5033194000.0,5330153000.0,5740575000.0,5930013000.0,6137564000.0,6321691000.0
Algeria,13828150000.0,11946770000.0,9595044000.0,12887460000.0,13640010000.0,14486640000.0,13790560000.0,15094170000.0,16723780000.0,18134140000.0,...,58856690000.0,62917800000.0,66189520000.0,69565190000.0,70956490000.0,73085190000.0,74839230000.0,76635370000.0,79164340000.0,81143450000.0
Andorra,,,,,,,,,,,...,1341509000.0,1432120000.0,1524990000.0,1615237000.0,1724911000.0,1749544000.0,1812015000.0,,,
Angola,,,,,,,,,,,...,10780450000.0,11137100000.0,12382540000.0,14643780000.0,17680170000.0,21674670000.0,24669480000.0,25264730000.0,26125660000.0,27013940000.0


In [8]:
# There are a lot of NaN values within the dataframe total_GDP
# We want to eliminate the NaNs by replacing them with an linear interpolation of the provided data elements.
# Rows which do not have any data will be deleted.
# Save final Dataframe as a csv-file
total_GDP.interpolate(method='linear',axis=0,inplace=True)
total_GDP.dropna(axis=0,inplace=True)
print('New shape: ', total_GDP.shape)
total_GDP.to_csv('total_GDP.csv')
print('Number of NaN: ', total_GDP.isnull().sum().sum())

New shape:  (194, 52)
Number of NaN:  0


In [9]:
# Load independent variable income of each country over time
income = CreateDataFrameFromExcel('excel/indicator gapminder gdp_per_capita_ppp.xlsx','Data','GDP per capita',regions)
income.head()

Original Shape:  (262, 217)
New shape:  (196, 216)


Unnamed: 0_level_0,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,603.0,603.0,603.0,603.0,603.0,603.0,603.0,603.0,603.0,603.0,...,1173.0,1298.0,1311.0,1548.0,1637.0,1695.0,1893.0,1884.0,1877.0,1925.0
Albania,667.0,667.0,668.0,668.0,668.0,668.0,668.0,668.0,668.0,668.0,...,7476.0,7977.0,8644.0,8994.0,9374.0,9640.0,9811.0,9961.0,10160.0,10620.0
Algeria,716.0,716.0,717.0,718.0,719.0,720.0,721.0,722.0,723.0,724.0,...,12088.0,12289.0,12314.0,12285.0,12494.0,12606.0,12779.0,12893.0,13179.0,13434.0
Andorra,1197.0,1199.0,1201.0,1204.0,1206.0,1208.0,1210.0,1212.0,1215.0,1217.0,...,42738.0,43442.0,41426.0,41735.0,38982.0,41958.0,41926.0,43735.0,44929.0,46577.0
Angola,618.0,620.0,623.0,626.0,628.0,631.0,634.0,637.0,640.0,642.0,...,5445.0,6453.0,7103.0,7039.0,7047.0,7094.0,7230.0,7488.0,7546.0,7615.0


In [10]:
# There are some NaN values within the dataframe income
# We want to eliminate the NaNs by replacing them with an linear interpolation of the provided data elements.
# Save final Dataframe as a csv-file
income.interpolate(method='linear',axis=0,inplace=True)
income.to_csv('income.csv')
print('Number of NaN: ', income.isnull().sum().sum())

Number of NaN:  0


In [11]:
# In the next step all indicators and dependend are imported as a dataframe.
# We only want to consider the most recent values (or maximum 2011, since this is the recent data given for total_GDP) 
# for each indicator and store them into one dataframe together with the other indicators.
# The year the data was recorded will also be saved within this dataframe.
# 1. Life expectancy
life_expectancy = CreateDataFrameFromExcel('excel/indicator life_expectancy_at_birth.xlsx','Data','Life expectancy',regions)
life_expectancy.head()

Original Shape:  (260, 218)
New shape:  (196, 217)


Unnamed: 0_level_0,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,28.21,28.2,28.19,28.18,28.17,28.16,28.15,28.14,28.13,28.12,...,52.4,52.8,53.3,53.6,54.0,54.4,54.8,54.9,53.8,52.72
Albania,35.4,35.4,35.4,35.4,35.4,35.4,35.4,35.4,35.4,35.4,...,76.6,76.8,77.0,77.2,77.4,77.5,77.7,77.9,78.0,78.1
Algeria,28.82,28.82,28.82,28.82,28.82,28.82,28.82,28.82,28.82,28.82,...,75.3,75.5,75.7,76.0,76.1,76.2,76.3,76.3,76.4,76.5
Andorra,,,,,,,,,,,...,84.5,84.6,84.6,84.7,84.7,84.7,84.8,84.8,84.8,84.8
Angola,26.98,26.98,26.98,26.98,26.98,26.98,26.98,26.98,26.98,26.98,...,56.2,56.7,57.1,57.6,58.1,58.5,58.8,59.2,59.6,60.0


In [12]:
# 2. Child mortality
child_mortality = CreateDataFrameFromExcel('excel/indicator gapminder under5mortality.xlsx','Data','Under five mortality',regions)
child_mortality.head()

Original Shape:  (275, 217)
New shape:  (196, 216)


Unnamed: 0_level_0,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,468.58,468.58,468.58,468.58,468.58,468.58,469.98,469.98,469.98,469.98,...,116.3,113.2,110.4,107.6,105.0,102.3,99.5,96.7,93.9,91.1
Albania,375.2,375.2,375.2,375.2,375.2,375.2,375.2,375.2,375.2,375.2,...,19.5,18.7,17.9,17.3,16.6,16.0,15.5,14.9,14.4,14.0
Algeria,460.21,460.21,460.21,460.21,460.21,460.21,460.21,460.21,460.21,460.21,...,32.1,30.7,29.5,28.4,27.4,26.6,25.8,25.2,24.6,24.0
Andorra,,,,,,,,,,,...,3.7,3.6,3.5,3.4,3.3,3.2,3.1,3.0,2.9,2.8
Angola,485.68,485.68,485.68,485.68,485.68,485.68,485.68,485.68,485.68,485.68,...,200.5,196.4,192.0,187.3,182.5,177.3,172.2,167.1,162.2,156.9


In [13]:
# 3. Health Spending
health_spending = CreateDataFrameFromExcel('excel/indicator health spending per person (US $).xlsx','Data','Per capita total expenditure on health at average exchange rate (US$)',regions)
health_spending.head()

Original Shape:  (262, 17)
New shape:  (196, 16)


Unnamed: 0_level_0,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Afghanistan,,,,,,,,14.818293,18.312764,20.665594,21.859666,23.820132,28.808767,31.809727,33.710308,37.666786
Albania,27.910805,43.045818,36.135184,47.102142,65.024024,75.236623,79.862222,90.264318,113.005324,160.909881,177.633315,191.779729,232.180439,275.14252,259.582585,240.824785
Algeria,62.055538,61.769883,66.893742,65.983195,62.52147,62.607389,67.814013,69.924657,79.623436,88.985323,96.149135,109.845806,140.850971,185.848234,180.544271,178.245066
Andorra,1392.178253,1506.72015,1460.073541,1857.622689,1424.631803,1330.411302,1293.948961,1485.790128,1891.384128,2193.360689,2355.66024,2631.377069,3011.773219,3391.470029,3364.327079,3099.413225
Angola,15.568388,11.344381,13.516758,9.101512,8.825158,15.792295,21.425847,18.149774,23.889953,25.861014,36.409794,64.123932,85.293081,148.878817,201.257016,123.201096


In [14]:
# 4. Sanitation
sanitation = CreateDataFrameFromExcel('excel/Indicator_Improved sanitation total percent.xlsx','Data','Proportion of the population using improved sanitation facilities, total',regions)
sanitation.head()

Original Shape:  (275, 22)
New shape:  (196, 21)


Unnamed: 0_level_0,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,...,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,29.0,29.0,29.0,29.0,29.0,29.0,30.0,30.0,31.0,...,32.0,33.0,34.0,34.0,35.0,35.0,37.0,37.0,37.0,37.0
Albania,76.0,76.0,77.0,77.0,77.0,78.0,79.0,80.0,81.0,82.0,...,85.0,86.0,87.0,88.0,90.0,91.0,91.0,93.0,94.0,94.0
Algeria,88.0,89.0,89.0,89.0,90.0,90.0,90.0,91.0,91.0,92.0,...,92.0,93.0,93.0,93.0,94.0,94.0,94.0,95.0,95.0,95.0
Andorra,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,...,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
Angola,29.0,29.0,30.0,31.0,33.0,34.0,36.0,38.0,39.0,41.0,...,44.0,46.0,47.0,49.0,51.0,52.0,54.0,55.0,57.0,58.0


In [15]:
# 5. Breast Cancer Deaths, female
breast_cancer_female = CreateDataFrameFromExcel('excel/indicator breast female mortality.xlsx','Data','Breast Female Mortality',regions)
breast_cancer_female.head()

Original Shape:  (176, 54)
New shape:  (163, 53)


Unnamed: 0_level_0,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,...,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,,,,,,,,,,11.7
Albania,,,,,,,,,,,...,4.75,4.77,5.95,6.55,5.71,6.41,8.92,7.87,6.68,6.54
Algeria,,,,,,,,,,,...,,,,,,,,,,16.7
Angola,,,,,,,,,,,...,,,,,,,,,,17.1
Argentina,,,,,,,,,,,...,21.23,20.9,21.12,21.84,21.0,20.71,20.21,20.25,20.2,19.42


In [16]:
# 6. Cervical Cancer Deaths, female
cervical_cancer_female = CreateDataFrameFromExcel('excel/indicator cervix female mortality.xlsx','Data','Cervix Female Mortality',regions)
cervical_cancer_female.head()

Original Shape:  (176, 54)
New shape:  (163, 53)


Unnamed: 0_level_0,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,...,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,,,,,,,,,,3.6
Albania,,,,,,,,,,,...,0.33,0.65,0.5,1.31,1.17,0.69,0.51,1.26,0.79,1.19
Algeria,,,,,,,,,,,...,,,,,,,,,,12.7
Angola,,,,,,,,,,,...,,,,,,,,,,23.2
Argentina,,,,,,,,,,,...,4.68,4.23,4.39,4.72,4.79,4.47,4.51,4.32,4.48,4.45


In [17]:
# 7. Colon and Rectum Cancer Deaths, female
colon_cancer_female = CreateDataFrameFromExcel('excel/indicator colon and rectum female mortality.xlsx','Data','Colon & Rectum Female Mortality',regions)
colon_cancer_female.head()

Original Shape:  (176, 49)
New shape:  (163, 48)


Unnamed: 0_level_0,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,...,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,,,,,,,,,,2.8
Albania,,,,,,,,,,,...,1.95,1.29,1.97,1.71,2.32,3.03,2.49,2.23,2.33,2.84
Algeria,,,,,,,,,,,...,,,,,,,,,,4.6
Angola,,,,,,,,,,,...,,,,,,,,,,2.8
Argentina,,,,,,,,,,,...,9.01,9.49,9.0,9.29,8.8,8.97,9.06,9.02,8.85,8.8


In [18]:
# 8. Liver Cancer Deaths, female
liver_cancer_female = CreateDataFrameFromExcel('excel/indicator liver female mortality.xlsx','Data','Liver Female Mortality',regions)
liver_cancer_female.head()

Original Shape:  (176, 49)
New shape:  (163, 48)


Unnamed: 0_level_0,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,...,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,,,,,,,,,,2.3
Albania,,,,,,,,,,,...,,,,,,,,,,4.0
Algeria,,,,,,,,,,,...,,,,,,,,,,1.0
Angola,,,,,,,,,,,...,,,,,,,,,,3.6
Argentina,,,,,,,,,,,...,3.34,3.31,3.16,3.01,3.28,2.97,2.88,3.09,2.72,2.84


In [19]:
# 9. Lung Cancer Deaths, female
lung_cancer_female = CreateDataFrameFromExcel('excel/indicator lung female mortality.xlsx','Data','Lung Female Mortality',regions)
lung_cancer_female.head()

Original Shape:  (176, 54)
New shape:  (163, 53)


Unnamed: 0_level_0,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,...,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,,,,,,,,,,2.7
Albania,,,,,,,,,,,...,4.79,5.09,5.79,7.8,7.62,7.84,7.38,6.4,5.88,8.1
Algeria,,,,,,,,,,,...,,,,,,,,,,2.0
Angola,,,,,,,,,,,...,,,,,,,,,,1.3
Argentina,,,,,,,,,,,...,6.38,6.42,6.29,6.46,6.5,7.09,7.05,6.73,6.95,7.08


In [20]:
# 10. Stomach Cancer Deaths, female
stomach_cancer_female = CreateDataFrameFromExcel('excel/indicator stomach female mortality.xlsx','Data','Stomach Female Mortality',regions)
stomach_cancer_female.head()

Original Shape:  (176, 54)
New shape:  (163, 53)


Unnamed: 0_level_0,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,...,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,,,,,,,,,,8.3
Albania,,,,,,,,,,,...,6.07,4.01,5.69,8.02,6.41,7.35,7.62,7.24,6.82,6.65
Algeria,,,,,,,,,,,...,,,,,,,,,,3.0
Angola,,,,,,,,,,,...,,,,,,,,,,9.1
Argentina,,,,,,,,,,,...,4.32,4.45,4.17,4.27,4.16,3.98,3.95,3.65,3.63,3.45


In [21]:
# 11. Colon and Rectum Cancer Deaths, male
colon_cancer_male = CreateDataFrameFromExcel('excel/indicator colon and rectum male mortality.xlsx','Data','Colon & Rectum Male Mortality',regions)
colon_cancer_male.head()

Original Shape:  (176, 49)
New shape:  (163, 48)


Unnamed: 0_level_0,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,...,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,,,,,,,,,,3.3
Albania,,,,,,,,,,,...,2.63,2.42,3.81,3.66,2.54,2.88,3.27,2.6,2.88,3.27
Algeria,,,,,,,,,,,...,,,,,,,,,,5.1
Angola,,,,,,,,,,,...,,,,,,,,,,4.0
Argentina,,,,,,,,,,,...,13.54,13.96,14.0,14.27,14.49,14.21,14.62,14.7,14.86,13.76


In [22]:
# 12. Liver Cancer Deaths, male
liver_cancer_male = CreateDataFrameFromExcel('excel/indicator liver male mortality.xlsx','Data','Liver Male Mortality',regions)
liver_cancer_male.head()

Original Shape:  (176, 49)
New shape:  (163, 48)


Unnamed: 0_level_0,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,...,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,,,,,,,,,,3.5
Albania,,,,,,,,,,,...,,,,,,,,,,6.6
Algeria,,,,,,,,,,,...,,,,,,,,,,0.8
Angola,,,,,,,,,,,...,,,,,,,,,,5.2
Argentina,,,,,,,,,,,...,5.25,4.91,4.78,4.99,5.32,4.85,4.92,4.9,4.76,4.58


In [23]:
# 13. Lung Cancer Deaths, male
lung_cancer_male = CreateDataFrameFromExcel('excel/indicator lung male mortality.xlsx','Data','Lung Male Mortality',regions)
lung_cancer_male.head()

Original Shape:  (176, 54)
New shape:  (163, 53)


Unnamed: 0_level_0,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,...,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,,,,,,,,,,11.3
Albania,,,,,,,,,,,...,25.43,21.83,33.68,35.57,35.98,34.84,34.67,31.48,27.91,29.44
Algeria,,,,,,,,,,,...,,,,,,,,,,16.4
Angola,,,,,,,,,,,...,,,,,,,,,,7.0
Argentina,,,,,,,,,,,...,37.77,37.88,36.8,36.97,36.0,35.99,36.12,34.74,32.92,31.87


In [24]:
# 14. Prostate Cancer Deaths, male
prostate_cancer_male = CreateDataFrameFromExcel('excel/indicator prostate male mortality.xlsx','Data','Prostate Male Mortality',regions)
prostate_cancer_male.head()

Original Shape:  (176, 54)
New shape:  (163, 53)


Unnamed: 0_level_0,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,...,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,,,,,,,,,,2.8
Albania,,,,,,,,,,,...,6.86,7.12,5.81,8.21,6.73,7.91,10.07,10.17,7.18,8.39
Algeria,,,,,,,,,,,...,,,,,,,,,,4.8
Angola,,,,,,,,,,,...,,,,,,,,,,11.1
Argentina,,,,,,,,,,,...,14.23,14.47,15.49,16.33,16.28,17.33,16.49,16.23,15.91,15.61


In [25]:
# 15. Stomach Cancer Deaths, male
stomach_cancer_male = CreateDataFrameFromExcel('excel/indicator stomach male mortality.xlsx','Data','Stomach Male Mortality',regions)
stomach_cancer_male.head()

Original Shape:  (176, 54)
New shape:  (163, 53)


Unnamed: 0_level_0,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,...,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,,,,,,,,,,15.8
Albania,,,,,,,,,,,...,11.12,10.32,14.79,16.13,15.48,15.99,15.76,15.84,12.08,13.86
Algeria,,,,,,,,,,,...,,,,,,,,,,5.6
Angola,,,,,,,,,,,...,,,,,,,,,,13.3
Argentina,,,,,,,,,,,...,10.66,10.76,10.81,10.86,10.2,10.03,10.2,9.85,9.3,9.15


In [26]:
# 16. Alcohol Cosumption
alc_consumption = CreateDataFrameFromExcel('excel/indicator alcohol consumption  20100830.xlsx','Data','Alcohol Consumption',regions)
alc_consumption.head()

Original Shape:  (189, 25)
New shape:  (173, 24)


Unnamed: 0_level_0,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,...,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,,,,,,,0.02,,,0.03
Albania,,,,,,,,,,,...,,,,,,,6.68,,,7.29
Algeria,,,,,,,,,,,...,,,,,,,0.96,,,0.69
Andorra,,,,,,,,,,,...,,,,,,,15.48,,,10.17
Angola,,,,,,,,,,,...,,,,,,,5.4,,,5.57


In [27]:
# 17. BMI, female
BMI_female = CreateDataFrameFromExcel('excel/Indicator_BMI female ASM.xlsx','Data','Country',regions)
BMI_female.head()

Original Shape:  (199, 30)
New shape:  (179, 29)


Unnamed: 0_level_0,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,...,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,20.44348,20.47765,20.52292,20.56493,20.60867,20.64796,20.68983,20.70902,20.71512,20.71421,...,20.61717,20.6185,20.61353,20.65274,20.70828,20.76927,20.83858,20.91021,20.9906,21.07402
Albania,25.17427,25.19088,25.20032,25.21906,25.22359,25.21257,25.20939,25.18918,25.16965,25.1537,...,25.06254,25.12797,25.20332,25.27082,25.33198,25.39804,25.46525,25.53328,25.59394,25.65726
Algeria,23.67764,23.80702,23.92626,24.03604,24.1363,24.24213,24.33652,24.42523,24.50904,24.59436,...,25.40139,25.49389,25.59477,25.69948,25.81168,25.93081,26.03886,26.15054,26.26096,26.36841
Andorra,25.67324,25.69018,25.69922,25.70089,25.70584,25.70877,25.71239,25.72437,25.74523,25.77649,...,26.07432,26.10622,26.14707,26.19542,26.23892,26.28851,26.32247,26.36846,26.40095,26.43196
Angola,20.06763,20.12766,20.19464,20.26439,20.3411,20.42624,20.51389,20.60929,20.70945,20.80873,...,21.76054,21.91293,22.07646,22.26093,22.44571,22.63536,22.83412,23.04406,23.2633,23.48431


In [28]:
# 18. BMI, male
BMI_male = CreateDataFrameFromExcel('excel/Indicator_BMI male ASM.xlsx','Data','Country',regions)
BMI_male.head()

Original Shape:  (199, 30)
New shape:  (179, 29)


Unnamed: 0_level_0,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,...,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,21.48678,21.46552,21.45145,21.43822,21.42734,21.41222,21.40132,21.37679,21.34018,21.29845,...,20.75469,20.69521,20.62643,20.59848,20.58706,20.57759,20.58084,20.58749,20.60246,20.62058
Albania,25.22533,25.23981,25.25636,25.27176,25.27901,25.28669,25.29451,25.30217,25.3045,25.31944,...,25.46555,25.55835,25.66701,25.77167,25.87274,25.98136,26.08939,26.20867,26.32753,26.44657
Algeria,22.25703,22.34745,22.43647,22.52105,22.60633,22.69501,22.76979,22.84096,22.90644,22.97931,...,23.69486,23.77659,23.86256,23.95294,24.05243,24.15957,24.27001,24.3827,24.48846,24.5962
Andorra,25.66652,25.70868,25.74681,25.7825,25.81874,25.85236,25.89089,25.93414,25.98477,26.0445,...,26.75078,26.83179,26.92373,27.02525,27.12481,27.23107,27.32827,27.43588,27.53363,27.63048
Angola,20.94876,20.94371,20.93754,20.93187,20.93569,20.94857,20.9603,20.98025,21.01375,21.05269,...,21.31954,21.3748,21.43664,21.51765,21.59924,21.69218,21.80564,21.93881,22.08962,22.25083


In [29]:
# 19. Cholesterol, female
fat_female = CreateDataFrameFromExcel('excel/Indicator_TC female ASM.xlsx','Data','TC female (mmol/L), age standardized mean',regions)
fat_female.head()

Original Shape:  (199, 30)
New shape:  (179, 29)


Unnamed: 0_level_0,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,...,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,4.644476,4.637118,4.63077,4.625318,4.619873,4.613289,4.607698,4.59808,4.583443,4.567768,...,4.356148,4.333604,4.309802,4.29517,4.283724,4.271746,4.262364,4.253473,4.246427,4.239035
Albania,5.039529,5.03661,5.033352,5.028138,5.022292,5.018456,5.012306,5.008334,5.002822,4.998255,...,4.916494,4.917402,4.919429,4.918646,4.915379,4.90998,4.902183,4.895111,4.888237,4.881235
Algeria,4.976215,4.975257,4.974508,4.976556,4.976963,4.977419,4.974378,4.971097,4.963965,4.957217,...,4.873433,4.866559,4.85758,4.848951,4.841351,4.836602,4.831645,4.826501,4.821301,4.815735
Andorra,6.132187,6.101291,6.069412,6.038793,6.008447,5.977998,5.948506,5.918512,5.890226,5.864313,...,5.59235,5.568902,5.548213,5.52906,5.513778,5.499101,5.484156,5.474519,5.465222,5.456065
Angola,4.789354,4.769557,4.75133,4.73355,4.719231,4.707431,4.693793,4.683528,4.674458,4.665655,...,4.511637,4.50508,4.499903,4.499115,4.498331,4.498226,4.501854,4.508352,4.517577,4.528061


In [30]:
# 20. Cholesterol, male
fat_male = CreateDataFrameFromExcel('excel/Indicator_TC male ASM.xlsx','Data','TC male (mmol/L), age standardized mean',regions)
fat_male.head()

Original Shape:  (199, 30)
New shape:  (179, 29)


Unnamed: 0_level_0,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,...,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,4.582847,4.575943,4.570482,4.566239,4.561473,4.555119,4.550215,4.54051,4.524835,4.507305,...,4.251635,4.221958,4.190959,4.171098,4.155508,4.139688,4.127613,4.115781,4.106099,4.095997
Albania,5.006371,5.001727,4.995893,4.988104,4.98158,4.977619,4.971681,4.968798,4.964878,4.961949,...,4.895632,4.89892,4.903275,4.905689,4.906776,4.90487,4.900919,4.897351,4.894249,4.890784
Algeria,4.925933,4.919521,4.914258,4.913837,4.911134,4.907856,4.901825,4.895566,4.884584,4.872994,...,4.743488,4.734174,4.722526,4.711596,4.701297,4.694086,4.686355,4.678111,4.671032,4.663696
Andorra,6.178972,6.14385,6.108996,6.077504,6.047321,6.018569,5.991545,5.964697,5.941315,5.921647,...,5.691342,5.671508,5.653838,5.636606,5.623507,5.609258,5.594193,5.582063,5.570358,5.557206
Angola,4.524107,4.509693,4.496962,4.484571,4.475721,4.469586,4.461452,4.457127,4.45422,4.451282,...,4.325931,4.322815,4.321519,4.325346,4.330437,4.336073,4.346258,4.359311,4.375734,4.39371


In [31]:
# 21. Smoking
smoking = CreateDataFrameFromExcel('excel/indicator_prevalence of current tobacco use among adults (%) both sexes.xlsx','Data','Prevalence of current tobacco use among adults (>=15 years) (%) both sexes',regions)
smoking.head()

Original Shape:  (192, 3)
New shape:  (179, 2)


Unnamed: 0_level_0,2002,2005
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Algeria,15.2,15.2
Angola,,
Benin,,
Botswana,,
Burkina Faso,16.6,16.6


In [32]:
# Extract the most recent column of each indicator (respectively 2011) and create a dataframe for health indicators, cancer deaths and risk factors.
life_expectancy = life_expectancy.loc[:,2011]
child_mortality = child_mortality.loc[:,2011]
health_spending = health_spending.loc[:,2010]
sanitation = sanitation.loc[:,2010]
breast_cancer_female = breast_cancer_female.loc[:,2002]
cervical_cancer_female = cervical_cancer_female.loc[:,2002]
colon_cancer_female = colon_cancer_female.loc[:,2002]
liver_cancer_female = liver_cancer_female.loc[:,2002]
lung_cancer_female = lung_cancer_female.loc[:,2002]
stomach_cancer_female = stomach_cancer_female.loc[:,2002]
colon_cancer_male = colon_cancer_male.loc[:,2002]
liver_cancer_male = liver_cancer_male.loc[:,2002]
lung_cancer_male = lung_cancer_male.loc[:,2002]
prostate_cancer_male = prostate_cancer_male.loc[:,2002]
stomach_cancer_male = stomach_cancer_male.loc[:,2002]
alc_consumption = alc_consumption.loc[:,2008]
BMI_female = BMI_female.loc[:,2008]
BMI_male = BMI_male.loc[:,2008]
fat_female = fat_female.loc[:,2008]
fat_male = fat_male.loc[:,2008]
smoking = smoking.loc[:,2005]

In [33]:
sanitation

country
Afghanistan                37.0
Albania                    94.0
Algeria                    95.0
Andorra                   100.0
Angola                     58.0
Antigua and Barbuda         NaN
Argentina                   NaN
Armenia                    90.0
Australia                 100.0
Austria                   100.0
Azerbaijan                 82.0
Bahamas                   100.0
Bahrain                     NaN
Bangladesh                 56.0
Barbados                  100.0
Belarus                    93.0
Belgium                   100.0
Belize                     90.0
Benin                      13.0
Bhutan                     44.0
Bolivia                    27.0
Bosnia and Herzegovina     95.0
Botswana                   62.0
Brazil                     79.0
Brunei                      NaN
Bulgaria                  100.0
Burkina Faso               17.0
Burundi                    46.0
Cambodia                   31.0
Cameroon                   49.0
                          ...  


> **Tip**: You should _not_ perform too many operations in each cell. Create cells freely to explore your data. One option that you can take with this project is to do a lot of explorations in an initial notebook. These don't have to be organized, but make sure you use enough comments to understand the purpose of each code cell. Then, after you're done with your analysis, create a duplicate notebook where you will trim the excess and organize your steps so that you have a flowing, cohesive report.

> **Tip**: Make sure that you keep your reader informed on the steps that you are taking in your investigation. Follow every code cell, or every set of related code cells, with a markdown cell to describe to the reader what was found in the preceding cell(s). Try to make it so that the reader can then understand what they will be seeing in the following cell(s).

### Data Cleaning (Replace this with more specific notes!)

In [None]:
# After discussing the structure of the data and any problems that need to be
#   cleaned, perform those cleaning steps in the second part of this section.


<a id='eda'></a>
## Exploratory Data Analysis

> **Tip**: Now that you've trimmed and cleaned your data, you're ready to move on to exploration. Compute statistics and create visualizations with the goal of addressing the research questions that you posed in the Introduction section. It is recommended that you be systematic with your approach. Look at one variable at a time, and then follow it up by looking at relationships between variables.

### Research Question 1 (Replace this header name!)

In [None]:
# Use this, and more code cells, to explore your data. Don't forget to add
#   Markdown cells to document your observations and findings.


### Research Question 2  (Replace this header name!)

In [None]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


<a id='conclusions'></a>
## Conclusions

> **Tip**: Finally, summarize your findings and the results that have been performed. Make sure that you are clear with regards to the limitations of your exploration. If you haven't done any statistical tests, do not imply any statistical conclusions. And make sure you avoid implying causation from correlation!

> **Tip**: Once you are satisfied with your work, you should save a copy of the report in HTML or PDF form via the **File** > **Download as** submenu. Before exporting your report, check over it to make sure that the flow of the report is complete. You should probably remove all of the "Tip" quotes like this one so that the presentation is as tidy as possible. Congratulations!