#### Introduction
This notebook will investigate the relationship between the GDP of a country with the life expectancy of its citizens.

Here are a few questions that this project will seek to answer:

- Has life expectancy increased over time in the six nations?
- Has GDP increased over time in the six nations?
- Is there a correlation between GDP and life expectancy of a country?
- What is the average life expectancy in these nations?
- What is the distribution of that life expectancy?

Data Sources:
- GDP Source: World Bank national accounts data, and OECD National Accounts data files.

- Life expectancy Data Source: World Health Organization

#### Importing Libraries

Before any data science activity is started python libraries useful in arriving at conclusions need to be imported.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
#Loading the csv file into a dataframe called gdp_data
gdp_data = pd.read_csv('/Users/elorm/Documents/Repos/Datasets/all_data.csv')

#Printing the head (first 5 rows) of the data
print(gdp_data.head())

  Country  Year  Life expectancy at birth (years)           GDP
0   Chile  2000                              77.3  7.786093e+10
1   Chile  2001                              77.3  7.097992e+10
2   Chile  2002                              77.8  6.973681e+10
3   Chile  2003                              77.9  7.564346e+10
4   Chile  2004                              78.0  9.921039e+10


In [3]:
#Printing the tail (last 5 rows) of the data
print(gdp_data.tail())

     Country  Year  Life expectancy at birth (years)           GDP
91  Zimbabwe  2011                              54.9  1.209845e+10
92  Zimbabwe  2012                              56.6  1.424249e+10
93  Zimbabwe  2013                              58.0  1.545177e+10
94  Zimbabwe  2014                              59.2  1.589105e+10
95  Zimbabwe  2015                              60.7  1.630467e+10


In [4]:
#Getting information on the dataset
gdp_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 96 entries, 0 to 95
Data columns (total 4 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Country                           96 non-null     object 
 1   Year                              96 non-null     int64  
 2   Life expectancy at birth (years)  96 non-null     float64
 3   GDP                               96 non-null     float64
dtypes: float64(2), int64(1), object(1)
memory usage: 3.1+ KB


In [5]:
#How many different countries are in the dataset?
num_countries = gdp_data['Country'].nunique()

#What are the countries?
countries = gdp_data['Country'].unique()

print('There are ' + str(num_countries) + ' in the dataset:')
print(countries)

There are 6 in the dataset:
['Chile' 'China' 'Germany' 'Mexico' 'United States of America' 'Zimbabwe']


#### Data Cleaning

Even though the data is not too dirty, the column called 'Life expectancy at birth (years)' is obviously too long and needs to renamed to a shorter one. The GDP column values have a scientific representation, the values are in billions. To make this simpler for readers, the values will rounded off to 2 decimal places and the column name updated to show the values are in billions.

In [6]:
#Renaming the 'Life expectancy at birth (years)' column to LifeExp and GDP to GDP (Billions)

gdp_data.rename(columns = {'Life expectancy at birth (years)' : 'LifeExp', 'GDP' : 'GDP (Billions)'}, inplace = True)
gdp_data.head()

Unnamed: 0,Country,Year,LifeExp,GDP (Billions)
0,Chile,2000,77.3,77860930000.0
1,Chile,2001,77.3,70979920000.0
2,Chile,2002,77.8,69736810000.0
3,Chile,2003,77.9,75643460000.0
4,Chile,2004,78.0,99210390000.0


In [7]:
#Recoding the values in GDP(Billions) for easy reading
gdp_data['GDP (Billions)'] = gdp_data.apply(lambda row: round(row['GDP (Billions)']/10000000000, 2), axis = 1)
gdp_data.head()

Unnamed: 0,Country,Year,LifeExp,GDP (Billions)
0,Chile,2000,77.3,7.79
1,Chile,2001,77.3,7.1
2,Chile,2002,77.8,6.97
3,Chile,2003,77.9,7.56
4,Chile,2004,78.0,9.92
