# Part 2: Life Expectancy vs World Happiness Level

## Task: Load in Data
For this section of the project, we will be comparing the world life expectancy rates for all 7 continents to the world happiness levels to see if they correlate. Let's begin by loading the csv files.

In [None]:
!pip install pandas
!pip install plotly
!pip install -q folium mapclassify
!pip install pycountry-convert
%matplotlib inline

import geopandas as gpd
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import pycountry_convert as pc

# read csvs for world happiness and life expectancy
world_happiness = pd.read_csv("world_happiness.csv")
life_expectancy = pd.read_csv("life_expectancy.csv")



Great! Now let's display the world happiness and life expectancy datasets to see what we need to fix.

In [45]:
# drop the Rank column as it is not needed for analysis
world_happiness = world_happiness.drop(columns='Rank')

# show dataframe
world_happiness.head()

Unnamed: 0,Country,Year,Index
0,Afghanistan,2013,4.04
1,Afghanistan,2015,3.575
2,Afghanistan,2016,3.36
3,Afghanistan,2017,3.794
4,Afghanistan,2018,3.632


In [46]:
life_expectancy.head()

Unnamed: 0,Entity,Code,Year,Period life expectancy at birth - Sex: all - Age: 0
0,Afghanistan,AFG,1950,27.7275
1,Afghanistan,AFG,1951,27.9634
2,Afghanistan,AFG,1952,28.4456
3,Afghanistan,AFG,1953,28.9304
4,Afghanistan,AFG,1954,29.2258


In [47]:
# filter the life expectancy data for only years 2013 and beyond
filtered_life_expectancy = life_expectancy.loc[life_expectancy['Year']>=2013]

# drop code column as it is not needed
filtered_life_expectancy = filtered_life_expectancy.drop(columns='Code')
filtered_life_expectancy.head()

Unnamed: 0,Entity,Year,Period life expectancy at birth - Sex: all - Age: 0
63,Afghanistan,2013,62.4167
64,Afghanistan,2014,62.5451
65,Afghanistan,2015,62.6587
66,Afghanistan,2016,63.1361
67,Afghanistan,2017,63.016


In [48]:
# merge both datasets
merged_data = pd.merge(world_happiness, filtered_life_expectancy, left_on=['Country', 'Year'], right_on=['Entity','Year'])

# drop entity column as we have country
merged_data = merged_data.drop(columns='Entity')
merged_data.head()

Unnamed: 0,Country,Year,Index,Period life expectancy at birth - Sex: all - Age: 0
0,Afghanistan,2013,4.04,62.4167
1,Afghanistan,2015,3.575,62.6587
2,Afghanistan,2016,3.36,63.1361
3,Afghanistan,2017,3.794,63.016
4,Afghanistan,2018,3.632,63.081


Convert the countries to continents for easier analysis. This code was done with assistance from this thread: https://stackoverflow.com/questions/55910004/get-continent-name-from-country-using-pycountry and ChatGPT

Prompt for ChatGPT: "How do I account for the Invalid Kosovo error" "How do I get rid of any unconverted values"

In [49]:
# converts countries to their continent
def country_to_continent(country):
    if country == "Kosovo":
        return "Europe"
    country_code = pc.country_name_to_country_alpha2(country)
    if country_code is not None:
        continent_code = pc.country_alpha2_to_continent_code(country_code)
        if continent_code is not None:
            continent_name = pc.convert_continent_code_to_continent_name(continent_code)
            return continent_name
    else:
        return None
    
# use this on merged dataset
merged_data['Continent'] = merged_data['Country'].apply(country_to_continent)
merged_data = merged_data[merged_data['Continent'] != "Unknown"]
merged_data.head()

Unnamed: 0,Country,Year,Index,Period life expectancy at birth - Sex: all - Age: 0,Continent
0,Afghanistan,2013,4.04,62.4167,Asia
1,Afghanistan,2015,3.575,62.6587,Asia
2,Afghanistan,2016,3.36,63.1361,Asia
3,Afghanistan,2017,3.794,63.016,Asia
4,Afghanistan,2018,3.632,63.081,Asia


In [53]:
# assertion tests
assert country_to_continent("Germany") == "Europe"
assert country_to_continent("United States") == "North America"
assert country_to_continent("Kosovo") == "Europe"
assert country_to_continent("Brazil") == "South America"
assert country_to_continent("Australia") == "Oceania"
assert country_to_continent("Afghanistan") == "Asia"

In [54]:
# average all ages for a year for a continent
merged_data = merged_data.drop(columns='Country')
avg_life_exp = merged_data.groupby(['Continent', 'Year'])['Period life expectancy at birth - Sex: all - Age: 0'].mean()
avg_life_exp.head()

Continent  Year
Africa     2013    61.581927
           2015    62.482523
           2016    63.233467
           2017    63.035141
           2018    63.360751
Name: Period life expectancy at birth - Sex: all - Age: 0, dtype: float64

In [None]:
# average all happiness indexes for a year for a continent
avg_happiness = merged_data.groupby(['Continent', 'Year'])['Index'].mean()
avg_happiness.head()

In [None]:
# merge two averaged datasets on continent and year
new_merged_data = pd.merge(avg_life_exp, avg_happiness, on=['Continent', 'Year'])

# reset index for proper formatting
new_merged_data = new_merged_data.reset_index()
#new_merged_data.to_csv('happiness_vs_life_expectancy.csv')


In [None]:
# create visualization plotting year with life expectancy for all continents with plot.ly
fig_life = px.line(new_merged_data, x = 'Year' ,
              y = 'Period life expectancy at birth - Sex: all - Age: 0',
              color = 'Continent',
              title = 'Life Expectancy Trends for Each Continent from 2013 - 2021')

# show plot by writing it to html
fig_life.write_html('life_plot.html')

In [None]:
# plot year with happiness index for all continents
fig_life2 = px.line(new_merged_data, x = 'Year' ,
              y = 'Index',
              color = 'Continent',
              title = 'World Happiness Index Trends for Each Continent from 2013 - 2021')

fig_life2.write_html('life_plot2.html')

In [None]:
# statistical analysis - pearson correlation coefficient for all continents

# africa
# filter new_merged_data with only rows where Continent is Africa
africa_life = new_merged_data[new_merged_data['Continent']=='Africa']

# drop columns Continent and Year for calculating pearson correlation coefficient and rename table
africa_corr = africa_life.drop(columns=['Continent','Year'])
africa_corr.corr(method='pearson')

In [None]:
# asia
asia_life = new_merged_data[new_merged_data['Continent']=='Asia']
asia_corr = asia_life.drop(columns=['Continent','Year'])
asia_corr.corr(method='pearson')

In [None]:
# europe
europe_life = new_merged_data[new_merged_data['Continent']=='Europe']
europe_corr = europe_life.drop(columns=['Continent','Year'])
europe_corr.corr(method='pearson')

In [None]:
# north america
north_life = new_merged_data[new_merged_data['Continent']=='North America']
north_corr = north_life.drop(columns=['Continent','Year'])
north_corr.corr(method='pearson')

In [None]:
# oceania
oceania_life = new_merged_data[new_merged_data['Continent']=='Oceania']
oceania_corr = oceania_life.drop(columns=['Continent','Year'])
oceania_corr.corr(method='pearson')

In [None]:
# south america
south_life = new_merged_data[new_merged_data['Continent']=='South America']
south_corr = south_life.drop(columns=['Continent','Year'])
south_corr.corr(method='pearson')

In [None]:
# plot one data set more closely to see pearson correlation coefficient
# this is life expectancy trends for the continent of africa
fig_africa_life = px.line(africa_life, x = 'Year' ,
              y = 'Period life expectancy at birth - Sex: all - Age: 0',
              color = 'Continent',
              title = 'Life Expectancy Trends for Africa from 2013 - 2021')

fig_africa_life.write_html('africa_life_plot.html')

In [None]:
# plot one data set more closely to see pearson correlation coefficient
# this is world happiness index trends for the continent of africa
fig_africa_life2 = px.line(africa_life, x = 'Year' ,
              y = 'Index',
              color = 'Continent',
              title = 'World Happiness Index Trends for Africa from 2013 - 2021')

fig_africa_life2.write_html('africa_life_plot2.html')