The World Happiness Report polls over 100,000 people in over 130 countries annually to determine happiness rankings. The main life evaluation question asks respondents to think of a ladder, with the best possible life for them being a 10 and the worst possible life being a 0. They are then asked to rate their own current lives on that 0 to 10 scale (Ladder Score). The columns following the happiness score are variables based on observed data for six factors: economic production, social support, life expectancy, freedom, absence of corruption, and generosity. The report uses these variables to help explain the happiness ranking variation across countries.  

The provided datasets include the World Happiness Report data for 2023 and 2024.

Your task is to use the provided datasets to answer the following questions:

• Which year had the highest average happiness score?

• For each year, which is the highest-scoring country with a gross domestic product (GDP) under 1?

• Which five countries had the highest average generosity factor over the 2-year period?

Bonus task: Find the top 10 countries with the highest Healthy life expectancy scores over both years... ONLY FOR countries beginning with letters C through M (inclusive)

### 1. Importing libraries and data

In [1]:
import pandas as pd
import numpy as np
import os

path = r'G:\My Drive\Python Challenges\Alteryx Challenges in Python\444'

df2023 = pd.read_csv(os.path.join('2023 WH Report Data.csv'), index_col=False)
df2024 = pd.read_csv(os.path.join('2024 WH Report Data.csv'), index_col=False)

### 2. Data Cleaning & Exploration

In [2]:
df2023.head()

Unnamed: 0,Overall rank,Country or region,Ladder Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,FileName
0,1,Finland,7.804,1.888,1.585,0.535,0.772,0.126,0.535,2023
1,2,Denmark,7.586,1.949,1.548,0.537,0.734,0.208,0.525,2023
2,3,Iceland,7.53,1.926,1.62,0.559,0.738,0.25,0.187,2023
3,4,Israel,7.473,1.833,1.521,0.577,0.569,0.124,0.158,2023
4,5,Netherlands,7.403,1.942,1.488,0.545,0.672,0.251,0.394,2023


In [3]:
df2024.head()

Unnamed: 0,Overall rank,Country or region,Ladder Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,FileName
0,1,Finland,7.741,1.844,1.572,0.695,0.859,0.142,0.546,2024
1,2,Denmark,7.583,1.908,1.52,0.699,0.823,0.204,0.548,2024
2,3,Iceland,7.525,1.881,1.617,0.718,0.819,0.258,0.182,2024
3,4,Sweden,7.344,1.878,1.501,0.724,0.838,0.221,0.524,2024
4,5,Israel,7.341,1.803,1.513,0.74,0.641,0.153,0.193,2024


In [4]:
# Appending into one dataset for ease
df = pd.concat([df2023, df2024], ignore_index=True, sort=False)

In [5]:
df.head()

Unnamed: 0,Overall rank,Country or region,Ladder Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,FileName
0,1,Finland,7.804,1.888,1.585,0.535,0.772,0.126,0.535,2023
1,2,Denmark,7.586,1.949,1.548,0.537,0.734,0.208,0.525,2023
2,3,Iceland,7.53,1.926,1.62,0.559,0.738,0.25,0.187,2023
3,4,Israel,7.473,1.833,1.521,0.577,0.569,0.124,0.158,2023
4,5,Netherlands,7.403,1.942,1.488,0.545,0.672,0.251,0.394,2023


In [6]:
df.tail()
# yup it's all in there

Unnamed: 0,Overall rank,Country or region,Ladder Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,FileName
275,139,Congo (Kinshasa),3.295,0.534,0.665,0.262,0.473,0.189,0.072,2024
276,140,Sierra Leone,3.245,0.654,0.566,0.253,0.469,0.181,0.053,2024
277,141,Lesotho,3.186,0.771,0.851,0.0,0.523,0.082,0.085,2024
278,142,Lebanon,2.707,1.377,0.577,0.556,0.173,0.068,0.029,2024
279,143,Afghanistan,1.721,0.628,0.0,0.242,0.0,0.091,0.088,2024


In [7]:
# renaming FileName to Year
df.rename(columns={'FileName' : 'Year'}, inplace=True) # inplace=True keeps the change without having to type in 'df =' before
df.head()

Unnamed: 0,Overall rank,Country or region,Ladder Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Year
0,1,Finland,7.804,1.888,1.585,0.535,0.772,0.126,0.535,2023
1,2,Denmark,7.586,1.949,1.548,0.537,0.734,0.208,0.525,2023
2,3,Iceland,7.53,1.926,1.62,0.559,0.738,0.25,0.187,2023
3,4,Israel,7.473,1.833,1.521,0.577,0.569,0.124,0.158,2023
4,5,Netherlands,7.403,1.942,1.488,0.545,0.672,0.251,0.394,2023


### 3. Solutions

#### Which year had the highest average happiness score?

In [8]:
# Grouping by year, take average of ladder score, sort for fanciness
df.groupby('Year', as_index=False).agg(avg_score=('Ladder Score', 'mean')).sort_values(by='avg_score', ascending=False)

Unnamed: 0,Year,avg_score
0,2023,5.539796
1,2024,5.52758


#### For each year, which is the highest-scoring country with a gross domestic product (GDP) under 1?

In [9]:
# Creating subset of df for rows with GDP < 1
low_gdp = df.loc[df['GDP per capita'] < 1]
low_gdp.head()

Unnamed: 0,Overall rank,Country or region,Ladder Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Year
77,78,Nepal,5.36,0.979,1.027,0.281,0.567,0.215,0.104,2023
79,80,Tajikistan,5.33,0.972,1.248,0.291,0.599,0.104,0.292,2023
85,86,Congo (Brazzaville),5.267,0.921,0.665,0.145,0.464,0.134,0.136,2023
87,88,Venezuela,5.211,0.0,1.257,0.341,0.369,0.205,0.084,2023
90,91,Guinea,5.072,0.844,0.776,0.072,0.369,0.204,0.102,2023


In [10]:
low_gdp_top_scores = low_gdp.groupby('Year', as_index=False).agg(max_score=('Ladder Score', 'max'))
low_gdp_top_scores

Unnamed: 0,Year,max_score
0,2023,5.36
1,2024,5.607


In [11]:
# Getting answer by joining by Year and score, showing only necessary columns
low_gdp.merge(low_gdp_top_scores, how='inner', left_on=['Year', 'Ladder Score'], right_on=['Year', 'max_score'])[['Country or region', 'Ladder Score', 'Year', 'max_score']]

Unnamed: 0,Country or region,Ladder Score,Year,max_score
0,Nepal,5.36,2023,5.36
1,Venezuela,5.607,2024,5.607


#### Which five countries had the highest average generosity factor over the 2-year period?

In [12]:
df.head()

Unnamed: 0,Overall rank,Country or region,Ladder Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Year
0,1,Finland,7.804,1.888,1.585,0.535,0.772,0.126,0.535,2023
1,2,Denmark,7.586,1.949,1.548,0.537,0.734,0.208,0.525,2023
2,3,Iceland,7.53,1.926,1.62,0.559,0.738,0.25,0.187,2023
3,4,Israel,7.473,1.833,1.521,0.577,0.569,0.124,0.158,2023
4,5,Netherlands,7.403,1.942,1.488,0.545,0.672,0.251,0.394,2023


In [13]:
# Grouping by country, take avg generosity, sort descending and limit results to 5
df.groupby('Country or region', as_index=False).agg(avg_gen_factor=('Generosity', 'mean')).sort_values(by='avg_gen_factor', ascending=False).head(5)

Unnamed: 0,Country or region,avg_gen_factor
55,Indonesia,0.4105
91,Myanmar,0.4005
43,Gambia,0.328
127,Thailand,0.287
66,Kenya,0.2865


#### Bonus: Compare the Healthy life expectancy scores of Countries beginning with letters C through M (inclusive)

In [14]:
df['Country letter'] = df['Country or region'].str[0:1]

In [15]:
df.head()

Unnamed: 0,Overall rank,Country or region,Ladder Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Year,Country letter
0,1,Finland,7.804,1.888,1.585,0.535,0.772,0.126,0.535,2023,F
1,2,Denmark,7.586,1.949,1.548,0.537,0.734,0.208,0.525,2023,D
2,3,Iceland,7.53,1.926,1.62,0.559,0.738,0.25,0.187,2023,I
3,4,Israel,7.473,1.833,1.521,0.577,0.569,0.124,0.158,2023,I
4,5,Netherlands,7.403,1.942,1.488,0.545,0.672,0.251,0.394,2023,N


In [16]:
df.dtypes

Overall rank                      int64
Country or region                object
Ladder Score                    float64
GDP per capita                  float64
Social support                  float64
Healthy life expectancy         float64
Freedom to make life choices    float64
Generosity                      float64
Perceptions of corruption       float64
Year                              int64
Country letter                   object
dtype: object

In [17]:
df['Country letter'] = df['Country letter'].astype(str)

In [19]:
df['ascii'] = df['Country letter'].apply(ord)

In [20]:
df.head()

Unnamed: 0,Overall rank,Country or region,Ladder Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Year,Country letter,ascii
0,1,Finland,7.804,1.888,1.585,0.535,0.772,0.126,0.535,2023,F,70
1,2,Denmark,7.586,1.949,1.548,0.537,0.734,0.208,0.525,2023,D,68
2,3,Iceland,7.53,1.926,1.62,0.559,0.738,0.25,0.187,2023,I,73
3,4,Israel,7.473,1.833,1.521,0.577,0.569,0.124,0.158,2023,I,73
4,5,Netherlands,7.403,1.942,1.488,0.545,0.672,0.251,0.394,2023,N,78


In [26]:
print(ord('C'))
print(ord('M'))

67
77


In [27]:
bonus_df = df.loc[(df['ascii']>=67) & (df['ascii']<=77)]

In [28]:
bonus_df.head()

Unnamed: 0,Overall rank,Country or region,Ladder Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Year,Country letter,ascii
0,1,Finland,7.804,1.888,1.585,0.535,0.772,0.126,0.535,2023,F,70
1,2,Denmark,7.586,1.949,1.548,0.537,0.734,0.208,0.525,2023,D,68
2,3,Iceland,7.53,1.926,1.62,0.559,0.738,0.25,0.187,2023,I,73
3,4,Israel,7.473,1.833,1.521,0.577,0.569,0.124,0.158,2023,I,73
8,9,Luxembourg,7.228,2.2,1.357,0.549,0.71,0.149,0.418,2023,L,76


In [29]:
bonus_df.groupby('Country or region', as_index=False).agg(avg_health_score=('Healthy life expectancy', 'mean')).sort_values(by='avg_health_score', ascending=False).head(10)

Unnamed: 0,Country or region,avg_health_score
33,Hong Kong S.A.R. of China,0.7795
45,Japan,0.7035
12,Cyprus,0.662
50,Kuwait,0.661
41,Israel,0.6585
23,France,0.6465
42,Italy,0.6395
35,Iceland,0.6385
59,Luxembourg,0.6285
64,Malta,0.627


In [31]:
df['Country or region'].value_counts('Israel')

Country or region
Finland       0.007143
Nigeria       0.007143
Laos          0.007143
Georgia       0.007143
Guinea        0.007143
                ...   
Libya         0.003571
Azerbaijan    0.003571
Yemen         0.003571
Eswatini      0.003571
Lesotho       0.003571
Name: proportion, Length: 143, dtype: float64

In [33]:
df.loc[df['Country or region']=='Kuwait']

Unnamed: 0,Overall rank,Country or region,Ladder Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Year,Country letter,ascii
149,13,Kuwait,6.951,1.845,1.364,0.661,0.827,0.2,0.172,2024,K,75


Okay so some countries only have one year's worth of data. So they don't REALLY belong in the average for "both" years. But I'm not doing that.