## Alteryx Challenge 444
Link: https://community.alteryx.com/t5/Weekly-Challenges/Challenge-444-World-Happiness-Report/td-p/1323864

Your task is to use the provided datasets to answer the following questions:

- Which year had the highest average happiness score?
- For each year, which is the highest-scoring country with a gross domestic product (GDP) under 1?
- Which five countries had the highest average generosity factor over the 2-year period?

### Import the dataset

In [2]:
import pandas as pd

In [3]:
df_2023 = pd.read_csv("2023 WH Report Data.csv")
df_2024 = pd.read_csv("2024 WH Report Data.csv")

In [4]:
df_2023.head()

Unnamed: 0,Overall rank,Country or region,Ladder Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,FileName
0,1,Finland,7.804,1.888,1.585,0.535,0.772,0.126,0.535,2023
1,2,Denmark,7.586,1.949,1.548,0.537,0.734,0.208,0.525,2023
2,3,Iceland,7.53,1.926,1.62,0.559,0.738,0.25,0.187,2023
3,4,Israel,7.473,1.833,1.521,0.577,0.569,0.124,0.158,2023
4,5,Netherlands,7.403,1.942,1.488,0.545,0.672,0.251,0.394,2023


In [5]:
df_union = pd.concat([df_2023,df_2024])

In [6]:
df_union.info()

<class 'pandas.core.frame.DataFrame'>
Index: 280 entries, 0 to 142
Data columns (total 10 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Overall rank                  280 non-null    int64  
 1   Country or region             280 non-null    object 
 2   Ladder Score                  280 non-null    float64
 3   GDP per capita                277 non-null    float64
 4   Social support                277 non-null    float64
 5   Healthy life expectancy       276 non-null    float64
 6   Freedom to make life choices  277 non-null    float64
 7   Generosity                    277 non-null    float64
 8   Perceptions of corruption     277 non-null    float64
 9   FileName                      280 non-null    int64  
dtypes: float64(7), int64(2), object(1)
memory usage: 24.1+ KB


### First question

Which year has the highest average happiness score?

In [29]:
#group by the file name, calculate the average happiness score
avg_happiness_score = df_union.groupby(['FileName']).agg({'Ladder Score':'mean'})

In [32]:
#sort the avg Ladder Score values by descending 
ans_one = avg_happiness_score.sort_values(by='Ladder Score',ascending=False).head(1)

In [33]:
ans_one

Unnamed: 0_level_0,Ladder Score
FileName,Unnamed: 1_level_1
2023,5.539796


### Second question
For each year, which is the highest-scoring country with a gross domestic product (GDP) under 1?

In [34]:
# Remove unnecessary columns from the data frame df_union
df_p3 = df_union[['FileName','Country or region','Ladder Score','GDP per capita']]

In [35]:
# FIlter the data to return only GDP < 1
df_p3 = df_p3[df_p3['GDP per capita']<1]

In [36]:
# Find the highest happiness score group by each year
df_highest = df_p3.loc[df_p3.groupby('FileName')['Ladder Score'].idxmax()]

In [13]:
df_highest

Unnamed: 0,FileName,Country or region,Ladder Score,GDP per capita
77,2023,Nepal,5.36,0.979
78,2024,Venezuela,5.607,0.0


### Third Question

Which five countries had the highest average generosity factor over the 2-year period?

In [37]:
# Pivot the Generousity of each year from rows to columns, group by the country
df_pivot = pd.pivot_table(df_union,index='Country or region',values='Generosity',columns='FileName')

In [15]:
df_pivot

FileName,2023,2024
Country or region,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,0.093,0.091
Albania,0.133,0.138
Algeria,0.073,0.091
Argentina,0.088,0.087
Armenia,0.053,0.051
...,...,...
Venezuela,0.205,0.192
Vietnam,0.134,0.094
Yemen,,0.080
Zambia,0.189,0.168


In [38]:
# Find the average generosity score of both years
df_pivot['average_generosity'] = (df_pivot[2023]+df_pivot[2024])/2

In [39]:
# Sort the avg generosity score by descending
df_pivot.sort_values(by='average_generosity', ascending=False)

FileName,2023,2024,average_generosity
Country or region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Indonesia,0.422,0.399,0.4105
Myanmar,0.400,0.401,0.4005
Gambia,0.332,0.324,0.3280
Thailand,0.291,0.283,0.2870
Kenya,0.291,0.282,0.2865
...,...,...,...
Lesotho,,0.082,
Libya,,0.111,
State of Palestine,0.065,,
Tajikistan,0.104,,


In [40]:
# Only get top 5 countries
df_p3 = df_pivot.sort_values(by='average_generosity', ascending=False).head(5)

In [41]:
df_p3

FileName,2023,2024,average_generosity
Country or region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Indonesia,0.422,0.399,0.4105
Myanmar,0.4,0.401,0.4005
Gambia,0.332,0.324,0.328
Thailand,0.291,0.283,0.287
Kenya,0.291,0.282,0.2865


### Bonus Challenge

Compare the AVG Healthy life expectancy scores of Countries beginning with letters C through M (inclusive)

In [42]:
# Split the first character of the Country name
df_union['first_char'] = df_union['Country or region'].str[0]

In [43]:
# Convert the first character of the country name into ASCII code
df_union['ord_first_char'] = df_union['first_char'].apply(ord)

In [44]:
# Filter the data to get only the country start from C to M
df_filter = df_union[(df_union['ord_first_char']>=ord('C')) & (df_union['ord_first_char']<=ord('M'))]

In [45]:
df_filter

Unnamed: 0,Overall rank,Country or region,Ladder Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,FileName,first_char,ord_first_char
0,1,Finland,7.804,1.888,1.585,0.535,0.772,0.126,0.535,2023,F,70
1,2,Denmark,7.586,1.949,1.548,0.537,0.734,0.208,0.525,2023,D,68
2,3,Iceland,7.530,1.926,1.620,0.559,0.738,0.250,0.187,2023,I,73
3,4,Israel,7.473,1.833,1.521,0.577,0.569,0.124,0.158,2023,I,73
8,9,Luxembourg,7.228,2.200,1.357,0.549,0.710,0.149,0.418,2023,L,76
...,...,...,...,...,...,...,...,...,...,...,...,...
134,135,Eswatini,3.502,1.255,0.925,0.176,0.284,0.059,0.116,2024,E,69
135,136,Malawi,3.421,0.617,0.410,0.349,0.571,0.135,0.136,2024,M,77
138,139,Congo (Kinshasa),3.295,0.534,0.665,0.262,0.473,0.189,0.072,2024,C,67
140,141,Lesotho,3.186,0.771,0.851,0.000,0.523,0.082,0.085,2024,L,76


In [46]:
# Same as the third question, pivot the healthy life expectancy score of each year from rows to columns grouped by country
df_pivot_bonus = pd.pivot_table(df_filter,index='Country or region',values='Healthy life expectancy',columns='FileName')

In [47]:
# Find the average score of both years
df_pivot_bonus['avg_sco'] = (df_pivot_bonus[2023]+df_pivot_bonus[2024])/2

In [52]:
# Sort the avg score by descending, and get the top 10
df_top_10 = df_pivot_bonus.sort_values(by='avg_sco',ascending=False).head(10)

In [53]:
df_top_10

FileName,2023,2024,avg_sco
Country or region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Hong Kong S.A.R. of China,0.702,0.857,0.7795
Japan,0.622,0.785,0.7035
Cyprus,0.58,0.744,0.662
Israel,0.577,0.74,0.6585
France,0.566,0.727,0.6465
Italy,0.559,0.72,0.6395
Iceland,0.559,0.718,0.6385
Luxembourg,0.549,0.708,0.6285
Malta,0.547,0.707,0.627
Canada,0.541,0.701,0.621
