# Student Name: Godfred Asamoah
# Student ID: @03054184
# Project Title: World Happiness Index Analysis

## Introduction to the Dataset
The World Happiness Report is a landmark survey that ranks global happiness across countries. It assesses various factors contributing to well-being, including economic production, social support, and more. [Dataset Source on Kaggle](https://www.kaggle.com/datasets/unsdsn/world-happiness)

## Dataset Source
The dataset is sourced from Kaggle: [World Happiness Report](https://www.kaggle.com/datasets/unsdsn/world-happiness)

## Background Context
Understanding the factors that contribute to happiness can inform policy decisions and promote societal well-being. This dataset provides insights into the determinants of happiness across different nations, enabling comparative analyses and fostering a deeper understanding of global happiness trends.

## Data Wrangling and Descriptive Statistics

### Loading the Dataset

In [11]:
import pandas as pd

# Load the dataset
file_path = '2015 (1).csv'
df = pd.read_csv(file_path)
df.head()

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
0,Switzerland,Western Europe,1,7.587,0.03411,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738
1,Iceland,Western Europe,2,7.561,0.04884,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201
2,Denmark,Western Europe,3,7.527,0.03328,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204
3,Norway,Western Europe,4,7.522,0.0388,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531
4,Canada,North America,5,7.427,0.03553,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176


## Section I: Dataset Overview and Structure

In [22]:
df.shape

(158, 12)

In [24]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 158 entries, 0 to 157
Data columns (total 12 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Country                        158 non-null    object 
 1   Region                         158 non-null    object 
 2   Happiness Rank                 158 non-null    int64  
 3   Happiness Score                158 non-null    float64
 4   Standard Error                 158 non-null    float64
 5   Economy (GDP per Capita)       158 non-null    float64
 6   Family                         158 non-null    float64
 7   Health (Life Expectancy)       158 non-null    float64
 8   Freedom                        158 non-null    float64
 9   Trust (Government Corruption)  158 non-null    float64
 10  Generosity                     158 non-null    float64
 11  Dystopia Residual              158 non-null    float64
dtypes: float64(9), int64(1), object(2)
memory usage: 1

In [26]:
df.head()

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
0,Switzerland,Western Europe,1,7.587,0.03411,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738
1,Iceland,Western Europe,2,7.561,0.04884,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201
2,Denmark,Western Europe,3,7.527,0.03328,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204
3,Norway,Western Europe,4,7.522,0.0388,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531
4,Canada,North America,5,7.427,0.03553,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176


In [28]:
df.tail()

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
153,Rwanda,Sub-Saharan Africa,154,3.465,0.03464,0.22208,0.7737,0.42864,0.59201,0.55191,0.22628,0.67042
154,Benin,Sub-Saharan Africa,155,3.34,0.03656,0.28665,0.35386,0.3191,0.4845,0.0801,0.1826,1.63328
155,Syria,Middle East and Northern Africa,156,3.006,0.05015,0.6632,0.47489,0.72193,0.15684,0.18906,0.47179,0.32858
156,Burundi,Sub-Saharan Africa,157,2.905,0.08658,0.0153,0.41587,0.22396,0.1185,0.10062,0.19727,1.83302
157,Togo,Sub-Saharan Africa,158,2.839,0.06727,0.20868,0.13995,0.28443,0.36453,0.10731,0.16681,1.56726


The dataset contains both **categorical** (e.g., Country, Region) and **numerical** variables (e.g., GDP, Health). There are no datetime values.

In [31]:
df.isnull().sum()

Country                          0
Region                           0
Happiness Rank                   0
Happiness Score                  0
Standard Error                   0
Economy (GDP per Capita)         0
Family                           0
Health (Life Expectancy)         0
Freedom                          0
Trust (Government Corruption)    0
Generosity                       0
Dystopia Residual                0
dtype: int64

Missing values will be handled by either imputing with the mean or removing rows with significant null values depending on their count and importance.

## Section II: Data Cleaning & Indexing

In [35]:
df.duplicated().sum()

0

In [37]:
df = df.drop_duplicates()

### Adding the 'Development Level' Column
To categorize countries based on their GDP per capita, we'll add a new categorical column named 'Development Level'.

In [65]:
import numpy as np
import pandas as pd

# Define the percentiles for GDP per capita
percentiles = np.percentile(df['Economy (GDP per Capita)'], [25, 50, 75, 100])

def classify_GDP(gdp):
    if pd.isnull(gdp):
        return 'Unknown'
    elif gdp <= percentiles[0]:
        return 'Low'
    elif gdp <= percentiles[1]:
        return 'Low Middle'
    elif gdp <= percentiles[2]:
        return 'Middle'
    else:
        return 'High'

# Copy the DataFrame and add new categorical column
df_cleaned = df.copy()
df_cleaned['Development Level'] = df_cleaned['Economy (GDP per Capita)'].apply(classify_GDP)

# Show a preview of the updated DataFrame
print(df_cleaned[['Country', 'Economy (GDP per Capita)', 'Development Level']].head())

# Save the cleaned DataFrame to a CSV file
df_cleaned.to_csv('cleaned_data.csv', index=False)

       Country  Economy (GDP per Capita) Development Level
0  Switzerland                   1.39651              High
1      Iceland                   1.30232              High
2      Denmark                   1.32548              High
3       Norway                   1.45900              High
4       Canada                   1.32629              High


### Descriptive Statistics

In [71]:
# Descriptive statistics for numerical columns
df_cleaned.describe()

Unnamed: 0,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
count,158.0,158.0,158.0,158.0,158.0,158.0,158.0,158.0,158.0,158.0
mean,79.493671,5.375734,0.047885,0.846137,0.991046,0.630259,0.428615,0.143422,0.237296,2.098977
std,45.754363,1.14501,0.017146,0.403121,0.272369,0.247078,0.150693,0.120034,0.126685,0.55355
min,1.0,2.839,0.01848,0.0,0.0,0.0,0.0,0.0,0.0,0.32858
25%,40.25,4.526,0.037268,0.545808,0.856823,0.439185,0.32833,0.061675,0.150553,1.75941
50%,79.5,5.2325,0.04394,0.910245,1.02951,0.696705,0.435515,0.10722,0.21613,2.095415
75%,118.75,6.24375,0.0523,1.158448,1.214405,0.811013,0.549092,0.180255,0.309883,2.462415
max,158.0,7.587,0.13693,1.69042,1.40223,1.02525,0.66973,0.55191,0.79588,3.60214


In [43]:
# Value counts for the new 'Development Level' column
df_cleaned['Development Level'].value_counts()

Development Level
High          40
Low           40
Middle        39
Low Middle    39
Name: count, dtype: int64

In [75]:
# Filter example 1: Countries with GDP above 1.4
df_cleaned[df['Economy (GDP per Capita)'] > 1.4]

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,Development Level
3,Norway,Western Europe,4,7.522,0.0388,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531,High
16,Luxembourg,Western Europe,17,6.946,0.03499,1.56391,1.21963,0.91894,0.61583,0.37798,0.28034,1.96961,High
19,United Arab Emirates,Middle East and Northern Africa,20,6.901,0.03729,1.42727,1.12575,0.80925,0.64157,0.38583,0.26428,2.24743,High
23,Singapore,Southeastern Asia,24,6.798,0.0378,1.52186,1.02,1.02525,0.54252,0.4921,0.31105,1.88501,High
27,Qatar,Middle East and Northern Africa,28,6.611,0.06257,1.69042,1.0786,0.79733,0.6404,0.52208,0.32573,1.55674,High
38,Kuwait,Middle East and Northern Africa,39,6.295,0.04456,1.55422,1.16594,0.72492,0.55499,0.25609,0.16228,1.87634,High


In [73]:
# Filter example 2: Countries in the Western Europe region
df_cleaned[df['Region'] == 'Western Europe']

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,Development Level
0,Switzerland,Western Europe,1,7.587,0.03411,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738,High
1,Iceland,Western Europe,2,7.561,0.04884,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201,High
2,Denmark,Western Europe,3,7.527,0.03328,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204,High
3,Norway,Western Europe,4,7.522,0.0388,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531,High
5,Finland,Western Europe,6,7.406,0.0314,1.29025,1.31826,0.88911,0.64169,0.41372,0.23351,2.61955,High
6,Netherlands,Western Europe,7,7.378,0.02799,1.32944,1.28017,0.89284,0.61576,0.31814,0.4761,2.4657,High
7,Sweden,Western Europe,8,7.364,0.03157,1.33171,1.28907,0.91087,0.6598,0.43844,0.36262,2.37119,High
12,Austria,Western Europe,13,7.2,0.03751,1.33723,1.29704,0.89042,0.62433,0.18676,0.33088,2.5332,High
16,Luxembourg,Western Europe,17,6.946,0.03499,1.56391,1.21963,0.91894,0.61583,0.37798,0.28034,1.96961,High
17,Ireland,Western Europe,18,6.94,0.03676,1.33596,1.36948,0.89533,0.61777,0.28703,0.45901,1.9757,High


In [77]:
df_cleaned['Region'].value_counts()

Region
Sub-Saharan Africa                 40
Central and Eastern Europe         29
Latin America and Caribbean        22
Western Europe                     21
Middle East and Northern Africa    20
Southeastern Asia                   9
Southern Asia                       7
Eastern Asia                        6
North America                       2
Australia and New Zealand           2
Name: count, dtype: int64

In [59]:
# Group by Region and compute average Happiness Score
df_cleaned.groupby('Region')['Happiness Score'].mean().sort_values(ascending=False)

Region
Australia and New Zealand          7.285000
North America                      7.273000
Western Europe                     6.689619
Latin America and Caribbean        6.144682
Eastern Asia                       5.626167
Middle East and Northern Africa    5.406900
Central and Eastern Europe         5.332931
Southeastern Asia                  5.317444
Southern Asia                      4.580857
Sub-Saharan Africa                 4.202800
Name: Happiness Score, dtype: float64

In [79]:
# Group by development level and compute average Happiness Score
df_cleaned.groupby('Development Level')['Happiness Score'].mean().sort_values(ascending=False)

Development Level
High          6.652675
Middle        5.623256
Low Middle    5.047308
Low           4.177675
Name: Happiness Score, dtype: float64

In [81]:
df_cleaned['Development Level'].value_counts()

Development Level
High          40
Low           40
Middle        39
Low Middle    39
Name: count, dtype: int64

## Section IV: Observations & Next Steps

- Countries in Western Europe generally have higher happiness scores.
- GDP per capita plays a significant role in development level and perceived well-being.

**Further questions for Part 2:**
1. Is there a visible correlation between GDP and Happiness Score?
2. How do regions compare in terms of health and freedom indicators using visuals?

## Final Summary
This notebook provides a structured approach to examining the 2015 World Happiness dataset. We performed initial data cleaning, classified countries by GDP, and calculated descriptive statistics. Our analysis found significant trends in GDP, health, and geographic region, which will guide the visual exploration in Part 2.