<a href="https://colab.research.google.com/github/egabrielvice/DATA602_ipynb/blob/8/Final_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Final Project  
**Inflation and National Happiness (2015–2023)**  

**Name:** Escarlet Gabriel Vicente  
**Course:** DATA 602 – M.S. Data Science  
**Instructor:** Professor Schettini  



## Abstract
This project examines the relationship between headline consumer price inflation and national happiness across countries using World Happiness Index indicators and inflation data from 2015–2023. Happiness serves as an outcome measure of societal well-being, while inflation captures macroeconomic instability through rising costs and uncertainty. The analysis follows a standard data science workflow that includes data preparation, exploratory analysis, and descriptive statistical assessment supported by visualizations and summary measures. The results document cross-country patterns in happiness and inflation, including distributional differences, grouped comparisons, and an overall negative association between inflation and happiness. Taken together, the findings offer an applied, finance-oriented perspective on how macroeconomic conditions may relate to well-being in a measurable and interpretable way using publicly available data.

## Introduction

**Research Question:** Is higher inflation associated with lower happiness scores across countries?

Happiness and well-being are increasingly used as broad indicators of societal progress. Economic conditions influence daily life through employment opportunities, purchasing power, and overall stability. Inflation is especially important because sustained increases in prices can reduce real income, increase uncertainty, and complicate financial planning for households and businesses. Using country-level data from 2015 to 2023, this analysis examines whether higher headline consumer price inflation is associated with lower happiness scores, while also reporting related economic context such as GDP per capita. The question is relevant to business and finance because macroeconomic stability influences consumer sentiment, market conditions, and policy environments that shape both economic performance and quality of life.

### Data Source
The dataset used in this analysis was obtained from Kaggle and combines World Happiness Index data with inflation metrics from publicly available sources, including the World Happiness Report, Gallup World Poll, and the World Bank.

Dataset source: https://www.kaggle.com/… (World Happiness Index and Inflation Dataset)

## Data Preparation

### Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Load Data


In [None]:
df = pd.read_csv("/content/WHI_Inflation.csv")
df.head()

FileNotFoundError: [Errno 2] No such file or directory: '/content/WHI_Inflation.csv'

### Data Wrangling

In [None]:
df.columns = [c.strip() for c in df.columns]
df.info()

In [None]:
COL_COUNTRY = "Country"
COL_YEAR    = "Year"
COL_HAPPY  = "Score"
COL_INFL   = "Headline Consumer Price Inflation"
COL_GDP    = "GDP per Capita"

for col in [COL_COUNTRY, COL_YEAR, COL_HAPPY, COL_INFL]:
    if col not in df.columns:
        print(f"Missing: {col}. Update your column mapping above.")
    else:
        print(f"Found: {col}")

In [None]:
df[COL_HAPPY] = pd.to_numeric(df[COL_HAPPY], errors="coerce")
df[COL_INFL]  = pd.to_numeric(df[COL_INFL], errors="coerce")
df[COL_YEAR]  = pd.to_numeric(df[COL_YEAR], errors="coerce")

if COL_GDP in df.columns:
    df[COL_GDP] = pd.to_numeric(df[COL_GDP], errors="coerce")

df_clean = df.dropna(subset=[COL_COUNTRY, COL_YEAR, COL_HAPPY, COL_INFL]).copy()

After standardizing column names, key variables were mapped and converted to numeric
types. Observations with missing values in the core variables (country, year, happiness
score, and inflation) were removed, resulting in a clean analytical sample of 1,200
country–year observations.

## EDA (Exploratory Data Analysis)

### Summary Statistics

In [None]:
summary_cols = [COL_HAPPY, COL_INFL] + ([COL_GDP] if COL_GDP in df_clean.columns else [])
df_clean[summary_cols].describe()

The cleaned dataset contains **1,200 country–year observations**.
The average happiness score across countries is **5.48**, with most values ranging between approximately **4.60 and 6.33**, indicating moderate variation in reported well-being.

Headline consumer price inflation has a **mean of 7.40 percent**, but displays substantial dispersion, with a standard deviation of **25.17**. While many observations cluster around low inflation values, the maximum exceeds **500 percent**, reflecting episodes of extreme inflation in certain countries and years. GDP per capita averages **1.03** (index-scaled), with relatively less dispersion compared to inflation.

These statistics indicate that inflation is highly skewed and volatile across countries, whereas happiness scores are more tightly distributed.

### Missing Values

In [None]:
df.isna().sum().sort_values(ascending=False).head(15)

Several inflation-related variables (such as core CPI and producer price inflation) contain missing values. However, the core variables used in this analysis—**country, year, happiness score, headline inflation, and GDP per capita**—are largely complete. After removing observations with missing values in the core variables, the analytical sample remains sufficiently large and representative for exploratory analysis.

### Distributions

In [None]:
plt.figure()
df_clean[COL_HAPPY].hist(bins=30)
plt.title("Distribution of Happiness Scores")
plt.xlabel("Happiness Score")
plt.ylabel("Count")
plt.show()

plt.figure()
df_clean[COL_INFL].hist(bins=30)
plt.title("Distribution of Inflation Rates")
plt.xlabel("Inflation")
plt.ylabel("Count")
plt.show()

The distribution of happiness scores appears approximately bell-shaped, centered around values between **5 and 6**, suggesting that most countries report moderate happiness levels.

In contrast, the distribution of inflation rates is **highly right-skewed**, with a large concentration of observations at low inflation levels and a small number of extreme outliers. This pattern is consistent with real-world macroeconomic data, where most countries experience moderate inflation while a few face severe inflationary episodes.

### Relationship plot: Inflation vs Happiness

In [None]:
plt.figure()
sns.scatterplot(data=df_clean, x=COL_INFL, y=COL_HAPPY, alpha=0.6)
plt.title("Inflation vs Happiness Score (All Countries, All Years)")
plt.xlabel("Inflation")
plt.ylabel("Happiness Score")
plt.show()

### Inflation and Happiness with Trend Line

In [None]:
plt.figure()
sns.regplot(data=df_clean, x=COL_INFL, y=COL_HAPPY, scatter_kws={"alpha":0.4}, line_kws={})
plt.title("Inflation vs Happiness Score with Trend Line")
plt.xlabel("Inflation")
plt.ylabel("Happiness Score")
plt.show()

The scatter plot shows substantial dispersion across countries, but an overall downward pattern is visible. Countries experiencing very high inflation tend to report lower happiness scores, while countries with low inflation exhibit a wider range of happiness outcomes. The fitted trend line reinforces this pattern, indicating a negative association between inflation and happiness, even though the relationship is not perfectly linear.

### Happiness by Inflation Quartile

In [None]:
df_q = df_clean.copy()
df_q["infl_quartile"] = pd.qcut(df_q[COL_INFL], 4, labels=["Q1 (Lowest)", "Q2", "Q3", "Q4 (Highest)"])

quartile_summary = df_q.groupby("infl_quartile")[COL_HAPPY].agg(["count","mean","median"]).reset_index()
quartile_summary

When countries are grouped into inflation quartiles, average happiness scores decline monotonically as inflation increases. Countries in the lowest inflation quartile report the highest average happiness, while those in the highest inflation quartile exhibit the lowest average happiness. This grouped comparison corroborates the negative relationship observed in the scatter plots and highlights how higher inflation environments are associated with lower average well-being.

In [None]:
plt.figure()
sns.barplot(data=quartile_summary, x="infl_quartile", y="mean")
plt.title("Average Happiness by Inflation Quartile")
plt.xlabel("Inflation Quartile")
plt.ylabel("Average Happiness Score")
plt.show()

## Data Analysis

### Correlation (overall)

In [None]:
corr = df_clean[[COL_HAPPY, COL_INFL]].corr().iloc[0,1]
corr

The Pearson correlation coefficient between happiness scores and headline inflation is approximately $r \approx -0.20$. This indicates a weak-to-moderate negative relationship, indicating that countries with higher inflation tend to report lower happiness levels, although inflation alone does not determine national well-being.

## Conclusion
This project analyzed the relationship between headline consumer price inflation and national happiness across countries from 2015 to 2023. Through exploratory data analysis, visualizations, grouped comparisons, and correlation analysis, the results consistently demonstrate a negative association between inflation and happiness.

Although the relationship is not strong enough to imply causality, the evidence suggests that countries experiencing higher inflation levels tend to report lower happiness on average. From a business and finance perspective, this finding is meaningful because macroeconomic stability influences consumer confidence, economic decision-making, and broader social outcomes.

Future extensions of this analysis could incorporate additional economic controls or examine regional heterogeneity. Nevertheless, the results presented here offer a clear and interpretable overview of how inflation relates to well-being using publicly available global data.