# DSAI Project (Happiness Predictor)

Dataset from Kaggle : **"World Happiness Report"** by Sustainable Development Solutions Network

Source 1: https://www.kaggle.com/datasets/unsdsn/world-happiness/data 

Source 2: https://worldhappiness.report/ (updated reports from Source 1)

---

## Essential Libraries
Let us begin by importing the essential Python Libraries.

> NumPy : Library for Numeric Computations in Python  
> Pandas : Library for Data Acquisition and Preparation  
> Matplotlib : Low-level library for Data Visualization  
> Seaborn : Higher-level library for Data Visualization

In [None]:
# Basic Libraries
import numpy as np
import pandas as pd
import seaborn as sb
import glob as gb
import matplotlib.pyplot as plt 
sb.set() 

## We proceeded to combine all the 3 years of data into 1 data sheet

In [None]:
data2017 = pd.read_csv('2017.csv')
data2018 = pd.read_csv('2018.csv')
data2019 = pd.read_csv('2019.csv')
combine = pd.concat([data2017,data2018,data2019])
combine

Unnamed: 0,Country,Happiness score,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption,Dystopia + residual
0,Afghanistan,3.794,0.401,0.582,0.181,0.106,0.312,0.061,2.151
1,Albania,4.644,0.996,0.804,0.731,0.381,0.201,0.040,1.490
2,Algeria,5.872,1.092,1.146,0.618,0.233,0.069,0.146,2.568
3,Argentina,6.599,1.185,1.440,0.695,0.495,0.109,0.060,2.614
4,Armenia,4.376,0.901,1.007,0.638,0.198,0.083,0.027,1.521
...,...,...,...,...,...,...,...,...,...
143,Rwanda,3.334,0.359,0.711,0.614,0.555,0.217,0.411,0.467
144,Tanzania,3.231,0.476,0.885,0.499,0.417,0.276,0.147,0.531
145,Afghanistan,3.203,0.350,0.517,0.361,0.000,0.158,0.025,1.793
146,Central African Republic,3.083,0.026,0.000,0.105,0.225,0.235,0.035,2.456


 ---
Description of the dataset, as available on World Happiness Report website, is as follows.

#### Note: 
The data for the 7 variables used in calculating Happiness Score are not reflective of their actual values. They are estimated based on GallUp's cantril ladder model to predict how much each variable contributes to life evaluation (Happiness Score)

> **Country** : Name  of each Country (runs from 1 to 152)  

> **Happiness Score** : The total score of the 

> **Explained by: GDP per Capita** : GDP per capita is in terms of Purchasing Power Parity (PPP) adjusted to constant 2011 international dollars, taken from the World Development Indicators (WDI) released by the World Bank on November 14, 2018.

> **Explained by: Social support** : Social support is the national average of the binary responses (either 0 or 1) to the Gallup World Poll (GWP) question “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?” 

> **Explained by: Healthy life expectancy** : The time series of healthy life expectancy at birth are constructed based on data
from the World Health Organization (WHO) Global Health Observatory data repository.

> **Explained by: Freedom to make life choices** :  Freedom to make life choices is the national average of binary responses to the GWP question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”   

> **Explained by: Generosity** : Generosity is the residual of regressing the national average of GWP responses to the question “Have you donated money to a charity in the past month?” on GDP per capita. 
 
> **Explained by: Perceptions of corruption** :  Perceptions of corruption are the average of binary answers to two GWP questions: “Is corruption widespread throughout the government or not?” and “Is corruption widespread within businesses or not?” Where data for government corruption are missing, the perception of business corruption is used as the overall corruption-perception measure.   

> **Dystopia + residual** : Dystopia is an imaginary country that has the world’s least-happy people. The purpose in establishing Dystopia is to have a benchmark against which all countries can be favorably compared (no country performs more poorly than Dystopia) in terms of each of the six key variables, thus allowing each sub-bar to be of positive (or zero, in six instances) width.

#### After merging the 3 dataframes into 1, we took the mean value across the 3 years of each unique country in order to get a better average of the data over a timeframe rather just 1 year by itself, reducing the likelihood of outliers.

In [None]:
average_combine = combine.groupby('Country').mean().round(3).reset_index()
average_combine

Unnamed: 0,Country,Happiness score,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption,Dystopia + residual
0,Afghanistan,3.543,0.361,0.545,0.266,0.064,0.220,0.041,2.047
1,Albania,4.650,0.953,0.823,0.798,0.394,0.176,0.033,1.471
2,Algeria,5.459,1.024,1.153,0.697,0.132,0.066,0.132,2.256
3,Argentina,6.358,1.117,1.447,0.773,0.512,0.079,0.055,2.376
4,Armenia,4.419,0.856,1.017,0.706,0.247,0.085,0.040,1.467
...,...,...,...,...,...,...,...,...,...
143,Venezuela,4.921,1.028,1.442,0.693,0.147,0.062,0.054,1.494
144,Vietnam,5.117,0.748,1.329,0.735,0.577,0.186,0.080,1.461
145,Yemen,3.443,0.440,1.057,0.372,0.212,0.098,0.066,1.197
146,Zambia,4.333,0.592,1.036,0.326,0.465,0.239,0.082,1.591


In [None]:
average_combine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 148 entries, 0 to 147
Data columns (total 9 columns):
 #   Column                                      Non-Null Count  Dtype  
---  ------                                      --------------  -----  
 0   Country                                     148 non-null    object 
 1   Happiness score                             148 non-null    float64
 2   Explained by: GDP per capita                148 non-null    float64
 3   Explained by: Social support                148 non-null    float64
 4   Explained by: Healthy life expectancy       148 non-null    float64
 5   Explained by: Freedom to make life choices  148 non-null    float64
 6   Explained by: Generosity                    148 non-null    float64
 7   Explained by: Perceptions of corruption     148 non-null    float64
 8   Dystopia + residual                         148 non-null    float64
dtypes: float64(8), object(1)
memory usage: 10.5+ KB


In [None]:
average_combine.describe()

Unnamed: 0,Happiness score,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption,Dystopia + residual
count,148.0,148.0,148.0,148.0,148.0,148.0,148.0,148.0
mean,5.393277,0.928838,1.208953,0.631797,0.422824,0.204365,0.116122,1.880236
std,1.121055,0.403418,0.297658,0.236071,0.147041,0.109942,0.097449,0.499656
min,2.953,0.008,0.0,0.045,0.027,0.0,0.002,0.284
25%,4.48775,0.61525,1.063,0.4525,0.32625,0.12475,0.05175,1.60325
50%,5.4385,1.007,1.2705,0.6915,0.4415,0.195,0.083,1.898
75%,6.159,1.25075,1.44525,0.7905,0.5285,0.269,0.13975,2.231
max,7.623,1.735,1.626,1.033,0.671,0.667,0.458,2.914


## Reading the HDI Value datasheet

In [None]:
HDI = pd.read_csv('HDI_table.csv')
HDI

Unnamed: 0,Country,HDI Value
0,Afghanistan,0.462
1,Albania,0.789
2,Algeria,0.745
3,Argentina,0.849
4,Armenia,0.786
...,...,...
143,Venezuela,0.699
144,Vietnam,0.726
145,Yemen,0.424
146,Zambia,0.569


## Merging the HDI Value column into our existing data sheet and output it into a final file

In [None]:
merged_df = pd.merge(average_combine, HDI, on='Country', how='inner')

# Write the updated DataFrame back to finalData.csv
merged_df.to_csv('finalData.csv')

## Reading the final datasheet with the 'HDI Value' column added

In [None]:
Updatereport = pd.read_csv('finalData.csv')
Updatereport

Unnamed: 0.1,Unnamed: 0,Country,Happiness score,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption,Dystopia + residual,HDI Value
0,0,Afghanistan,3.543,0.361,0.545,0.266,0.064,0.220,0.041,2.047,0.462
1,1,Albania,4.650,0.953,0.823,0.798,0.394,0.176,0.033,1.471,0.789
2,2,Algeria,5.459,1.024,1.153,0.697,0.132,0.066,0.132,2.256,0.745
3,3,Argentina,6.358,1.117,1.447,0.773,0.512,0.079,0.055,2.376,0.849
4,4,Armenia,4.419,0.856,1.017,0.706,0.247,0.085,0.040,1.467,0.786
...,...,...,...,...,...,...,...,...,...,...,...
143,143,Venezuela,4.921,1.028,1.442,0.693,0.147,0.062,0.054,1.494,0.699
144,144,Vietnam,5.117,0.748,1.329,0.735,0.577,0.186,0.080,1.461,0.726
145,145,Yemen,3.443,0.440,1.057,0.372,0.212,0.098,0.066,1.197,0.424
146,146,Zambia,4.333,0.592,1.036,0.326,0.465,0.239,0.082,1.591,0.569


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=0674b5d1-0671-4d0f-ad7c-f1e1b27d9175' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>