# World Happiness Project
The goal of this project is to look at the "World Happiness Report" datasets available on Kaggle and compare them to different factors such as Poverty Rate, Education, and Religions around the world. By analyzing the data, I hope to find if there are any significant correlations between the different datasets.

The World Happiness Reports via kaggle offer a unique insight and rating system for happiness based on multiple factors such as: GDP per capita, life expectancy, social support, freedom to make life choices, and so on. The general goal of these reports aim to depict and estimate the well-being of people across the world from all different backgrounds. 

More information on these datasets can be found at https://www.kaggle.com/datasets/mathurinache/world-happiness-report?select=2022.csv . 

## Introduction

My initial interest was sparked by the happiness dataset itself, as I was curious about what factors were used to measure "happiness". Beyond this initial curiousity, I began to think of other factors that could potentially correlate with this dataset and the happiness rating such as religion, education, and poverty/wealth. My goal is to analyze these different factors to show if they or how they may or may not correlate with the happiness datasets. 

# 1. Loading Data

In [44]:
import pandas as pd
import numpy as np
import matplotlib as plt
from pandas import DataFrame
import random

In [45]:
happy21_df = pd.read_csv("C:\\Users\\pwurt\\documents\\DA_pathway\\M3\\World_Happy\\World_Happiness_Report\\.venv\\happiness_report_22\\2021.csv")

happy21_df.head()

Unnamed: 0,Country name,Regional indicator,Ladder score,Standard error of ladder score,upperwhisker,lowerwhisker,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Ladder score in Dystopia,Explained by: Log GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption,Dystopia + residual
0,Finland,Western Europe,7.842,0.032,7.904,7.78,10.775,0.954,72.0,0.949,-0.098,0.186,2.43,1.446,1.106,0.741,0.691,0.124,0.481,3.253
1,Denmark,Western Europe,7.62,0.035,7.687,7.552,10.933,0.954,72.7,0.946,0.03,0.179,2.43,1.502,1.108,0.763,0.686,0.208,0.485,2.868
2,Switzerland,Western Europe,7.571,0.036,7.643,7.5,11.117,0.942,74.4,0.919,0.025,0.292,2.43,1.566,1.079,0.816,0.653,0.204,0.413,2.839
3,Iceland,Western Europe,7.554,0.059,7.67,7.438,10.878,0.983,73.0,0.955,0.16,0.673,2.43,1.482,1.172,0.772,0.698,0.293,0.17,2.967
4,Netherlands,Western Europe,7.464,0.027,7.518,7.41,10.932,0.942,72.4,0.913,0.175,0.338,2.43,1.501,1.079,0.753,0.647,0.302,0.384,2.798


* Cleaning Data

In [46]:
happy21_df.columns = happy21_df.columns.str.title()

happy21_df.head()

Unnamed: 0,Country Name,Regional Indicator,Ladder Score,Standard Error Of Ladder Score,Upperwhisker,Lowerwhisker,Logged Gdp Per Capita,Social Support,Healthy Life Expectancy,Freedom To Make Life Choices,Generosity,Perceptions Of Corruption,Ladder Score In Dystopia,Explained By: Log Gdp Per Capita,Explained By: Social Support,Explained By: Healthy Life Expectancy,Explained By: Freedom To Make Life Choices,Explained By: Generosity,Explained By: Perceptions Of Corruption,Dystopia + Residual
0,Finland,Western Europe,7.842,0.032,7.904,7.78,10.775,0.954,72.0,0.949,-0.098,0.186,2.43,1.446,1.106,0.741,0.691,0.124,0.481,3.253
1,Denmark,Western Europe,7.62,0.035,7.687,7.552,10.933,0.954,72.7,0.946,0.03,0.179,2.43,1.502,1.108,0.763,0.686,0.208,0.485,2.868
2,Switzerland,Western Europe,7.571,0.036,7.643,7.5,11.117,0.942,74.4,0.919,0.025,0.292,2.43,1.566,1.079,0.816,0.653,0.204,0.413,2.839
3,Iceland,Western Europe,7.554,0.059,7.67,7.438,10.878,0.983,73.0,0.955,0.16,0.673,2.43,1.482,1.172,0.772,0.698,0.293,0.17,2.967
4,Netherlands,Western Europe,7.464,0.027,7.518,7.41,10.932,0.942,72.4,0.913,0.175,0.338,2.43,1.501,1.079,0.753,0.647,0.302,0.384,2.798


In [47]:
print(happy21_df.isnull().any().any())

False


In [48]:
happy21_df = happy21_df.rename(columns={'Country Name': 'Country', 'Regional Indicator': 'Region', 'Ladder Score': 'Happiness Score 21'})

In [49]:
happy21_df.columns

Index(['Country', 'Region', 'Happiness Score 21',
       'Standard Error Of Ladder Score', 'Upperwhisker', 'Lowerwhisker',
       'Logged Gdp Per Capita', 'Social Support', 'Healthy Life Expectancy',
       'Freedom To Make Life Choices', 'Generosity',
       'Perceptions Of Corruption', 'Ladder Score In Dystopia',
       'Explained By: Log Gdp Per Capita', 'Explained By: Social Support',
       'Explained By: Healthy Life Expectancy',
       'Explained By: Freedom To Make Life Choices',
       'Explained By: Generosity', 'Explained By: Perceptions Of Corruption',
       'Dystopia + Residual'],
      dtype='object')

In [50]:
cols_to_drop21 = ['Standard Error Of Ladder Score', 'Upperwhisker', 'Lowerwhisker', 'Logged Gdp Per Capita', 'Generosity', 'Perceptions Of Corruption', 'Ladder Score In Dystopia', 'Explained By: Log Gdp Per Capita', 'Explained By: Social Support', 'Explained By: Healthy Life Expectancy', 'Explained By: Freedom To Make Life Choices', 'Explained By: Generosity', 'Explained By: Perceptions Of Corruption', 'Dystopia + Residual', ]

happy21_df = happy21_df.drop(columns = cols_to_drop21)

In [51]:
happy21_df.columns

Index(['Country', 'Region', 'Happiness Score 21', 'Social Support',
       'Healthy Life Expectancy', 'Freedom To Make Life Choices'],
      dtype='object')

In [52]:
happy21_df.tail()

Unnamed: 0,Country,Region,Happiness Score 21,Social Support,Healthy Life Expectancy,Freedom To Make Life Choices
144,Lesotho,Sub-Saharan Africa,3.512,0.787,48.7,0.715
145,Botswana,Sub-Saharan Africa,3.467,0.784,59.269,0.824
146,Rwanda,Sub-Saharan Africa,3.415,0.552,61.4,0.897
147,Zimbabwe,Sub-Saharan Africa,3.145,0.75,56.201,0.677
148,Afghanistan,South Asia,2.523,0.463,52.493,0.382


In [53]:
region_country = happy21_df.groupby('Region')['Country'].count().sort_values(ascending=True)

region_country

Region
North America and ANZ                  4
East Asia                              6
South Asia                             7
Southeast Asia                         9
Commonwealth of Independent States    12
Central and Eastern Europe            17
Middle East and North Africa          17
Latin America and Caribbean           20
Western Europe                        21
Sub-Saharan Africa                    36
Name: Country, dtype: int64

In [54]:
happy21_df.to_csv("happy21.csv", index=False)

* Follow link for Tableau bar graph below: "2021 Region Happiness Country Count"

https://public.tableau.com/views/2021RegionHappiness/2021RegionHappinessCountryCount?:language=en-US&publish=yes&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link

## 1a. Load different dataset "Happiness Rating for 2022"

In [55]:
happy_df22 = pd.read_csv("C:\\Users\\pwurt\\documents\\DA_pathway\\m3\\World_Happy\\World_Happiness_Report\\.venv\\happiness_report_22\\2022.csv")

happy_df22.head()

Unnamed: 0,RANK,Country,Happiness score,Whisker-high,Whisker-low,Dystopia (1.83) + residual,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,1,Finland,7821,7886,7756,2518,1892,1258,775,736,109,534
1,2,Denmark,7636,7710,7563,2226,1953,1243,777,719,188,532
2,3,Iceland,7557,7651,7464,2320,1936,1320,803,718,270,191
3,4,Switzerland,7512,7586,7437,2153,2026,1226,822,677,147,461
4,5,Netherlands,7415,7471,7359,2137,1945,1206,787,651,271,419


In [56]:
happy_df22.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 147 entries, 0 to 146
Data columns (total 12 columns):
 #   Column                                      Non-Null Count  Dtype 
---  ------                                      --------------  ----- 
 0   RANK                                        147 non-null    int64 
 1   Country                                     147 non-null    object
 2   Happiness score                             146 non-null    object
 3   Whisker-high                                146 non-null    object
 4   Whisker-low                                 146 non-null    object
 5   Dystopia (1.83) + residual                  146 non-null    object
 6   Explained by: GDP per capita                146 non-null    object
 7   Explained by: Social support                146 non-null    object
 8   Explained by: Healthy life expectancy       146 non-null    object
 9   Explained by: Freedom to make life choices  146 non-null    object
 10  Explained by: Generosity  

In [57]:
happy_df22.tail()

Unnamed: 0,RANK,Country,Happiness score,Whisker-high,Whisker-low,Dystopia (1.83) + residual,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
142,143,Rwanda*,3268.0,3462.0,3074.0,536.0,785.0,133.0,462.0,621.0,187.0,544.0
143,144,Zimbabwe,2995.0,3110.0,2880.0,548.0,947.0,690.0,270.0,329.0,106.0,105.0
144,145,Lebanon,2955.0,3049.0,2862.0,216.0,1392.0,498.0,631.0,103.0,82.0,34.0
145,146,Afghanistan,2404.0,2469.0,2339.0,1263.0,758.0,0.0,289.0,0.0,89.0,5.0
146,147,xx,,,,,,,,,,


* Cleaning Data

In [58]:
happy_df22.columns = (happy_df22.columns
                      .str.strip()
                      .str.replace('-', ' ')
                      .str.replace('*', '')
)
happy_df22.head()

Unnamed: 0,RANK,Country,Happiness score,Whisker high,Whisker low,Dystopia (1.83) + residual,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,1,Finland,7821,7886,7756,2518,1892,1258,775,736,109,534
1,2,Denmark,7636,7710,7563,2226,1953,1243,777,719,188,532
2,3,Iceland,7557,7651,7464,2320,1936,1320,803,718,270,191
3,4,Switzerland,7512,7586,7437,2153,2026,1226,822,677,147,461
4,5,Netherlands,7415,7471,7359,2137,1945,1206,787,651,271,419


In [59]:
cols_to_drop = ["Dystopia (1.83) + residual", "Whisker high", "Whisker low", "Explained by: GDP per capita", "Explained by: Generosity", "Explained by: Perceptions of corruption"]

happy_df22 = happy_df22.drop(columns = cols_to_drop)

happy_df22.head()

Unnamed: 0,RANK,Country,Happiness score,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices
0,1,Finland,7821,1258,775,736
1,2,Denmark,7636,1243,777,719
2,3,Iceland,7557,1320,803,718
3,4,Switzerland,7512,1226,822,677
4,5,Netherlands,7415,1206,787,651


In [60]:
print(happy_df22.isnull().any().any())

True


In [61]:
happy_df22.isnull().sum()

RANK                                          0
Country                                       0
Happiness score                               1
Explained by: Social support                  1
Explained by: Healthy life expectancy         1
Explained by: Freedom to make life choices    1
dtype: int64

In [62]:
happy22_clean = happy_df22.dropna()

print(happy22_clean.isnull().any().any())

False


In [63]:
happy22_clean.columns

Index(['RANK', 'Country', 'Happiness score', 'Explained by: Social support',
       'Explained by: Healthy life expectancy',
       'Explained by: Freedom to make life choices'],
      dtype='object')

In [64]:
happy22_clean['Happiness score'] = happy22_clean['Happiness score'].str.replace(',', '.', regex=False)

happy22_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  happy22_clean['Happiness score'] = happy22_clean['Happiness score'].str.replace(',', '.', regex=False)


Unnamed: 0,RANK,Country,Happiness score,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices
0,1,Finland,7.821,1258,775,736
1,2,Denmark,7.636,1243,777,719
2,3,Iceland,7.557,1320,803,718
3,4,Switzerland,7.512,1226,822,677
4,5,Netherlands,7.415,1206,787,651


In [23]:
happy22_clean['Country'] = happy22_clean['Country'].str.replace('*', '', regex=False)

happy22_clean.tail(15)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  happy22_clean['Country'] = happy22_clean['Country'].str.replace('*', '', regex=False)


Unnamed: 0,RANK,Country,Happiness score,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices
131,132,Yemen,4.197,1043,384,330
132,133,Mauritania,4.153,865,450,304
133,134,Jordan,4.152,724,675,476
134,135,Togo,4.112,322,360,292
135,136,India,3.777,376,471,647
136,137,Zambia,3.76,577,306,525
137,138,Malawi,3.75,279,388,477
138,139,Tanzania,3.702,597,425,578
139,140,Sierra Leone,3.574,416,273,387
140,141,Lesotho,3.512,848,0,419


In [24]:
edit_cols = {
    'Explained by: GDP per capita': {',': '.'},
    'Explained by: Social support': {',': '.'},
    'Explained by: Healthy life expectancy': {',': '.'},
    'Explained by: Freedom to make life choices': {',':'.'},
    'Explained by: Generosity': {',':'.'},
    'Explained by: Perceptions of corruption': {',':'.'}
}

happy22_clean = happy22_clean.replace(edit_cols, regex=True)

happy22_clean.tail()

Unnamed: 0,RANK,Country,Happiness score,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices
141,142,Botswana,3.471,0.815,0.28,0.571
142,143,Rwanda,3.268,0.133,0.462,0.621
143,144,Zimbabwe,2.995,0.69,0.27,0.329
144,145,Lebanon,2.955,0.498,0.631,0.103
145,146,Afghanistan,2.404,0.0,0.289,0.0


In [25]:
happy22_clean["Happiness score"] = happy22_clean["Happiness score"].astype(float)

In [26]:
happy22_clean.dtypes

RANK                                            int64
Country                                        object
Happiness score                               float64
Explained by: Social support                   object
Explained by: Healthy life expectancy          object
Explained by: Freedom to make life choices     object
dtype: object

## 2. Combining both datasets: Happiness Rating for 2021 and 2022

In [27]:
happy_merged = pd.merge(happy21_df, happy22_clean, how="outer", on=["Country"])

In [28]:
happy_merged.iloc[0:20]

Unnamed: 0,Country,Region,Happiness Score 21,Social Support,Healthy Life Expectancy,Freedom To Make Life Choices,RANK,Happiness score,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices
0,Afghanistan,South Asia,2.523,0.463,52.493,0.382,146.0,2.404,0.0,0.289,0.0
1,Albania,Central and Eastern Europe,5.117,0.697,68.999,0.785,90.0,5.199,0.646,0.719,0.511
2,Algeria,Middle East and North Africa,4.887,0.802,66.005,0.48,96.0,5.122,0.97,0.643,0.146
3,Argentina,Latin America and Caribbean,5.929,0.898,69.0,0.828,57.0,5.967,1.102,0.662,0.555
4,Armenia,Commonwealth of Independent States,5.283,0.799,67.055,0.825,82.0,5.399,0.82,0.668,0.558
5,Australia,North America and ANZ,7.183,0.94,73.9,0.914,12.0,7.162,1.203,0.772,0.676
6,Austria,Western Europe,7.268,0.934,73.3,0.908,11.0,7.163,1.165,0.774,0.623
7,Azerbaijan,Commonwealth of Independent States,5.171,0.836,65.656,0.814,92.0,5.173,1.093,0.56,0.601
8,Bahrain,Middle East and North Africa,6.647,0.862,69.495,0.925,21.0,6.647,1.029,0.625,0.693
9,Bangladesh,South Asia,5.025,0.693,64.8,0.877,94.0,5.155,0.614,0.581,0.622


* Cleaning Data

In [29]:
happy_merged.index = range(1, len(happy_merged)+1)

happy_merged.head()

Unnamed: 0,Country,Region,Happiness Score 21,Social Support,Healthy Life Expectancy,Freedom To Make Life Choices,RANK,Happiness score,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices
1,Afghanistan,South Asia,2.523,0.463,52.493,0.382,146.0,2.404,0.0,0.289,0.0
2,Albania,Central and Eastern Europe,5.117,0.697,68.999,0.785,90.0,5.199,0.646,0.719,0.511
3,Algeria,Middle East and North Africa,4.887,0.802,66.005,0.48,96.0,5.122,0.97,0.643,0.146
4,Argentina,Latin America and Caribbean,5.929,0.898,69.0,0.828,57.0,5.967,1.102,0.662,0.555
5,Armenia,Commonwealth of Independent States,5.283,0.799,67.055,0.825,82.0,5.399,0.82,0.668,0.558


In [30]:
print(happy_merged.isnull().any().any())

True


In [31]:
happy_merged.shape

(152, 11)

In [32]:
happy_merged.isnull().sum()

Country                                       0
Region                                        3
Happiness Score 21                            3
Social Support                                3
Healthy Life Expectancy                       3
Freedom To Make Life Choices                  3
RANK                                          6
Happiness score                               6
Explained by: Social support                  6
Explained by: Healthy life expectancy         6
Explained by: Freedom to make life choices    6
dtype: int64

In [33]:
happy_merged.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 152 entries, 1 to 152
Data columns (total 11 columns):
 #   Column                                      Non-Null Count  Dtype  
---  ------                                      --------------  -----  
 0   Country                                     152 non-null    object 
 1   Region                                      149 non-null    object 
 2   Happiness Score 21                          149 non-null    float64
 3   Social Support                              149 non-null    float64
 4   Healthy Life Expectancy                     149 non-null    float64
 5   Freedom To Make Life Choices                149 non-null    float64
 6   RANK                                        146 non-null    float64
 7   Happiness score                             146 non-null    float64
 8   Explained by: Social support                146 non-null    object 
 9   Explained by: Healthy life expectancy       146 non-null    object 
 10  Explained by: 

In [34]:
happy_merged = happy_merged.drop("RANK", axis= 1)

happy_merged.columns

Index(['Country', 'Region', 'Happiness Score 21', 'Social Support',
       'Healthy Life Expectancy', 'Freedom To Make Life Choices',
       'Happiness score', 'Explained by: Social support',
       'Explained by: Healthy life expectancy',
       'Explained by: Freedom to make life choices'],
      dtype='object')

In [35]:
print(happy_merged[["Happiness Score 21", "Happiness score"]].dtypes)

Happiness Score 21    float64
Happiness score       float64
dtype: object


In [36]:
happy_merged['Happiness Rating Avg 21-22'] = happy_merged[['Happiness Score 21','Happiness score']].mean(axis=1)

happy_merged.tail(10)

Unnamed: 0,Country,Region,Happiness Score 21,Social Support,Healthy Life Expectancy,Freedom To Make Life Choices,Happiness score,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Happiness Rating Avg 21-22
143,United Arab Emirates,Middle East and North Africa,6.561,0.844,67.333,0.932,6.576,0.98,0.633,0.702,6.5685
144,United Kingdom,Western Europe,7.064,0.934,72.5,0.859,6.943,1.143,0.75,0.597,7.0035
145,United States,North America and ANZ,6.951,0.92,68.2,0.837,6.977,1.182,0.628,0.574,6.964
146,Uruguay,Latin America and Caribbean,6.431,0.925,69.1,0.896,6.474,1.18,0.672,0.665,6.4525
147,Uzbekistan,Commonwealth of Independent States,6.179,0.918,65.255,0.97,6.063,1.092,0.6,0.716,6.121
148,Venezuela,Latin America and Caribbean,4.892,0.861,66.7,0.615,4.925,0.968,0.578,0.283,4.9085
149,Vietnam,Southeast Asia,5.411,0.85,68.034,0.94,5.485,0.932,0.611,0.707,5.448
150,Yemen,Middle East and North Africa,3.658,0.832,57.122,0.602,4.197,1.043,0.384,0.33,3.9275
151,Zambia,Sub-Saharan Africa,4.073,0.708,55.809,0.782,3.76,0.577,0.306,0.525,3.9165
152,Zimbabwe,Sub-Saharan Africa,3.145,0.75,56.201,0.677,2.995,0.69,0.27,0.329,3.07


In [37]:
print(happy_merged.isnull().any().any())

True


In [38]:
happy_merged_clean = happy_merged.dropna()

print(happy_merged_clean.isnull().any().any())

False


In [39]:
happy_merged_clean.shape

(143, 11)

In [40]:
happy_merged_clean.columns

Index(['Country', 'Region', 'Happiness Score 21', 'Social Support',
       'Healthy Life Expectancy', 'Freedom To Make Life Choices',
       'Happiness score', 'Explained by: Social support',
       'Explained by: Healthy life expectancy',
       'Explained by: Freedom to make life choices',
       'Happiness Rating Avg 21-22'],
      dtype='object')

In [41]:
happy_merged_clean.to_csv("happymerged.csv", index=False)

### Tableau Visuals for Happiness Ratings datasets 2021-2022

* Global Map Viewing Happiness Rating Avg. 21-22

https://public.tableau.com/views/GlobalHappinessRatingAvg_21-22MapView/GlobalHappinessRatingAvg_21-22?:language=en-US&publish=yes&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link

* Region/Country Happiness Avg. 21-22 Full Viewing

https://public.tableau.com/views/GlobalHappinessRatingAvg_21-22MapView/RegionCountryHappinessRatingAvg_21-22?:language=en-US&publish=yes&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link