In [1]:
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px
import matplotlib.pyplot as plt
import numpy as np

20-06-2023

Information Visualization: group project Draft 

Group: B4

| Student name | student number | 
| --- | --- | 
| Evan Lont | 14729210 | 
| Joep Haanen | 14657368 |
| Lotte te Kulve | 14648911 | 
| Robin Kuipers | 14273810 |

## Introduction

In the last few years, a lot has happened in the world. From the end of 2019 to the first half of 2022, the world went through a global pandemic. During the pandemic, the inflation rates skyrocketed to record-breaking numbers. The inflation had not been that high in almost 40 years (OECD Economic Outlook, 2023). Additionally, at the beginning of 2022, a war between Russia and Ukraine broke out. All of these events could have a significant influence on world happiness scores. However, strangely enough, the world happiness scores barely changed during these years (World Happiness Report Data Dashboard | The World Happiness Report, z.d.).

The analysis will focus on the correlation between the underlying factors that make up the world happiness score, because of the little change in the score itself, and economic factors.

//// **Hier moet iets komen over waarom wij inflatie hebben gekozen om op te focussen** !!!

The "World Happiness Report" dataset and relevant economic indicators such as GDP per capita, inflation rates, and consumer price index (CPI) will be used to investigate the relationship between subjective well-being and economic stability. Through data analysis, the aim is to determine whether countries with higher economic indicators tend to exhibit higher happiness scores. //**misschien goed om te kijken of dit echt nog zo is** // This study aims to contribute to understanding how economic factors influence levels of happiness at both individual and societal levels.

#### Perspectives
1. Consumer Price Index (CPI) and Happiness: We want to examine the impact of
CPI on happiness. We will argue that a higher CPI, reflecting increased prices of
goods and services, could potentially reduce individuals' satisfaction with their
standard of living, impacting overall happiness levels.

2. Inflation Rates and Happiness: We want to investigate the relationship between
inflation rates and happiness. We will argue that high inflation rates may lead to
increased uncertainty, economic instability, and decreased purchasing power,
potentially negatively affecting happiness levels in a country.

3. GDP per Capita and Happiness: We will explore the correlation between GDP per
capita and happiness scores. We will argue that countries with higher GDP per
capita may have better economic opportunities, access to resources, and quality of
life, which could positively impact happiness levels.

## Datasets and preprocessing
For the first dataset, the World Happiness Report Dataset from the Sustainable Development Solutions Network, powered by the Gallup World Poll data, has been chosen. As for the second dataset, an inflation dataset from OECD data that covers at least, ten years up until 2022 has been identified to meet our requirements. Upon analyzing the two datasets, it became clear that the datasets needed some filtering. Additionally, the inflation dataset offers the potential for intriguing visualizations due to the inclusion of inflation trends before, during, and to some extent, after the COVID-19 pandemic.

### Dataset 1: World happiness report
**Source:** https://worldhappiness.report/ed/2020/#appendices-and-data

**Number of records:** `20`

**Number of variables:** `10`

**Description:** As part of the data analysis, two datasets were utilized from the World Happiness Report for the years 2020 and 2022. The WHR is an annual publication made by the Sustainable Development Solutions Network, and relies on data collected by the Gallup World Poll. The report is written by a group of independent experts, each with expertise in different variables that the WHR measures. It covers these variables over more than 150 countries worldwide, of which ten specific countries were chosen to analyze. The primary objective of the yearly report is to reflect a worldwide demand for more attention towards happiness by inspiring countries' governments to take on a better government policy.

| Variable | Datatype | Measurement scale |
| --- | --- | --- |
| country name | Categorical | Nominal |
| Regional indicator | Categorical | Nominal |
| Happiness score | Continuous | Interval |
| upperwhisker | Continuous | Interval |
| lowerwhisker | Continuous | Interval |
| Logged GDP per capita | Continuous | Ratio |
| Healthy life expectancy | Continuous | Interval |
| Generosity | Continuous | Interval |
| Perceptions of corruption | Continuous | Interval |
| Explained by: Log GDP per capita | Continuous | Ratio |
| Explained by: Healthy life expectancy | Continuous | Ratio |
| Explained by: Freedom to make life choices | Continuous | Ratio |
| Explained by: Generosity | Continuous | Ratio |
| Explained by: Social support | Continuous | Ratio |
| Explained by: Perceptions of corruption | Continuous | Ratio |
| Dystopia + residual | Continuous | Interval |


#### Preprocessing
Find the data cleaning code in the ....///// verwerken

For each variable we asked ourselves the following questions:

- What are the variables in the data?
- Do we need all the data points and variables?
- Are there data that are out of scope?
- Are there privacy or ethical issues in the data?
- Is it practical to process the variable that we want?
- To prevent the dataset to be too large, the focus of the project will lay on the data for the years 2020 and 2022, because some of the datasets values varied a lot in between these years. Another reason for the selection of only two different years is that we want to find out how much the data can differ in such a small timeframe. The analysis will use the variables of our ten chosen countries in order to make findings about the relationship between the happiness score and several economic factors. These variables include ones found inside the WHR, such as GDP per capita and generosity, but also external variables such as the yearly inflation.

Based on the requirements for the data, the following actions were taken:

- The removal of specific columns from the world happiness dataset, including:
    - Regional indicator
    - Upperwhisker
    - Lowerwhisker
- Rearranging the columns to facilitate clear identification of the country and year under consideration.
- Selecting and retaining only the countries necessary for our analysis, while removing the rest. The final selection includes: 'Switzerland', 'Netherlands', 'New Zealand', 'Canada','Saudi Arabia', 'Chile', 'Portugal', 'China', 'South Africa', 'India'. We chose these countries because they're located in different regions and their economic wellbeing differs a lot.

In [2]:
happiness_2020 = pd.read_csv('happiness_2020-def.csv')
pd.DataFrame.head(happiness_2020, n=5)

Unnamed: 0.1,Unnamed: 0,Country name,Happiness score,Dystopia + residual,Explained by: Log GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,2,CHE,7.5599,2.350267,1.390774,1.472403,1.040533,0.628954,0.269056,0.407946
1,5,NLD,7.4489,2.352117,1.338946,1.463646,0.975675,0.613626,0.336318,0.36857
2,7,NZL,7.2996,2.128108,1.242318,1.487218,1.008138,0.64679,0.325726,0.461268
3,10,CAN,7.2321,2.195269,1.301648,1.435392,1.022502,0.644028,0.281529,0.351702
4,26,SAU,6.4065,2.203119,1.334329,1.30995,0.759818,0.548477,0.087441,0.163322


In [3]:
happiness_2022 = pd.read_csv('happiness_2022-def.csv')
pd.DataFrame.head(happiness_2022, n=10)

Unnamed: 0.1,Unnamed: 0,Country,Happiness score,Dystopia (1.83) + residual,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,3,CHE,7.512,2.153,2.026,1.226,0.822,0.677,0.147,0.461
1,4,NLD,7.415,2.137,1.945,1.206,0.787,0.651,0.271,0.419
2,9,NZL,7.2,1.954,1.852,1.235,0.752,0.68,0.245,0.483
3,14,CAN,7.025,1.924,1.886,1.188,0.783,0.659,0.217,0.368
4,24,SAU,6.523,2.075,1.87,1.092,0.577,0.651,0.078,0.18
5,43,CHL,6.172,2.04,1.651,1.08,0.748,0.46,0.124,0.069
6,55,PRT,6.016,1.691,1.76,1.078,0.777,0.655,0.016,0.039
7,71,CHN,5.585,1.516,1.508,0.958,0.705,0.656,0.099,0.142
8,90,ZAF,5.194,1.742,1.425,1.088,0.361,0.442,0.089,0.046
9,135,IND,3.777,0.795,1.167,0.376,0.471,0.647,0.198,0.123



### Dataset 2: Inflation (CPI)

**Source:** https://data.oecd.org/price/inflation-cpi.htm

**Number of records:** `490`

**Number of variables:** `8`

**Description:** The "Inflation (CPI)" dataset from the OECD contains information on consumer price index (CPI) and inflation rates across various countries. It provides a comprehensive view of the changes in price levels for goods and services over time, allowing for the analysis and comparison of inflation rates among different economies. The dataset includes indicators such as headline inflation, core inflation, and various sub-components of CPI. It serves as a valuable resource for understanding and monitoring inflation trends at a global level.

| Variable | Datatype | Measurement scale |
| --- | --- | --- |
| Location | Categorical | Nominal |
| Regional indicator | Categorical | Nominal |
| Subject | categorical | Nominal |
| Measure | categorical | Interval |
| Frequency | Continuous | Interval |
| Time | Continuous | Interval |
| Value | Continuous | Interval |
| Flag code | Categorical | Nominal |


#### Preprocessing

- Country names were changed to abbreviations.

- Both datasets contained information per country, but the inflation dataset used abbreviations as values while the happiness dataset used full country names. To facilitate data comparison for specific countries, we needed to align the values either to abbreviations or full country names. We decided to use abbreviations for consistency.
    

In [4]:
inflation = pd.read_csv('inflation.csv')
pd.DataFrame.head(inflation, n=5)

Unnamed: 0,LOCATION,INDICATOR,SUBJECT,MEASURE,FREQUENCY,TIME,Value,Flag Codes
0,AUS,CPI,FOOD,AGRWTH,A,2018,0.670376,
1,AUS,CPI,FOOD,AGRWTH,A,2019,4.482894,
2,AUS,CPI,FOOD,AGRWTH,A,2020,9.320118,
3,AUS,CPI,FOOD,AGRWTH,A,2021,7.909739,
4,AUS,CPI,FOOD,AGRWTH,A,2022,8.1667,


In [5]:
inflation = pd.read_csv('inflation.csv')
happiness_2020 = pd.read_csv('happiness_2020-def.csv')
happiness_2022 = pd.read_csv('happiness_2022-def.csv')
inflation.drop('Flag Codes', axis=1, inplace=True)
inflation.drop('FREQUENCY', axis=1, inplace=True)

In [6]:
# Specify the desired column order
column_order = ['Country name', 'Happiness score', 'Dystopia + residual', 'Explained by: Log GDP per capita', 'Explained by: Social support', 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices', 'Explained by: Generosity', 'Explained by: Perceptions of corruption']

# Reorder the columns
happiness_2020 = happiness_2020[column_order]
happiness_2022 = happiness_2022[column_order]

KeyError: "['Country name', 'Dystopia + residual', 'Explained by: Log GDP per capita'] not in index"

In [19]:
# list all unique country names
unique_countries = pd.unique(happiness_2020['Country name'])

# list all unique abbreviations
unique_abbr = pd.unique(inflation['LOCATION'])

# map all unique country names in a dictionary with abbreviations as values
country_mapping = {
    "Switzerland": "CHE",
    "Netherlands": "NLD",
    "New Zealand": "NZL",
    "Canada": "CAN",
    "Saudi Arabia": "SAU",
    "Chile": "CHL",
    "Japan": "JPN",
    "Portugal": "PRT",
    "China": "CHN",
    "South Africa": "ZAF",
    "India": "IND"
}

# map the dictionary to the values of 'country name' in the happiness dataset
happiness_2020['Country name'] = happiness_2020['Country name'].map(country_mapping)
happiness_2020.head()

# export to csv
#happiness_2020.to_csv('happiness_2020.csv', index=False)

Unnamed: 0.1,Unnamed: 0,Country name,Happiness score,Dystopia + residual,Explained by: Log GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,2,,7.5599,2.350267,1.390774,1.472403,1.040533,0.628954,0.269056,0.407946
1,5,,7.4489,2.352117,1.338946,1.463646,0.975675,0.613626,0.336318,0.36857
2,7,,7.2996,2.128108,1.242318,1.487218,1.008138,0.64679,0.325726,0.461268
3,10,,7.2321,2.195269,1.301648,1.435392,1.022502,0.644028,0.281529,0.351702
4,26,,6.4065,2.203119,1.334329,1.30995,0.759818,0.548477,0.087441,0.163322


In [27]:
inflation2020 = inflation[inflation['TIME'] == 2020]
inflation2022 = inflation[inflation['TIME'] == 2022]

# Perspective 1: Inflation has a minimal impact on happiness.
While inflation is an important economic indicator, its influence on happiness might be overshadowed by other factors. This perspective suggests that while economic stability is crucial, it may not be the sole determinant of happiness. To see if this perspective is valid, three visualisations have been created.

The first visualisation illustrates the increase the inflation between the years 2020 and 2022 per selected country. The lines in the graph represent the increase in inflation for the different countries. In the visualisation can be seen how for every counrty the inflation has increased in 2022. The graph also shows how high the inflation rates are in comparison with the inflation in 2015. The year 2015 got the value of 100, so an inflation rate of 130 means that the inflation got 30% higher in that year in comparison to 2015.

In [43]:
{
    "tags": [
        "hide-input",
    ]
}
# Define the colors
colors = ['rgb(102,194,165)', 'rgb(252,141,98)', 'rgb(141,160,203)']

# Create the layout
layout = go.Layout(
    xaxis=go.layout.XAxis(
        type='category',  # The x-axis type is categorical
        tickvals=['2020', '2022'],  # Set custom tick values
        ticktext=['2020', '2022'],  # Set custom tick labels
    ),
#     yaxis=go.layout.YAxis(
#         tickformat="%",  # Format the y-axis labels as percentage
#     ),
    width=600,
    height=600
)

# Define the data
data = []
for country in inflation2020['LOCATION'].unique():
    # Extract the data for each country
    country_data_2020 = inflation2020[inflation2020['LOCATION'] == country]
    country_data_2022 = inflation2022[inflation2022['LOCATION'] == country]
    
    # Create a trace for each country
    trace = go.Scatter(
        x=['2020', '2022'],
        y=[country_data_2020['Value'].iloc[0], country_data_2022['Value'].iloc[0]],
        mode='lines+markers',
        name=country,
#         marker=dict(color=colors)  # Randomly assign a color from the predefined colors
    )
    
    data.append(trace)

# Create the figure with data and layout
fig = go.Figure(data=data, layout=layout)

# Update the layout and labels
fig.update_layout(
    title="Inflation Rates by Country with the year 2015 as inflation rate 100",
    xaxis_title="Year",
    yaxis_title="Inflation Rate",
)

# Display the graph
fig.show()

From this visualisation can be concluded that for every chosen country the inflation has increased in 2022 in perspective to 2020. With that said, let's start to look at the world happiness rates in 2020 and 2022.

The second visualisation represents the happiness rate per country in 2020 and in 2022. For every country two bars have been plotted to represent the happiness rate in the two years. The orange bars represent the year 2020 and the blue represent the year 2022.

In [44]:

{
    "tags": [
        "hide-input",
    ]
}
# Define the colors (ChatGPT)
colors = ['rgb(102,194,165)', 'rgb(252,141,98)', 'rgb(141,160,203)']

# creeer de layout
layout = go.Layout(
    xaxis=go.layout.XAxis(
        type='category' # het type van de X as is categorisch
    ),
#     yaxis = go.layout.YAxis(
#         tickformat = ',.0%', # toon als percentage
#     ),
    height=400
)

year2020 = go.Bar(
    x=happiness_2020['Country name'],
    y=happiness_2020['Happiness score'], # by year 2020
    name='2020',
    marker=dict(color=colors[1]) #ChatGPT 
)
year2022 = go.Bar(
    x=happiness_2022['Country'],
    y=happiness_2022['Happiness score'],
    name='2022',
    marker=dict(color=colors[2]) #ChatGPT
)

# creeer het figuur
# data = [happy2020, year2020, happy2022, year2022]
data = [year2020, year2022]
fig = go.Figure(data=data, layout=layout)

# labels
fig.update_layout(
    title="World happiness rate per country in 2020 vs 2022",
    xaxis_title="Country",
    yaxis_title="Happiness Rate")
    
fig.show()


As shown in the visualisation above, the happiness rate per country in 2022 did not significantly change compared to the happiness rate in 2020. Because of this, the aim of this perspective is to explore the underlying factors contributing to the happiness rate and assess whether their distribution varied between the two years. The third visualisation has been made for this purpose.

The third visualisation illustrates the distribution of the underlying factors which make up the happiness score per year. The mean of every column was calculated to create an average distribution per year. With this visualisation can be analysed how the distribution of the happiness rate factores change when the inflation gets higher. The dropdown can be used to switch between the two years.

In [45]:
import dash
from dash import dcc
from dash import html

{
    "tags": [
        "hide-input",
    ]
}

# Load the datasets
df1 = pd.read_csv('happiness_2020-def.csv')
df2 = pd.read_csv('happiness_2022-def.csv')

# Initialize the Dash app
app = dash.Dash(__name__)

# Define the layout of the app
app.layout = html.Div([
    dcc.Dropdown(
        id='dataset-dropdown',
        options=[
            {'label': 'Happiness 2020', 'value': 'df1'},
            {'label': 'Happiness 2022', 'value': 'df2'}
        ],
        value='df1',
    ),
    html.H2(id='chart-title'),
    dcc.Graph(id='pie-chart')
])

# Define the callback function to update the pie chart
@app.callback(
    [dash.dependencies.Output('pie-chart', 'figure'),
     dash.dependencies.Output('chart-title', 'children')],
    [dash.dependencies.Input('dataset-dropdown', 'value')]
)
    
    
def update_pie_chart(dataset):
    # Determine the selected dataframe based on the dropdown value
    if dataset == 'df1':
        df = df1
        dataset_name = 'Happiness 2020'
    else:
        df = df2
        dataset_name = 'Happiness 2022'
    
    # Calculate the mean of the columns for the selected dataframe
    mean_values = df.iloc[:, -7:].mean(axis=0)
    labels = mean_values.index
    values = mean_values.values
    
    # Create the pie chart figure
    fig = px.pie(values=values, names=labels, hole=0.5)
    
    # Set the chart title
    title = f"Distribution of each happiness factor - {dataset_name}"
    
    return fig, title

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True)

From visualisation above can be concluded that almost every factor of the world happiness rate decreased a little in their influence, while GDP per capita increased 9% in their influence. Because of this, the world happiness rate didn't significantly change.

## Perspective 2

**Overall happiness rates will decrease when the inflation gets higher and the social and health factors will play a bigger role in the happiness rates.** 
Economic well-being and happiness are positively correlated. By examining the relationship between inflation and happiness scores, we can observe that countries experiencing lower inflation rates tend to have higher happiness scores. This suggests that maintaining low inflation can contribute to the overall well-being and happiness of a population.


First, we will distribute all selected  countries into three categories based on their inflation rate: high, medium and low using the `.cut` function. With the cut function, we specify three equal-sized bins with all the different inflation rates to see the distribution of high, medium and low inflation. We will do the same with happiness scores.

In [33]:
original_data20 = px.histogram(happiness_2020, x='Happiness score', title='Distribution of happiness scores in 2020')
original_data22 = px.histogram(happiness_2022, x='Happiness score', title='Distribution of happiness scores in 2022')
original_data20.show()
original_data22.show()

In [46]:
{
    "tags": [
        "hide-input",
    ]
}

inflation_tot = pd.read_csv('inflation_tot.csv')
inflation2020 = inflation_tot[inflation_tot['TIME'] == 2020]
inflation2022 = inflation_tot[inflation_tot['TIME'] == 2022]

In [47]:
{
    "tags": [
        "hide-input",
    ]
}

inflation_original20 = px.histogram(inflation2020, x='Value', title='Distribution of inflation rates in 2020')
inflation_original22 = px.histogram(inflation2022, x='Value', title='Distribution of inflation rates in 2022')

# Cut
inflation2020['cut'] = pd.cut(inflation2020['Value'], bins=3, right=True, labels=['Low', 'Medium', 'High'])
fig_cut20 = px.histogram(inflation2020, x="cut", title='Distribution of inflation 2020')

# Cut
inflation2022['cut'] = pd.cut(inflation2022['Value'], bins=3, right=True, labels=['Low', 'Medium', 'High'])
fig_cut22 = px.histogram(inflation2022, x="cut", title='Distribution of inflation in 2022')

inflation_original20.show()
inflation_original22.show()
fig_cut20.show()
fig_cut22.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [36]:
print(inflation2020[inflation2020['cut']== 'Low'])
print(inflation2020[inflation2020['cut']== 'High'])

    Unnamed: 0 LOCATION INDICATOR SUBJECT  MEASURE  TIME     Value  cut
0       146211      CAN       CPI     TOT  IDX2015  2020  108.2104  Low
2       149430      NLD       CPI     TOT  IDX2015  2020  107.5100  Low
4       149731      NZL       CPI     TOT  IDX2015  2020  107.6488  Low
6       150321      PRT       CPI     TOT  IDX2015  2020  103.3332  Low
8       151167      CHE       CPI     TOT  IDX2015  2020  100.6647  Low
16      153215      SAU       CPI     TOT  IDX2015  2020  105.0286  Low
    Unnamed: 0 LOCATION INDICATOR SUBJECT  MEASURE  TIME     Value   cut
14      152622      IND       CPI     TOT  IDX2015  2020  128.1744  High
18      153475      ZAF       CPI     TOT  IDX2015  2020  125.9030  High


### Low and high inflation in 2020
Countries that fell into the category of low inflation in 2020 were Canada, The Netherlands, New Zealand, Portugal, Swiss and Saudi Arabia. Countries that fell into the category of high inflation in 2020 were India and South Africa.

Let's take a look at these categories in 2022:

In [37]:
print(inflation2022[inflation2022['cut']== 'Low'])
print(inflation2022[inflation2022['cut']== 'High'])

    Unnamed: 0 LOCATION INDICATOR SUBJECT  MEASURE  TIME     Value  cut
7       150323      PRT       CPI     TOT  IDX2015  2022  112.8373  Low
9       151169      CHE       CPI     TOT  IDX2015  2022  104.1208  Low
13      152194      CHN       CPI     TOT  IDX2015  2022  114.7902  Low
17      153217      SAU       CPI     TOT  IDX2015  2022  110.9241  Low
    Unnamed: 0 LOCATION INDICATOR SUBJECT  MEASURE  TIME     Value   cut
11      152110      CHL       CPI     TOT  IDX2015  2022  133.9722  High
15      152624      IND       CPI     TOT  IDX2015  2022  142.3749  High
19      153477      ZAF       CPI     TOT  IDX2015  2022  140.9812  High


In 2022, the countries that fell into the category of low inflation were: Portugal, Swiss, China and Saudi Arabia. The countries that fell into the category of high inflation were: Chile, India and South Africa.

Before we can make any statements, we have to consider the bin ranges that were created with the .qut function. In 2020, the bin ranges are 100-110 (low), 110-120 (medium) and 120-130 (high). In 2022, the bin ranges are 100-115 (low), 115-130 (medium) and 130-145 (high). The range for 2022 is larger, because the values for inflation are more varied.

The countries that moved from the "low" category to a higher category are The Netherlands and New Zealand. There are no countries that moved from the 'high inflation' category to a lower category. But Chile moved to this category in 2022.

Lets further analyze these countries in comparison to their happiness scores.

In [48]:
{
    "tags": [
        "hide-input",
    ]
}

happiness_2020 = pd.read_csv('happiness_2020-def.csv')
happiness_2022 = pd.read_csv('happiness_2022-def.csv')


happiness_2020 = happiness_2020.drop(['Unnamed: 0'], axis=1)
happiness_2022 = happiness_2022.drop(['Unnamed: 0'], axis=1)


# save countries in df
low2020 = inflation2020[inflation2020['cut']== 'Low']
high2020 = inflation2020[inflation2020['cut']== 'High']

low2022 = inflation2022[inflation2022['cut']== 'Low']
high2022 = inflation2022[inflation2022['cut']== 'High']

infhap20 = pd.concat([happiness_2020.set_index('Country name'), inflation2020.set_index('LOCATION')], axis = 1)
infhap22 = pd.concat([happiness_2022.set_index('Country'), inflation2022.set_index('LOCATION')], axis = 1)

infhap20 = infhap20.filter(items=['Happiness score', 'Dystopia + residual',
       'Explained by: Log GDP per capita', 'Explained by: Social support',
       'Explained by: Healthy life expectancy',
       'Explained by: Freedom to make life choices',
       'Explained by: Generosity', 'Explained by: Perceptions of corruption', 'Value',
       'cut', 'TIME'])
infhap22 = infhap22.filter(items=['Happiness score', 'Dystopia (1.83) + residual',
       'Explained by: GDP per capita', 'Explained by: Social support',
       'Explained by: Healthy life expectancy',
       'Explained by: Freedom to make life choices',
       'Explained by: Generosity', 'Explained by: Perceptions of corruption', 'Value', 'cut','TIME'])

df = pd.concat([infhap20,infhap22])

In [49]:

{
    "tags": [
        "hide-input",
    ]
}
fig = px.scatter(df, x="Value", y="Happiness score", color=df.index, facet_col="TIME", facet_row="cut", title='Correlation between inflation and happiness scores in 2020 and 2022')
fig.show()


There is a strong positive correlation between happiness and inflation for the 'low' inflation countries and their happiness score in 2020. Which is really controversial to us: it seems that countries within this category with relatively higher inflation also have a higher happiness score. The countries in the 'medium' inflation category show the same correlation, while countries in the category of high inflation have relatively lower happiness scores.

In 2022, in the 'low' inflation category, the opposite can be seen: there is a strong negative correlation between inflation and happiness. The countries within this category with a relatively higher inflation value have relatively lower happiness scores. The same applies to the 'high' inflation category. Only the 'medium' category shows the opposite.

The fact that inflation, in most cases, does not immediately affect the happiness score of a country indicates that besides inflation, other factors contributed to happiness that overruled the inflation effects. Possibly non-economic factors.

But overall, countries (scatters) in the 'low' inflation category are clustered in between happiness scores of 5 and 8, while happiness scores of countries within the 'high' inflatition category are clustered in between 3 and 5 (2020) and 3 and 6.5 (2022). This proves that there exists a positive correlation between inflation and happiness.

## Calculating the correlation coefficient between inflation and all happiness factors
The happiness score is made up of seven independent factors. We want to see what kind of correlation exists between inflation and all of these happiness factors.

In [40]:
infhap20.corr(method='pearson', min_periods=1, numeric_only=True).style.background_gradient(cmap="Blues")


All-NaN slice encountered


All-NaN slice encountered



Unnamed: 0,Happiness score,Dystopia + residual,Explained by: Log GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption,Value,TIME
Happiness score,1.0,0.928188,0.953589,0.9104,0.818651,0.415717,0.499488,0.766688,-0.82064,
Dystopia + residual,0.928188,1.0,0.900718,0.91913,0.643008,0.061505,0.313296,0.543494,-0.710782,
Explained by: Log GDP per capita,0.953589,0.900718,1.0,0.860186,0.778085,0.431339,0.320622,0.660738,-0.904726,
Explained by: Social support,0.9104,0.91913,0.860186,1.0,0.668954,0.171148,0.243946,0.548585,-0.745143,
Explained by: Healthy life expectancy,0.818651,0.643008,0.778085,0.668954,1.0,0.60664,0.346538,0.599054,-0.86704,
Explained by: Freedom to make life choices,0.415717,0.061505,0.431339,0.171148,0.60664,1.0,0.488757,0.692397,-0.536636,
Explained by: Generosity,0.499488,0.313296,0.320622,0.243946,0.346538,0.488757,1.0,0.849454,-0.080253,
Explained by: Perceptions of corruption,0.766688,0.543494,0.660738,0.548585,0.599054,0.692397,0.849454,1.0,-0.49606,
Value,-0.82064,-0.710782,-0.904726,-0.745143,-0.86704,-0.536636,-0.080253,-0.49606,1.0,
TIME,,,,,,,,,,


We calculated the Pearson correlation coefficient for the combined dataframes 'inflation' and 'happiness'for both 2020 and 2022. 'Value' is the column which represents the inflation value for all countries and categories. As you can see, in 2020, there is a negative correlation between the inflation value and the happiness score (and most of its factors). Between inflation and Happiness that is explained by generosity, there is no correlation at all.

In [41]:
infhap22.corr(method='pearson', min_periods=1, numeric_only=True).style.background_gradient(cmap="Blues")

Unnamed: 0,Happiness score,Dystopia (1.83) + residual,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption,Value,TIME
Happiness score,1.0,0.904041,0.973648,0.894421,0.753984,0.327524,0.316277,0.753218,-0.6957,
Dystopia (1.83) + residual,0.904041,1.0,0.881686,0.938524,0.537265,-0.048906,0.069443,0.471792,-0.540433,
Explained by: GDP per capita,0.973648,0.881686,1.0,0.851496,0.746482,0.39769,0.189256,0.683745,-0.778939,
Explained by: Social support,0.894421,0.938524,0.851496,1.0,0.553329,0.003057,0.041099,0.481593,-0.565912,
Explained by: Healthy life expectancy,0.753984,0.537265,0.746482,0.553329,1.0,0.491561,0.235221,0.556855,-0.685671,
Explained by: Freedom to make life choices,0.327524,-0.048906,0.39769,0.003057,0.491561,1.0,0.296481,0.585441,-0.670363,
Explained by: Generosity,0.316277,0.069443,0.189256,0.041099,0.235221,0.296481,1.0,0.750396,0.159214,
Explained by: Perceptions of corruption,0.753218,0.471792,0.683745,0.481593,0.556855,0.585441,0.750396,1.0,-0.467474,
Value,-0.6957,-0.540433,-0.778939,-0.565912,-0.685671,-0.670363,0.159214,-0.467474,1.0,
TIME,,,,,,,,,,


Between 2020 and 2022, there are no drastic changes in the correlation coefficients between inflation and happiness factors. There is a slight decrease for each correlation in 2022, but not significant.

## Reflection

Working on this project was an overall positive experience. There was some confusion at the start of the course with regard to the groups, where we were not sure if it was possible to form this group due to our different sub-groups. Fortunately, we were allowed to form a group together. We have learned from past projects that a strong group is the key to a succesfull result and this is why we decided on this group. From beginning to end there was a strong communication and we could rely on each other for valueable feedback.

We began this project by deciding on a topic. This happened fairly easy and we were content with the topic of inflation and its correlation to happiness. After this the two perspectives of our project were set. We then divided the tasks based on the required results, and got to work. The tasks were evenly divided and we were able to help each other if necessary. There was some confusion around the use of github, which unfortunately led to us not being able to hand in the draft version correctly, thus losing some points. We quickly learned from our mistakes and went to focus on the next task. We did a peer review in the next lesson which was incredibly helpfull for us. This gave us the opportunity to reflect on our own graphics and receive feedback on it from outside our group. We took this feedback very seriously and started modifying our graphs to better fit the desired result. The peer review also gave us the opportunity to look at another groups' graphics and use this for inspiration in our own project. The next week we made the final changes to our graphics. In some cases we could not figure out the solution by ourselves, and for this we used generative AI (chatGPT) to help us complete the graphics. When the graphics were finished, we set on to answer our perspectives using the data we acquired from the graphics.

We can all agree that the teamwork in our group was splendid and we are more than satisfied with the results. Whenever there was trouble, we quickly came to each others help which was possible due to the strong communication in our group. There were few disagreements about the project, and if there ever were, they were quickly resolved.

The only problem we did have was the absence of TA's in some of our lessons, which led to us not being able to receive any feedback. We believe this held us back from improving our project further. Overall working on this project was a more than satisfactory experience

## Work distribution

| Who? | Tasks |
| --- | --- |
| Evan | Visualizations, setup Github  |
| Joep | Visualizations |
| Lotte | Data preprocessing, documentation, visualizations |
| Robin | Data preprocessing, documentation, visualizations, githubn pages |

## References
    OECD Economic Outlook. (2023). OECD iLibrary. 
        https://www.oecd-ilibrary.org/economics/oecd-economic-outlook_16097408 
    World Happiness Report Data Dashboard | The World Happiness Report. (z.d.). 
        https://worldhappiness.report/data/
        
    Orac, R. (2022, 5 januari). The Fastest Way to Visualize Correlation in Python - Towards Data Science. Medium. https://towardsdatascience.com/the-fastest-way-to-visualize-correlation-in-python-ce10ed533346
    