In [1]:
# pip install dash

In [11]:
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px
import matplotlib.pyplot as plt
import numpy as np

# Correlation between Happiness and Economic Factors

01-07-2023

Information Visualization: data story final 

Group: B4

| Student name | student number | 
| --- | --- | 
| Evan Lont | 14729210 | 
| Joep Haanen | 14657368 |
| Lotte te Kulve | 14648911 | 
| Robin Kuipers | 14273810 |

## Introduction

Over the last few years, a lot has happened in the world. From the end of 2019 to the first half of 2022, the world went through a global pandemic. During and after the pandemic, the inflation rates skyrocketed to record-breaking numbers. The inflation had not been this high in almost 40 years (OECD Economic Outlook, 2023). Additionally, at the beginning of 2022, a war between Russia and Ukraine broke out. All of these events could have a significant influence on world happiness rate.

The analysis will focus on the correlation between the world happiness rate and economic factors.

We have decided to focus on the aspect of inflation for the economic factors. This is mainly due to our own experience with inflation and that of our environment. In the past few years, we have heard a lot about the problems around inflation and the potential risks of an ever-increasing inflation rate. This has been broadcasted on the news, show in newspapers but most obviously seen in our own economic environment. We have noticed ourselves that all our expenses have gone up. Groceries have become more expensive, restaurants have become more expensive, and even basic needs like a haircut have seen an enormous increase in cost over the past years. Inflation has been an important topic of conversation that we all deal with. This is why we have set our focus on this topic and its correlation with happiness of the people around the world.

The "World Happiness Report" dataset and relevant economic indicators such as GDP per capita, inflation rates, and consumer price index (CPI) will be used to investigate the relationship between subjective well-being and economic stability. Through data analysis, the aim is to determine whether countries with higher economic indicators tend to exhibit higher happiness scores. This study aims to contribute to understanding how economic factors influence levels of happiness at both individual and societal levels.


## Datasets and preprocessing
For the first dataset, the World Happiness Report Dataset from the Sustainable Development Solutions Network, powered by the Gallup World Poll data, has been chosen. As for the second dataset, an inflation dataset from OECD data that covers at least, ten years up until 2022 has been identified to meet our requirements. Upon analyzing the two datasets, it became clear that the datasets needed some filtering. Additionally, the inflation dataset offers the potential for intriguing visualizations due to the inclusion of inflation trends before, during, and to some extent, after the COVID-19 pandemic.

### Dataset 1: World happiness report
**Source:** https://worldhappiness.report/ed/2020/#appendices-and-data

**Number of records:** `20`

**Number of variables:** `10`

**Description:** As part of our data analysis, we utilized two datasets from the World Happiness Report for the years 2020 and 2022.  The WHR is an annual publication made by the Sustainable Development Solutions Network, and relies on data collected by the Gallup World Poll. The report is written by a group of independent experts, each with expertise in different variables that the WHR measures. It covers these variables over more than 150 countries worldwide, of which we have chosen to analyze eight specific countries. The primary objective of the yearly report is to reflect a worldwide demand for more attention towards happiness by inspiring countries' governments to take on a better government policy. 
During our analysis we will work with the variables of our eight chosen countries in order to make findings about the relationship between the happiness score and several economic factors. These variables include ones found inside the WHR, such as GDP per capita and generosity, but also external variables such as the yearly inflation.

| Variable | Datatype | Measurement scale |
| --- | --- | --- |
| country name | Categorical | Nominal |
| Regional indicator | Categorical | Nominal |
| Happiness score | Continuous | Interval |
| upperwhisker | Continuous | Interval |
| lowerwhisker | Continuous | Interval |
| Logged GDP per capita | Continuous | Ratio |
| Healthy life expectancy | Continuous | Interval |
| Generosity | Continuous | Interval |
| Perceptions of corruption | Continuous | Interval |
| Explained by: Log GDP per capita | Continuous | Ratio |
| Explained by: Healthy life expectancy | Continuous | Ratio |
| Explained by: Freedom to make life choices | Continuous | Ratio |
| Explained by: Generosity | Continuous | Ratio |
| Explained by: Social support | Continuous | Ratio |
| Explained by: Perceptions of corruption | Continuous | Ratio |
| Dystopia + residual | Continuous | Interval |


#### Preprocessing
For detailed preprocessing, visit: [happiness data preprocessing](https://rxbinashley.github.io/infovisb4/docs/data%20cleaning.html)

For each variable we asked ourselves the following questions:

- What are the variables in the data?
- Do we need all the data points and variables?
- Are there data that are out of scope?
- Are there privacy or ethical issues in the data?
- Is it practical to process the variable that we want?
- To prevent the dataset to be too large, the focus of the project will lay on the data for the years 2020 and 2022, because some of the datasets values varied a lot in between these years. Another reason for the selection of only two different years is that we want to find out how much the data can differ in such a small timeframe. The analysis will use the variables of our ten chosen countries in order to make findings about the relationship between the happiness score and several economic factors. These variables include ones found inside the WHR, such as GDP per capita and generosity, but also external variables such as the yearly inflation.

Based on the requirements for the data, the following actions were taken:

- The removal of specific columns from the world happiness dataset, including:
    - Regional indicator
    - Upperwhisker
    - Lowerwhisker
- Rearranging the columns to facilitate clear identification of the country and year under consideration.
- Selecting and retaining only the countries necessary for our analysis, while removing the rest. The final selection includes: 'Switzerland', 'Netherlands', 'New Zealand', 'Canada','Saudi Arabia', 'Chile', 'Portugal', 'China', 'South Africa', 'India'. We chose these countries because they're located in different regions and their economic wellbeing differs a lot.

In [12]:
happiness_2020 = pd.read_csv('happiness_2020.csv')
pd.DataFrame.head(happiness_2020, n=5)

Unnamed: 0.1,Unnamed: 0,Country name,Happiness score,Dystopia + residual,Explained by: Log GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,2,CHE,7.5599,2.350267,1.390774,1.472403,1.040533,0.628954,0.269056,0.407946
1,5,NLD,7.4489,2.352117,1.338946,1.463646,0.975675,0.613626,0.336318,0.36857
2,7,NZL,7.2996,2.128108,1.242318,1.487218,1.008138,0.64679,0.325726,0.461268
3,10,CAN,7.2321,2.195269,1.301648,1.435392,1.022502,0.644028,0.281529,0.351702
4,26,SAU,6.4065,2.203119,1.334329,1.30995,0.759818,0.548477,0.087441,0.163322


In [13]:
happiness_2022 = pd.read_csv('happiness_2022.csv')
pd.DataFrame.head(happiness_2022, n=5)

Unnamed: 0.1,Unnamed: 0,Country,Happiness score,Dystopia (1.83) + residual,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,3,CHE,7.512,2.153,2.026,1.226,0.822,0.677,0.147,0.461
1,4,NLD,7.415,2.137,1.945,1.206,0.787,0.651,0.271,0.419
2,9,NZL,7.2,1.954,1.852,1.235,0.752,0.68,0.245,0.483
3,14,CAN,7.025,1.924,1.886,1.188,0.783,0.659,0.217,0.368
4,24,SAU,6.523,2.075,1.87,1.092,0.577,0.651,0.078,0.18



### Dataset 2: Inflation (CPI)

**Source:** https://data.oecd.org/price/inflation-cpi.htm

**Number of records:** `490`

**Number of variables:** `8`

**Description:** The "Inflation (CPI)" dataset from the OECD contains information on consumer price index (CPI) and inflation rates across various countries. It provides a comprehensive view of the changes in price levels for goods and services over time, allowing for the analysis and comparison of inflation rates among different economies. The dataset includes indicators such as headline inflation, core inflation, and various sub-components of CPI. It serves as a valuable resource for understanding and monitoring inflation trends at a global level.

| Variable | Datatype | Measurement scale |
| --- | --- | --- |
| Location | Categorical | Nominal |
| Regional indicator | Categorical | Nominal |
| Subject | categorical | Nominal |
| Measure | categorical | Interval |
| Frequency | Continuous | Interval |
| Time | Continuous | Interval |
| Value | Continuous | Interval |
| Flag code | Categorical | Nominal |


#### Preprocessing

For detailed preprocessing, visit: [inflation data preprocessing](https://rxbinashley.github.io/infovisb4/docs/inflation_cleaning.html)

- Country names were changed to abbreviations.

- Both datasets contained information per country, but the inflation dataset used abbreviations as values while the happiness dataset used full country names. To facilitate data comparison for specific countries, we needed to align the values either to abbreviations or full country names. We decided to use abbreviations for consistency.
    

In [14]:
inflation = pd.read_csv('inflation.csv')
# inflation.drop('Flag Codes', axis=1, inplace=True)
# inflation.drop('FREQUENCY', axis=1, inplace=True)
inflation2020 = inflation[inflation['TIME'] == 2020]
inflation2022 = inflation[inflation['TIME'] == 2022]
pd.DataFrame.head(inflation, n=5)

Unnamed: 0.1,Unnamed: 0,LOCATION,INDICATOR,SUBJECT,MEASURE,TIME,Value
0,146211,CAN,CPI,TOT,IDX2015,2020,108.2104
1,146213,CAN,CPI,TOT,IDX2015,2022,119.4957
2,149430,NLD,CPI,TOT,IDX2015,2020,107.51
3,149432,NLD,CPI,TOT,IDX2015,2022,121.4267
4,149731,NZL,CPI,TOT,IDX2015,2020,107.6488


In [15]:
# This code was used in the data cleaning process, more data cleaning code in datacleaning.ipynb

# # list all unique country names
# unique_countries = pd.unique(happiness_2020['Country name'])

# # list all unique abbreviations
# unique_abbr = pd.unique(inflation['LOCATION'])

# # map all unique country names in a dictionary with abbreviations as values
# country_mapping = {
#     "Switzerland": "CHE",
#     "Netherlands": "NLD",
#     "New Zealand": "NZL",
#     "Canada": "CAN",
#     "Saudi Arabia": "SAU",
#     "Chile": "CHL",
#     "Japan": "JPN",
#     "Portugal": "PRT",
#     "China": "CHN",
#     "South Africa": "ZAF",
#     "India": "IND"
# }

# # map the dictionary to the values of 'country name' in the happiness dataset
# happiness_2020['full Country name'] = happiness_2020['Country name'].map(country_mapping)
# happiness_2020.head()

# # export to csv
# #happiness_2020.to_csv('happiness_2020.csv', index=False)

## Perspective 1: Inflation has a minimal impact on happiness.
>**While inflation is an important economic indicator, its influence on happiness might be overshadowed by other factors. This perspective suggests that while economic stability is crucial, it may not be the sole determinant of happiness. To see if this perspective is valid, three visualisations have been created.**

The first visualisation illustrates the increase the inflation between the years 2020 and 2022 per selected country. The lines in the graph represent the increase in inflation for the different countries. In the visualisation can be seen how for every counrty the inflation has increased in 2022. The graph also shows how high the inflation rates are in comparison with the inflation in 2015. The year 2015 got the value of 100, so an inflation rate of 130 means that the inflation got 30% higher in that year in comparison to 2015.

In [16]:

colors = ['rgb(102,194,165)', 'rgb(252,141,98)', 'rgb(141,160,203)']


layout = go.Layout(
    xaxis=go.layout.XAxis(
        type='category',  # The x-axis type is categorical
        tickvals=['2020', '2022'],  # Set custom tick values
        ticktext=['2020', '2022'],  # Set custom tick labels
    ),

    width=600,
    height=600
)

data = []
for country in inflation2020['LOCATION'].unique():
    # Extract the data for each country
    country_data_2020 = inflation2020[inflation2020['LOCATION'] == country]
    country_data_2022 = inflation2022[inflation2022['LOCATION'] == country]
    
    # Create a trace for each country
    trace = go.Scatter(
        x=['2020', '2022'],
        y=[country_data_2020['Value'].iloc[0], country_data_2022['Value'].iloc[0]],
        mode='lines+markers',
        name=country,
#         
    )
    
    data.append(trace)


fig = go.Figure(data=data, layout=layout)


fig.update_layout(
    title="Inflation Rates by Country with the year 2015 as inflation rate 100",
    xaxis_title="Year",
    yaxis_title="Inflation Rate",
)


fig.show()

> _Figure 1. The graph above shows the increase in inflation between the years 2020 and 2022 per selected country. The lines in the graph represent the increase in inflation for the different countries. In the visualisation can be seen how for every counrty the inflation has increased in 2022._
>
>

From this visualisation can be concluded that for every chosen country the inflation has increased in 2022 in perspective to 2020. With that said, let's start to look at the world happiness rates in 2020 and 2022.

The second visualisation represents the happiness rate per country in 2020 and in 2022. For every country two bars have been plotted to represent the happiness rate in the two years. The orange bars represent the year 2020 and the blue represent the year 2022.

In [17]:

colors = ['rgb(102,194,165)', 'rgb(252,141,98)', 'rgb(141,160,203)']

layout = go.Layout(
    xaxis=go.layout.XAxis(
        type='category' # het type van de X as is categorisch
    ),

    height=400
)

year2020 = go.Bar(
    x=happiness_2020['Country name'],
    y=happiness_2020['Happiness score'], # by year 2020
    name='2020',
    marker=dict(color=colors[1]) 
)
year2022 = go.Bar(
    x=happiness_2022['Country'],
    y=happiness_2022['Happiness score'],
    name='2022',
    marker=dict(color=colors[2]) 
)

data = [year2020, year2022]
fig = go.Figure(data=data, layout=layout)

# labels
fig.update_layout(
    title="World happiness rate per country in 2020 vs 2022",
    xaxis_title="Country",
    yaxis_title="Happiness Rate")
    
fig.show()

> _Figure 2: The grouped bar chart above represents the happiness rate for the year 2020 and the year 2022 among selected countries._


As shown in the visualisation above, the happiness rate per country in 2022 did not significantly change compared to the happiness rate in 2020. Because of this, the aim of this perspective is to explore the underlying factors contributing to the happiness rate and assess whether their distribution varied between the two years. The third visualisation has been made for this purpose.

The third visualisation illustrates the distribution of the underlying factors which make up the happiness score per year. The mean of every column was calculated to create an average distribution per year. With this visualisation can be analysed how the distribution of the happiness rate factores change when the inflation gets higher. The dropdown can be used to switch between the two years.

In [19]:
import dash
from dash import dcc
from dash import html

df1 = pd.read_csv('happiness_2020.csv')
df2 = pd.read_csv('happiness_2022.csv')

# Initialize the Dash app
app = dash.Dash(__name__)


app.layout = html.Div([
    dcc.Dropdown(
        id='dataset-dropdown',
        options=[
            {'label': 'Happiness 2020', 'value': 'df1'},
            {'label': 'Happiness 2022', 'value': 'df2'}
        ],
        value='df1',
    ),
    html.H2(id='chart-title'),
    dcc.Graph(id='pie-chart')
])


@app.callback(
    [dash.dependencies.Output('pie-chart', 'figure'),
     dash.dependencies.Output('chart-title', 'children')],
    [dash.dependencies.Input('dataset-dropdown', 'value')]
)
    
    
def update_pie_chart(dataset):
    
    if dataset == 'df1':
        df = df1
        dataset_name = 'Happiness 2020'
    else:
        df = df2
        dataset_name = 'Happiness 2022'
    
    
    mean_values = df.iloc[:, -7:].mean(axis=0)
    labels = mean_values.index
    values = mean_values.values
    
    
    fig = px.pie(values=values, names=labels, hole=0.5)
    
    
    title = f"Distribution of each happiness factor - {dataset_name}"
    
    return fig, title

# Run the app
if __name__ == '__main__':
    app.run_server(port=8051, debug=True)

> _Figure 3: This interactive pie chart shows the distribution of all mean values for each happiness factor. The happiness score is made up of seven factors that can be seen in the legend. The dropdown allows to switch between 2020 and 2022._
>
>

From visualisation above can be concluded that almost every factor of the world happiness rate decreased a little in their influence, while GDP per capita increased 9% in their influence. Because of this, the world happiness rate didn't significantly change.

## Perspective 2

>**Overall happiness rates will decrease when the inflation gets higher and the social and health factors will play a bigger role in the happiness rates.** 

Economic well-being and happiness are positively correlated. By examining the relationship between inflation and happiness scores, we can observe that countries experiencing lower inflation rates tend to have higher happiness scores. This suggests that maintaining low inflation can contribute to the overall well-being and happiness of a population.

### Happiness distribution

First, it is necessary to visualize all different happiness scores in aa histogram that counts each given happiness score. This way, it is easier to see what the difference is between happiness scores in 2020 and in 2022. 

In [None]:
original_data20 = px.histogram(happiness_2020, x='Happiness score', title='Distribution of happiness scores in 2020')
original_data20.show()

> _Figure 4: Distribution of happiness scores in 2020. The x-axis represents happiness values and the y-axis counts how many countries give a certain score._
>
>

In [None]:
original_data22 = px.histogram(happiness_2022, x='Happiness score', title='Distribution of happiness scores in 2022')
original_data22.show()

> _Figure 5: The same distribution as Figure 4, but now it represents the values for 2022_
>
>

### Inflation distribution

To examine the distribution of inflation in 2020 and 2022, we will distribute all selected  countries into three categories based on their inflation rate: high, medium and low using the `.cut` function. With the `.cut` function, we specify three equal-sized bins with all the different inflation rates to see the distribution of high, medium and low inflation. 

In [None]:
inflation_tot = pd.read_csv('inflation_tot.csv')
inflation2020 = inflation_tot[inflation_tot['TIME'] == 2020]
inflation2022 = inflation_tot[inflation_tot['TIME'] == 2022]

In [None]:
inflation_original20 = px.histogram(inflation2020, x='Value', title='Distribution of inflation rates in 2020')
inflation_original22 = px.histogram(inflation2022, x='Value', title='Distribution of inflation rates in 2022')

# Cut
inflation2020['cut'] = pd.cut(inflation2020['Value'], bins=3, right=True, labels=['Low', 'Medium', 'High'])
fig_cut20 = px.histogram(inflation2020, x="cut", title='Distribution of inflation 2020')

# Cut
inflation2022['cut'] = pd.cut(inflation2022['Value'], bins=3, right=True, labels=['Low', 'Medium', 'High'])
fig_cut22 = px.histogram(inflation2022, x="cut", title='Distribution of inflation in 2022')

inflation_original20.show()
inflation_original22.show()

> _Figure 6: The two histograms above represent the distribution of inflation (y-axis) and counts (x-axis) how many times this inflation rate occurs._

As we look at the difference between the general distribution of 2020 and 2022 in the first two graphs, we can see that 6 countries had inflation rate value between 100 and 109 in 2020. In 2022, 5 countries increased in inflation up to the value range of 110-119.99. Are those the same countries? To answer this question, let's visualise our bins.

In [None]:
fig_cut20.show()
fig_cut22.show()

> _Figure 7: The histograms above visualise the bins that are created with the .cut function. The categories for the cut represent inflation and can be 'low','medium','high'. The count on the y-axis represents how many countries fall into each category._
>

Noticably, the medium and high categories are even in 2022. The distribution of all categories is much more even. In 2020, medium and high were also even for these countries. 

Below will be a printed version of the bins, to see which country falls into the low and high category, based on the prediction that these differences will be more noticable than low - medium and medium-high. This way we are able to see how each country moves from high to low and vice versa.

In [None]:
print(inflation2020[inflation2020['cut']== 'Low'])
print(inflation2020[inflation2020['cut']== 'High'])

## Low and high inflation in 2020 and 2022
Countries that fell into the category of low inflation in 2020 were Canada, The Netherlands, New Zealand, Portugal, Swiss and Saudi Arabia. Countries that fell into the category of high inflation in 2020 were India and South Africa.

Let's take a look at these categories in 2022:

In [None]:
print(inflation2022[inflation2022['cut']== 'Low'])
print(inflation2022[inflation2022['cut']== 'High'])

In 2022, the countries that fell into the category of low inflation were: Portugal, Swiss, China and Saudi Arabia. The countries that fell into the category of high inflation were: Chile, India and South Africa.

Before we can make any statements, we have to consider the bin ranges that were created with the .qut function. In 2020, the bin ranges are 100-110 (low), 110-120 (medium) and 120-130 (high). In 2022, the bin ranges are 100-115 (low), 115-130 (medium) and 130-145 (high). The range for 2022 is larger, because the values for inflation are more varied.

The countries that moved from the "low" category to a higher category are The Netherlands and New Zealand. There are no countries that moved from the 'high inflation' category to a lower category. But Chile moved to this category in 2022.

Lets further analyze these countries in comparison to their happiness scores.

In [None]:
worldhappiness_2020 = happiness_2020.drop(['Unnamed: 0'], axis=1)
worldhappiness_2022 = happiness_2022.drop(['Unnamed: 0'], axis=1)


# save countries in df
low2020 = inflation2020[inflation2020['cut']== 'Low']
high2020 = inflation2020[inflation2020['cut']== 'High']

low2022 = inflation2022[inflation2022['cut']== 'Low']
high2022 = inflation2022[inflation2022['cut']== 'High']

infhap20 = pd.concat([worldhappiness_2020.set_index('Country name'), inflation2020.set_index('LOCATION')], axis = 1)
infhap22 = pd.concat([worldhappiness_2022.set_index('Country'), inflation2022.set_index('LOCATION')], axis = 1)

infhap20 = infhap20.filter(items=['Happiness score', 'Dystopia + residual',
       'Explained by: Log GDP per capita', 'Explained by: Social support',
       'Explained by: Healthy life expectancy',
       'Explained by: Freedom to make life choices',
       'Explained by: Generosity', 'Explained by: Perceptions of corruption', 'Value',
       'cut', 'TIME'])
infhap22 = infhap22.filter(items=['Happiness score', 'Dystopia (1.83) + residual',
       'Explained by: GDP per capita', 'Explained by: Social support',
       'Explained by: Healthy life expectancy',
       'Explained by: Freedom to make life choices',
       'Explained by: Generosity', 'Explained by: Perceptions of corruption', 'Value', 'cut','TIME'])

df = pd.concat([infhap20,infhap22])

In [None]:

fig = px.scatter(df, x="Value", y="Happiness score", color=df.index, facet_col="TIME", facet_row="cut", title='Correlation between inflation and happiness scores in 2020 and 2022')
fig.show()

> _Figure 8: A scatterplot matrix that maps the correlation between inflation (x-axis) and happiness (y-axis) based on each category that a country's inflation rate falls in._
>
>


There is a strong positive correlation between happiness and inflation for the 'low' inflation countries and their happiness score in 2020. Which is really controversial to us: it seems that countries within this category with relatively higher inflation also have a higher happiness score. The countries in the 'medium' inflation category show the same correlation, while countries in the category of high inflation have relatively lower happiness scores.

In 2022, in the 'low' inflation category, the opposite can be seen: there is a strong negative correlation between inflation and happiness. The countries within this category with a relatively higher inflation value have relatively lower happiness scores. The same applies to the 'high' inflation category. Only the 'medium' category shows the opposite.

The fact that inflation, in most cases, does not immediately affect the happiness score of a country indicates that besides inflation, other factors contributed to happiness that overruled the inflation effects. Possibly non-economic factors.

But overall, countries (scatters) in the 'low' inflation category are clustered in between happiness scores of 5 and 8, while happiness scores of countries within the 'high' inflatition category are clustered in between 3 and 5 (2020) and 3 and 6.5 (2022). This proves that there exists a positive correlation between inflation and happiness.

## Calculating the correlation coefficient between inflation and all happiness factors
The happiness score is made up of seven independent factors. We want to see what kind of correlation exists between inflation and all of these happiness factors.

In [None]:
infhap20.corr(method='pearson', min_periods=1, numeric_only=True).style.background_gradient(cmap="Blues")

We calculated the Pearson correlation coefficient for the combined dataframes 'inflation' and 'happiness'for both 2020 and 2022. 'Value' is the column which represents the inflation value for all countries and categories. As you can see, in 2020, there is a negative correlation between the inflation value and the happiness score (and most of its factors). Between inflation and Happiness that is explained by generosity, there is no correlation at all.

In [None]:
infhap22.corr(method='pearson', min_periods=1, numeric_only=True).style.background_gradient(cmap="Blues")

Between 2020 and 2022, there are no drastic changes in the correlation coefficients between inflation and happiness factors. There is a slight (positive) increase for each correlation in 2022, but not significant. This would mean that happiness is impacted by inflation, but high inflation doesn't negatively affeect happiness as much as predicted, relatively to correlation coefficients in 2020.  Other factors must have affected happiness.

## GDP per capita and happiness
We will explore the correlation between GDP per capita and happiness score, since GDP per capita is one of the factors that has mostly to do with economic wellbeing.

We will argue that countries with higher GDP per capita may have better economic opportunities, access to resources, and quality of life, which could positively impact happiness levels.

In [None]:
df_gdp = pd.read_csv('happiness_2020.csv')

selected_countries = ['CHE', 'NLD', 'NZL', 'CAN', 'SAU', 'CHL', 'PRT', 'CHN', 'ZAF', 'IND']
df_filtered = df_gdp[df_gdp['Country name'].isin(selected_countries)]

# Create map
fig = px.choropleth(df_filtered,
                    locations='Country name',
                    locationmode='ISO-3',
                    color='Explained by: Log GDP per capita',
                    color_continuous_scale='blues',
                    title='GDP by Country')

# Update layout
fig.update_layout(
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular',
        scope='world',
    ),
    coloraxis_colorbar=dict(
        title='GDP',
        thickness=20,
        len=0.5,
        xanchor='right',
        yanchor='middle'
    )
)

fig.show()

> _Figure 9: The map above visualizes the relative GDP per country in 2020 for the 10 countries. The year 2020 was picked as the same graph in 2022 looked roughly the same, so an extra visualization wouldn't give new information._
>
>

#### Findings:
- Switzerland, the Netherlands & Saudi Arabia have the highest GDP

- India, South Africa & China have the lowest GDP


Now, let's compare this map to a map that visualizes inflation rates:

In [None]:
df_inflation = pd.read_csv('inflation.csv')

# Filter the inflation data 
selected_countries = ['CHE', 'NLD', 'NZL', 'CAN', 'SAU', 'CHL', 'PRT', 'CHN', 'ZAF', 'IND']
df_filtered_inflation = df_inflation[df_inflation['LOCATION'].isin(selected_countries)]

# Create map for inflation
fig = px.choropleth(df_filtered_inflation,
                    locations='LOCATION',
                    locationmode='ISO-3',
                    color='Value',
                    color_continuous_scale='Reds',
                    title='Inflation by Country')

# Update layout
fig.update_layout(
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular',
        scope='world'
    ),
    coloraxis_colorbar=dict(
        title='Inflation',
        thickness=20,
        len=0.5,
        xanchor='right',
        yanchor='middle'
    )
)

fig.show()

> _Figure 10: The map above visualizes the relative inflation per country._

#### Findings:
- India, South Africa and Chile have the highest inflation

- Canada, Portugal and Switzerland have the lowest inflation



## Reflection

Working on this project was an overall positive experience. There was some confusion at the start of the course with regard to the groups, where we were not sure if it was possible to form this group due to our different sub-groups. Fortunately, we were allowed to form a group together. We have learned from past projects that a strong group is the key to a succesfull result and this is why we decided on this group. From beginning to end there was a strong communication and we could rely on each other for valueable feedback.

We began this project by deciding on a topic. This happened fairly easy and we were content with the topic of inflation and its correlation to happiness. After this the two perspectives of our project were set. We then divided the tasks based on the required results, and got to work. The tasks were evenly divided and we were able to help each other if necessary. There was some confusion around the use of github, which unfortunately led to us not being able to hand in the draft version correctly, thus losing some points. We quickly learned from our mistakes and went to focus on the next task. We did a peer review in the next lesson which was incredibly helpfull for us. This gave us the opportunity to reflect on our own graphics and receive feedback on it from outside our group. We took this feedback very seriously and started modifying our graphs to better fit the desired result. The peer review also gave us the opportunity to look at another groups' graphics and use this for inspiration in our own project. The next week we made the final changes to our graphics. In some cases we could not figure out the solution by ourselves, and for this we used generative AI (chatGPT) to help us complete the graphics. When the graphics were finished, we set on to answer our perspectives using the data we acquired from the graphics.

We can all agree that the teamwork in our group was splendid and we are more than satisfied with the results. Whenever there was trouble, we quickly came to each others help which was possible due to the strong communication in our group. There were few disagreements about the project, and if there ever were, they were quickly resolved.

The only problem we did have was the absence of TA's in some of our lessons, which led to us not being able to receive any feedback. We believe this held us back from improving our project further. Overall working on this project was a more than satisfactory experience

## Work distribution

We distributed the jobs to be done as in the table below:

| Who? | Tasks |
| --- | --- |
| Evan | Visualizations, setup Github  |
| Joep | Visualizations |
| Lotte | Data preprocessing, documentation, visualizations |
| Robin | Data preprocessing, documentation, visualizations, githubn pages |

## References
    OECD Economic Outlook. (2023). OECD iLibrary. 
        https://www.oecd-ilibrary.org/economics/oecd-economic-outlook_16097408 
    World Happiness Report Data Dashboard | The World Happiness Report. (z.d.). 
        https://worldhappiness.report/data/
        
    Orac, R. (2022, 5 januari). The Fastest Way to Visualize Correlation in Python - Towards Data Science. Medium. https://towardsdatascience.com/the-fastest-way-to-visualize-correlation-in-python-ce10ed533346
    
    GeeksforGeeks. (2022). How to use pandas cut  and qcut. GeeksforGeeks. https://www.geeksforgeeks.org/how-to-use-pandas-cut-and-qcut/
    