In [1]:
pip install dash


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3.11 install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
from IPython.display import HTML, Javascript, display

def initialize():
    display(HTML(
        '''
            <script>
                code_show = false;
                function restart_run_all(){
                    IPython.notebook.kernel.restart();
                    setTimeout(function(){
                        IPython.notebook.execute_all_cells();
                    }, 2000)
                }
                function code_toggle() {
                    if (code_show) {
                        $('div.input').hide(200);
                    } else {
                        $('div.input').show(200);
                    }
                    code_show = !code_show
                }
            </script>
            <button onclick="code_toggle()">Verberg of zie de code</button>
            <button onclick="restart_run_all()">'Restart and Run all'</button>
        '''
    ))
initialize()

In [15]:
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px
import matplotlib.pyplot as plt
import numpy as np

import dash
from dash import dcc
from dash import html

20-06-2023

# Correlation between Happiness and Economic Factors 
Information Visualization: group project Draft 

Group: B4

| Student name | student number | 
| --- | --- | 
| Evan Lont | 14729210 | 
| Joep Haanen | 14657368 |
| Lotte te Kulve | 14648911 | 
| Robin Kuipers | 14273810 |

## Introduction

In the last few years, a lot has happened in the world. From the end of 2019 to the first half of 2022, the world went through a global pandemic. During and after the pandemic, the inflation rates skyrocketed to record-breaking numbers. The inflation had not been this high in almost 40 years (OECD Economic Outlook, 2023). Additionally, at the beginning of 2022, a war between Russia and Ukraine broke out. All of these events could have a significant influence on world happiness rate.

The analysis will focus on the correlation between the world happiness rate and economic factors, such as inflation. This choice stems from our personal experiences and observations in relation to inflation, as well as the prevalent discussions surrounding it. In recent years, we have witnessed numerous discussions in the media and newspapers regarding the challenges associated with inflation and the potential risks of a continuously rising inflation rate. These impacts have also been evident in our own economic environment. We have personally observed increases in the prices of everyday items such as groceries, dining out, and even basic services like haircuts. Inflation has become a significant topic of conversation that affects us all. Therefore, we have directed our attention to exploring the correlation between inflation and the happiness of people worldwide.

The "World Happiness Report" dataset and relevant economic indicators such as GDP per capita, inflation rates, and consumer price index (CPI) will be used to investigate the relationship between subjective well-being and economic stability. Through data analysis, the aim is to determine whether countries with higher economic indicators tend to exhibit higher happiness scores. This study aims to contribute to understanding how economic factors influence levels of happiness at both individual and societal levels.

### Perspectives
1. Inflation has a limited impact on happiness. 

While inflation is an important economic indicator, its influence on happiness might be overshadowed by other factors. This perspective suggests that while economic stability is crucial, it may not be the sole determinant of happiness.
    

2. Economic well-being and happiness are positively correlated. 

By examining the relationship between inflation and happiness scores, we can observe that countries experiencing lower inflation rates tend to have higher happiness scores. This suggests that maintaining low inflation can contribute to the overall well-being and happiness of a population.

## Datasets and preprocessing
For the first dataset, the World Happiness Report Dataset from the Sustainable Development Solutions Network, powered by the Gallup World Poll data, has been chosen. As for the second dataset, an inflation dataset from OECD data that covers at least, ten years up until 2022 has been identified to meet our requirements. Upon analyzing the two datasets, it became clear that the datasets needed some filtering. Additionally, the inflation dataset offers the potential for intriguing visualizations due to the inclusion of inflation trends before, during, and to some extent, after the COVID-19 pandemic.

### Dataset 1: World happiness report
**Source:** https://worldhappiness.report/ed/2020/#appendices-and-data & https://worldhappiness.report/ed/2022/#appendices-and-data 

**Number of records:** `20`

**Number of variables:** `12`

**Description:** As part of the data analysis, two datasets were utilized from the World Happiness Report for the years 2020 and 2022.  The WHR is an annual publication made by the Sustainable Development Solutions Network, and relies on data collected by the Gallup World Poll. The report is written by a group of independent experts, each with expertise in different variables that the WHR measures. It covers these variables over more than 150 countries worldwide, of which ten specific countries were chosen to analyze. The primary objective of the yearly report is to reflect a worldwide demand for more attention towards happiness by inspiring countries' governments to take on a better government policy. 

| Variable | Datatype | Measurement scale |
| --- | --- | --- |
| country name | Categorical | Nominal |
| Regional indicator | Categorical | Nominal |
| Happiness score | Continuous | Interval |
| upperwhisker | Continuous | Interval |
| lowerwhisker | Continuous | Interval |
| Explained by: Log GDP per capita | Continuous | Ratio |
| Explained by: Healthy life expectancy | Continuous | Ratio |
| Explained by: Freedom to make life choices | Continuous | Ratio |
| Explained by: Generosity | Continuous | Ratio |
| Explained by: Social support | Continuous | Ratio |
| Explained by: Perceptions of corruption | Continuous | Ratio |
| Dystopia + residual | Continuous | Interval |

#### Preprocessing
**Find the data cleaning code in the ..../////** verwerken

For each variable we asked ourselves the following questions:

- What are the variables in the data?
- Do we need all the data points and variables?
- Are there data that are out of scope?
- Are there privacy or ethical issues in the data?
- Is it practical to process the variable that we want?

To prevent the dataset to be too large, the focus of the project will lay on the data for the years 2020 and 2022, because some of the datasets values varied a lot in between these years. Another reason for the selection of only two different years is that we want to find out how much the data can differ in such a small timeframe. 
The analysis will use the variables of our ten chosen countries in order to make findings about the relationship between the happiness score and several economic factors. These variables include ones found inside the WHR, such as GDP per capita and generosity, but also external variables such as the yearly inflation.

Based on the requirements for the data, the following actions were taken:
- The removal of specific columns from the world happiness dataset, including:
    - Regional indicator
    - Upperwhisker
    - Lowerwhisker
- Rearranging the columns to facilitate clear identification of the country and year under consideration.
- Selecting and retaining only the countries necessary for our analysis, while removing the rest. **///// Waarom 10 landen uitleggen!!!**  The final selection includes:
    'Switzerland', 'Netherlands', 'New Zealand', 'Canada','Saudi Arabia', 'Chile', 'Portugal', 'China', 'South Africa', 'India'


In [79]:
happiness_2020 = pd.read_csv('happiness_2020.csv')
pd.DataFrame.head(happiness_2020, n=5)

Unnamed: 0.1,Unnamed: 0,Country name,Happiness score,Dystopia + residual,Explained by: Log GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,2,CHE,7.5599,2.350267,1.390774,1.472403,1.040533,0.628954,0.269056,0.407946
1,5,NLD,7.4489,2.352117,1.338946,1.463646,0.975675,0.613626,0.336318,0.36857
2,7,NZL,7.2996,2.128108,1.242318,1.487218,1.008138,0.64679,0.325726,0.461268
3,10,CAN,7.2321,2.195269,1.301648,1.435392,1.022502,0.644028,0.281529,0.351702
4,26,SAU,6.4065,2.203119,1.334329,1.30995,0.759818,0.548477,0.087441,0.163322


In [81]:
happiness_2022 = pd.read_csv('happiness_2022.csv')
pd.DataFrame.head(happiness_2022, n=10)

Unnamed: 0.1,Unnamed: 0,Country,Happiness score,Dystopia (1.83) + residual,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,3,CHE,7.512,2.153,2.026,1.226,0.822,0.677,0.147,0.461
1,4,NLD,7.415,2.137,1.945,1.206,0.787,0.651,0.271,0.419
2,9,NZL,7.2,1.954,1.852,1.235,0.752,0.68,0.245,0.483
3,14,CAN,7.025,1.924,1.886,1.188,0.783,0.659,0.217,0.368
4,24,SAU,6.523,2.075,1.87,1.092,0.577,0.651,0.078,0.18
5,43,CHL,6.172,2.04,1.651,1.08,0.748,0.46,0.124,0.069
6,55,PRT,6.016,1.691,1.76,1.078,0.777,0.655,0.016,0.039
7,71,CHN,5.585,1.516,1.508,0.958,0.705,0.656,0.099,0.142
8,90,ZAF,5.194,1.742,1.425,1.088,0.361,0.442,0.089,0.046
9,135,IND,3.777,0.795,1.167,0.376,0.471,0.647,0.198,0.123


### Dataset 2: Inflation (CPI)

**Source:** https://data.oecd.org/price/inflation-cpi.htm

**Number of records:** `490`

**Number of variables:** `8`

**Description:** The "Inflation (CPI)" dataset from the OECD contains information on consumer price index (CPI) and inflation rates across various countries. It provides a comprehensive view of the changes in price levels for goods and services over time, allowing for the analysis and comparison of inflation rates among different economies. The dataset includes indicators such as headline inflation, core inflation, and various sub-components of CPI. It serves as a valuable resource for understanding and monitoring inflation trends at a global level.

**moet de tekst hieronder nog iets mee, stond in de uitleg van happiness?**

This dataset is from the Organization for Economic Cooperation and Development (OECD) and focuses on inflation, specifically the Consumer Price Index (CPI). The Consumer Price Index is a widely used measure of inflation, representing the average price change over time for a basket of goods and services commonly consumed by households.

The dataset likely includes information about inflation rates for various countries and regions over a specific time period. It can provide insights into how prices of goods and services have changed over time, serving as an important indicator of economic stability and the purchasing power of consumers.

By analyzing this dataset, we can gain a deeper understanding of inflation trends across different economies and regions. We could also compare the inflation rates of different countries to assess their economic performance, identify periods of high or low inflation, and study the potential impacts on various sectors such as investment, wages, and consumer spending.

| Variable | Datatype | Measurement scale |
| --- | --- | --- |
| Location | Categorical | Nominal |
| Regional indicator | Categorical | Nominal |
| Subject | categorical | Nominal |
| Measure | categorical | Interval |
| Frequency | Continuous | Interval |
| Time | Continuous | Interval |
| Value | Continuous | Interval |
| Flag code | Categorical | Nominal |

#### Preprocessing

   **Find the data cleaning code in the ..../////** verwerken
- Country names were changed to abbreviations.
    
    Both datasets contained information per country, but the inflation dataset used abbreviations as values while the happiness dataset used full country names. To facilitate data comparison for specific countries, we needed to align the values either to abbreviations or full country names. We decided to use abbreviations for consistency. 
    

In [32]:
inflation = pd.read_csv('inflation.csv')
pd.DataFrame.head(inflation, n=5)

Unnamed: 0.1,Unnamed: 0,LOCATION,INDICATOR,SUBJECT,MEASURE,TIME,Value
0,146211,CAN,CPI,TOT,IDX2015,2020,108.2104
1,146213,CAN,CPI,TOT,IDX2015,2022,119.4957
2,149430,NLD,CPI,TOT,IDX2015,2020,107.51
3,149432,NLD,CPI,TOT,IDX2015,2022,121.4267
4,149731,NZL,CPI,TOT,IDX2015,2020,107.6488


The inflation data will be split up into two datasets to make it easier to analyse the data between the different years.

In [33]:
inflation2020 = inflation[inflation['TIME'] == 2020]
inflation2022 = inflation[inflation['TIME'] == 2022]

## Perspective 1: Inflation has a minimal impact on happiness. 
While inflation is an important economic indicator, its influence on happiness might be overshadowed by other factors. This perspective suggests that while economic stability is crucial, it may not be the sole determinant of happiness. To see if this perspective is valid, three visualisations have been created.

The first visualisation illustrates the increase the inflation between the years 2020 and 2022 per selected country. The lines in the graph represent the increase in inflation for the different countries. In the visualisation can be seen how for every counrty the inflation has increased in 2022. The graph also shows how high the inflation rates are in comparison with the inflation in 2015. The year 2015 got the value of 100, so an inflation rate of 130 means that the inflation got 30% higher in that year in comparison to 2015.

In [20]:
# Define the colors
colors = ['rgb(102,194,165)', 'rgb(252,141,98)', 'rgb(141,160,203)']

# Create the layout
layout = go.Layout(
    xaxis=go.layout.XAxis(
        type='category',  # The x-axis type is categorical
        tickvals=['2020', '2022'],  # Set custom tick values
        ticktext=['2020', '2022'],  # Set custom tick labels
    ),
#     yaxis=go.layout.YAxis(
#         tickformat="%",  # Format the y-axis labels as percentage
#     ),
    width=600,
    height=600
)

# Define the data
data = []
for country in inflation2020['LOCATION'].unique():
    # Extract the data for each country
    country_data_2020 = inflation2020[inflation2020['LOCATION'] == country]
    country_data_2022 = inflation2022[inflation2022['LOCATION'] == country]
    
    # Create a trace for each country
    trace = go.Scatter(
        x=['2020', '2022'],
        y=[country_data_2020['Value'].iloc[0], country_data_2022['Value'].iloc[0]],
        mode='lines+markers',
        name=country,
#         marker=dict(color=colors)  # Randomly assign a color from the predefined colors
    )
    
    data.append(trace)

# Create the figure with data and layout
fig = go.Figure(data=data, layout=layout)

# Update the layout and labels
fig.update_layout(
    title="Inflation Rates by Country with the year 2015 as inflation rate 100",
    xaxis_title="Year",
    yaxis_title="Inflation Rate",
)

# Display the graph
fig.show()

From this visualisation can be concluded that for every chosen country the inflation has increased in 2022 in perspective to 2020. With that said, let's start to look at the world happiness rates in 2020 and 2022. 

The second visualisation represents the happiness rate per country in 2020 and in 2022. For every country two bars have been plotted to represent the happiness rate in the two years. The orange bars represent the year 2020 and the blue represent the year 2022. 

In [83]:
# Define the colors (ChatGPT)
colors = ['rgb(102,194,165)', 'rgb(252,141,98)', 'rgb(141,160,203)']

# creeer de layout
layout = go.Layout(
    xaxis=go.layout.XAxis(
        type='category' # het type van de X as is categorisch
    ),
#     yaxis = go.layout.YAxis(
#         tickformat = ',.0%', # toon als percentage
#     ),
    height=400
)

year2020 = go.Bar(
    x=happiness_2020['Country name'],
    y=happiness_2020['Happiness score'], # by year 2020
    name='2020',
    marker=dict(color=colors[1]) #ChatGPT 
)
year2022 = go.Bar(
    x=happiness_2022['Country'],
    y=happiness_2022['Happiness score'],
    name='2022',
    marker=dict(color=colors[2]) #ChatGPT
)

# creeer het figuur
# data = [happy2020, year2020, happy2022, year2022]
data = [year2020, year2022]
fig = go.Figure(data=data, layout=layout)

# labels
fig.update_layout(
    title="World happiness rate per country in 2020 vs 2022",
    xaxis_title="Country",
    yaxis_title="Happiness Rate")
    
fig.show()

As shown in the visualisation above, the happiness rate per country in 2022 did not significantly change compared to the happiness rate in 2020. Because of this, the aim of this perspective is to explore the underlying factors contributing to the happiness rate and assess whether their distribution varied between the two years. The third visualisation has been made for this purpose.

The third visualisation illustrates the distribution of the underlying factors which make up the happiness score per year. The mean of every column was calculated to create an average distribution per year. With this visualisation can be analysed how the distribution of the happiness rate factores change when the inflation gets higher. The dropdown can be used to switch between the two years.

In [75]:
# Load the datasets
df1 = pd.read_csv('happiness_2020.csv')
df2 = pd.read_csv('happiness_2022.csv')

# Initialize the Dash app
app = dash.Dash(__name__)

# Define the layout of the app
app.layout = html.Div([
    dcc.Dropdown(
        id='dataset-dropdown',
        options=[
            {'label': 'Happiness 2020', 'value': 'df1'},
            {'label': 'Happiness 2022', 'value': 'df2'}
        ],
        value='df1',
    ),
    html.H2(id='chart-title'),
    dcc.Graph(id='pie-chart')
])

# Define the callback function to update the pie chart
@app.callback(
    [dash.dependencies.Output('pie-chart', 'figure'),
     dash.dependencies.Output('chart-title', 'children')],
    [dash.dependencies.Input('dataset-dropdown', 'value')]
)
    
    
def update_pie_chart(dataset):
    # Determine the selected dataframe based on the dropdown value
    if dataset == 'df1':
        df = df1
        dataset_name = 'Happiness 2020'
    else:
        df = df2
        dataset_name = 'Happiness 2022'
    
    # Calculate the mean of the columns for the selected dataframe
    mean_values = df.iloc[:, -7:].mean(axis=0)
    labels = mean_values.index
    values = mean_values.values
    
    # Create the pie chart figure
    fig = px.pie(values=values, names=labels, hole=0.5)
    
    # Set the chart title
    title = f"Distribution of each happiness factor - {dataset_name}"
    
    return fig, title

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True)


From visualisation above can be concluded that almost every factor of the world happiness rate decreased a little in their influence, while GDP per capita increased 9% in their influence. Because of this, the world happiness rate didn't significantly change. 

## Perspective 2

## Conclusion


## Reflection

Working on this project was an overall positive experience. There was some confusion at the start of the course with regard to the groups, where we were not sure if it was possible to form this group due to our different sub-groups. Fortunately, we were allowed to form a group together. We have learned from past projects that a strong group is the key to a succesfull result and this is why we decided on this group. From beginning to end there was a strong communication and we could rely on each other for valueable feedback. 

We began this project by deciding on a topic. This happened fairly easy and we were content with the topic of inflation and its correlation to happiness. After this the two perspectives of our project were set. We then divided the tasks based on the required results, and got to work. The tasks were evenly divided and we were able to help each other if necessary. There was some confusion around the use of github, which unfortunately led to us not being able to hand in the draft version correctly, thus losing some points. We quickly learned from our mistakes and went to focus on the next task. We did a peer review in the next lesson which was incredibly helpfull for us. This gave us the opportunity to reflect on our own graphics and receive feedback on it from outside our group. We took this feedback very seriously and started modifying our graphs to better fit the desired result. The peer review also gave us the opportunity to look at another groups' graphics and use this for inspiration in our own project. The next week we made the final changes to our graphics. In some cases we could not figure out the solution by ourselves, and for this we used generative AI (chatGPT) to help us complete the graphics. When the graphics were finished, we set on to answer our perspectives using the data we acquired from the graphics. 

We can all agree that the teamwork in our group was splendid and we are more than satisfied with the results. Whenever there was trouble, we quickly came to each others help which was possible due to the strong communication in our group. There were few disagreements about the project, and if there ever were, they were quickly resolved. 

The only problem we did have was the absence of TA's in some of our lessons, which led to us not being able to receive any feedback. We believe this held us back from improving our project further. Overall working on this project was a more than satisfactory experience

## Work distribution

| Who? | Role | Tasks |
| --- | --- | --- |
| Evan |  | Visualizations, setup Github  |
| Joep |  | Visualizations, reflection |
| Lotte |  | Data preprocessing, visualizations, documentation |
| Robin |  | Data preprocessing, visualizations, documentation |

## References
    OECD Economic Outlook. (2023). OECD iLibrary. 
        https://www.oecd-ilibrary.org/economics/oecd-economic-outlook_16097408 
    World Happiness Report Data Dashboard | The World Happiness Report. (z.d.). 
        https://worldhappiness.report/data/
        
   **Naar CHAT referencen!!!!**