# Data Story Draft
* Hugo Krijgsman (14667851)
* Ingmar Hartman (15206149)
* Julius de Groot (14362104)


## Introduction
The relationship between freedom and happiness is a crucial area of investigation, especially for informing the development of effective public policies. In Western society, there is a prevalent belief that increased freedom directly correlates with higher levels of happiness. This belief drives significant efforts to enhance personal and political freedoms. However, the reality is more complex. There are countries where people face greater limitations in their freedoms but still report high levels of happiness.

This report utilizes two key datasets to explore the relationship between freedom and happiness. The first dataset, from the Human Freedom Index, measures various aspects of freedom including personal, civil, and economic dimensions. The second dataset, from the World Happiness Report, provides happiness scores based on indicators such as GDP per capita, social support, and life expectancy.

By comparing these datasets, we aim to uncover correlations and patterns that reveal how different dimensions of freedom influence overall happiness across countries.

### Perspectives
**The more freedom citizens of a country experience, the happier they are.**

This perspective suggests that increased personal freedoms, such as freedom of speech, movement, and choice, correlate directly with higher levels of happiness among citizens. It posits that when individuals have greater autonomy and fewer restrictions, their overall well-being and satisfaction with life improve.
* When people have more control, there overall well-being increases.
* Societies with more personal freedom provide more opportunities for personal growth.
* Countries with more economical freedom have higher standards of living.

**There is no correlation between freedom and happiness citizens of a country experience.**

This perspective argues that the level of personal freedom in a country does not necessarily affect the happiness of its citizens. It suggests that other factors, such as economic stability, cultural values, and social support, may play a more significant role in determining overall happiness.
* Some societies prioritize communities over personal autonomy, yet report high levels of happiness.
* Citizens might adjust their expectations and find joy in everyday life, family, and social relationships.
* Authoritarian regimes provide economic benefits and social services, leading to high life satisfaction.

### Dependencies 



### Imports

In [25]:
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
import pycountry 

from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)

## Datasets


### World Happiness Report
The World Happiness Report dataset on Kaggle contains data from the annual World Happiness Report, offering insights into the happiness levels of countries worldwide. The dataset includes information on several key indicators such as GDP per capita, social support, healthy life expectancy, freedom to make life choices, generosity, and perceptions of corruption. This comprehensive dataset allows for detailed analysis and comparison of the well-being and happiness of different nations, aiding researchers and analysts in understanding the factors that contribute to a country's overall happiness.

The script processes this dataset as follows:

1. The dataset is loaded as a pandas DataFrame from a CSV file named "happiness.csv".
2. The column "Country or region" is renamed to "Country" for consistency.
3. The columns "Country or region" and "Overall rank" are removed from the DataFrame.
4. The "Score" column is renamed to "Happiness Score" for clarity.
5. Finally, the first few rows of the modified DataFrame are displayed to inspect the data.

These steps ensure the dataset is ready for analysis by normalizing column names and removing unnecessary columns.

In [26]:
happiness_data = pd.read_csv("happiness.csv")
happiness_data["Country"] = happiness_data["Country or region"]
happiness_data = happiness_data.drop(columns=["Country or region", "Overall rank"])
happiness_data = happiness_data.rename(columns={"Score": "Happiness Score"})

happiness_data.head()

Unnamed: 0,Happiness Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Country
0,7.769,1.34,1.587,0.986,0.596,0.153,0.393,Finland
1,7.6,1.383,1.573,0.996,0.592,0.252,0.41,Denmark
2,7.554,1.488,1.582,1.028,0.603,0.271,0.341,Norway
3,7.494,1.38,1.624,1.026,0.591,0.354,0.118,Iceland
4,7.488,1.396,1.522,0.999,0.557,0.322,0.298,Netherlands


### Human Freedom Index

The Human Freedom Index dataset on Kaggle includes comprehensive data on the state of human freedom globally. This dataset combines indicators of personal and economic freedom, covering a wide range of topics such as rule of law, security and safety, movement, religion, association, assembly, civil society, expression and information, identity and relationships, and economic freedoms like regulation and freedom to trade internationally. This extensive dataset facilitates in-depth analysis of the factors that influence human freedom in various countries.

The script processes this dataset as follows:

1. The dataset is loaded as a pandas DataFrame from a CSV file, with memory usage optimized. A list `columns_to_drop` is created to keep track of columns that will be removed.

2. The column "countries" is renamed to "Country" for consistency, and "region" is added to the list of columns to be removed. The values in the "year" column are converted to numeric values, and the dataset is filtered to retain only data for 2019. The "year" column is then also added to the list of columns to be removed.

3. The DataFrame index is reset, and the new "index" column is added to the list of columns to be removed. All columns in `columns_to_drop` are then removed.

4. Finally, the first few rows of the modified DataFrame are displayed to inspect the data. These steps ensure a dataset ready for analysis by removing redundant columns, filtering for the year 2019, and normalizing column names.

In [27]:
freedom_data = pd.read_csv("freedom.csv", low_memory=False)
columns_to_drop = []

# Normalize country column
freedom_data = freedom_data.rename(columns={"countries": "Country"})
columns_to_drop.extend(["region"])

# Filter out years other than 2019
freedom_data["year"] = pd.to_numeric(freedom_data["year"], errors='coerce')
freedom_data = freedom_data[freedom_data["year"] == 2019]
columns_to_drop.append("year")

# Reset and Drop
freedom_data = freedom_data.reset_index()
columns_to_drop.append("index")
freedom_data = freedom_data.drop(columns=columns_to_drop)

freedom_data.head()

Unnamed: 0,Country,hf_score,hf_rank,hf_quartile,pf_rol_procedural,pf_rol_civil,pf_rol_criminal,pf_rol_vdem,pf_rol,pf_ss_homicide,...,ef_regulation_business_adm,ef_regulation_business_burden,ef_regulation_business_start,ef_regulation_business_impartial,ef_regulation_business_licensing,ef_regulation_business_compliance,ef_regulation_business,ef_regulation,ef_score,ef_rank
0,Albania,8.07,42.0,2.0,5.903741,4.725831,4.047825,7.375907,4.892466,9.343023,...,5.651538,6.666667,9.742477,6.2425,5.62194,7.17525,6.850062,7.700885,7.79,31.0
1,Algeria,5.08,155.0,4.0,4.913311,5.503872,4.254187,5.345021,4.890457,9.613372,...,4.215154,2.222222,9.305002,2.5775,8.771111,7.029528,5.686753,5.840164,4.86,159.0
2,Angola,5.96,127.0,4.0,2.773262,4.352009,3.47895,5.2643,3.53474,8.590305,...,2.937894,2.444444,8.730805,4.7025,7.916416,6.782923,5.58583,5.974672,5.55,153.0
3,Argentina,7.33,75.0,2.0,6.824288,5.679943,4.218635,6.570627,5.574289,8.505814,...,2.714233,5.777778,9.579288,6.53,5.726521,6.508295,6.139352,5.994265,5.44,154.0
4,Armenia,8.32,34.0,1.0,,,,7.287006,7.287006,9.281977,...,5.170406,5.555556,9.86353,6.9575,9.302574,7.040738,7.315051,7.819774,7.98,17.0


### Merging Both Datasets

To merge the datasets, we first needed to filter some irrelevant information away. This meant only keeping the data from 2019. We then decided to take the ISO code of each country, a unique 3 letter code for every internationally recognised country. After this we appended every other variable to the row per ISO code. This still left a couple of countries without representation (listed below). So for these a bit of manual work was needed to make them fit.

In [28]:
# Applying the corrected code to merge datasets with the right column names
# Define a dictionary for mapping different country names to a standardized name

# Function for converting to iso 
def getIsoCode(country_name):
    try:
        country_iso = pycountry.countries.search_fuzzy(country_name)[0]
        return country_iso.alpha_3
    except LookupError:
        return ""
    
country_name_mapping = {
    "Bahamas, The": "Bahamas",
    "Congo, Rep.": "Congo",
    "Cote d'Ivoire": "Ivory Coast",
    "Egypt, Arab Rep.": "Egypt",
    "Gambia, The": "Gambia",
    "Iran, Islamic Rep.": "Iran",
    "Korea, Rep.": "South Korea",
    "Kyrgyz Republic": "Kyrgyzstan",
    "Lao PDR": "Laos",
    "Russia": "Russian Federation",
    "Slovak Republic": "Slovakia",
    "Swaziland": "Eswatini",
    "Syria": "Syrian Arab Republic",
    "Trinidad & Tobago": "Trinidad and Tobago",
    "Venezuela, RB": "Venezuela",
    "Yemen, Rep.": "Yemen"
}

freedom_data['Country'] = freedom_data['Country'].replace(country_name_mapping)
happiness_data['Country'] = happiness_data['Country'].replace(country_name_mapping)

happiness_data["iso"] = happiness_data['Country'].apply(getIsoCode)
freedom_data["iso"] = freedom_data['Country'].apply(getIsoCode)

data = pd.merge(freedom_data, happiness_data, on='iso', how='left')
data.rename(columns={
    'hf_score': 'Human Freedom Score',
    'ef_score': 'Economic Freedom Score',
    'pf_score': 'Personal Freedom Score',
    'Country_x': "Country"
}, inplace=True)

data = data.drop(columns='Country_y')

data.head()

Unnamed: 0,Country,Human Freedom Score,hf_rank,hf_quartile,pf_rol_procedural,pf_rol_civil,pf_rol_criminal,pf_rol_vdem,pf_rol,pf_ss_homicide,...,Economic Freedom Score,ef_rank,iso,Happiness Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
0,Albania,8.07,42.0,2.0,5.903741,4.725831,4.047825,7.375907,4.892466,9.343023,...,7.79,31.0,ALB,4.719,0.947,0.848,0.874,0.383,0.178,0.027
1,Algeria,5.08,155.0,4.0,4.913311,5.503872,4.254187,5.345021,4.890457,9.613372,...,4.86,159.0,DZA,5.211,1.002,1.16,0.785,0.086,0.073,0.114
2,Angola,5.96,127.0,4.0,2.773262,4.352009,3.47895,5.2643,3.53474,8.590305,...,5.55,153.0,AGO,,,,,,,
3,Argentina,7.33,75.0,2.0,6.824288,5.679943,4.218635,6.570627,5.574289,8.505814,...,5.44,154.0,ARG,6.086,1.092,1.432,0.881,0.471,0.066,0.05
4,Armenia,8.32,34.0,1.0,,,,7.287006,7.287006,9.281977,...,7.98,17.0,ARM,4.559,0.85,1.055,0.815,0.283,0.095,0.064


In [29]:
fig = go.Figure()

# Add freedom scores bar
fig.add_trace(go.Bar(
    x=data['Country'],
    y=data['Human Freedom Score'],
    name='Human Freedom Score',
    marker_color='blue'
))

# Add happiness scores bar
fig.add_trace(go.Bar(
    x=data['Country'],
    y=data['Happiness Score'],
    name='Happiness Score',
    marker_color='green'
))

# Customize the layout
fig.update_layout(
    title='Freedom and Happiness Scores Across Countries',
    xaxis=dict(title='Country'),
    yaxis=dict(title='Score'),
    barmode='group',  # Grouped bars
    legend=dict(title='Scores'),
    template='plotly'
)

# Show the plot
fig.show()

*This is a draft, more about the idea instead of the look of the graph. Maybe we will make it with the countries on the y-axis, or split it in multiple graphs.*

In [30]:
# Create the scatter plots

fig = make_subplots(rows=1, cols=3, subplot_titles=("Human Freedom Score", "Economic Freedom Score", "Personal Freedom Score"))

scatter1 = go.Scatter(
    x=data["Human Freedom Score"], y=data["Happiness Score"],
    mode='markers',
    marker=dict(size=4, color='red', opacity=0.7),
    name='Human Freedom Score'
)

scatter2 = go.Scatter(
    x=data["Economic Freedom Score"], y=data["Happiness Score"],
    mode='markers',
    marker=dict(size=4, color='green', opacity=0.7),
    name='Economic Freedom Score'
)

scatter3 = go.Scatter(
    x=data["Personal Freedom Score"], y=data["Happiness Score"],
    mode='markers',
    marker=dict(size=4, color='blue', opacity=0.7),
    name='Personal Freedom Score'
)

# Create the layout
layout = go.Layout(
    title='Three Scatter Plots',
    xaxis=dict(title='X Axis'),
    yaxis=dict(title='Y Axis'),
    showlegend=True
)

# Add the scatter plots to the figure
fig.add_trace(scatter1, row=1, col=1)
fig.add_trace(scatter2, row=1, col=2)
fig.add_trace(scatter3, row=1, col=3)

# Update layout
fig.update_layout(height=400, width=1200, title_text="Freedom Scores Against Happiness")

# Show the plot
fig.show()

In [31]:
fig = px.choropleth(data, locations="iso",
                   title='Freedom around the world',
                   color="Human Freedom Score",
                   hover_name="iso",
                   hover_data=['iso'],
                   color_continuous_scale=px.colors.sequential.Redor_r, 
                   )


fig_2 = px.choropleth(data, locations="iso",
                   title='Happiness',
                   color="Happiness Score",
                   hover_name="iso",
                   hover_data=['iso'],
                   color_continuous_scale=px.colors.sequential.Redor_r, 
                   )
fig.update_layout(
   width=1000,  # Adjust the width as needed
   height=500  # Adjust the height if needed
)
fig_2.update_layout(
   width=1000,  # Adjust the width as needed
   height=500  # Adjust the height if needed
)
fig.add_annotation(
   x=0.5,
   y=-0.2,
   xref='paper',
   yref='paper',
   showarrow=False,
   text=(
       'Correlation Matrix of Freedom, Happiness, and Health Indicators: This heatmap illustrates the<br>'
       'relationships between various measures of freedom (human, economic, and personal), happiness scores,<br>'
       'healthy life expectancy, and perceptions of corruption, with correlation coefficients<br>'
       'ranging from 0 (no correlation) to 1 (perfect correlation).'
   ),
   font=dict(size=10),
   align='center',
   width=700  # Adjust this value as needed
)

fig_2.add_annotation(
   x=0.5,
   y=-0.2,
   xref='paper',
   yref='paper',
   showarrow=False,
   text=(
       'Global Distribution of Personal Freedom Scores: A map showcasing the varying levels of personal freedom<br>'
       'experienced by citizens around the world, with scores ranging from low (red) to high (yellow).'
   ),
   font=dict(size=10),
   align='center',
   width=700  # Adjust this value as needed
)
fig.show()
fig_2.show()



In [32]:
columns_to_use = ["Human Freedom Score", "Economic Freedom Score", "Personal Freedom Score", "Happiness Score", "Healthy life expectancy", "Perceptions of corruption"]
corr_matrix = data[columns_to_use].corr()


fig_corr_matrix = px.imshow(corr_matrix,title='Correlation Matrix of Freedom, Happiness, and Health Indicators')
fig_corr_matrix.update_layout(
    width=1000,  # Adjust the width as needed
    height=500  # Adjust the height if needed
)
fig_corr_matrix.show()