## World Happiness Report 2021 - EDA

In this notebook I mainly focus on the .csv file that contains historical metric data of the countries previous to 2021.

In [29]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
import folium
import ipywidgets as widgets
from IPython.display import display

In [30]:
data = pd.read_csv("world-happiness-report.csv")

In [31]:
data.head()

Unnamed: 0,Country name,year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect
0,Afghanistan,2008,3.724,7.37,0.451,50.8,0.718,0.168,0.882,0.518,0.258
1,Afghanistan,2009,4.402,7.54,0.552,51.2,0.679,0.19,0.85,0.584,0.237
2,Afghanistan,2010,4.758,7.647,0.539,51.6,0.6,0.121,0.707,0.618,0.275
3,Afghanistan,2011,3.832,7.62,0.521,51.92,0.496,0.162,0.731,0.611,0.267
4,Afghanistan,2012,3.783,7.705,0.521,52.24,0.531,0.236,0.776,0.71,0.268


In [32]:
data.isna().sum()

Country name                          0
year                                  0
Life Ladder                           0
Log GDP per capita                   36
Social support                       13
Healthy life expectancy at birth     55
Freedom to make life choices         32
Generosity                           89
Perceptions of corruption           110
Positive affect                      22
Negative affect                      16
dtype: int64

In [33]:
data = data.rename(columns={"Country name": "country"})

Reading in the geodata for plotting.

In [34]:
countries = gpd.read_file('ne_50m_admin_0_countries/ne_50m_admin_0_countries.shp')

In [35]:
countries.head()

Unnamed: 0,featurecla,scalerank,LABELRANK,SOVEREIGNT,SOV_A3,ADM0_DIF,LEVEL,TYPE,TLC,ADMIN,...,FCLASS_TR,FCLASS_ID,FCLASS_PL,FCLASS_GR,FCLASS_IT,FCLASS_NL,FCLASS_SE,FCLASS_BD,FCLASS_UA,geometry
0,Admin-0 country,1,3,Zimbabwe,ZWE,0,2,Sovereign country,1,Zimbabwe,...,,,,,,,,,,"POLYGON ((31.28789 -22.40205, 31.19727 -22.344..."
1,Admin-0 country,1,3,Zambia,ZMB,0,2,Sovereign country,1,Zambia,...,,,,,,,,,,"POLYGON ((30.39609 -15.64307, 30.25068 -15.643..."
2,Admin-0 country,1,3,Yemen,YEM,0,2,Sovereign country,1,Yemen,...,,,,,,,,,,"MULTIPOLYGON (((53.08564 16.64839, 52.58145 16..."
3,Admin-0 country,3,2,Vietnam,VNM,0,2,Sovereign country,1,Vietnam,...,,,,,,,,,,"MULTIPOLYGON (((104.06396 10.39082, 104.08301 ..."
4,Admin-0 country,5,3,Venezuela,VEN,0,2,Sovereign country,1,Venezuela,...,,,,,,,,,,"MULTIPOLYGON (((-60.82119 9.13838, -60.94141 9..."


### Creating an interactive map

Making the country names match between two datasets.

In [36]:
countries = countries.sort_values(by = "SOVEREIGNT")

data = data.sort_values(by = "country")

In [37]:
data_countries = data['country'].unique()

In [38]:
print(f"Geometries exist for {len(countries['SOVEREIGNT'])} countries (areas).")

print(f"Metric data exists for {len(data_countries)} countries.")

Geometries exist for 242 countries (areas).
Metric data exists for 166 countries.


Manually editing the country names to get rid of mismatches.

In [39]:
metric_countries = set(data['country'].unique())

geo_countries = set(countries['SOVEREIGNT'].unique())

#Find mismatches
mismatch_metric_to_geo = metric_countries - geo_countries
mismatch_geo_to_metric = geo_countries - metric_countries

print("Countries in metric data not in geo data:", mismatch_metric_to_geo)
print()
print("Countries in geo data not in metric data:", mismatch_geo_to_metric)

Countries in metric data not in geo data: {'Somaliland region', 'United States', 'Czech Republic', 'Palestinian Territories', 'Serbia', 'Swaziland', 'Tanzania', 'Taiwan Province of China', 'Congo (Kinshasa)', 'Hong Kong S.A.R. of China', 'North Cyprus', 'Congo (Brazzaville)'}

Countries in geo data not in metric data: {'North Korea', 'San Marino', 'Czechia', 'Samoa', 'Cabo Verde', 'Taiwan', 'Saint Lucia', 'Monaco', 'Seychelles', 'eSwatini', 'Republic of the Congo', 'Tonga', 'Democratic Republic of the Congo', 'Guinea-Bissau', 'Papua New Guinea', 'Brunei', 'Equatorial Guinea', 'Saint Vincent and the Grenadines', 'Northern Cyprus', 'Kashmir', 'Grenada', 'Solomon Islands', 'The Bahamas', 'Andorra', 'Nauru', 'Tuvalu', 'Fiji', 'Antigua and Barbuda', 'Federated States of Micronesia', 'Saint Kitts and Nevis', 'United States of America', 'Vanuatu', 'Somaliland', 'Liechtenstein', 'São Tomé and Principe', 'Vatican', 'United Republic of Tanzania', 'Dominica', 'Eritrea', 'Kiribati', 'Barbados', 'R

In [40]:
rename_dict = {
    'North Cyprus': 'Northern Cyprus',
    'Tanzania': 'United Republic of Tanzania',
    'United States': 'United States of America',
    'Czech Republic': 'Czechia',
    'Somaliland region': 'Somaliland',
    'Taiwan Province of China': 'Taiwan',
    'Swaziland': 'eSwatini',
    'Congo (Kinshasa)': 'Democratic Republic of the Congo',
    'Congo (Brazzaville)': 'Republic of the Congo',
    'Serbia': 'Republic of Serbia'
}

In [41]:
data['country'] = data['country'].replace(rename_dict)

Defining the map.

In [48]:
#Defining the interactive widgets
years = data['year'].unique()
metrics = [col for col in data.columns if col not in ['year', 'country']]

year_slider = IntSlider(
    value=max(years),
    min=min(years),
    max=max(years),
    step=1,
    description='Year:',
    continuous_update=False
)

metric_dropdown = Dropdown(
    options=metrics,
    description='Metric:'
)

In [52]:
def create_map(year, metric):
    
    #Merge data with the GeoDataFrame
    merged = countries.merge(data[data['year'] == year][['country', metric]], 
                             left_on='SOVEREIGNT', right_on='country', how='left')


    m = folium.Map(location=[0, 0], zoom_start=2)

    #Defining a function to return the tooltip text
    def tooltip_text(feature):
        country = feature['properties']['SOVEREIGNT']
        value = merged.loc[merged['SOVEREIGNT'] == country, metric].values[0]
        return f'{country}: {value}'

    # Add the countries to the map with tooltips
    folium.Choropleth(
        geo_data=merged.__geo_interface__,
        name='choropleth',
        data=merged,
        columns=['SOVEREIGNT', metric],
        key_on='feature.properties.SOVEREIGNT',
        fill_color='YlOrRd',
        fill_opacity=0.7,
        line_opacity=0.2,
        legend_name=metric
    ).add_to(m)

    #Adding a GeoJson tooltip
    folium.GeoJson(
        data=merged.__geo_interface__,style_function=lambda x: {'color':'transparent', 'fillColor':'transparent', 'weight':0},
        tooltip=folium.GeoJsonTooltip(fields=['SOVEREIGNT', metric])
    ).add_to(m)

    folium.LayerControl().add_to(m)

    display(m)

In [53]:
interactive_map = interactive(create_map, year=year_slider, metric=metric_dropdown)

display(interactive_map)

interactive(children=(IntSlider(value=2010, continuous_update=False, description='Year:', max=2020, min=2005),…