In [312]:
import datetime as dt
import pandas as pd
import numpy as np

import plotly.io as pio
import plotly.express as px
import plotly.graph_objects as go

#read your mapbox credentials from creds.py
from creds import *

px.set_mapbox_access_token(mapbox_access_token)

In [313]:
df = pd.read_excel("DataPanelWHR2021C2.xls")
df.rename(columns={"Healthy life expectancy at birth": "Healthy life expectancy (Age)"}, inplace=True)

## Figure 1: Countries per Year Visualization ##

Start by organizing the data by year using sort_values.

In [120]:
df_by_year = df.sort_values("year")

Create a histogram using plotly express that displays the number of countries with data collected each year.

In [31]:
fig = px.bar(df_by_year.year.value_counts(), title="World Happiness Report: Countries with Data Collected per Year", 
             labels={'index':'year','value':'# of Countries with Data'}, hover_data={'variable':False})
fig.show()

This visualization tells us that certain years have less data, most notably 2005 and 2020. In order to ensure a relatively consistent sample size of countries, it makes sense to use the years 2011 to 2019 which is in accordance with the World Happiness Report's official publications starting in 2012 (for the year 2011).

In [32]:
pio.write_html(fig, "Countries_Per_Year.html", full_html=False) #export this figure as an html file for the web page

## Figure 2: Global Choropleths ##

This will create in total 19 choropleths for all the countries included: one for each column with a slider for years (9 total), and one for each year with a dropdown to select a column (10 total). Ideally these 19 could be combined into a single choropleth with both a slider for years and a dropdown for columns, however, due to the nature of bool visibility for traces this does not seem possible. The slider and dropdown features use separate trackers for which trace should be visible and so moving the slider would change the trace based on the slider's visibility mask but would not take into account the visibility mask of the dropdown. This is unfortunate as one graph would be much nicer to look at and interact with than 19, but at the moment I am incapable of resolving this issue.

The first step to enable choropleth visualizations is adding the necessary geojson codes for each country to our data as the World Happiness Report does not include this data. The following csv does include the proper codes.

In [47]:
df_country_codes = pd.read_csv("2014_world_gdp_with_codes.csv")

We will merge our WHR data with the geojson codes and drop the other columns added in the merge.

In [48]:
df_with_codes = df.merge(df_country_codes, how="inner", left_on="Country name", right_on="COUNTRY")
df_with_codes.drop(["GDP (BILLIONS)","COUNTRY"], axis=1, inplace=True)

This function simplifies our code for the actual graph as both the sliders and dropdowns use essentially the same dictionary to determine the label, visible traces, and annotations that are displayed.

In [49]:
def slider_dict(label, visibles, annotation1, annotation2):
    return {'method': 'update', 'label':label,'args':[{'visible': visibles}, {'annotations': [annotation1, annotation2]}]}

This next segment initializes the lists to be used for both the slider and dropdown graphs.
- FIELDS creates a 10x10 list of lists that serves as the visibility mask for both slider and dropdown (although the dropdown excludes the last row and column as there are only 9 columns to be selected by dropdown)
- COLORSCALES is a list of different plotly colorscales that the choropleth accepts and changes its color. The ones in this list are my personal favorites and my reason for including multiple colors is to provide visual feedback to the viewer so that they can easily see when they are looking at graphs for different columns. The last two columns are the same to the show the connection between positive and negative affect.
- GRAPH_COLUMNS selects the columns from the WHR data that have information worth graphing, i.e. dropping the country name, year, and geojson code and leaving all the rest.

In [314]:
fields = np.zeros(100, dtype=bool)
fields = fields.reshape(10,10)
n=0
for i in range(10):
    fields[i][n]=True
    n+=1
    
colorscales = ['aggrnyl', 'agsunset', 'bluered', 'viridis', 'rdpu', 'pinkyl', 'plasma', 'plotly3', 'plotly3']

graph_columns = list(df_with_codes.columns)
graph_columns.remove("Country name")
graph_columns.remove("year")
graph_columns.remove("CODE")

This next segment uses a for-loop to create a distinct graph for each column with a slider for the years. The steps it goes through are as follows:
1. Create a figure and make an empty list that will be filled with the dictionaries for sliders
1. Set the starting year to 2011
1. Begin a while loop that goes through each of the years from 2011 to 2020
    * Make a new dataframe for the specific year (ex. 2011)
    * Add a choropleth for the current year based on the column from the for-loop and set it to the corresponding color
    * Make that choropleth no longer visible
    * Find the highest and lowest scores for that year and the corresponding countres
    * Create annotations that show which country had the highest score and which had the lowest score
    * Use the earlier defined function "slider_dict" to add a dictionary to the list of sliders
    * Increment the year
1. Once a trace has been made for each year, properly format the sliders list to be plotly acceptable
1. Update the layout of the figure to include all the sliders and a title based on the current column
1. Set the first trace to be visible and change the title to make it easier for exporting to html
1. Export the graph and repeat the entire process with the next column

In [79]:
for graph in graph_columns:

    fig = go.Figure()
    sliders = []

    year = 2011

    while (year < 2021):
        df = df_with_codes[df_with_codes["year"]==year]
        
        fig.add_trace(go.Choropleth(locations=df["CODE"], z=df[graph], text = df["Country name"], autocolorscale=False,
             colorscale=colorscales[graph_columns.index(graph)], marker_line_color='white', marker_line_width=0.5, colorbar_title=f"{graph}"))

        fig.data[year%2011].visible=False
        
        max_of_graph = df[df[graph]==df[graph].max()]
        highest_score = (max_of_graph["Country name"].values[0], round(max_of_graph[graph].values[0],3))
        min_of_graph = df[df[graph]==df[graph].min()]
        lowest_score = (min_of_graph["Country name"].values[0], round(min_of_graph[graph].values[0],3))
        max_annotation = {'x':1.1, 'y':1.2, 'xref':'paper', 'yref':'paper', 'text':"{0} has the highest score of ~ {1}".format(highest_score[0], highest_score[1]), 
                           'font':{'size':10, 'color':'white'}, 'showarrow':False, 'borderwidth':9, 'borderpad':4, 'bgcolor':'grey'}
        min_annotation = {'x':1.1, 'y':1.1, 'xref':'paper', 'yref':'paper', 'text':"{0} has the lowest score of ~ {1}".format(lowest_score[0], lowest_score[1]), 
                           'font':{'size':10, 'color':'white'}, 'showarrow':False, 'borderwidth':9, 'borderpad':4, 'bgcolor':'grey'}
        
        sliders.append(slider_dict(year, fields[year%2011], max_annotation, min_annotation))
        
        year = year+1
        
    
    sliders = [{'steps':sliders}]
    
    fig.update_layout({'sliders': sliders}, title_text=f'World Happiness Report: {graph}', geo=dict(showframe=False, showcoastlines=False, projection_type="equirectangular"),
                     autosize=False, height=600, width=950)

    fig.data[0].visible=True
    
    title = graph.replace(" ", "_")
    #pio.write_html(fig, f"WHR-{title}.html", full_html=False)
    #fig.show()
    

This next segment creates the graphs for each year with a dropdown to choose a column. It is very similar to the last segment except that now for every year it adds a trace for each column instead of vice versa. Additionally, "buttons_list" serves the same function as "sliders" and there is a minor difference in the way dropdown menus are formatted than the way sliders are.


In [274]:
year = 2011

while (year<2021):
    
    df = df_with_codes[df_with_codes['year']==year]

    fig = go.Figure()
    n=0
    buttons_list = []

    for graph in graph_columns:
            fig.add_trace(go.Choropleth(locations=df["CODE"], z=df[graph], text = df["Country name"], autocolorscale=False,
                    colorscale=colorscales[n], marker_line_color='white', marker_line_width=0.5, colorbar_title=f"{graph}"))   
            label = "{0}: {1}".format(year, graph_columns[n])
            
            max_of_graph = df[df[graph]==df[graph].max()]
            highest_score = (max_of_graph["Country name"].values[0], round(max_of_graph[graph].values[0],3))
            min_of_graph = df[df[graph]==df[graph].min()]
            lowest_score = (min_of_graph["Country name"].values[0], round(min_of_graph[graph].values[0],3))
            max_annotation = {'x':1.1, 'y':1.2, 'xref':'paper', 'yref':'paper', 'text':"{0} has the highest score of ~ {1}".format(highest_score[0], highest_score[1]), 
                           'font':{'size':10, 'color':'white'}, 'showarrow':False, 'borderwidth':9, 'borderpad':4, 'bgcolor':'grey'}
            min_annotation = {'x':1.1, 'y':1.1, 'xref':'paper', 'yref':'paper', 'text':"{0} has the lowest score of ~ {1}".format(lowest_score[0], lowest_score[1]), 
                           'font':{'size':10, 'color':'white'}, 'showarrow':False, 'borderwidth':9, 'borderpad':4, 'bgcolor':'grey'}
            
            buttons_list.append(slider_dict(label, fields[n][:-1], max_annotation, min_annotation))
            fig.data[n].visible=False
            n+=1


    fig.update_layout(updatemenus=[dict(active=0, buttons=buttons_list, x=0, xanchor="left", y=1.05,yanchor="top")],
                    title="World Happiness Report: {0}".format(year), geo=dict(showframe=False, showcoastlines=False, projection_type="equirectangular"),
                    autosize=False, height=600, width=950)

    fig.data[0].visible=True
    
    #pio.write_html(fig, f"WHR-{year}.html", full_html=False)
    #fig.show()
    
    year+=1


## Figure 3: Improvement Choropleth

To find the total improvement of each country, we will merge the 2011 and 2019 data so that all the information we need is in one place. The reason we are selecting the year 2019 instead of 2020 is because there are significantly less countries with data for the year 2020.

In [340]:
df_2011 = df_with_codes[df_with_codes["year"] == 2011]
df_2019 = df_with_codes[df_with_codes["year"] == 2019]

Here we merge the two years and drop both the NaN values and the columns we will no longer need. Dropping the NaN values ensures that each country has data for both 2011 and 2019. We go from ~136 countries with information to 112 countries as a result of this.

In [341]:
df_change = df_2011.merge(df_2019, how="outer", left_on="Country name", right_on="Country name")
df_change.dropna(inplace=True)
df_change.drop(columns=['year_x', 'year_y', 'CODE_x'])

Unnamed: 0,Country name,Life Ladder_x,Log GDP per capita_x,Social support_x,Healthy life expectancy (Age)_x,Freedom to make life choices_x,Generosity_x,Perceptions of corruption_x,Positive affect_x,Negative affect_x,Life Ladder_y,Log GDP per capita_y,Social support_y,Healthy life expectancy (Age)_y,Freedom to make life choices_y,Generosity_y,Perceptions of corruption_y,Positive affect_y,Negative affect_y,CODE_y
0,Afghanistan,3.831719,7.619532,0.521104,51.919998,0.495901,0.162427,0.731109,0.611387,0.267175,2.375092,7.697248,0.419973,52.400002,0.393656,-0.108459,0.923849,0.351387,0.502474,AFG
1,Albania,5.867422,9.331056,0.759434,66.680000,0.487496,-0.204594,0.877003,0.627659,0.256577,4.995318,9.544080,0.686365,69.000000,0.777351,-0.099263,0.914284,0.681080,0.273827,ALB
2,Algeria,5.317194,9.296691,0.810234,64.660004,0.529561,-0.180654,0.637982,0.550203,0.254897,4.744627,9.336946,0.803259,66.099998,0.385083,0.005087,0.740609,0.584944,0.215198,DZA
4,Argentina,6.775805,10.112445,0.889073,67.480003,0.815802,-0.173993,0.754646,0.840048,0.231855,6.085561,10.000340,0.896371,69.000000,0.817053,-0.210719,0.830460,0.825965,0.319055,ARG
5,Armenia,4.260491,9.182483,0.705108,65.360001,0.464525,-0.225479,0.874601,0.474549,0.459074,5.488087,9.521770,0.781604,67.199997,0.844324,-0.172369,0.583473,0.598238,0.430463,ARM
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
131,Uruguay,6.554047,9.829510,0.891282,68.139999,0.851442,-0.080015,0.556286,0.805346,0.252250,6.600337,9.978644,0.933471,69.099998,0.902679,-0.095303,0.599400,0.888966,0.221730,URY
132,Uzbekistan,5.738744,8.493077,0.924071,63.400002,0.934133,0.042083,0.521862,0.787154,0.122773,6.154049,8.853480,0.915276,65.400002,0.970295,0.304298,0.511197,0.844809,0.219746,UZB
134,Vietnam,5.767344,8.585228,0.897655,66.660004,0.818404,0.104993,0.742162,0.531590,0.192669,5.467451,8.992331,0.847592,68.099998,0.952469,-0.125531,0.787889,0.751160,0.185610,VNM
136,Zambia,4.999114,8.071311,0.864023,50.840000,0.662850,0.002801,0.882150,0.833214,0.204070,3.306797,8.154642,0.637894,55.799999,0.811040,0.077462,0.831956,0.743407,0.394385,ZMB


This next segment adds new columns to the dataframe that calculate the total change by subtracting the number from the 2011 data from the corresponding number in the 2019 data, calculating the change for each individual country across all 9 factors.
Sidenote: The commented line at the bottom removes the data used to make these calculations from the dataset, leaving just the changes. For the following graph to work, it must be called somewhere, however, if you also wish to view the percent change you should wait to call into until you have created those columns.

In [342]:
df_change["Life Ladder"] = df_change["Life Ladder_y"] - df_change["Life Ladder_x"]
df_change["Log GDP per capita"] = df_change["Log GDP per capita_y"] - df_change["Log GDP per capita_x"]
df_change["Social Support"] = df_change["Social support_y"] - df_change["Social support_x"]
df_change["Life Expectancy"] = df_change["Healthy life expectancy (Age)_y"] - df_change["Healthy life expectancy (Age)_x"]
df_change["Life Choices"] = df_change["Freedom to make life choices_y"] - df_change["Freedom to make life choices_x"]
df_change["Generosity"] = df_change["Generosity_y"] - df_change["Generosity_x"]
df_change["Perceptions of corruption"] = df_change["Perceptions of corruption_y"] - df_change["Perceptions of corruption_x"]
df_change["Positive Affect"] = df_change["Positive affect_y"] - df_change["Positive affect_x"]
df_change["Negative Affect"] = df_change["Negative affect_y"] - df_change["Negative affect_x"]
#df_change.drop(columns = df_change.columns[1:22], inplace=True)



This next segment is the for-loop that creates a trace for each of the unique factors, essentially the same as for the second choropleth above except not making a new set for every year. Some minor changes are changing the color scale so it is easier to see which have increased and which have decreased, and also centering the colorbar at 0 using zmid.

In [334]:
fig = go.Figure()
n=0
columns = df_change.columns[2:11]
buttons_list = []

for column in (columns):
    fig.add_trace(go.Choropleth(locations=df_change["CODE_y"], z=df_change[column], autocolorscale=False, zmid=0, text = df_change["Country name"],
                    colorscale="rdylgn", marker_line_color='white', marker_line_width=1, colorbar_title=f"{column} Change"))
    
    label = f"{column}"
    
    max_of_graph = df_change[df_change[column]==df_change[column].max()]
    highest_score = (max_of_graph["Country name"].values[0], round(max_of_graph[column].values[0],3))
    min_of_graph = df_change[df_change[column]==df_change[column].min()]
    lowest_score = (min_of_graph["Country name"].values[0], round(min_of_graph[column].values[0],3))
    max_annotation = {'x':1.1, 'y':1.2, 'xref':'paper', 'yref':'paper', 'text':"{0} has the highest improvement of ~ {1}".format(highest_score[0], highest_score[1]), 
                           'font':{'size':10, 'color':'white'}, 'showarrow':False, 'borderwidth':9, 'borderpad':4, 'bgcolor':'grey'}
    min_annotation = {'x':1.1, 'y':1.1, 'xref':'paper', 'yref':'paper', 'text':"{0} has the lowest improvement of ~ {1}".format(lowest_score[0], lowest_score[1]), 
                           'font':{'size':10, 'color':'white'}, 'showarrow':False, 'borderwidth':9, 'borderpad':4, 'bgcolor':'grey'}
            
    buttons_list.append(slider_dict(label, fields[n][:-1], max_annotation, min_annotation))
    fig.data[n].visible=False
    n+=1
            
fig.update_layout(updatemenus=[dict(active=0, buttons=buttons_list, x=0, xanchor="left", y=1.05,yanchor="top")], title="World Happiness Report: 2011-2019 Total Change".format(year), geo=dict(showframe=False, showcoastlines=False, projection_type="equirectangular"),
                    autosize=False, height=600, width=950)


fig.data[0].visible=True

pio.write_html(fig, f"WHR-total_change.html", full_html=False)
fig.show()


This next segment adds the percent change for each country and all categories. This was calculated by finding (Increase) / 2011_value * 100.

Sidenote: As mentioned previously, this will only work if the df_change.drop(etc.) is not called when the previous columns are created. However, for both graphs to work it must be called here (if you want to view both of them).

In [344]:
df_change["Life Ladder %"] = ((df_change["Life Ladder"])/df_change["Life Ladder_x"])*100
df_change["Log GDP per capita %"] = ((df_change["Log GDP per capita"])/df_change["Log GDP per capita_x"])*100
df_change["Social Support %"] = ((df_change["Social Support"])/df_change["Social support_x"])*100
df_change["Life Expectancy %"] = ((df_change["Life Expectancy"])/df_change["Healthy life expectancy (Age)_x"])*100
df_change["Life Choices %"] = ((df_change["Life Choices"])/df_change["Freedom to make life choices_x"])*100
df_change["Generosity %"] = ((df_change["Generosity"])/df_change["Generosity_x"])*100
df_change["Perceptions of corruption %"] = ((df_change["Perceptions of corruption"])/df_change["Perceptions of corruption_x"])*100
df_change["Positive Affect %"] = ((df_change["Positive Affect"])/df_change["Positive affect_x"])*100
df_change["Negative Affect %"] = ((df_change["Negative Affect"])/df_change["Negative affect_x"])*100
df_change.drop(columns = df_change.columns[1:22], inplace=True)

Only difference between this graph and the previous is that columns now starts at the 12th column instead of the 3rd and goes through the rest.

In [345]:
fig = go.Figure()
n=0
columns = df_change.columns[11:]
buttons_list = []

for column in (columns):
    fig.add_trace(go.Choropleth(locations=df_change["CODE_y"], z=df_change[column], autocolorscale=False, zmid=0,
                    colorscale="rdylgn", marker_line_color='white', marker_line_width=1, colorbar_title=f"{column} Change"))
    
    label = f"{column}"
    
    max_of_graph = df_change[df_change[column]==df_change[column].max()]
    highest_score = (max_of_graph["Country name"].values[0], round(max_of_graph[column].values[0],3))
    min_of_graph = df_change[df_change[column]==df_change[column].min()]
    lowest_score = (min_of_graph["Country name"].values[0], round(min_of_graph[column].values[0],3))
    max_annotation = {'x':1.1, 'y':1.2, 'xref':'paper', 'yref':'paper', 'text':"{0} has the highest improvement of ~ {1}".format(highest_score[0], highest_score[1]), 
                           'font':{'size':10, 'color':'white'}, 'showarrow':False, 'borderwidth':9, 'borderpad':4, 'bgcolor':'grey'}
    min_annotation = {'x':1.1, 'y':1.1, 'xref':'paper', 'yref':'paper', 'text':"{0} has the lowest improvement of ~ {1}".format(lowest_score[0], lowest_score[1]), 
                           'font':{'size':10, 'color':'white'}, 'showarrow':False, 'borderwidth':9, 'borderpad':4, 'bgcolor':'grey'}
            
    buttons_list.append(slider_dict(label, fields[n][:-1], max_annotation, min_annotation))
    fig.data[n].visible=False
    n+=1
            
fig.update_layout(updatemenus=[dict(active=0, buttons=buttons_list, x=0, xanchor="left", y=1.05,yanchor="top")], title="World Happiness Report: 2011-2019 Percent Change".format(year), geo=dict(showframe=False, showcoastlines=False, projection_type="equirectangular"),
                    autosize=False, height=600, width=950)

fig.data[0].visible=True

pio.write_html(fig, f"WHR-percent_change.html", full_html=False)
fig.show()

Now this visualization turns out to be fairly interesting. While much of it is what you would expect, the Generosity visualization is absurd. With the greatest increase being of 8000% and the greatest decrease being -2300%, needless to say I found this quite interesting. Looking more in particular at these two countries, I found two things that I hadn't noticed previously. First of all, looking at the Generosity columns I found both to be surprisingly noisy, changing by orders of magnitude with no consistent increase or decrease. Furthermore, I had not expected the Generosity column to be negative, or in the case of Panama, swinging wildy back and forth between the two.

In [320]:
df[df["Country name"] == "Latvia"]

Unnamed: 0,Country name,year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy (Age),Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect
954,Latvia,2006,4.709502,10.032048,0.884499,63.16,0.640807,-0.229206,0.937049,0.654296,0.234135
955,Latvia,2007,4.666972,10.135619,0.835509,63.52,0.700174,-0.166734,0.923953,0.672521,0.246863
956,Latvia,2008,5.145375,10.112092,0.855418,63.880001,0.630111,-0.203171,0.926328,0.638644,0.214901
957,Latvia,2009,4.668911,9.975006,0.806939,64.239998,0.437065,-0.180326,0.94209,0.525005,0.242197
958,Latvia,2011,4.966812,10.029216,0.836042,64.860001,0.564464,-0.002395,0.934256,0.563278,0.221713
959,Latvia,2012,5.125025,10.08213,0.851195,65.120003,0.563812,-0.037763,0.894979,0.560013,0.232225
960,Latvia,2013,5.06977,10.115854,0.834023,65.379997,0.630508,-0.072923,0.836554,0.642102,0.227449
961,Latvia,2014,5.729115,10.144242,0.881256,65.639999,0.670653,-0.043007,0.803688,0.652273,0.225979
962,Latvia,2015,5.880598,10.184513,0.879372,65.900002,0.656393,-0.077392,0.8084,0.60838,0.228137
963,Latvia,2016,5.940446,10.211235,0.917074,66.199997,0.685299,-0.156372,0.86764,0.653751,0.231384


In [321]:
df[df["Country name"] == "Panama"]

Unnamed: 0,Country name,year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy (Age),Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect
1339,Panama,2006,6.127988,9.763903,0.95098,67.900002,0.882047,-0.047107,0.911756,0.845192,0.232063
1340,Panama,2007,6.89414,9.858971,0.937078,68.0,0.640219,0.083109,0.915287,0.819987,0.149341
1341,Panama,2008,6.930903,9.935025,0.922481,68.099998,0.707385,0.059698,0.880651,0.819301,0.150143
1342,Panama,2009,7.03374,9.929617,0.905029,68.199997,0.721394,0.014429,0.889424,0.883028,0.1442
1343,Panama,2010,7.321467,9.968682,0.927533,68.300003,0.754524,-0.008531,0.879826,0.887585,0.146369
1344,Panama,2011,7.248081,10.058502,0.876284,68.5,0.829013,0.008965,0.839684,0.885293,0.179641
1345,Panama,2012,6.859836,10.134644,0.897391,68.699997,0.783183,-0.001814,0.795797,0.868587,0.206641
1346,Panama,2013,6.86648,10.184355,0.89572,68.900002,0.811338,0.018312,0.814465,0.868715,0.225746
1347,Panama,2014,6.631171,10.21675,0.873474,69.099998,0.893915,0.002134,0.846594,0.80769,0.253816
1348,Panama,2015,6.60555,10.255424,0.882615,69.300003,0.846669,-0.006958,0.809943,0.800634,0.263826


In [361]:
px.box(df, y=df.Generosity)

In [362]:
px.box(df_change, y=df_change["Generosity %"])