In [28]:
import datetime as dt
import pandas as pd
import numpy as np

import plotly.io as pio
import plotly.express as px
import plotly.graph_objects as go

#read your mapbox credentials from creds.py
from creds import *

px.set_mapbox_access_token(mapbox_access_token)

In [46]:
df = pd.read_excel("DataPanelWHR2021C2.xls")
df.rename(columns={"Healthy life expectancy at birth": "Healthy life expectancy (Age)"}, inplace=True)

## Figure 1: Countries per Year Visualization ##

Start by organizing the data by year using sort_values.

In [30]:
df_by_year = df.sort_values("year")

Create a histogram using plotly express that displays the number of countries with data collected each year.

In [31]:
fig = px.bar(df_by_year.year.value_counts(), title="World Happiness Report: Countries with Data Collected per Year", 
             labels={'index':'year','value':'# of Countries with Data'}, hover_data={'variable':False})
fig.show()

This visualization tells us that certain years have less data, most notably 2005 and 2020. In order to ensure a relatively consistent sample size of countries, it makes sense to use the years 2011 to 2019 which is in accordance with the World Happiness Report's official publications starting in 2012 (for the year 2011).

In [32]:
pio.write_html(fig, "Countries_Per_Year.html", full_html=False) #export this figure as an html file for the web page

## Figure 2: Global Choropleths ##

This will create in total 19 choropleths for all the countries included: one for each column with a slider for years (9 total), and one for each year with a dropdown to select a column (10 total). Ideally these 19 could be combined into a single choropleth with both a slider for years and a dropdown for columns, however, due to the nature of bool visibility for traces this does not seem possible. The slider and dropdown features use separate trackers for which trace should be visible and so moving the slider would change the trace based on the slider's visibility mask but would not take into account the visibility mask of the dropdown. This is unfortunate as one graph would be much nicer to look at and interact with than 19, but at the moment I am incapable of resolving this issue.

The first step to enable choropleth visualizations is adding the necessary geojson codes for each country to our data as the World Happiness Report does not include this data. The following csv does include the proper codes.

In [47]:
df_country_codes = pd.read_csv("2014_world_gdp_with_codes.csv")

We will merge our WHR data with the geojson codes and drop the other columns added in the merge.

In [48]:
df_with_codes = df.merge(df_country_codes, how="inner", left_on="Country name", right_on="COUNTRY")
df_with_codes.drop(["GDP (BILLIONS)","COUNTRY"], axis=1, inplace=True)

This function simplifies our code for the actual graph as both the sliders and dropdowns use essentially the same dictionary to determine the label, visible traces, and annotations that are displayed.

In [49]:
def slider_dict(label, visibles, annotation1, annotation2):
    return {'method': 'update', 'label':label,'args':[{'visible': visibles}, {'annotations': [annotation1, annotation2]}]}

This next segment initializes the lists to be used for both the slider and dropdown graphs.
- FIELDS creates a 10x10 list of lists that serves as the visibility mask for both slider and dropdown (although the dropdown excludes the last row and column as there are only 9 columns to be selected by dropdown)
- COLORSCALES is a list of different plotly colorscales that the choropleth accepts and changes its color. The ones in this list are my personal favorites and my reason for including multiple colors is to provide visual feedback to the viewer so that they can easily see when they are looking at graphs for different columns. The last two columns are the same to the show the connection between positive and negative affect.
- GRAPH_COLUMNS selects the columns from the WHR data that have information worth graphing, i.e. dropping the country name, year, and geojson code and leaving all the rest.

In [50]:
fields = np.zeros(100, dtype=bool)
fields = fields.reshape(10,10)
n=0
for i in range(10):
    fields[i][n]=True
    n+=1
    
colorscales = ['aggrnyl', 'agsunset', 'bluered', 'viridis', 'rdpu', 'pinkyl', 'plasma', 'plotly3', 'plotly3']

graph_columns = list(df_with_codes.columns)
graph_columns.remove("Country name")
graph_columns.remove("year")
graph_columns.remove("CODE")

This next segment uses a for-loop to create a distinct graph for each column with a slider for the years. The steps it goes through are as follows:
1. Create a figure and make an empty list that will be filled with the dictionaries for sliders
1. Set the starting year to 2011
1. Begin a while loop that goes through each of the years from 2011 to 2020
    * Make a new dataframe for the specific year (ex. 2011)
    * Add a choropleth for the current year based on the column from the for-loop and set it to the corresponding color
    * Make that choropleth no longer visible
    * Find the highest and lowest scores for that year and the corresponding countres
    * Create annotations that show which country had the highest score and which had the lowest score
    * Use the earlier defined function "slider_dict" to add a dictionary to the list of sliders
    * Increment the year
1. Once a trace has been made for each year, properly format the sliders list to be plotly acceptable
1. Update the layout of the figure to include all the sliders and a title based on the current column
1. Set the first trace to be visible and change the title to make it easier for exporting to html
1. Export the graph and repeat the entire process with the next column

In [77]:
for graph in graph_columns:

    fig = go.Figure()
    sliders = []

    year = 2011

    while (year < 2021):
        df = df_with_codes[df_with_codes["year"]==year]
        
        fig.add_trace(go.Choropleth(locations=df["CODE"], z=df[graph], text = df["Country name"], autocolorscale=False,
             colorscale=colorscales[graph_columns.index(graph)], marker_line_color='white', marker_line_width=0.5, colorbar_title=f"{graph}"))

        fig.data[year%2011].visible=False
        
        max_of_graph = df[df[graph]==df[graph].max()]
        highest_score = (max_of_graph["Country name"].values[0], round(max_of_graph[graph].values[0],3))
        min_of_graph = df[df[graph]==df[graph].min()]
        lowest_score = (min_of_graph["Country name"].values[0], round(min_of_graph[graph].values[0],3))
        max_annotation = {'x':1.1, 'y':1.2, 'xref':'paper', 'yref':'paper', 'text':"{0} has the highest score of ~ {1}".format(highest_score[0], highest_score[1]), 
                           'font':{'size':10, 'color':'white'}, 'showarrow':False, 'borderwidth':9, 'borderpad':4, 'bgcolor':'grey'}
        min_annotation = {'x':1.1, 'y':1.1, 'xref':'paper', 'yref':'paper', 'text':"{0} has the lowest score of ~ {1}".format(lowest_score[0], lowest_score[1]), 
                           'font':{'size':10, 'color':'white'}, 'showarrow':False, 'borderwidth':9, 'borderpad':4, 'bgcolor':'grey'}
        
        sliders.append(slider_dict(year, fields[year%2011], max_annotation, min_annotation))
        
        year = year+1
        
    
    sliders = [{'steps':sliders}]
    
    fig.update_layout({'sliders': sliders}, title_text=f'World Happiness Report: {graph}', geo=dict(showframe=False, showcoastlines=False, projection_type="equirectangular"),
                     autosize=False, height=600, width=950)

    fig.data[0].visible=True
    
    title = graph.replace(" ", "_")
    #pio.write_html(fig, f"WHR-{title}.html", full_html=False)
    fig.show()
    

This next segment creates the graphs for each year with a dropdown to choose a column. It is very similar to the last segment except that now for every year it adds a trace for each column instead of vice versa. Additionally, "buttons_list" serves the same function as "sliders" and there is a minor difference in the way dropdown menus are formatted than the way sliders are.


In [75]:
year = 2011

while (year<2021):
    
    df = df_with_codes[df_with_codes['year']==year]

    fig = go.Figure()
    n=0
    buttons_list = []

    for graph in graph_columns:
            fig.add_trace(go.Choropleth(locations=df["CODE"], z=df[graph], text = df["Country name"], autocolorscale=False,
                    colorscale=colorscales[n], marker_line_color='white', marker_line_width=0.5, colorbar_title=f"{graph}"))   
            label = "{0}: {1}".format(year, graph_columns[n])
            
            max_of_graph = df[df[graph]==df[graph].max()]
            highest_score = (max_of_graph["Country name"].values[0], round(max_of_graph[graph].values[0],3))
            min_of_graph = df[df[graph]==df[graph].min()]
            lowest_score = (min_of_graph["Country name"].values[0], round(min_of_graph[graph].values[0],3))
            max_annotation = {'x':1.1, 'y':1.2, 'xref':'paper', 'yref':'paper', 'text':"{0} has the highest score of ~ {1}".format(highest_score[0], highest_score[1]), 
                           'font':{'size':10, 'color':'white'}, 'showarrow':False, 'borderwidth':9, 'borderpad':4, 'bgcolor':'grey'}
            min_annotation = {'x':1.1, 'y':1.1, 'xref':'paper', 'yref':'paper', 'text':"{0} has the lowest score of ~ {1}".format(lowest_score[0], lowest_score[1]), 
                           'font':{'size':10, 'color':'white'}, 'showarrow':False, 'borderwidth':9, 'borderpad':4, 'bgcolor':'grey'}
            
            buttons_list.append(slider_dict(label, fields[n][:-1], max_annotation, min_annotation))
            fig.data[n].visible=False
            n+=1

    fig.update_layout(updatemenus=[dict(active=0, buttons=buttons_list, x=0, xanchor="left", y=1.05,yanchor="top")], 
                    title="World Happiness Report: {0}".format(year), geo=dict(showframe=False, showcoastlines=False, projection_type="equirectangular"),
                    autosize=False, height=600, width=950)

    fig.data[0].visible=True
    
    pio.write_html(fig, f"WHR-{year}.html", full_html=False)
    #fig.show()
    
    year+=1


In [61]:
df.head()

Unnamed: 0,Country name,year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy (Age),Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect,CODE
24,Albania,2020,5.36491,9.497252,0.710115,69.300003,0.753671,0.006968,0.891359,0.678661,0.265066,ALB
51,Argentina,2020,5.900567,9.85045,0.897104,69.199997,0.823392,-0.122354,0.81578,0.763524,0.342497,ARG
79,Australia,2020,7.137368,10.759864,0.936517,74.199997,0.905283,0.21003,0.491095,0.769182,0.205078,AUS
92,Austria,2020,7.213489,10.851118,0.924831,73.599998,0.91191,0.011032,0.46383,0.769317,0.2065,AUT
117,Bahrain,2020,6.173176,10.619904,0.847745,69.699997,0.945233,0.132441,,0.789795,0.296835,BHR
