In [20]:
# load the dataset

import pandas as pd
import numpy as np

# Data 550 Project 2
###  Sophia Bulcock, Ryan Koenig
### 09-02-2021

## Dataset Description

This dataset is composed of six different csv files. Each file represents a "Happiness" report for the year, running from 2015 to 2020. The information provided is from the Gallup World Survey (GWP).  For each year various scores are given to each country based on the survey results and a resulting ranking is generated from these scores.  
When looking at the indivdual columns of interest for this data set we have:

- Country: The name of the country the scores apply to.
- Year: The year the report was generated.
- Happiness Rank: How that countries score compares to the others for that particular year.
- Happiness Score: The national average of responses from the GWP by Cantril ladder questions.

Each of the following columns are a bit of a black box as to how they are generated but they are a based upon how much it is thought they impact the happiness score. Adding them all together should give you the happiness score.

- Economy (GDP per Capita): How much the countries economy is expected to contribute to the happiness of its populace.
- Family: How much the countries family structure is expected to contribute to the happiness of its populace.
- Health (Life Expectancy): How much the countries individual's health is expected to contribute to the happiness of its populace.
- Freedom: How much the countries percieved freedoms are expected to contribute to the happiness of its populace.
- Trust (Government Corruption): How much the countries percieved trust of the government are expected to contribute to the happiness of its populace.
- Generosity: How much the countries percieved generosity are expected to contribute to the happiness of its populace.
- Dystopia + Residual: Dystopia a comparison to a theoretical worst country so that their is a baseline. It is combined with the residuals or unexplained components to create a positive value.
- Social Support: How much the social support available in the country is expected to contribute to the happiness of its populace.

For more information see: https://www.kaggle.com/mathurinache/world-happiness-report




## Exploring the Data Set

Lets take a look at the columns of each csv data set.

In [21]:
csv2015= pd.read_csv('2015.csv')
list2015=list(csv2015.columns.values) 
print(list2015)

['Country', 'Region', 'Happiness Rank', 'Happiness Score', 'Standard Error', 'Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)', 'Generosity', 'Dystopia Residual']


In [22]:
csv2016=pd.read_csv('2016.csv')
list2016=list(csv2016.columns.values) 
print(list2016)

['Country', 'Region', 'Happiness Rank', 'Happiness Score', 'Lower Confidence Interval', 'Upper Confidence Interval', 'Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)', 'Generosity', 'Dystopia Residual']


In [23]:
csv2017=pd.read_csv('2017.csv')
list2017=list(csv2017.columns.values) 
print(list2017)

['Country', 'Happiness.Rank', 'Happiness.Score', 'Whisker.high', 'Whisker.low', 'Economy..GDP.per.Capita.', 'Family', 'Health..Life.Expectancy.', 'Freedom', 'Generosity', 'Trust..Government.Corruption.', 'Dystopia.Residual']


In [24]:
csv2018=pd.read_csv('2018.csv')
list2018=list(csv2018.columns.values) 
print(list2018)

['Overall rank', 'Country or region', 'Score', 'GDP per capita', 'Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']


In [25]:
csv2019=pd.read_csv('2019.csv')
list2019=list(csv2019.columns.values) 
print(list2019)

['Overall rank', 'Country or region', 'Score', 'GDP per capita', 'Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']


In [26]:
csv2020=pd.read_csv('2020.csv')
list2020=list(csv2020.columns.values) 
print(list2020)

['Country name', 'Regional indicator', 'Ladder score', 'Standard error of ladder score', 'upperwhisker', 'lowerwhisker', 'Logged GDP per capita', 'Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption', 'Ladder score in Dystopia', 'Explained by: Log GDP per capita', 'Explained by: Social support', 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices', 'Explained by: Generosity', 'Explained by: Perceptions of corruption', 'Dystopia + residual']


Now lets take a look at the last 5 rows of each data set.

In [27]:
# exploring last 5 rows of each of the data sets

print(csv2015.tail())
print(csv2016.tail())
print(csv2017.tail())
print(csv2018.tail())
print(csv2019.tail())
print(csv2020.tail())

     Country                           Region  Happiness Rank  \
153   Rwanda               Sub-Saharan Africa             154   
154    Benin               Sub-Saharan Africa             155   
155    Syria  Middle East and Northern Africa             156   
156  Burundi               Sub-Saharan Africa             157   
157     Togo               Sub-Saharan Africa             158   

     Happiness Score  Standard Error  Economy (GDP per Capita)   Family  \
153            3.465         0.03464                   0.22208  0.77370   
154            3.340         0.03656                   0.28665  0.35386   
155            3.006         0.05015                   0.66320  0.47489   
156            2.905         0.08658                   0.01530  0.41587   
157            2.839         0.06727                   0.20868  0.13995   

     Health (Life Expectancy)  Freedom  Trust (Government Corruption)  \
153                   0.42864  0.59201                        0.55191   
154         

Number of rows in each csv data set.

In [28]:
# count number of rows in each data frame

print(csv2015.shape[0])
print(csv2016.shape[0])
print(csv2017.shape[0])
print(csv2018.shape[0])
print(csv2019.shape[0])
print(csv2020.shape[0])


158
157
155
156
156
153


Lets verify that all the numbers columns are numeric.

In [29]:
# exploring data frame types

print(csv2015.dtypes)
print(csv2016.dtypes)
print(csv2017.dtypes)
print(csv2018.dtypes)
print(csv2019.dtypes)
print(csv2020.dtypes)


Country                           object
Region                            object
Happiness Rank                     int64
Happiness Score                  float64
Standard Error                   float64
Economy (GDP per Capita)         float64
Family                           float64
Health (Life Expectancy)         float64
Freedom                          float64
Trust (Government Corruption)    float64
Generosity                       float64
Dystopia Residual                float64
dtype: object
Country                           object
Region                            object
Happiness Rank                     int64
Happiness Score                  float64
Lower Confidence Interval        float64
Upper Confidence Interval        float64
Economy (GDP per Capita)         float64
Family                           float64
Health (Life Expectancy)         float64
Freedom                          float64
Trust (Government Corruption)    float64
Generosity                       float64
Dy

## Initial Observations

1. This data set contains 6 seperate CSV files for the the happiness reports from 2015-2020. 
2. Each CSV file does not contain the exact same rows. Some have additional columns, and others seem to be missing some columns that other years have. This will be taken care of in the wrangling section where we filter and combine them into 1 large data frame with the a 'Year' column added.
3. Some of the original csv files have different column names for the same information. This will also be taken care of in the wrangling section. 
4. Each of the 6 individual data frames has approximately the same amount of observations/rows

## Research questions

1. What are the top 5 'happiest' countries scores over the last 6 years and what elements(columns) contribute the most to this score? How does the top contributing element compare to the 'unhappiest' countries?

2. Which countries have had the largest increase and decrease in ranking over the last 6 years.


## Wrangling

In [30]:
# wrangling into 1 usable data frame

# drop columns not needed and found in all data frames
csv2015=csv2015.drop(['Region','Standard Error'], axis=1)

csv2016=csv2016.drop(['Region'], axis=1)

csv2016=csv2016.drop(['Lower Confidence Interval','Upper Confidence Interval'], axis=1)

csv2017=csv2017.drop(['Whisker.high', 'Whisker.low'],axis=1)

csv2018=csv2018.drop(['Social support'], axis=1)

csv2020=csv2020.drop(['Regional indicator','Standard error of ladder score',
                     'upperwhisker','lowerwhisker','Ladder score in Dystopia',
                     'Explained by: Log GDP per capita','Explained by: Social support',
                     'Explained by: Healthy life expectancy','Explained by: Freedom to make life choices',
                     'Explained by: Generosity','Explained by: Perceptions of corruption'], axis=1)



In [31]:
#renaming columns so all data frames share column names
csv2017=csv2017.rename(columns={'Happiness.Rank': 'Happiness Rank', 
                                   'Happiness.Score': 'Happiness Score',
                                   'Economy..GDP.per.Capita.':'Economy (GDP per Capita)',
                                   'Health..Life.Expectancy.':'Health (Life Expectancy)',
                                   'Trust..Government.Corruption.':'Trust (Government Corruption)',
                                   'Dystopia.Residual':'Dystopia Residual'})

csv2018=csv2018.rename(columns={'Overall rank': 'Happiness Rank','Country or region':'Country',
                                'Score':'Happiness Score','GDP per capita':'Economy (GDP per Capita)',
                                'Healthy life expectancy':'Health (Life Expectancy)',
                                'Freedom to make life choices':'Freedom',
                                'Perceptions of corruption':'Trust (Government Corruption)'})

csv2019=csv2019.rename(columns={'Overall rank': 'Happiness Rank','Country or region':'Country',
                                'Score':'Happiness Score','GDP per capita':'Economy (GDP per Capita)',
                                'Healthy life expectancy':'Health (Life Expectancy)',
                                'Freedom to make life choices':'Freedom',
                                'Trust..Government.Corruption.':'Trust (Government Corruption)',
                                'Perceptions of corruption':'Trust (Government Corruption)'})

csv2020=csv2020.rename(columns={'Country name':'Country', 'Ladder score':'Happiness Score',
                              'Logged GDP per capita':'Economy (GDP per Capita)',
                              'Healthy life expectancy':'Health (Life Expectancy)',
                               'Freedom to make life choices':'Freedom',
                               'Perceptions of corruption':'Trust (Government Corruption)',
                               'Dystopia + residual':'Dystopia Residual'
                              })                               
                                

In [32]:
# add happiness rank to csv2020

csv2020['Happiness Rank']=np.arange(len(csv2020))

In [33]:
# make into 1 data set


csv2015['Year']=2015
csv2016['Year']=2016
csv2017['Year']=2017
csv2018['Year']=2018
csv2019['Year']=2019
csv2020['Year']=2020

frames = [csv2015,csv2016,csv2017,csv2018,csv2019,csv2020]

happiness_df = pd.concat(frames)

Summary description of the data frame's numerical columns

In [34]:
happiness_df.describe(include=None)

Unnamed: 0,Happiness Rank,Happiness Score,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,Year,Social support
count,935.0,935.0,935.0,470.0,935.0,935.0,934.0,935.0,623.0,935.0,309.0
mean,78.256684,5.394436,2.287264,0.990347,11.057834,0.472008,0.224981,0.180425,2.063148,2017.485561,1.01071
std,45.028594,1.124935,3.161344,0.318707,23.799414,0.201962,0.254946,0.153977,0.567172,1.70826,0.304093
min,0.0,2.5669,0.0,0.0,0.0,0.0,0.0,-0.300907,0.257241,2015.0,0.0
25%,39.0,4.54,0.695145,0.793,0.508,0.337772,0.061039,0.098152,1.703152,2016.0,0.806092
50%,78.0,5.3535,1.07,1.025665,0.70806,0.46582,0.1108,0.183,2.081786,2017.0,0.921125
75%,117.0,6.1985,1.395705,1.228745,0.89235,0.585785,0.2853,0.262,2.431136,2019.0,1.274
max,158.0,7.8087,11.450681,1.610574,76.804581,0.974998,0.935585,0.838075,3.83772,2020.0,1.624


In [35]:
# For analysis 1 - sort dataframes into top 5 happiest countries by year



happiest_c_2015 = csv2015.sort_values(by=['Happiness Rank']).head(5)
happiest_c_2015
happiest_c_2016 = csv2016.sort_values(by=['Happiness Rank']).head(5)
happiest_c_2016
happiest_c_2017 = csv2017.sort_values(by=['Happiness Rank']).head(5)
happiest_c_2017
happiest_c_2018 = csv2018.sort_values(by=['Happiness Rank']).head(5)
happiest_c_2018
happiest_c_2019 = csv2019.sort_values(by=['Happiness Rank']).head(5)
happiest_c_2019

# no data for 2020
happiest_c_2020 = csv2020.sort_values(by=['Happiness Rank']).head(5)
# print(happiest_c_2020)

# combine into 1 data frame

happiest = [happiest_c_2015,happiest_c_2016,happiest_c_2017,happiest_c_2018,happiest_c_2019, happiest_c_2020]
happest_c_df = pd.concat(happiest)
rouded_happiest_df = happest_c_df.round({'Happiness Score': 2})

In [36]:
# For analysis 2 - sort dataframe into 3 happiest countries that appear each year


happiest_c_2015_2 = happiness_df[(happiness_df['Country'] == 'Denmark' )|(happiness_df['Country'] == 'Iceland') | (happiness_df['Country'] == 'Norway') ]
happiest_c_2015_2_2=happiest_c_2015_2[(happiest_c_2015_2['Year']==2015)|(happiest_c_2015_2['Year']==2016)|(happiest_c_2015_2['Year']==2017)|(happiest_c_2015_2['Year']==2018)|(happiest_c_2015_2['Year']==2019)]
happiest_c_2015_2_2['Year'].astype(int)


1    2015
2    2015
3    2015
0    2016
2    2016
3    2016
0    2017
1    2017
2    2017
1    2018
2    2018
3    2018
1    2019
2    2019
3    2019
Name: Year, dtype: int32

In [37]:
# for analysis 3

# sort dataframes into bottom 5 happiest countries by year



unhappiest_c_2015 = csv2015.sort_values(by=['Happiness Rank']).tail(3)
unhappiest_c_2015
unhappiest_c_2016 = csv2016.sort_values(by=['Happiness Rank']).tail(3)
unhappiest_c_2016
unhappiest_c_2017 = csv2017.sort_values(by=['Happiness Rank']).tail(3)
unhappiest_c_2017
unhappiest_c_2018 = csv2018.sort_values(by=['Happiness Rank']).tail(3)
unhappiest_c_2018
unhappiest_c_2019 = csv2019.sort_values(by=['Happiness Rank']).tail(3)
unhappiest_c_2019

# no data for 2020
unhappiest_c_2020 = csv2020.sort_values(by=['Happiness Rank']).tail(3)
# print(happiest_c_2020)

# combine into 1 data frame

unhappiest = [unhappiest_c_2015,unhappiest_c_2016,unhappiest_c_2017,unhappiest_c_2018,unhappiest_c_2019, unhappiest_c_2020]
unhappiest_c_df = pd.concat(unhappiest)

In [38]:

# take out the bottom ones

unhappiest_c_2015_2 = unhappiest_c_df[(unhappiest_c_df['Country'] == 'Central African Republic')|(unhappiest_c_df['Country'] == 'Togo') | (unhappiest_c_df['Country'] == 'Burundi')  | (unhappiest_c_df['Country'] == 'South Sudan')]
unhappiest_c_2015_2=unhappiest_c_2015_2[(unhappiest_c_2015_2['Year']==2015)|(unhappiest_c_2015_2['Year']==2016)|(unhappiest_c_2015_2['Year']==2017)|(unhappiest_c_2015_2['Year']==2018)|(unhappiest_c_2015_2['Year']==2019)]
unhappiest_c_2015_2['Year'].astype(int)



156    2015
157    2015
154    2016
156    2016
153    2017
154    2017
153    2018
154    2018
155    2018
154    2019
155    2019
Name: Year, dtype: int32

In [39]:
# Join with happiest for comparison

frames3=[happiest_c_2015_2_2,unhappiest_c_2015_2]
result = pd.concat(frames3)




In [76]:
import requests

def get_code_num(countryname):
    url_='https://restcountries.eu/rest/v2/name/'

    if countryname=='South Korea':
        countryname='Korea'

    countryname.replace("","%20")
    url_=url_+countryname

    try:
        res = requests.get(url_)
        response = int(res.json()[0]['numericCode'])
    except:
        response = None

    return response



In [77]:
# Careful running this block.  Takes a long time.
happiness_df['CountryCode']=happiness_df.Country.map(get_code_num)

In [153]:
map_df = happiness_df[~happiness_df['CountryCode'].isna()]
map_df = map_df[map_df['Year']==2020]
map_df.loc[map_df.CountryCode==581, 'CountryCode']=840



In [70]:
happiness_df.groupby(["Country"]).var().sort_values(["Happiness Rank"], ascending = False)['Happiness Rank'][0:10]


Venezuela = happiness_df[(happiness_df["Country"]=="Venezuela") |
 (happiness_df["Country"]=="Benin")|
  (happiness_df["Country"]=="Ivory Coast")|
  (happiness_df["Country"]=="Algeria")]
Venezuela = Venezuela[Venezuela["Year"]!=2020]

In [71]:
Gloom_df = happiness_df[happiness_df['Year']!=2020]

## Data analysis & visualizations

In [79]:
import altair as alt

#top 3 Happiest countries by year



csv2015.sort_values(by=['Happiness Rank']).head(5)




chart=(alt.Chart(rouded_happiest_df).encode(
     x=alt.X('Happiness Score', scale=alt.Scale(domain=(7.3, 7.83))),
     y=alt.Y('Country', sort='-x', axis=alt.Axis(title=None)))).properties(
    
    height=200, width=200)


bars = chart.mark_bar().encode(
    color=alt.Color('Country',
                    legend=None,
                    scale=alt.Scale(scheme="pastel2"))
)




text = chart.mark_text(
    align='left',
    baseline='middle',
    dx=3  , color='black'
).encode(
    text='Happiness Score:Q'
)

(bars + text).facet(
    facet=alt.Facet('Year:N', title=None), columns=2).resolve_axis(
    x='independent',
    y='independent',
).resolve_scale(
    
    y='independent',
).properties(
     title={
          "text":"Top 5  Happiest Countries by Year",
          }
).configure_title(fontSize=20,
                  offset=5,
                  orient='top',
                  anchor='middle')

Interesting! Over the past 5 years, the happiest self-ranked countries were  Canada, Denmark, Finland, Iceland, the Netherlands, Norway and Switzerland! Denmark, Iceland and Norway appear in the top 5 rankings for every year. Lets investigate what contributes the most to their happiness score, and how these elements change over time.

In [80]:
alt.Chart(happiest_c_2015_2_2).mark_line().transform_fold(
    fold=['Health (Life Expectancy)', 'Freedom', 'Economy (GDP per Capita)','Trust (Government Corruption)','Generosity'], 
    as_=['variable', 'value']
).encode(
    x=alt.X('Year:O'),y=alt.Y('max(value):Q',axis=alt.Axis(title="Amount of Hapiness Score")),
    color='variable:N'
).properties(
    width=400,
    height=300).facet(
    facet=alt.Facet('Country:N', title=None), columns=1).resolve_axis(
    x='independent',
    y='independent',
).properties(
     title={
          "text":"Top Contributors to Happiest Countries Score",
          }
).configure_title(fontSize=20,
                  offset=5,
                  orient='top',
                  anchor='middle')



In [81]:
alt.Chart(result).mark_bar().encode(
    x=alt.X('Country',sort='-y'),
    y=alt.Y("Economy (GDP per Capita)",scale=alt.Scale(domain=(0, 1.6))),
    
    color=alt.Color('Country', legend=None)
    ).facet(
    facet='Year').resolve_axis(
    x='independent',
    y='independent',
).resolve_axis(
    x='independent',
    y='independent',
).configure_header(labelFontSize=16).properties(
    title='GDP Comparison of Unhappiest Ranked Countries to Happiest Ranked Countries'
)


This chart shows us the drastic comparison between the amount GDP contributes to a countries happiness score for the most happy and least happy countries. Remember, a countries GDP score is calculated from the countries self ranked happiness score, and is the how much the researchers think that the GDP column contributes to the happiness score. This visualization makes sense if we look back at the data wrangling section and see the low overall happiness scores for the least happy countries seen here. Since the unhappy countries overall happiness scores are lower, it makes sense that their GDP contribution to that score would also be much lower.

In [154]:

#https://runestone.academy/runestone/books/published/httlads/WorldFacts/screenscrape_cids.html

from vega_datasets import data
countries = alt.topo_feature(data.world_110m.url, 'countries')

base = alt.Chart(countries).mark_geoshape(
).encode(tooltip='Country:N',
         color=alt.Color('Happiness Score:Q', scale=alt.Scale(scheme="plasma"))
).transform_lookup( lookup='id',
    from_=alt.LookupData(map_df, 'CountryCode', ['Happiness Score']
)).properties(
    width=750,
    height=450
).project('equirectangular')

base





In [28]:
Vpoints = alt.Chart(Venezuela).mark_line().encode(
    x=alt.X('Year:Q', 
     scale=alt.Scale(domain=[2015,2019]),
     axis=alt.Axis(format=".4")),
    y=alt.Y(alt.repeat(),
     type='quantitative',
     scale=alt.Scale()),
     color='Country',
     shape=alt.Shape('Country', legend=None)
).repeat(repeat=["Happiness Rank",
                "Happiness Score",
                 "Economy (GDP per Capita)",
                 "Health (Life Expectancy)",
                 "Freedom",
                 "Trust (Government Corruption)",
                 "Generosity"],
     columns=2
).properties(
     title={
          "text":"Highest and Lowest Variance in Country Ranks",
          "subtitle":"With Factors Contributing to Score"}
).configure_title(fontSize=20,
                  offset=5,
                  orient='top',
                  anchor='middle')

Vpoints

Here we have taken the two countries that had the largest rise in rankings and the two countries that dropped the most from 2015 to 2019 and compared how each of their happiness score factors deviated over that time period.  We were hoping to find a clear indicator of why the countries rise and fall.  Looking at the two countries that rose, Benin and Ivory Coast, their economy slightly rose, their health both took a slight dip before rising overall, their freedom and their generosity fell. Benin gained some trust of their government after a dip while Ivory Coast's fell.  Looking at the two countries that fell the most in the rankings, Algeria and Venezuela, while Venezuela had a slight dip in economy, Algeria's rose, both their health rose, their freedom and trust in the government fell,  and their generosity stayed more or less the same.  There doesn't seem to be any clear indicator about why countries are able to rise and fall through the rankings. However it was interesting to note that overall freedom and trust of the government seemed to trend downwards in these years.  Let's take a closer look at that on a broader scale.

In [31]:
Gloom = alt.Chart(
    Gloom_df
).transform_density('Trust (Government Corruption)', 
    as_=['Trust (Government Corruption)','density'], 
    groupby=['Year']
).mark_area(
    orient='horizontal'
).encode(
    color=alt.Color('Year:Q',
                    legend=None,
                    scale=alt.Scale(scheme="pastel2")), 
    y='Trust (Government Corruption):Q', 
    x=alt.X('density:Q', 
            stack='center', 
            impute=None, 
            title=None,
            axis=alt.Axis(labels=False, 
                         values=[0], 
                         grid=False, 
                         ticks=True)
            ),
    column=alt.Column(
        'Year:Q', 
        header=alt.Header(
            titleOrient='bottom', 
            labelOrient='bottom',
            labelPadding=0)
    )
).properties(
    width=100,
    title="Trust in Government Globally"
)


Doom = alt.Chart(
    Gloom_df
).transform_density('Freedom', 
    as_=['Freedom','density'], 
    groupby=['Year']
).mark_area(
    orient='horizontal'
).encode(
    color=alt.Color('Year:Q',
                    legend=None,
                    scale=alt.Scale(scheme="pastel2")), 
    y='Freedom:Q', 
    x=alt.X('density:Q', 
            stack='center', 
            impute=None, 
            title=None,
            axis=alt.Axis(labels=False, 
                         values=[0], 
                         grid=False, 
                         ticks=True)
            ),
    column=alt.Column(
        'Year:Q', 
        header=alt.Header(
            titleOrient='bottom', 
            labelOrient='bottom',
            labelPadding=0)
    )
).properties(
    width=100,
    title="Global Freedom"
)

Boom = alt.hconcat(Doom,Gloom).configure_facet(
    spacing=0
).configure_view(
    stroke=None
).configure_title(fontSize=20,
                  offset=5,
                  orient='top',
                  anchor='middle')
Boom

We have here a density plot of all the countries where we can see how the concept of freedom and trust are trending throughou the years.  Freedom seems to be oscillating slightly being at an all time high across the countries in 2018 while lowering in 2019, but seems to be pretty stable.  Trust of the government is on a downwards trend globally as you can see the bottom of the density plot filling out and the top reducing every year.

## Summary



This analysis has taken us through a couple different aspects of this data set. We have examined that the countries that show up the the top 5 'happiest' countries from 2015-2020 are Canada, Denmark, Finland, Iceland, the Netherlands, Norway and Switzerland. Denmark, Iceland and Norway appear every year in the top 5. When we investiagted what column/element (that each year contained) contributes the most the the happiness score, it was found to be a countries GDP. GDP makes up more of the happiness score than the columns Health, Freedom, Trust and Generosity. After wrangling the data frame and plotting the GDP score for the happiest and un happiest countries (Central African Republic, Togo, Burundi and South Sudan), we can see that their GDP scores are much lower than 
than those of the happiest countries, but this is to be expected since their overall happiness scores were lower.  We took a look at the four countries with the highest variances in their happiness ranks over time and could not seem to find a connection between the factors and their dramatic changes.  Although percieved freedom is remaining fairly stable worldwide it is clear when looking at all the countries trust of their government scores it is clear that government trust is trending downwards globally.




## Conclusions

Since this data set was based of the countries own surverys to determine happiness score, it is a bit difficult to pull many deep and meaninful conclusions from this analysis. The resulting columns were a black box as to how they are generated, but are thought to be based upon how much it is thought these elements impact the happiness score. This data set was also not complete, and many columns were missing from certain years, making analysis a bit difficult. This is why the 'dystopia residual' column was not used in Figure 2, as it was missing for multiple years. This is also why the year 2020 was not included for the latter half of the analysis as it scored a couple columns differently than in previous years.

We can however conclude the following. The countries that rank themselves as the most happy are mainly Scandinavian and Northern European countries. These countries GDP make up a large portion of their happiness scores, and both the happiness scores and affiliated GDP scores are lower for the unhappiest ranked countries like Central African Republic, Togo, Burundi and South Sudan. We can also come to the conclusion that there is a globaly trend of percieved lack of trust from governments.





## Reseach Questions

1. We would be interested in seeing the real and historical GDP data for the 'happiest' and 'unhappiest' countries that were analyzed in this study. We would want to know if the top 5 ranked happiest countries and unhappiest countries for each year were infact the countries with the higest calculated GDP for that year. We would need the GDP data for each of those years to compare.

2. It would also be interesting to do some predictive modelling to guess at where countries would rank in the next couple years based on the previous five years. This would take some modelling that would that is beyond the scope of this project.