In [1]:
# load the dataset

import pandas as pd
import numpy as np

# Data 550 Project 2
###  Sophia Bulcock, Ryan Koenig
### 09-02-2021

## Dataset Description

This dataset is composed of six different csv files. Each file represents a "Happiness" report for the year, running from 2015 to 2020. The information provided is from the Gallup World Survey (GWP).  For each year various scores are given to each country based on the survey results and a resulting ranking is generated from these scores.  
When looking at the indivdual columns of interest for this data set we have:

- Country: The name of the country the scores apply to.
- Year: The year the report was generated.
- Happiness Rank: How that countries score compares to the others for that particular year.
- Happiness Score: The national average of responses from the GWP by Cantril ladder questions.

Each of the following columns are a bit of a black box as to how they are generated but they are a based upon how much it is thought they impact the happiness score. Adding them all together should give you the happiness score.

- Economy (GDP per Capita): How much the countries economy is expected to contribute to the happiness of its populace.
- Family: How much the countries family structure is expected to contribute to the happiness of its populace.
- Health (Life Expectancy): How much the countries individual's health is expected to contribute to the happiness of its populace.
- Freedom: How much the countries percieved freedoms are expected to contribute to the happiness of its populace.
- Trust (Government Corruption): How much the countries percieved trust of the government are expected to contribute to the happiness of its populace.
- Generosity: How much the countries percieved generosity are expected to contribute to the happiness of its populace.
- Dystopia + Residual: Dystopia a comparison to a theoretical worst country so that their is a baseline. It is combined with the residuals or unexplained components to create a positive value.
- Social Support: How much the social support available in the country is expected to contribute to the happiness of its populace.

For more information see: https://www.kaggle.com/mathurinache/world-happiness-report




## Exploring the Data Set

Lets take a look at the columns of each csv data set.

In [237]:
csv2015= pd.read_csv('2015.csv')
list2015=list(csv2015.columns.values) 
print(list2015)

['Country', 'Region', 'Happiness Rank', 'Happiness Score', 'Standard Error', 'Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)', 'Generosity', 'Dystopia Residual']


In [238]:
csv2016=pd.read_csv('2016.csv')
list2016=list(csv2016.columns.values) 
print(list2016)

['Country', 'Region', 'Happiness Rank', 'Happiness Score', 'Lower Confidence Interval', 'Upper Confidence Interval', 'Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)', 'Generosity', 'Dystopia Residual']


In [239]:
csv2017=pd.read_csv('2017.csv')
list2017=list(csv2017.columns.values) 
print(list2017)

['Country', 'Happiness.Rank', 'Happiness.Score', 'Whisker.high', 'Whisker.low', 'Economy..GDP.per.Capita.', 'Family', 'Health..Life.Expectancy.', 'Freedom', 'Generosity', 'Trust..Government.Corruption.', 'Dystopia.Residual']


In [240]:
csv2018=pd.read_csv('2018.csv')
list2018=list(csv2018.columns.values) 
print(list2018)

['Overall rank', 'Country or region', 'Score', 'GDP per capita', 'Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']


In [241]:
csv2019=pd.read_csv('2019.csv')
list2019=list(csv2019.columns.values) 
print(list2019)

['Overall rank', 'Country or region', 'Score', 'GDP per capita', 'Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']


In [242]:
csv2020=pd.read_csv('2020.csv')
list2020=list(csv2020.columns.values) 
print(list2020)

['Country name', 'Regional indicator', 'Ladder score', 'Standard error of ladder score', 'upperwhisker', 'lowerwhisker', 'Logged GDP per capita', 'Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption', 'Ladder score in Dystopia', 'Explained by: Log GDP per capita', 'Explained by: Social support', 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices', 'Explained by: Generosity', 'Explained by: Perceptions of corruption', 'Dystopia + residual']


Now lets take a look at the last 5 rows of each data set.

In [243]:
# exploring last 5 rows of each of the data sets

print(csv2015.tail())
print(csv2016.tail())
print(csv2017.tail())
print(csv2018.tail())
print(csv2019.tail())
print(csv2020.tail())

     Country                           Region  Happiness Rank  \
153   Rwanda               Sub-Saharan Africa             154   
154    Benin               Sub-Saharan Africa             155   
155    Syria  Middle East and Northern Africa             156   
156  Burundi               Sub-Saharan Africa             157   
157     Togo               Sub-Saharan Africa             158   

     Happiness Score  Standard Error  Economy (GDP per Capita)   Family  \
153            3.465         0.03464                   0.22208  0.77370   
154            3.340         0.03656                   0.28665  0.35386   
155            3.006         0.05015                   0.66320  0.47489   
156            2.905         0.08658                   0.01530  0.41587   
157            2.839         0.06727                   0.20868  0.13995   

     Health (Life Expectancy)  Freedom  Trust (Government Corruption)  \
153                   0.42864  0.59201                        0.55191   
154         

Number of rows in each csv data set.

In [244]:
# count number of rows in each data frame

print(csv2015.shape[0])
print(csv2016.shape[0])
print(csv2017.shape[0])
print(csv2018.shape[0])
print(csv2019.shape[0])
print(csv2020.shape[0])


158
157
155
156
156
153


Lets verify that all the numbers columns are numeric.

In [245]:
# exploring data frame types

print(csv2015.dtypes)
print(csv2016.dtypes)
print(csv2017.dtypes)
print(csv2018.dtypes)
print(csv2019.dtypes)
print(csv2020.dtypes)


Country                           object
Region                            object
Happiness Rank                     int64
Happiness Score                  float64
Standard Error                   float64
Economy (GDP per Capita)         float64
Family                           float64
Health (Life Expectancy)         float64
Freedom                          float64
Trust (Government Corruption)    float64
Generosity                       float64
Dystopia Residual                float64
dtype: object
Country                           object
Region                            object
Happiness Rank                     int64
Happiness Score                  float64
Lower Confidence Interval        float64
Upper Confidence Interval        float64
Economy (GDP per Capita)         float64
Family                           float64
Health (Life Expectancy)         float64
Freedom                          float64
Trust (Government Corruption)    float64
Generosity                       float64
Dy

## Initial Observations

1. This data set contains 6 seperate CSV files for the the happiness reports from 2015-2020. 
2. Each CSV file does not contain the exact same rows. Some have additional columns, and others seem to be missing some columns that other years have. This will be taken care of in the wrangling section where we filter and combine them into 1 large data frame with the a 'Year' column added.
3. Some of the original csv files have different column names for the same information. This will also be taken care of in the wrangling section. 
4. Each of the 6 individual data frames has approximately the same amount of observations/rows

## Research questions

1. What are the top 5 'happiest' countries scores over the last 6 years and what elements(columns) contribute the most to this score? How does the top contributing element compare to the 'unhappiest' countries?

2. Which countries have had the largest increase and decrease in ranking over the last 6 years.


## Wrangling

In [246]:
# wrangling into 1 usable data frame

# drop columns not needed and found in all data frames
csv2015=csv2015.drop(['Region','Standard Error'], axis=1)

csv2016=csv2016.drop(['Region'], axis=1)

csv2016=csv2016.drop(['Lower Confidence Interval','Upper Confidence Interval'], axis=1)

csv2017=csv2017.drop(['Whisker.high', 'Whisker.low'],axis=1)

csv2018=csv2018.drop(['Social support'], axis=1)

csv2020=csv2020.drop(['Regional indicator','Standard error of ladder score',
                     'upperwhisker','lowerwhisker','Ladder score in Dystopia',
                     'Explained by: Log GDP per capita','Explained by: Social support',
                     'Explained by: Healthy life expectancy','Explained by: Freedom to make life choices',
                     'Explained by: Generosity','Explained by: Perceptions of corruption'], axis=1)



In [247]:
#renaming columns so all data frames share column names
csv2017=csv2017.rename(columns={'Happiness.Rank': 'Happiness Rank', 
                                   'Happiness.Score': 'Happiness Score',
                                   'Economy..GDP.per.Capita.':'Economy (GDP per Capita)',
                                   'Health..Life.Expectancy.':'Health (Life Expectancy)',
                                   'Trust..Government.Corruption.':'Trust (Government Corruption)',
                                   'Dystopia.Residual':'Dystopia Residual'})

csv2018=csv2018.rename(columns={'Overall rank': 'Happiness Rank','Country or region':'Country',
                                'Score':'Happiness Score','GDP per capita':'Economy (GDP per Capita)',
                                'Healthy life expectancy':'Health (Life Expectancy)',
                                'Freedom to make life choices':'Freedom',
                                'Perceptions of corruption':'Trust (Government Corruption)'})

csv2019=csv2019.rename(columns={'Overall rank': 'Happiness Rank','Country or region':'Country',
                                'Score':'Happiness Score','GDP per capita':'Economy (GDP per Capita)',
                                'Healthy life expectancy':'Health (Life Expectancy)',
                                'Freedom to make life choices':'Freedom',
                                'Trust..Government.Corruption.':'Trust (Government Corruption)',
                                'Perceptions of corruption':'Trust (Government Corruption)'})

csv2020=csv2020.rename(columns={'Country name':'Country', 'Ladder score':'Happiness Score',
                              'Logged GDP per capita':'Economy (GDP per Capita)',
                              'Healthy life expectancy':'Health (Life Expectancy)',
                               'Freedom to make life choices':'Freedom',
                               'Perceptions of corruption':'Trust (Government Corruption)',
                               'Dystopia + residual':'Dystopia Residual'
                              })                               
                                

In [248]:
# add happiness rank to csv2020

csv2020['Happiness Rank']=np.arange(len(csv2020))
csv2020

Unnamed: 0,Country,Happiness Score,Economy (GDP per Capita),Social support,Health (Life Expectancy),Freedom,Generosity,Trust (Government Corruption),Dystopia Residual,Happiness Rank
0,Finland,7.8087,10.639267,0.954330,71.900825,0.949172,-0.059482,0.195445,2.762835,0
1,Denmark,7.6456,10.774001,0.955991,72.402504,0.951444,0.066202,0.168489,2.432741,1
2,Switzerland,7.5599,10.979933,0.942847,74.102448,0.921337,0.105911,0.303728,2.350267,2
3,Iceland,7.5045,10.772559,0.974670,73.000000,0.948892,0.246944,0.711710,2.460688,3
4,Norway,7.4880,11.087804,0.952487,73.200783,0.955750,0.134533,0.263218,2.168266,4
...,...,...,...,...,...,...,...,...,...,...
148,Central African Republic,3.4759,6.625160,0.319460,45.200001,0.640881,0.082410,0.891807,2.860198,148
149,Rwanda,3.3123,7.600104,0.540835,61.098846,0.900589,0.055484,0.183541,0.548445,149
150,Zimbabwe,3.2992,7.865712,0.763093,55.617260,0.711458,-0.072064,0.810237,0.841031,150
151,South Sudan,2.8166,7.425360,0.553707,51.000000,0.451314,0.016519,0.763417,1.378751,151


In [250]:
# make into 1 data set


csv2015['Year']=2015
csv2016['Year']=2016
csv2017['Year']=2017
csv2018['Year']=2018
csv2019['Year']=2019
csv2020['Year']=2020

frames = [csv2015,csv2016,csv2017,csv2018,csv2019,csv2020]

happiness_df = pd.concat(frames)
happiness_df

Unnamed: 0,Country,Happiness Rank,Happiness Score,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,Year,Social support
0,Switzerland,1,7.5870,1.396510,1.34951,0.941430,0.665570,0.419780,0.296780,2.517380,2015,
1,Iceland,2,7.5610,1.302320,1.40223,0.947840,0.628770,0.141450,0.436300,2.702010,2015,
2,Denmark,3,7.5270,1.325480,1.36058,0.874640,0.649380,0.483570,0.341390,2.492040,2015,
3,Norway,4,7.5220,1.459000,1.33095,0.885210,0.669730,0.365030,0.346990,2.465310,2015,
4,Canada,5,7.4270,1.326290,1.32261,0.905630,0.632970,0.329570,0.458110,2.451760,2015,
...,...,...,...,...,...,...,...,...,...,...,...,...
148,Central African Republic,148,3.4759,6.625160,,45.200001,0.640881,0.891807,0.082410,2.860198,2020,0.319460
149,Rwanda,149,3.3123,7.600104,,61.098846,0.900589,0.183541,0.055484,0.548445,2020,0.540835
150,Zimbabwe,150,3.2992,7.865712,,55.617260,0.711458,0.810237,-0.072064,0.841031,2020,0.763093
151,South Sudan,151,2.8166,7.425360,,51.000000,0.451314,0.763417,0.016519,1.378751,2020,0.553707


Summary description of the data frame's numerical columns

In [251]:
happiness_df.describe(include=None)

Unnamed: 0,Happiness Rank,Happiness Score,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,Year,Social support
count,935.0,935.0,935.0,470.0,935.0,935.0,934.0,935.0,623.0,935.0,309.0
mean,78.256684,5.394436,2.287264,0.990347,11.057834,0.472008,0.224981,0.180425,2.063148,2017.485561,1.01071
std,45.028594,1.124935,3.161344,0.318707,23.799414,0.201962,0.254946,0.153977,0.567172,1.70826,0.304093
min,0.0,2.5669,0.0,0.0,0.0,0.0,0.0,-0.300907,0.257241,2015.0,0.0
25%,39.0,4.54,0.695145,0.793,0.508,0.337772,0.061039,0.098152,1.703152,2016.0,0.806092
50%,78.0,5.3535,1.07,1.025665,0.70806,0.46582,0.1108,0.183,2.081786,2017.0,0.921125
75%,117.0,6.1985,1.395705,1.228745,0.89235,0.585785,0.2853,0.262,2.431136,2019.0,1.274
max,158.0,7.8087,11.450681,1.610574,76.804581,0.974998,0.935585,0.838075,3.83772,2020.0,1.624


In [272]:
# For analysis 1 - sort dataframes into top 5 happiest countries by year



happiest_c_2015 = csv2015.sort_values(by=['Happiness Rank']).head(5)
happiest_c_2015
happiest_c_2016 = csv2016.sort_values(by=['Happiness Rank']).head(5)
happiest_c_2016
happiest_c_2017 = csv2017.sort_values(by=['Happiness Rank']).head(5)
happiest_c_2017
happiest_c_2018 = csv2018.sort_values(by=['Happiness Rank']).head(5)
happiest_c_2018
happiest_c_2019 = csv2019.sort_values(by=['Happiness Rank']).head(5)
happiest_c_2019

# no data for 2020
happiest_c_2020 = csv2020.sort_values(by=['Happiness Rank']).head(5)
# print(happiest_c_2020)

# combine into 1 data frame

happiest = [happiest_c_2015,happiest_c_2016,happiest_c_2017,happiest_c_2018,happiest_c_2019, happiest_c_2020]
happest_c_df = pd.concat(happiest)
rouded_happiest_df = happest_c_df.round({'Happiness Score': 2})
rouded_happiest_df

Unnamed: 0,Country,Happiness Rank,Happiness Score,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,Year,Social support
0,Switzerland,1,7.59,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738,2015,
1,Iceland,2,7.56,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201,2015,
2,Denmark,3,7.53,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204,2015,
3,Norway,4,7.52,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531,2015,
4,Canada,5,7.43,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176,2015,
0,Denmark,1,7.53,1.44178,1.16374,0.79504,0.57941,0.44453,0.36171,2.73939,2016,
1,Switzerland,2,7.51,1.52733,1.14524,0.86303,0.58557,0.41203,0.28083,2.69463,2016,
2,Iceland,3,7.5,1.42666,1.18326,0.86733,0.56624,0.14975,0.47678,2.83137,2016,
3,Norway,4,7.5,1.57744,1.1269,0.79579,0.59609,0.35776,0.37895,2.66465,2016,
4,Finland,5,7.41,1.40598,1.13464,0.81091,0.57104,0.41004,0.25492,2.82596,2016,


In [274]:
# For analysis 2 - sort dataframe into 3 happiest countries that appear each year


happiest_c_2015_2 = happiness_df[(happiness_df['Country'] == 'Denmark' )|(happiness_df['Country'] == 'Iceland') | (happiness_df['Country'] == 'Norway') ]
happiest_c_2015_2_2=happiest_c_2015_2[(happiest_c_2015_2['Year']==2015)|(happiest_c_2015_2['Year']==2016)|(happiest_c_2015_2['Year']==2017)|(happiest_c_2015_2['Year']==2018)|(happiest_c_2015_2['Year']==2019)]
happiest_c_2015_2_2['Year'].astype(int)
happiest_c_2015_2_2


Unnamed: 0,Country,Happiness Rank,Happiness Score,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,Year,Social support
1,Iceland,2,7.561,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201,2015,
2,Denmark,3,7.527,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204,2015,
3,Norway,4,7.522,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531,2015,
0,Denmark,1,7.526,1.44178,1.16374,0.79504,0.57941,0.44453,0.36171,2.73939,2016,
2,Iceland,3,7.501,1.42666,1.18326,0.86733,0.56624,0.14975,0.47678,2.83137,2016,
3,Norway,4,7.498,1.57744,1.1269,0.79579,0.59609,0.35776,0.37895,2.66465,2016,
0,Norway,1,7.537,1.616463,1.533524,0.796667,0.635423,0.315964,0.362012,2.277027,2017,
1,Denmark,2,7.522,1.482383,1.551122,0.792566,0.626007,0.40077,0.35528,2.313707,2017,
2,Iceland,3,7.504,1.480633,1.610574,0.833552,0.627163,0.153527,0.47554,2.322715,2017,
1,Norway,2,7.594,1.456,,0.861,0.686,0.34,0.286,,2018,


In [275]:
# for analysis 3

# sort dataframes into bottom 5 happiest countries by year



unhappiest_c_2015 = csv2015.sort_values(by=['Happiness Rank']).tail(3)
unhappiest_c_2015
unhappiest_c_2016 = csv2016.sort_values(by=['Happiness Rank']).tail(3)
unhappiest_c_2016
unhappiest_c_2017 = csv2017.sort_values(by=['Happiness Rank']).tail(3)
unhappiest_c_2017
unhappiest_c_2018 = csv2018.sort_values(by=['Happiness Rank']).tail(3)
unhappiest_c_2018
unhappiest_c_2019 = csv2019.sort_values(by=['Happiness Rank']).tail(3)
unhappiest_c_2019

# no data for 2020
unhappiest_c_2020 = csv2020.sort_values(by=['Happiness Rank']).tail(3)
# print(happiest_c_2020)

# combine into 1 data frame

unhappiest = [unhappiest_c_2015,unhappiest_c_2016,unhappiest_c_2017,unhappiest_c_2018,unhappiest_c_2019, unhappiest_c_2020]
unhappiest_c_df = pd.concat(unhappiest)
unhappiest_c_df

Unnamed: 0,Country,Happiness Rank,Happiness Score,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,Year,Social support
155,Syria,156,3.006,0.6632,0.47489,0.72193,0.15684,0.18906,0.47179,0.32858,2015,
156,Burundi,157,2.905,0.0153,0.41587,0.22396,0.1185,0.10062,0.19727,1.83302,2015,
157,Togo,158,2.839,0.20868,0.13995,0.28443,0.36453,0.10731,0.16681,1.56726,2015,
154,Togo,155,3.303,0.28123,0.0,0.24811,0.34678,0.11587,0.17517,2.1354,2016,
155,Syria,156,3.069,0.74719,0.14866,0.62994,0.06912,0.17233,0.48397,0.81789,2016,
156,Burundi,157,2.905,0.06831,0.23442,0.15747,0.0432,0.09419,0.2029,2.10404,2016,
152,Tanzania,153,3.349,0.511136,1.04199,0.364509,0.390018,0.066035,0.354256,0.62113,2017,
153,Burundi,154,2.905,0.091623,0.629794,0.151611,0.059901,0.084148,0.204435,1.683024,2017,
154,Central African Republic,155,2.693,0.0,0.0,0.018773,0.270842,0.056565,0.280876,2.066005,2017,
153,South Sudan,154,3.254,0.337,,0.177,0.112,0.106,0.224,,2018,


In [282]:

# take out the bottom ones

unhappiest_c_2015_2 = unhappiest_c_df[(unhappiest_c_df['Country'] == 'Central African Republic')|(unhappiest_c_df['Country'] == 'Togo') | (unhappiest_c_df['Country'] == 'Burundi')  | (unhappiest_c_df['Country'] == 'South Sudan')]
unhappiest_c_2015_2=unhappiest_c_2015_2[(unhappiest_c_2015_2['Year']==2015)|(unhappiest_c_2015_2['Year']==2016)|(unhappiest_c_2015_2['Year']==2017)|(unhappiest_c_2015_2['Year']==2018)|(unhappiest_c_2015_2['Year']==2019)]
unhappiest_c_2015_2['Year'].astype(int)



156    2015
157    2015
154    2016
156    2016
153    2017
154    2017
153    2018
154    2018
155    2018
154    2019
155    2019
Name: Year, dtype: int64

In [277]:
# Join with happiest for comparison

frames3=[happiest_c_2015_2_2,unhappiest_c_2015_2]
result = pd.concat(frames3)




## Data analysis & visualizations

In [278]:
import altair as alt

#top 3 Happiest countries by year



csv2015.sort_values(by=['Happiness Rank']).head(5)




chart=(alt.Chart(rouded_happiest_df).mark_bar().encode(
     alt.X('Happiness Score', scale=alt.Scale(domain=(7.3, 7.83))),
     alt.Y('Country'), color=alt.Color('Country', legend=None))).properties(
    
    height=200, width=200)

text = chart.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='Happiness Score:Q'
)

(chart + text).facet(
    facet='Year:N', columns=2).resolve_axis(
    x='independent',
    y='independent',
).configure_header(labelFontSize=25).properties(
    title='Top 5 Happiest Countries by Year '
)

Interesting! Over the past 5 years, the happiest self-ranked countries were  Canada, Denmark, Finland, Iceland, the Netherlands, Norway and Switzerland! Denmark, Iceland and Norway appear in the top 5 rankings for every year. Lets investigate what contributes the most to their happiness score, and how these elements change over time.

In [279]:
alt.Chart(happiest_c_2015_2_2).mark_line().transform_fold(
    fold=['Health (Life Expectancy)', 'Freedom', 'Economy (GDP per Capita)','Trust (Government Corruption)','Generosity'], 
    as_=['variable', 'value']
).encode(
    x='Year:O',y=alt.Y('max(value):Q',axis=alt.Axis(title="Amount of Hapiness Score")),
    color='variable:N'
).properties(
    width=400,
    height=300).facet(
    facet='Country', columns=1).resolve_axis(
    x='independent',
    y='independent',
).configure_header(labelFontSize=25).properties(
    title='Top Contibutors to Happiest Countries Score'
)



As we can see from the 3 charts above, a countries Economy (GPD per capita) makes up the majority of their happiness score. Next is Health (Life expentancy), Freedom , Generosity lastly Trust. All 3 of these happy countries follow this pattern, except in Norway where Trust and Generosity seem to flip between eachother over the years. By looking at these happiness score contributer weights, we can see visually that for Norway, Iceland and Denmark Economy (GPD per capita) makes up between 1.3 and 1.6 units of their happiness scores. Health (Life expentancy) makes up between 0.8 and 1, Freedom makes up between 0.6 and 0.7 and Trust and Generosity are bewteen 0.1 and 0.5. Since GDP seems to have the biggest influence and be the biggest contributer to a countries happiness score, lets investigate this for the most 'unhappy' countries. 

In [283]:
alt.Chart(result).mark_bar().encode(
    x=alt.X('Country',sort='y'),
    y=alt.Y("Economy (GDP per Capita)",scale=alt.Scale(domain=(0, 1.6))),
    
    color=alt.Color('Country', legend=None)
    ).facet(
    facet='Year').resolve_axis(
    x='independent',
    y='independent',
).configure_header(labelFontSize=16).properties(
    title='GDP Comparison of Unhappiest Ranked Countries to Happiest Ranked Countries'
)


This chart shows us the drastic comparison between the amount GDP contributes to a countries happiness score for the most happy and least happy countries. Remember, a countries GDP score is calculated from the countries self ranked happiness score, and is the how much the researchers think that the GDP column contributes to the happiness score. This visualization makes sense if we look back at the data wrangling section and see the low overall happiness scores for the least happy countries seen here. Since the unhappy countries overall happiness scores are lower, it makes sense that their GDP contribution to that score would also be much lower.

In [23]:
happiness_df.groupby(["Country"]).var().sort_values(["Happiness Rank"], ascending = False)['Happiness Rank'][0:10]


Venezuela = happiness_df[(happiness_df["Country"]=="Venezuela") |
 (happiness_df["Country"]=="Benin")|
  (happiness_df["Country"]=="Ivory Coast")|
  (happiness_df["Country"]=="Algeria")]
Venezuela = Venezuela[Venezuela["Year"]!=2020]

In [24]:
Vpoints = alt.Chart(Venezuela).mark_line().encode(
    x=alt.X('Year:Q', 
     scale=alt.Scale(domain=[2015,2019])),
    y=alt.Y(alt.repeat(),
     type='quantitative',
     scale=alt.Scale()),
     color='Country',
     shape=alt.Shape('Country', legend=None)
).repeat(repeat=["Happiness Rank",
                "Happiness Score",
                 "Economy (GDP per Capita)",
                 "Health (Life Expectancy)",
                 "Freedom",
                 "Trust (Government Corruption)",
                 "Generosity"],
     columns=2
).properties(title="Countries Scores vs. Factors")

Vpoints

## Summary



This analysis has taken us through a couple different aspects of this data set. We have examined that the countries that show up the the top 5 'happiest' countries from 2015-2020 are Canada, Denmark, Finland, Iceland, the Netherlands, Norway and Switzerland. Denmark, Iceland and Norway appear every year in the top 5. When we investiagted what column/element (that each year contained) contributes the most the the happiness score, it was found to be a countries GDP. GDP makes up more of the happiness score than the columns Health, Freedom, Trust and Generosity. After wrangling the data frame and plotting the GDP score for the happiest and un happiest countries (Central African Republic, Togo, Burundi and South Sudan), we can see that their GDP scores are much lower than 
than those of the happiest countries, but this is to be expected since their overall happiness scores were lower. 




## Conclusions

Since this data set was based of the countries own surverys to determine happiness score, it is a bit difficult to pull many deep and meaninful conclusions from this analysis. The resulting columns were a black box as to how they are generated, but are thought to be based upon how much it is thought these elements impact the happiness score. This data set was also not complete, and many columns were missing from certain years, making analysis a bit difficult. This is why the 'dystopia residual' column was not used in Figure 2, as it was missing for multiple years.

We can however conclude the following. The countries that rank themselves as the most happy are mainly Scandinavian and Northern European countries. These countries GDP make up a large portion of their happiness scores, and both the happiness scores and affiliated GDP scores are lower for the unhappiest ranked countries like Central African Republic, Togo, Burundi and South Sudan.





## Reseach Questions

1. I would be interested in seeing the real and historical GDP data for the 'happiest' and 'unhappiest' countries that were analyzed in this study. I would want to know if the top 5 ranked happiest countries and unhappiest countries for each year were infact the countries with the higest calculated GDP for that year. We would need the GDP data for each of those years to compare.