**Context**

The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.

**Content**

The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. The scores are from nationally representative samples for the years 2013-2016 and use the Gallup weights to make the estimates representative. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others.

**Inspiration**

What countries or regions rank the highest in overall happiness and each of the six factors contributing to happiness? How did country ranks or scores change between the 2015 and 2016 as well as the 2016 and 2017 reports? Did any country experience a significant increase or decrease in happiness?

*What is Dystopia?*

Dystopia is an imaginary country that has the world’s least-happy people. The purpose in establishing Dystopia is to have a benchmark against which all countries can be favorably compared (no country performs more poorly than Dystopia) in terms of each of the six key variables, thus allowing each sub-bar to be of positive width. The lowest scores observed for the six key variables, therefore, characterize Dystopia. Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom and least social support, it is referred to as “Dystopia,” in contrast to Utopia.

*What are the residuals?*

The residuals, or unexplained components, differ for each country, reflecting the extent to which the six variables either over- or under-explain average 2014-2016 life evaluations. These residuals have an average value of approximately zero over the whole set of countries. Figure 2.2 shows the average residual for each country when the equation in Table 2.1 is applied to average 2014- 2016 data for the six variables in that country. We combine these residuals with the estimate for life evaluations in Dystopia so that the combined bar will always have positive values. As can be seen in Figure 2.2, although some life evaluation residuals are quite large, occasionally exceeding one point on the scale from 0 to 10, they are always much smaller than the calculated value in Dystopia, where the average life is rated at 1.85 on the 0 to 10 scale.

*What do the columns succeeding the Happiness Score(like Family, Generosity, etc.) describe?*

The following columns: GDP per Capita, Family, Life Expectancy, Freedom, Generosity, Trust Government Corruption describe the extent to which these factors contribute in evaluating the happiness in each country. 
The Dystopia Residual metric actually is the Dystopia Happiness Score(1.85) + the Residual value or the unexplained value for each country as stated in the previous answer.

If you add all these factors up, you get the happiness score so it might be un-reliable to model them to predict Happiness Scores.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
df3=pd.read_csv("/kaggle/input/world-happiness/2017.csv")
df5=pd.read_csv('/kaggle/input/world-happiness/2019.csv')
df2=pd.read_csv('/kaggle/input/world-happiness/2016.csv')
df1=pd.read_csv('/kaggle/input/world-happiness/2015.csv')
df4=pd.read_csv('/kaggle/input/world-happiness/2018.csv')

Part 1:  I will first check on the earliest dataset. As I comefrom Europe I want to check if people from the Northern Europe are really "happiest" ? I will filter my data and present score on geomap.  
I will lookinto "Freedom "on my first dataset asit seems to be an important factor. Even the regressionisn't appropriate I will use heatmap to show that columns with higher values willhave stronger correlation with the Happiness score. 

In [None]:
df1.head(10)

In [None]:
df1.columns

In [None]:
df1.info()

In [None]:
df1['Country'].nunique()

In [None]:
df1['Region'].unique()

In [None]:
w_europe=df1[df1['Region']=='Western Europe']

In [None]:
w_europe.info()

In [None]:
w_europe.plot(x='Country', y=['Happiness Score','Economy (GDP per Capita)'], kind="bar",figsize=(10,8))
plt.xticks(rotation='vertical')

In [None]:
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go

In [None]:
data=dict(type='choropleth',
          locations=w_europe['Country'],
          locationmode='country names',
          colorscale="Jet",
          text=w_europe['Country'],
          z=w_europe['Happiness Score'],
          colorbar={'title':'Happiness score'})

layout=dict(title='2015 Western Europe Happiness Score',geo={'scope':'europe'})
choromap=go.Figure(data=[data],layout=layout)
iplot(choromap)

As per that saying, "As further south as more sad" - it seems that the happiest countries in the western Europe are the Northern countries and the least happy are Southern countries.

I will now look at the entire world Happiness Score and the Freedom score presented on the map in year 2015.

In [None]:
data=dict(type='choropleth',
          locations=df1['Country'],
          locationmode='country names',
          colorscale="blues",
          text=df1['Country'],
          z=df1['Happiness Score'],
          colorbar={'title':'Happiness score'})

layout=dict(title='2015 World Happiness Score',geo=dict(showframe=True,projection={'type':'natural earth'}))
choromap=go.Figure(data=[data],layout=layout)
iplot(choromap)


data=dict(type='choropleth',
          locations=df1['Country'],
          locationmode='country names',
          colorscale="blues",
          text=df1['Country'],
          z=df1['Freedom'],
          colorbar={'title':'Freedom score'})

layout=dict(title='2015 World Freedom Score',geo=dict(showframe=True,projection={'type':'natural earth'}))
choromap=go.Figure(data=[data],layout=layout)
iplot(choromap)

I can not compare scores for happines and Freedom as they're calulcated on a different scale. I could scale them but I can also compare results visually. That is why I have used the same palette because I can see that countries with high Hapiness score have high freedom score and vice versa. This can tell us that the freedom score is correlated with Hapiness Score. Let's check the heatmap.

In [None]:
plt.figure(figsize=(12,8))
sns.heatmap(df1.corr(),cmap='coolwarm',annot=True)
a,b=plt.ylim()
a+=0.5
b-=0.5
plt.ylim(a,b)

This doesn't tells us a lot. As the scores are increasing acroos columns, starting from right to left it is to expect that columns that are more to left will have higher correlationwith the Happiness Score. Also, happiness is the sum of all these scores so correlation coeffincient is not a very good method.

Part 2 : I will look into the latest development,so the year 2019. I will present it on the map and compare it to the first one.




In [None]:
df5.info()

In [None]:
df5['Country or region'].unique()

In [None]:
df5=df5.replace('Congo (Brazzaville)','Republic of the Congo')

In [None]:
df5=df5.replace('Congo (Kinshasa)','Democratic Republic of the Congo')

In [None]:
data=dict(type='choropleth',
          locations=df1['Country'],
          locationmode='country names',
          colorscale="Jet",
          text=df1['Country'],
          z=df1['Happiness Score'],
          colorbar={'title':'Happiness score'})

layout=dict(title='2015 World Happiness Score',geo=dict(showframe=True,projection={'type':'natural earth'}))
choromap=go.Figure(data=[data],layout=layout)
iplot(choromap)

data=dict(type='choropleth',
          locations=df5['Country or region'],
          locationmode='country names',
          colorscale="Jet",
          text=df5['Country or region'],
          z=df5['Score'],
          colorbar={'title':'Happiness score'})

layout=dict(title='2019 World Happiness Score',geo=dict(showframe=True,projection={'type':'natural earth'}))
choromap=go.Figure(data=[data],layout=layout)
iplot(choromap)



It is obvious from the map above that scores in some countries have decreased/increased in 2019 ( Russia,South and Central America, Africa...). However it is hard to see from the map for how much have scores decreased/increased compared to 2015.
I will make a new dataset that will have only two columns: country name and % difference in Happiness Score (compared to 2015). In order to do that I need to make sure that countries have same names, so it would be easier to use ISO-3 standard in this case.

In [None]:
import pycountry

input_countries = df1['Country'].unique()

countries = {}
for country in pycountry.countries:
    countries[country.name] = country.alpha_3

df1_codes = [countries.get(country, 'Unknown code') for country in input_countries]

print(df1_codes)
    

In [None]:
df1['Country'].unique()

Let's replace Unknown codes!

In [None]:
df1['Country code']=['CHE', 'ISL', 'DNK', 'NOR', 'CAN', 'FIN', 'NLD', 'SWE', 'NZL', 'AUS', 'ISR', 'CRI',
           'AUT', 'MEX', 'USA', 'BRA', 'LUX', 'IRL', 'BEL', 'ARE', 'GBR', 'OMN', 'VEN', 'SGP',
           'PAN', 'DEU', 'CHL', 'QAT', 'FRA', 'ARG', 'CZE', 'URY', 'COL', 'THA', 'SAU', 'ESP',
           'MLT', 'TWN', 'KWT', 'SUR', 'TTO', 'SLV', 'GTM', 'UZB', 'SVK', 'JPN', 'KOR','ECU',
           'BHR', 'ITA', 'BOL', 'MDA', 'PRY', 'KAZ', 'SVN', 'LTU', 'NIC', 'PER', 'BLR', 'POL', 
           'MYS', 'HRV', 'LBY', 'RUS', 'JAM', 'NaN','CYP', 'DZA', 'KOS', 'TKM', 'MUS', 'HKG', 'EST',
           'IDN', 'VNM', 'TUR', 'KGZ', 'NGA', 'BTN', 'AZE', 'PAK', 'JOR', 'MNE', 'CHN', 'ZMB', 
           'ROU', 'SRB', 'PRT', 'LVA', 'PHL', 'SOM', 'MAR', 'MKD', 'MOZ', 'ALB', 'BIH', 'LSO',
           'DOM', 'LAO', 'MNG', 'SWZ', 'GRC', 'LBN', 'HUN', 'HND', 'TJK', 'TUN', 'PSE', 'BGD',
           'IRN', 'UKR', 'IRQ', 'ZAF', 'GHA', 'ZWE', 'LBR', 'IND', 'SDN', 'HTI', 'COD', 'NPL',
           'ETH', 'SLE', 'MRT', 'KEN', 'DJI', 'ARM', 'BWA', 'MMR', 'GEO', 'MWI', 'LKA', 'CMR',
           'BGR', 'EGY', 'YEM', 'AGO', 'MLI', 'COG', 'COM', 'UGA', 'SEN', 'GAB', 'NER', 'KHM',
           'TZA', 'MDG', 'CAF', 'TCD', 'GIN', 'CIV', 'BFA', 'AFG', 'RWA', 'BEN', 
           'SYR', 'BDI', 'TGO']

In [None]:
df1=df1[['Country','Country code', 'Region', 'Happiness Rank', 'Happiness Score',
       'Standard Error', 'Economy (GDP per Capita)', 'Family',
       'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)',
       'Generosity', 'Dystopia Residual']]

In [None]:
df1.head()

In [None]:
df5['Country or region'].unique()

In [None]:
df1['Country'].nunique()

In order to investigate the percentage change in Happiness Score in countries in 2019 compared to 2015, I need to check the list of countries and rename similar ones. Also, for the sake of comparison,I will omit countries that are not in both tables.

In [None]:
n=0
for country in df1['Country'].unique() :
    if country not in df5['Country or region'].unique():
        n+=1
        print( country,False)
print(n)

In [None]:
n=0
for country in df5['Country or region'].unique() :
    if country not in df1['Country'].unique():
        n+=1
        print( country,False)
print(n)

From here I see that some countrieshave slightly different names and I will rename them in df1 and omit thsethat are not in both dataframes.

In [None]:
#dropping countries that do not belong to both datasets
df1_1=df1.drop(df1[df1['Country']=='Oman'].index)

In [None]:
df1_1=df1_1.drop(df1_1[df1_1['Country']=='Suriname'].index)

In [None]:
df1_1=df1_1.drop(df1_1[df1_1['Country']=='Djibouti'].index)

In [None]:
df1_1=df1_1.drop(df1_1[df1_1['Country']=='Angola'].index)

In [None]:
len(df1_1)

In [None]:
df1_1['Country code']=['CHE', 'ISL', 'DNK', 'NOR', 'CAN', 'FIN', 'NLD', 'SWE', 'NZL', 'AUS', 'ISR', 'CRI',
           'AUT', 'MEX', 'USA', 'BRA', 'LUX', 'IRL', 'BEL', 'ARE', 'GBR', 'VEN', 'SGP',
           'PAN', 'DEU', 'CHL', 'QAT', 'FRA', 'ARG', 'CZE', 'URY', 'COL', 'THA', 'SAU', 'ESP',
           'MLT', 'TWN', 'KWT', 'TTO', 'SLV', 'GTM', 'UZB', 'SVK', 'JPN', 'KOR','ECU',
           'BHR', 'ITA', 'BOL', 'MDA', 'PRY', 'KAZ', 'SVN', 'LTU', 'NIC', 'PER', 'BLR', 'POL', 
           'MYS', 'HRV', 'LBY', 'RUS', 'JAM', 'NaN','CYP', 'DZA', 'KOS', 'TKM', 'MUS', 'HKG', 'EST',
           'IDN', 'VNM', 'TUR', 'KGZ', 'NGA', 'BTN', 'AZE', 'PAK', 'JOR', 'MNE', 'CHN', 'ZMB', 
           'ROU', 'SRB', 'PRT', 'LVA', 'PHL', 'SOM', 'MAR', 'MKD', 'MOZ', 'ALB', 'BIH', 'LSO',
           'DOM', 'LAO', 'MNG', 'SWZ', 'GRC', 'LBN', 'HUN', 'HND', 'TJK', 'TUN', 'PSE', 'BGD',
           'IRN', 'UKR', 'IRQ', 'ZAF', 'GHA', 'ZWE', 'LBR', 'IND', 'SDN', 'HTI', 'COD', 'NPL',
           'ETH', 'SLE', 'MRT', 'KEN', 'ARM', 'BWA', 'MMR', 'GEO', 'MWI', 'LKA', 'CMR',
           'BGR', 'EGY', 'YEM', 'MLI', 'COG', 'COM', 'UGA', 'SEN', 'GAB', 'NER', 'KHM',
           'TZA', 'MDG', 'CAF', 'TCD', 'GIN', 'CIV', 'BFA', 'AFG', 'RWA', 'BEN', 
           'SYR', 'BDI', 'TGO']

In [None]:
df5_1=df5.drop(df5[df5['Country or region']=='Namibia'].index)

In [None]:
df5_1=df5_1.drop(df5_1[df5_1['Country or region']=='Gambia'].index)

In [None]:
df1_1['Country'].nunique()

In [None]:
df5_1['Country or region'].nunique()

In [None]:
n=0
for country in df1_1['Country'].unique() :
    if country not in df5_1['Country or region'].unique():
        n+=1
        print( country,False)
print(n)

In [None]:
n=0
for country in df5_1['Country or region'].unique() :
    if country not in df1_1['Country'].unique():
        n+=1
        print( country,False)
print(n)

In [None]:
#renaming similar country names
df5_1=df5_1.replace('Trinidad & Tobago','Trinidad and Tobago')

In [None]:
df5_1=df5_1.replace(['Northern Cyprus','North Macedonia'],['North Cyprus','Macedonia'])

In [None]:
df5_1=df5_1.replace('South Sudan','Sudan')

In [None]:
df1_1=df1_1.replace(['Congo (Brazzaville)','Somaliland region','Congo (Kinshasa)'],
                 ['Republic of the Congo','Somalia','Democratic Republic of the Congo'])
                

In [None]:
n=0
for country in df1_1['Country'].unique() :
    if country not in df5_1['Country or region'].unique():
        n+=1
        print( country,False)
print(n)

In [None]:
n=0
for country in df5_1['Country or region'].unique() :
    if country not in df1_1['Country'].unique():
        n+=1
        print( country,False)
print(n)

Great now we have the same country names. I need to extract the country name, codes and the Hapinnes score for 2015 and 2019 and make a new dataframe, but first I need to sort countries in both dataframes.


In [None]:
diff=df1_1.sort_values('Country')[['Country','Country code', 'Happiness Score']]

In [None]:
diff.head()

In [None]:
diff.info()

In [None]:
diff.info()

In [None]:
sorted2=df5_1.sort_values('Country or region')

In [None]:
df5_1.head(50)

In [None]:
sorted2.head(30)

In [None]:
sorted2.info()

In [None]:
diff['Country'].unique()==sorted2['Country or region'].unique()

In [None]:
l=diff['Country'].unique()

In [None]:
l

In [None]:
l2=sorted2['Country or region'].unique()

In [None]:
l2

In [None]:
l==l2

This confirms that the order of countris in both dataframes is the same. 
Now I can simply make a new dataframe,and add happines score from 2019.

I just need to reset the index in both dataframes.

In [None]:
diff=diff.reset_index(drop=True)

In [None]:
diff.head(10)

In [None]:
sorted2=sorted2.reset_index(drop=True)

In [None]:
sorted2.head(30)

In [None]:
diff["Happiness score 2019"]=sorted2['Score']

In [None]:
sorted2['Score']

In [None]:
diff.info()

In [None]:
diff=diff.rename(columns={'Happiness Score':'Happiness score 2015'})

In [None]:
diff['% difference']=round(((diff['Happiness score 2019']-diff['Happiness score 2015'])/
                     diff['Happiness score 2015']*100),2)

In [None]:
diff.head(5)

There is too much data to present on one plot so  I will divide data into 2 subsets with + percentage difference and - percentage difference.

In [None]:
diff_pos=diff[diff['% difference']>=0][['Country','Country code','% difference']].sort_values(by='% difference',
                                                                               axis=0)

In [None]:
diff_pos.tail(5)

In [None]:
diff_pos.apply(len)

In [None]:
diff_neg=diff[diff['% difference']<0][['Country','Country code','% difference']].sort_values(by='% difference',
                                                                               axis=0)

In [None]:
diff_neg.head()

In [None]:
diff_neg.apply(len)

In [None]:
diff_pos.plot(x='Country',y='% difference', kind='bar',figsize=(20,10))

In [None]:
print(f"Percentage of countries whose Happiness score has increased for more than 20% is:{len(diff_pos[diff_pos['% difference']>20])/len(diff_pos)*100}")

In [None]:
diff_neg.plot(x='Country',y='% difference', kind='bar',figsize=(20,10))

In [None]:
diff_neg.head(20)

In [None]:
print(f"Percentage of countries whose Happiness score has decreased for more than 20% is:{len(diff_neg[diff_neg['% difference']<-20])/len(diff_neg)*100}")

I will present those on a world map too.

In [None]:
data=dict(type='choropleth',
          locations=diff_pos['Country code'],
          colorscale="reds",
          text=diff_pos['Country'],
          z=diff_pos['% difference'],
          colorbar={'title':'% difference legend '})

layout=dict(title='% increase in Happiness Score 2015 to 2019',geo=dict(showframe=True,projection={'type':'natural earth'}))
choromap=go.Figure(data=[data],layout=layout)
iplot(choromap)

data=dict(type='choropleth',
          locations=diff_neg['Country code'],
          colorscale="greens_r",
          text=diff_neg['Country'],
          z=diff_neg['% difference'],
          colorbar={'title':'% difference legend '})

layout=dict(title='% decrease in Happiness Score 2015 to 2019',geo=dict(showframe=True,projection={'type':'natural earth'}))
choromap=go.Figure(data=[data],layout=layout)
iplot(choromap)

* Looking at the data, more countries have witnessed increase in Happiness index in 2019, but looking area vise and population vise, it looks that more people have witnessed decrease in Happiness Score. Also it seems that certain continents are "more happy" than the other ones. So I will divide my dataset based on continents.At this point I decided to divide countries into continents and to look into change in Happiness Score over years. 

**PART 3** :  
I have searched for possible convertor for ISO-3 codes into continents and I couldn't find it... Probably there is a much better way, but the only thing I could do was to create a list of ISO-3 codes per continents... Long way, I know :)

In [None]:
df1.columns


In [None]:
african=['DZA',
         'AGO',
'SHN',
'BEN',
'BWA',
'BFA',
'BDI',
'CMR',
'CPV',
'CAF',
'TCD',
'COM',
'COG',
'COD',
'DJI',
'EGY',
'GNQ',
'ERI',
'SWZ',
'ETH',
'GAB',
'GMB',
'GHA',
'GIN',
'GNB',
'CIV',
'KEN',
'LSO',
'LBR',
'LBY',
'MDG',
'MWI',
'MLI',
'MRT',
'MUS',
'MYT',
'MAR',
'MOZ',
'NAM',
'NER',
'NGA',
'STP',
'REU',
'RWA',
'STP',
'SEN',
'SYC',
'SLE',
'SOM',
'ZAF',
'SSD',
'SHN',
'SDN',
'SWZ',
'TZA',
'TGO',
'TUN',
'UGA',
'COD',
'ZMB',
'TZA',
'ZWE']


In [None]:
len(african)

In [None]:
european=['AND',
'AUT',
 'ALB',         
'BLR',
'BEL',
'BIH',
'BGR',
'HRV',
'CYP',
'CZE',
'DNK',
'EST',
'FRO',
'FIN',
'FRA',
'DEU',
'GIB',
'GRC',
'HUN',
'ISL',
'IRL',
'IMN',
'ITA',
'XKX',
 'KOS',         
'LVA',
'LIE',
'LTU',
'LUX',
'MKD',
'MLT',
'MDA',
'MCO',
'MNE',
'NLD',
'NOR',
'POL',
'PRT',
'ROU',
'RUS',
'SMR',
'SRB',
'SVK',
'SVN',
'ESP',
'SWE',
'CHE',
'UKR',
'GBR',
'VAT']


In [None]:
len(european)

In [None]:
n_american=['AIA',
'ATG',
'ABW',
'BHS',
'BRB',
'BLZ',
'BMU',
'BES',
'VGB',
'CAN',
'CYM',
'CRI',
'CUB',
'CUW',
'DMA',
'DOM',
'SLV',
'GRL',
'GRD',
'GLP',
'GTM',
'HTI',
'HND',
'JAM',
'MTQ',
'MEX',
'SPM',
'MSR',
'ANT',
'KNA',
'NIC',
'PAN',
'PRI',
'BES',
'SXM',
'KNA',
'LCA',
'SPM',
'VCT',
'TTO',
'TCA',
'USA',
'VIR'
]

In [None]:
len(n_american)

In [None]:
s_american=['ARG',
'BOL',
'BRA',
'CHL',
'COL',
'ECU',
'FLK',
'GUF',
'GUY',
'PRY',
'PER',
'SUR',
'URY',
'VEN'
]

In [None]:
len(s_american)

In [None]:
asian=['AFG',
'ARM',
'AZE',
'BHR',
'BGD',
'BTN',
'BRN',
'KHM',
'CHN',
'CXR',
'CCK',
'IOT',
'GEO',
'HKG',
'IND',
'IDN',
'IRN',
'IRQ',
'ISR',
'JPN',
'JOR',
'KAZ',
'KWT',
'KGZ',
'LAO',
'LBN',
'MAC',
'MYS',
'MDV',
'MNG',
'MMR',
'NPL',
'PRK',
'OMN',
'PAK',
'PSE',
'PHL',
'QAT',
'SAU',
'SGP',
'KOR',
'LKA',
'SYR',
'TWN',
'TJK',
'THA',
'TUR',
'TKM',
'ARE',
'UZB',
'VNM',
'YEM'
]

In [None]:
len(asian)

In [None]:
australian=['ASM',
'AUS',
'NZL',
'COK',
'TLS',
'FSM',
'FJI',
'PYF',
'GUM',
'KIR',
'MNP',
'MHL',
'UMI',
'NRU',
'NCL',
'NZL',
'NIU',
'NFK',
'PLW',
'PNG',
'MNP',
'WSM',
'SLB',
'TKL',
'TON',
'TUV',
'VUT',
'UMI',
'WLF'
]

In [None]:
len(australian)

In [None]:
def continent(code):
    if code in african:
        return 'AF'
    elif code in european:
        return 'EU'
    elif code in n_american:
        return 'NA'
    elif code in s_american:
        return 'SA'
    elif code in australian:
        return 'OC'
    elif code in asian:
        return 'AS'

In [None]:
df1['Continent']=df1['Country code'].apply(continent)

In [None]:
df1.head()

In [None]:
df1=df1[['Country', 'Country code','Continent', 'Region', 'Happiness Rank',
       'Happiness Score', 'Standard Error', 'Economy (GDP per Capita)',
       'Family', 'Health (Life Expectancy)', 'Freedom',
       'Trust (Government Corruption)', 'Generosity', 'Dystopia Residual']]

In [None]:
df1.info()

In [None]:
df1[df1['Continent'].isnull()]

I will drop this one as the North Cyprus  is only recognized as country by Turkey and therefore it doesn't have the ISO code.

In [None]:
df1.drop( df1[ df1['Country'] == 'North Cyprus' ].index , inplace=True)

In [None]:
len(df1)

I need to add ISO 3 codes to countries in df2,df3 and df4 and their corresponding continents.

In [None]:
input_countries = df2['Country'].unique()

countries = {}
for country in pycountry.countries:
    countries[country.name] = country.alpha_3

df2_codes = [countries.get(country, 'Unknown code') for country in input_countries]

print(df2_codes)
    

In [None]:
df2['Country'].unique()

In [None]:
df2['Country code']=['DNK', 'CHE', 'ISL', 'NOR', 'FIN', 'CAN', 'NLD', 'NZL', 'AUS', 'SWE', 'ISR', 
                     'AUT', 'USA', 'CRI', 'PRI', 'DEU', 'BRA', 'BEL', 'IRL', 'LUX', 'MEX', 'SGP',
                     'GBR', 'CHL', 'PAN', 'ARG', 'CZE', 'ARE', 'URY', 'MLT', 'COL', 'FRA', 'THA', 
                     'SAU', 'TWN', 'QAT', 'ESP', 'DZA', 'GTM', 'SUR', 'KWT', 'BHR', 'TTO', 'VEN',
                     'SVK', 'SLV', 'MYS', 'NIC', 'UZB', 'ITA', 'ECU', 'BLZ', 'JPN', 'KAZ', 'MDA',
                     'RUS', 'POL', 'KOR', 'BOL', 'LTU', 'BLR', 'NaN', 'SVN', 'PER', 'TKM', 'MUS', 
                     'LBY', 'LVA', 'CYP', 'PRY', 'ROU', 'EST', 'JAM', 'HRV', 'HKG', 'SOM', 'KOS', 
                     'TUR', 'IDN', 'JOR', 'AZE', 'PHL', 'CHN', 'BTN', 'KGZ', 'SRB', 'BIH', 'MNE',
                     'DOM', 'MAR', 'HUN', 'PAK', 'LBN', 'PRT', 'MKD', 'VNM', 'SOM', 'TUN', 'GRC',
                     'TJK', 'MNG', 'LAO', 'NGA', 'HND', 'IRN', 'ZMB', 'NPL', 'PSE', 'ALB', 'BGD',
                     'SLE', 'IRQ', 'NAM', 'CMR', 'ETH', 'ZAF', 'LKA', 'IND', 'MMR', 'EGY', 'ARM', 
                     'KEN', 'UKR', 'GHA', 'COD', 'GEO', 'COG', 'SEN', 'BGR', 'MRT', 'ZWE', 'MWI',
                     'SDN', 'GAB', 'MLI', 'HTI', 'BWA', 'COM', 'CIV', 'KHM', 'AGO', 'NER', 'SSD', 
                     'TCD', 'BFA', 'UGA', 'YEM', 'MDG', 'TZA', 'LBR', 'GIN', 'RWA', 'BEN', 'AFG', 
                     'TGO', 'SYR', 'BDI']

In [None]:
df2['Continent']=df2['Country code'].apply(continent)

In [None]:
df2=df2[['Country', 'Country code', 'Continent','Region', 'Happiness Rank', 'Happiness Score',
       'Lower Confidence Interval', 'Upper Confidence Interval',
       'Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)',
       'Freedom', 'Trust (Government Corruption)', 'Generosity',
       'Dystopia Residual']]

In [None]:
df2.head()

In [None]:
df2[df2['Continent'].isnull()]

In [None]:
df2.drop( df2[ df2['Country'] == 'North Cyprus' ].index , inplace=True)

In [None]:
df2.info()

In [None]:
df2[df2['Continent']=='None']

In [None]:
input_countries = df3['Country'].unique()

countries = {}
for country in pycountry.countries:
    countries[country.name] = country.alpha_3

df3_codes = [countries.get(country, 'Unknown code') for country in input_countries]

print(df3_codes)
    

In [None]:
df3['Country'].unique()

In [None]:
df3['Country code']=['NOR', 'DNK', 'ISL', 'CHE', 'FIN', 'NLD', 'CAN', 'NZL', 'SWE', 'AUS', 'ISR',
                     'CRI', 'AUT', 'USA', 'IRL', 'DEU', 'BEL', 'LUX', 'GBR', 'CHL', 'ARE', 'BRA',
                     'CZE', 'ARG', 'MEX', 'SGP', 'MLT', 'URY', 'GTM', 'PAN', 'FRA', 'THA', 'TWN',
                     'ESP', 'QAT', 'COL', 'SAU', 'TTO', 'KWT', 'SVK', 'BHR', 'MYS', 'NIC', 'ECU', 
                     'SLV', 'POL', 'UZB', 'ITA', 'RUS', 'BLZ', 'JPN', 'LTU', 'DZA', 'LVA', 'KOR',
                     'MDA', 'ROU', 'BOL', 'TKM', 'KAZ', 'NaN', 'SVN', 'PER', 'MUS', 'CYP', 'EST',
                     'BLR', 'LBY', 'TUR', 'PRY', 'HKG', 'PHL', 'SRB', 'JOR', 'HUN', 'JAM', 'HRV',
                     'KOS', 'CHN', 'PAK', 'IDN', 'VEN', 'MNE', 'MAR', 'AZE', 'DOM', 'GRC', 'LBN', 
                     'PRT', 'BIH', 'HND', 'MKD', 'SOM', 'VNM', 'NGA', 'TJK', 'BTN', 'KGZ', 'NPL', 
                     'MNG', 'ZAF', 'TUN', 'PSE', 'EGY', 'BGR', 'SLE', 'CMR', 'IRN', 'ALB', 'BGD', 
                     'NAM', 'KEN', 'MOZ', 'MMR', 'SEN', 'ZMB', 'IRQ', 'GAB', 'ETH', 'LKA', 'ARM',
                     'IND', 'MRT', 'COG', 'GEO', 'COD', 'MLI', 'CIV', 'KHM', 'SDN', 'GHA', 'UKR',
                     'UGA', 'BFA', 'NER', 'MWI', 'TCD', 'ZWE', 'LSO', 'AGO', 'AFG', 'BWA', 'BEN',
                     'MDG', 'HTI', 'YEM', 'SSD', 'LBR', 'GIN', 'TGO', 'RWA', 'SYR', 'TZA', 'BDI',
                     'CAF']

In [None]:
df3['Continent']=df3['Country code'].apply(continent)

In [None]:
df3=df3[['Country','Country code','Continent','Happiness.Rank', 'Happiness.Score', 'Whisker.high',
       'Whisker.low', 'Economy..GDP.per.Capita.', 'Family','Health..Life.Expectancy.', 'Freedom',
        'Generosity','Trust..Government.Corruption.', 'Dystopia.Residual']]

In [None]:
df3.head()

In [None]:
df3[df3['Continent'].isnull()]

In [None]:
df3.drop( df3[ df3['Country'] == 'North Cyprus' ].index , inplace=True)

In [None]:
df3.info()

In [None]:
input_countries = df4['Country or region'].unique()

countries = {}
for country in pycountry.countries:
    countries[country.name] = country.alpha_3

df4_codes = [countries.get(country, 'Unknown code') for country in input_countries]

print(df4_codes)
    

In [None]:
df4['Country or region'].unique()

In [None]:
df4['Country code']=['FIN', 'NOR', 'DNK', 'ISL', 'CHE', 'NLD', 'CAN', 'NZL', 'SWE', 'AUS', 'GBR',
                     'AUT', 'CRI', 'IRL', 'DEU', 'BEL', 'LUX', 'USA', 'ISR', 'ARE', 'CZE','MLT',
                     'FRA', 'MEX', 'CHL', 'TWN', 'PAN', 'BRA', 'ARG', 'GTM', 'URY', 'QAT', 'SAU',
                     'SGP', 'MYS', 'ESP', 'COL', 'TTO', 'SVK', 'SLV', 'NIC', 'POL', 'BHR', 'UZB', 
                     'KWT', 'THA', 'ITA', 'ECU', 'BLZ', 'LTU', 'SVN', 'ROU', 'LVA', 'JPN', 'MUS',
                     'JAM', 'KOR', 'NaN', 'RUS', 'KAZ', 'CYP', 'BOL', 'EST', 'PRY', 'PER', 'KOS',
                     'MDA', 'TKM', 'HUN', 'LBY', 'PHL', 'HND', 'BLR', 'TUR', 'PAK', 'HKG', 'PRT',
                     'SRB', 'GRC', 'LBN', 'MNE', 'HRV', 'DOM', 'DZA', 'MAR', 'CHN', 'AZE', 'TJK',
                     'MKD', 'JOR', 'NGA', 'KGZ', 'BIH', 'MNG', 'VNM', 'IDN', 'BTN', 'SOM', 'CMR', 
                     'BGR', 'NPL', 'VEN', 'GAB', 'PSE', 'ZAF', 'ZAF', 'IRN', 'GHA', 'SEN', 'LAO',
                     'TUN', 'ALB', 'SLE', 'COG', 'BGD', 'LKA', 'IRQ', 'MLI', 'NAM', 'KHM', 'BFA',
                     'EGY', 'MOZ', 'KEN', 'ZMB', 'MRT', 'ETH', 'GEO', 'ARM', 'MMR', 'TCD', 'COD',
                     'IND', 'NER', 'UGA', 'BEN', 'SDN', 'UKR', 'TGO', 'GIN', 'LSO', 'AGO', 'MDG', 
                     'ZWE', 'AFG', 'BWA', 'MWI', 'HTI', 'LBR', 'SYR', 'RWA', 'YEM', 'TZA', 'SSD',
                     'CAF', 'BDI']

In [None]:
df4['Continent']=df4['Country code'].apply(continent)

In [None]:
df4=df4[['Country or region','Country code', 'Continent','Overall rank','Score', 'GDP per capita',
       'Social support', 'Healthy life expectancy',
       'Freedom to make life choices', 'Generosity',
       'Perceptions of corruption']]

In [None]:
df4[df4['Continent'].isnull()]

In [None]:
df4.drop( df4[ df4['Country or region'] == 'Northern Cyprus' ].index , inplace=True)

In [None]:
df4.info()

In [None]:
input_countries = df5['Country or region'].unique()

countries = {}
for country in pycountry.countries:
    countries[country.name] = country.alpha_3

df5_codes = [countries.get(country, 'Unknown code') for country in input_countries]

print(df5_codes)
    

In [None]:
df5['Country or region'].unique()

In [None]:
df5['Country code']=['FIN', 'DNK', 'NOR', 'ISL', 'NLD', 'CHE', 'SWE', 'NZL', 'CAN', 'AUT', 'AUS',
                     'CRI', 'ISR', 'LUX', 'GBR', 'IRL', 'DEU', 'BEL', 'USA', 'CZE', 'ARE', 'MLT',
                     'MEX', 'FRA', 'TWN', 'CHL', 'GTM', 'SAU', 'QAT', 'ESP', 'PAN', 'BRA', 'URY',
                     'SGP', 'SLV', 'ITA', 'BHR', 'SVK', 'TTO', 'POL', 'UZB', 'LTU', 'COL', 'SVN',
                     'NIC', 'KOS', 'ARG', 'ROU', 'CYP', 'ECU', 'KWT', 'THA', 'LVA', 'KOR', 'EST',
                     'JAM', 'MUS', 'JPN', 'HND', 'KAZ', 'BOL', 'HUN', 'PRY', 'NaN', 'PER', 'PRT', 
                     'PAK', 'RUS', 'PHL', 'SRB', 'MDA', 'LBY', 'MNE', 'TJK', 'HRV', 'HKG', 'DOM',
                     'BIH', 'TUR', 'MYS', 'BLR', 'GRC', 'MNG', 'MKD', 'NGA', 'KGZ', 'TKM', 'DZA',
                     'MAR', 'AZE', 'LBN', 'IDN', 'CHN', 'VNM', 'BTN', 'CMR', 'BGR', 'GHA', 'CIV', 
                     'NPL', 'JOR', 'BEN', 'COG', 'GAB', 'LAO', 'ZAF', 'ALB', 'VEN', 'KHM', 'PSE', 
                     'SEN', 'SOM', 'NAM', 'NER', 'BFA', 'ARM', 'IRN', 'GIN', 'GEO', 'GMB', 'KEN',
                     'MRT', 'MOZ', 'TUN', 'BGD', 'IRQ', 'COD', 'MLI', 'SLE', 'LKA', 'MMR', 'TCD',
                     'UKR', 'ETH', 'SWZ', 'UGA', 'EGY', 'ZMB', 'TGO', 'IND', 'LBR', 'COM', 'MDG', 
                     'LSO', 'BDI', 'ZWE', 'HTI', 'BWA', 'SYR', 'MWI', 'YEM', 'RWA', 'TZA', 'AFG',
                     'CAF', 'SSD']

In [None]:
df5['Continent']=df5['Country code'].apply(continent)

In [None]:
df5=df5[['Country or region','Country code', 'Continent','Overall rank','Score', 'GDP per capita',
       'Social support', 'Healthy life expectancy',
       'Freedom to make life choices', 'Generosity',
       'Perceptions of corruption']]

In [None]:
df5[df5['Continent'].isnull()]

In [None]:
df5.drop( df5[ df5['Country or region'] == 'Northern Cyprus' ].index , inplace=True)

I will try to visually represent the data accross years and grouped by continents. In order to do that and to be able to represent plots one next to another I will create a new dataset. That set will contain <country code ',"Continent' and 'Happiness Score' from each dataframe and an additional column for year. SO, I will create column'Year' for each dataset and in the end then concatenate datasets.

In [None]:
#making sure columns have same names
df3=df3.rename(columns={'Happiness.Score':'Happiness Score'})

In [None]:
df4=df4.rename(columns={'Score':'Happiness Score'})

In [None]:
#creating first dataframe for 2015
score_15=df1[['Country code','Continent','Happiness Score']]

In [None]:
len(score_15)

In [None]:
l=['2015']*157

In [None]:
score_15['Year']=l

In [None]:
score_15

In [None]:
score_16=df2[['Country code','Continent','Happiness Score']]

In [None]:
len(score_16)

In [None]:
l=['2016']*156

In [None]:
score_16['Year']=l

In [None]:
score_16

In [None]:
score_16.info()

In [None]:
score_17=df3[['Country code','Continent','Happiness Score']]

In [None]:
len(score_17)

In [None]:
l=['2017']*154

In [None]:
score_17['Year']=l

In [None]:
score_17

In [None]:
score_18=df4[['Country code','Continent','Happiness Score']]

In [None]:
len(score_18)

In [None]:
l=['2018']*155

In [None]:
score_18['Year']=l

In [None]:
score_18.info()

In [None]:
df5=df5.rename(columns={'Score':'Happiness Score'})

In [None]:
score_19=df5[['Country code','Continent','Happiness Score']]

In [None]:
len(score_19)

In [None]:
l=['2019']*155

In [None]:
score_19['Year']=l

In [None]:
score_19

In [None]:
data=[score_15,score_16,score_17,score_18,score_19]

In [None]:
cont_score=pd.concat(data,axis=0)

In [None]:
cont_score.info()

In [None]:
score_15.groupby('Continent').mean()

Ok, now the dataset for continets is ready I will use barplot to check for averages and then boxplot to check for spread.

In [None]:
plt.figure(figsize=(12,8))
sns.barplot(x='Continent',y='Happiness Score',hue='Year',data=cont_score,errwidth=0)

From the plot above we can see that Europe has witnessed overall and constant increase in Happiness index over years. On the other hand Happiness index in South America has been in constant decrease over years.

Oceania has the most stable index and by far the highest Happiness index in the world.Africa has withnessed it's highest score in 2019 but at the same time Africa has the lowest Happiness Index in the world. 

North America had its lowest drop in Happiness index in 2017.

Asia has the second lowest score in the world and the Happiness index is similarover years


It would beinteresting to compare cumulative values too.

In [None]:
sorted(round(cont_score.groupby('Continent')['Happiness Score'].mean(),2))

In [None]:
def display_figures(ax):
    l=[ 6.13,6.11,7.29, 5.26, 6.13, 4.29]
    i=0
    for p in ax.patches:
        h=p.get_height()
        if (h>0):
            value=l[i]
            ax.text(p.get_x()+p.get_width()/2,h+0.08, value, ha='center')
            i=i+1
            
plt.figure(figsize=(12,8))
ax=sns.barplot(x='Continent',y='Happiness Score',data=cont_score,errwidth=0)
display_figures(ax)


From this barplot we can see how mean values can be deceiving. While Oceania is no doubt in lead (obvious from both charts) , South America has the third overall score. If we look at the previous chart we can see that SA is the only continent where Happiness index is consistently decreasing and therefor this average Happiness index has to be taken with "a grain of salt". 

For other continents this barchart shows pretty much the same as the previous one.


Let's look at the median and spread. I will use boxplots.

In [None]:
plt.figure(figsize=(12,8))
sns.boxplot( hue='Year', y='Happiness Score', x='Continent',data=cont_score)

From here we can see that the Oceania has the smallest spread and higest values definitely. I will confirm that by calculatinfg some statistics soon.

Asia has the highest overall data spread, followed by Europe. That only tells us that the Happiness index in these continents has high variability. In Asia the middle 50% of data are almost consistent acrros years and they have less spread than the middle 50% in Europe. The bottom and the top 25% have some extreme low and high values, while in Europe those max/min values are not as extreme when compared to the middle 50%.

If we remember the first geographical plot of happiness Index in Europe 2015, we could see that relatively wide psread of Happiness index on the map too. Norther countries are reaching max, while southernmin values of the Happiness index. 

Second smallest spread is present is South America which is actually telling s that not only the Happiness index has been steadily decreasing, but it is unuformely decreasing accross the continent.

Africa has relatively a large data spread too, but very similar accross year spread across years. The majority 50% are clustered around median, and the top and the bottom 25%are characterized with some vey low and very high values for Africa.

In [None]:
stat=cont_score.groupby('Continent').describe()

In [None]:
stat[('Happiness score','range')]=stat[('Happiness Score','max')]-stat[('Happiness Score','min')]

In [None]:
stat[('Happiness score','IQR')]=stat[('Happiness Score','75%')]-stat[('Happiness Score','50%')]

In [None]:
stat

This analysis supports above anlysis. Highest values of st.dev. are indeed for Europe, North America and Asia.The lowest one is for Oceania, folowed by South America.

The highest value of a range has Asia, and the smallest Oceania, followed by South America.All this supports above stated.

The highest values ar interestingly present in Europe and North America. 

In [None]:
cont_score

As we we can see from the data bellow the max Happiness Index every year belongs to a country form Norhtern Europe. Last two years it has been Finland.

On the other end, the min Happiness Index every year belongs to a country form Africa. 

So, even the Oceania is the "Happiest continet" overall, it doesn't hold the highest Happiness Score .

In [None]:
score_15[score_15['Happiness Score']==score_15['Happiness Score'].max()]

In [None]:
score_15[score_15['Happiness Score']==score_15['Happiness Score'].min()]

In [None]:
score_16[score_16['Happiness Score']==score_16['Happiness Score'].max()]

In [None]:
score_16[score_16['Happiness Score']==score_16['Happiness Score'].min()]

In [None]:
score_17[score_17['Happiness Score']==score_17['Happiness Score'].max()]

In [None]:
score_17[score_17['Happiness Score']==score_17['Happiness Score'].min()]

In [None]:
score_18[score_18['Happiness Score']==score_18['Happiness Score'].max()]

In [None]:
score_18[score_18['Happiness Score']==score_18['Happiness Score'].min()]

In [None]:
score_19[score_19['Happiness Score']==score_19['Happiness Score'].max()]

In [None]:
score_19[score_19['Happiness Score']==score_19['Happiness Score'].min()]