# World Happiness Report Project

1 Project Description
The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.

What is Dystopia?
Dystopia is an imaginary country that has the world’s least-happy people. The purpose in establishing Dystopia is to have a benchmark against which all countries can be favorably compared (no country performs more poorly than Dystopia) in terms of each of the six key variables, thus allowing each sub-bar to be of positive width. The lowest scores observed for the six key variables, therefore, characterize Dystopia. Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom and least social support, it is referred to as “Dystopia,” in contrast to Utopia.

What are the residuals?
The residuals, or unexplained components, differ for each country, reflecting the extent to which the six variables either over- or under-explain average life evaluations. These residuals have an average value of approximately zero over the whole set of countries. 

What do the columns succeeding the Happiness Score(like Family, Generosity, etc.) describe?
The following columns: GDP per Capita, Family, Life Expectancy, Freedom, Generosity, Trust Government Corruption describe the extent to which these factors contribute in evaluating the happiness in each country.
The Dystopia Residual metric actually is the Dystopia Happiness Score(1.85) + the Residual value or the unexplained value for each country.
The Dystopia Residual is already provided in the dataset. 
If you add all these factors up, you get the happiness score so it might be un-reliable to model them to predict Happiness Scores.
You need to predict the happiness score considering all the other factors mentioned in the dataset.
Dataset Link-
https://github.com/FlipRoboTechnologies/ML-Datasets/blob/main/World%20Happiness/happiness_score_dataset.csv


In [None]:
#We will load necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import random
import seaborn as sns

from pandas.plotting import scatter_matrix
from pandas.tools.plotting import scatter_matrix

#!pip install plotly

import plotly 
plotly.tools.set_credentials_file(username='meryemdikmen', api_key='HkPHBVsn5LcaL3ogcTm2')

In [None]:
#Data is imported and checked the variables
df5 = pd.read_csv("C:/Users/merye/Anaconda3/Datasets/World_Happiness/2015.csv", sep =',') 
df6= pd.read_csv("C:/Users/merye/Anaconda3/Datasets/World_Happiness/2016.csv", sep =',')
df7= pd.read_csv("C:/Users/merye/Anaconda3/Datasets/World_Happiness/2017.csv", sep =',')
#frames = [df5, df6, df7]
#df = pd.concat(frames)
df7.head() #See 5 head values of data 

In [None]:
#Check last 5 values of the data 
df7.tail()


In [None]:
#Checking the data types for each column 
df7.info()

In [None]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 155 entries, 0 to 154
Data columns (total 12 columns):
Country                          155 non-null object
Happiness.Rank                   155 non-null int64
Happiness.Score                  155 non-null float64
Whisker.high                     155 non-null float64
Whisker.low                      155 non-null float64
Economy..GDP.per.Capita.         155 non-null float64
Family                           155 non-null float64
Health..Life.Expectancy.         155 non-null float64
Freedom                          155 non-null float64
Generosity                       155 non-null float64
Trust..Government.Corruption.    155 non-null float64
Dystopia.Residual                155 non-null float64
dtypes: float64(10), int64(1), object(1)
memory usage: 14.6+ KB
Most of our values are in float type, only Country is object and Ranking is integer.

In [None]:
#As we have long column names, i will change some of them 
df=df7.rename(columns = {'Happiness.Rank':'Happ.Rank', 'Happiness.Score':'Happ.Score', 'Economy..GDP.per.Capita.':'GDP', 
                     'Health..Life.Expectancy.':'Life.Expect','Trust..Government.Corruption.':'Trust.to.Gov', 'Dystopia.Residual':'Dystop.Res'})
df.head()

In [None]:
#Checking if we have any NA value
df.isnull().values.any()

In [None]:
False
Our 2017 data has no "null" value.

In [None]:
#Statistical values for all numeric variables as count, max, mean and quantiles
df.describe()

Part II. Visualization¶

In [None]:
#Let's check all columns by pair plot 
import seaborn as sns; sns.set(style="ticks", color_codes=True)
g = sns.pairplot(df, kind="reg")

In [None]:
Scatterplots show possible associations or relationships between two variables. I wanted to see each variable positive and negative relationships, uphill lines are showing positive, downhill lines are negative relationships. In above plots are showing that we have weaker, stronger relationships. To quantify the strength of a linear (straight) relationship, we will use a correlation analysis

In [None]:
#Check correlation of each values 
corrmat = df.corr()
sns.heatmap(corrmat, vmax=.8, square=True)

In [None]:
<matplotlib.axes._subplots.AxesSubplot at 0x185e80b88d0>

Above 2 graphs are showing us how is the correlation between each variable, we will concentrate the highly correlated variables. Happiness Rank and Happiness Score have negative correlation, while Happiness Score is increasing, Ranking is going to decrease (1 is the top, 155 is the last ranking). Therefore, we will analyze Happiness Score relations with GDP, Life Expectations, Freedom and Trust to Government Corruption as these values are highly correlated to each other. GDP is the main factor which is effecting others as Family, Life Expectations and Freedom.

In [None]:
#let's check highly correlated columns separately 

%config InlineBackend.figure_format = 'retina'

plt.figure(figsize=(16, 10))
for i, key in enumerate(['GDP', 'Family', 'Life.Expect', 'Freedom', 'Trust.to.Gov']):
    plt.subplot(2, 3, i+1)
    plt.xlabel(key)
    plt.scatter(df[key], df['Happ.Score'], alpha=0.5)

In [None]:
#I wanted to see happiness score distribution on the world map for 2017 results
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot #loading necessary libraries for mapping
init_notebook_mode(connected=True)

data = dict(type = 'choropleth', #As we have only country names in data, we can use country names to see the happiness
           locations = df['Country'],
           locationmode = 'country names',
           z = df['Happ.Score'], 
           text = df['Country'],
           colorbar = {'title':'Happiness'})
layout = dict(title = 'Global Happiness 2017', 
             geo = dict(showframe = False, 
                       projection = {'type': 'Mercator'}))
choromap3 = go.Figure(data = [data], layout=layout)
iplot(choromap3) 

In [None]:
#Let's check head for 2015 
df5.head()

In [None]:
#Last 5 countries in the list 
df5.tail()

In [None]:
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

data = dict(type = 'choropleth', 
           locations = df5['Country'],
           locationmode = 'country names',
           z = df5['Happiness Score'], 
           text = df5['Country'],
           colorbar = {'title':'Happiness'})
layout = dict(title = 'Global Happiness 2015', 
             geo = dict(showframe = False, 
                       projection = {'type': 'Mercator'}))
choromap3 = go.Figure(data = [data], layout=layout)
iplot(choromap3)

In [None]:
#2017 10 Happiest Countries 
dfl= df.groupby(['Country'], sort=False)['Happ.Score'].max().head(10) #I wanted to see top 10 Happy Countries
dfl

In [None]:
#2015 10 Happiest Countries
dfh = df5.groupby(['Country'], sort=False)['Happiness Score'].max().head(10)
dfh

Part III. Conclusion
Above analysis are showing that there is no single factor which can explain the happiness of people. Factors such as GDP, family, income inequality, the degree of peace and corruption have important role on happiness. This suggests that when we analyze the happiness, we should consider all factors together.

We know very well that money does not buy happiness by itself but it provides the other factors to be happier as healthier life, trustable government, freedom to make life choices and freedom from corruption, income inequality and levels of peace. GDP is like a catalyzer which is effecting most of the factors.

Happiness isn't just about money, although it's part of it.

"As demonstrated by many countries, this report gives evidence that happiness is a result of creating strong social foundations. It's time to build social trust and healthy lives, not guns or walls. Let's hold our leaders to this fact."

Future is exciting for developed countries as they are working in new technologies, Artificial Intelligence, Electrical Cars, Internet of Things and most of them are ready for climate changes in next decades. Developed countries are investing for clean energy, agricultural sciences and cleaning the air from pollutants. They will surely be the best survived countries against climate changes. We can easily conclude that developed countries will keep their happiness and life standards in future.

On the other hand, poor and unhappy countries will be worsen day after day as they have limited resources, high and not educated population and wars, their future is very dark. Climate change will show the effects very fastly in near future, this will be faster than expected. Afterwards, world will have 2 type of countries, very low standard countries and very high. There will not be middle level country. If this will continue in this way, developed countries will be effected indirectly. We will see more refugees around the developed countries, more wars for the limited sources as oil, clean water and food. Polulation is increasing uncontrollably, especially in Far East Countries as India, China and Indonesia. Air pollution is another big problem of the world and these crowded countries have the most polluted air, they are over the limits.

To sum up, I can not draw an optimistic table at the end with all these results and conditions. Developed countries will keep their status and they will be more happy than the rest of the world. Unhappy countries score will not increase with these circumstances, even their score will decrease every year. Happiness have several factors, GDP is the powerful factor but not the only one. I wish we could have a better world in future but i am afraid this will not happen.