# **WORLD HAPPINESS REPORT**

**The World Happiness Report is an annual publication of the United Nations Sustainable Development Solutions Network. It contains articles, and rankings of national happiness based on respondent ratings of their own lives.Nationally representative samples of respondents are asked to think of a ladder(Happiness score), with the best possible life for them being a 10, and the worst possible life being a 0. They are then asked to rate their own current lives on that 0 to 10 scale.The report correlates the results with various life factors.**

![many%20smile.jpg](attachment:many%20smile.jpg)

# About the Notebook****
1. We will first look into the top 10 and bottom 10 countries and the region they belong to , for each year starting from 2015 till    2020.
2. Look at the regions to which top 20 and bottom 20 counties belong
3. Look at the relation between happiness score(dependent variable) and all the other features(indipendent variables).
4. Build Linear regression model to predict Happiness score using the report of 2020.
5. Merge all the reports to check the trend over the years.

# ***Importing Libraries***

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# ***Importing all the datasets***

In [None]:
df15=pd.read_csv('../input/world-happiness-report/2015.csv')
df16=pd.read_csv('../input/world-happiness-report/2016.csv')
df17=pd.read_csv('../input/world-happiness-report/2017.csv')
df18=pd.read_csv('../input/world-happiness-report/2018.csv')
df19=pd.read_csv('../input/world-happiness-report/2019.csv')
df20=pd.read_csv('../input/world-happiness-report/2020.csv')


# ***Top 10 countries based on Happiness Score from 2015 to 2020***

In [None]:
fx,ax=plt.subplots(3,2,figsize=(20,25))
sns.barplot(x=df15['Country'].head(10),y='Happiness Score',data=df15,ax=ax[0,0])
ax[0,0].set_title('Top 10 countries based on Happiness Score 2015',fontweight="bold")
ax[0,0].tick_params(axis='x', labelrotation=45)
sns.barplot(x=df16['Country'].head(10),y='Happiness Score',data=df16,palette='husl',ax=ax[0,1])
ax[0,1].set_title('Top 10 countries based on Happiness Score 2016',fontweight="bold")
ax[0,1].tick_params(axis='x', labelrotation=45)
sns.barplot(x=df17['Country'].head(10),y='Happiness.Score',data=df17,palette='spring',ax=ax[1,0])
ax[1,0].set_title('Top 10 countries based on Happiness Score 2017',fontweight="bold")
ax[1,0].tick_params(axis='x', labelrotation=45)
sns.barplot(x=df18['Country or region'].head(10),y='Score',data=df18,palette='autumn',ax=ax[1,1])
ax[1,1].set_title('Top 10 countries based on Happiness Score 2018',fontweight="bold")
ax[1,1].tick_params(axis='x', labelrotation=45)
sns.barplot(x=df19['Country or region'].head(10),y='Score',data=df19,ax=ax[2,0])
ax[2,0].set_title('Top 10 countries based on Happiness Score 2019',fontweight="bold")
ax[2,0].tick_params(axis='x', labelrotation=45)
sns.barplot(x=df20['Country name'].head(10),y='Ladder score',data=df20,palette='Paired',ax=ax[2,1])
ax[2,1].set_title('Top 10 countries based on Ladder Score 2020',fontweight="bold")
ax[2,1].tick_params(axis='x', labelrotation=45)

plt.show()

# ****Bottom 10 countries based on happiness score from 2015 to 2020****

In [None]:
fx,ax=plt.subplots(3,2,figsize=(20,25))
sns.barplot(x=df15['Country'].tail(10),y='Happiness Score',data=df15,ax=ax[0,0])
ax[0,0].set_title('Bottom 10 countries based on Happiness Score 2015',fontweight="bold")
ax[0,0].tick_params(axis='x', labelrotation=45)
sns.barplot(x=df16['Country'].tail(10),y='Happiness Score',data=df16,palette='husl',ax=ax[0,1])
ax[0,1].set_title('Bottom 10 countries based on Happiness Score 2016',fontweight="bold")
ax[0,1].tick_params(axis='x', labelrotation=45)
sns.barplot(x=df17['Country'].tail(10),y='Happiness.Score',data=df17,palette='spring',ax=ax[1,0])
ax[1,0].set_title('Bottom 10 countries based on Happiness Score 2017',fontweight="bold")
ax[1,0].tick_params(axis='x', labelrotation=45)
sns.barplot(x=df18['Country or region'].tail(10),y='Score',data=df18,palette='autumn',ax=ax[1,1])
ax[1,1].set_title('Bottom 10 countries based on Happiness Score 2018',fontweight="bold")
ax[1,1].tick_params(axis='x', labelrotation=45)
sns.barplot(x=df19['Country or region'].tail(10),y='Score',data=df19,ax=ax[2,0])
ax[2,0].set_title('Bottom 10 countries based on Happiness Score 2019',fontweight="bold")
ax[2,0].tick_params(axis='x', labelrotation=45)
sns.barplot(x=df20['Country name'].tail(10),y='Ladder score',data=df20,palette='Paired',ax=ax[2,1])
ax[2,1].set_title('Bottom 10 countries based on Ladder Score 2020',fontweight="bold")
ax[2,1].tick_params(axis='x', labelrotation=45)

plt.show()

#  Further Analysis using happiness report 2020

# *** *Region to which most of the Happy countries belong****

In [None]:

df20.head(20).groupby('Regional indicator').agg({'Country name':'count'}).sort_values(by='Country name',ascending=False)

# ***Visualizing the top Regions***

In [None]:
plt.figure(figsize=(10,5))
df20.head(20).groupby('Regional indicator').agg({'Country name':'count'}).sort_values(by='Country name',ascending=False).plot(kind='bar',color='g')
plt.show()

# ***Region to which most of the Unhappy countries belong***

In [None]:
 
df20.tail(20).groupby('Regional indicator').agg({'Country name':'count'}).sort_values(by='Country name',ascending=False)

# ***Visualizing the Regions with low happiness scores***

In [None]:
plt.figure(figsize=(10,5))
df20.tail(20).groupby('Regional indicator').agg({'Country name':'count'}).sort_values(by='Country name',ascending=False).plot(kind='bar',color='r')
plt.show()

# *Bivarate analysis between the target and the features*

In [None]:
cols=['Explained by: Log GDP per capita', 'Explained by: Social support',
       'Explained by: Healthy life expectancy',
       'Explained by: Freedom to make life choices',
       'Explained by: Generosity', 'Explained by: Perceptions of corruption',
       'Dystopia + residual']

In [None]:
for a in cols:
    plt.figure(figsize=(10,5))
    sns.regplot(x=a,y='Ladder score',data=df20,color='r')
    plt.show()

**Observation:**
* ***The graphs shows that Ladder score is linearly related to GDP,Social Support and Health expectancy. Almost no linear relation with Generosity and mild linear relation with Perception of corruption and Dystopia. ***

# *Correlation Between features and Target*

In [None]:
corr=df20[['Explained by: Log GDP per capita', 'Explained by: Social support',
       'Explained by: Healthy life expectancy',
       'Explained by: Freedom to make life choices',
       'Explained by: Generosity', 'Explained by: Perceptions of corruption',
       'Dystopia + residual','Ladder score']].corr()
plt.figure(figsize=(10,7))
sns.heatmap(corr,annot=True)
plt.show()

**Observation:**
* ***We can see that Score is correlated with GDP,Health expectancy and Social support and is not corellated to Generosity***

# ***Linear Regression Model using Statsmodels and Sklearn***

In [None]:
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

In [None]:
y=df20['Ladder score']
X=df20[['Logged GDP per capita', 'Social support', 'Healthy life expectancy',
       'Freedom to make life choices', 'Generosity',
       'Perceptions of corruption', 'Ladder score in Dystopia']]
X_train,X_test,Y_train,Y_test= train_test_split(X,y,test_size=0.2,random_state=1)

In [None]:
x_const=sm.add_constant(X)
model=sm.OLS(y,x_const).fit()
model.summary()

**Observation:**
* ***We can see that Generosity is not contributing to the prediction of Happiness score as the pvalue is greater than the threshold (0.05). This is exactly what we can observe from the scatter plots between happiness score and Generosity.***

***Model building using Sklearn***

In [None]:
lr=LinearRegression()
lr.fit(X_train,Y_train)
print(f'R^2 score for train: {lr.score(X_train, Y_train)}')
print(f'R^2 score for test: {lr.score(X_test, Y_test)}')
y_pred_test=lr.predict(X_test)
y_pred_train=lr.predict(X_train)
print(f'mean squared error train: {mean_squared_error(Y_train,y_pred_train)}')
print(f'mean squared error test: {mean_squared_error(Y_test,y_pred_test)}')     

*Building model without the Generosity feature and check model performance*

In [None]:
y=df20['Ladder score']
X=df20[['Logged GDP per capita', 'Social support', 'Healthy life expectancy',
       'Freedom to make life choices','Perceptions of corruption', 'Ladder score in Dystopia']]
X_train,X_test,Y_train,Y_test= train_test_split(X,y,test_size=0.2,random_state=1)

In [None]:
x_const=sm.add_constant(X)
model=sm.OLS(y,x_const).fit()
model.summary()

In [None]:
lr=LinearRegression()
lr.fit(X_train,Y_train)
print(f'R^2 score for train: {lr.score(X_train, Y_train)}')
print(f'R^2 score for test: {lr.score(X_test, Y_test)}')
y_pred_test=lr.predict(X_test)
y_pred_train=lr.predict(X_train)
print(f'mean squared error train: {mean_squared_error(Y_train,y_pred_train)}')
print(f'mean squared error test: {mean_squared_error(Y_test,y_pred_test)}')     

**Observation:**
* ***We can see that the model has low mean squared error for both train and test,still the performance might be better with other algorithms***

# *Merging all the datasets to check the trend  *

In [None]:
df15['year']=2015
d15=df15[['Country', 'Happiness Score',
        'Economy (GDP per Capita)', 'Family',
       'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)',
       'Generosity','year']]
df16['year']=2016
d16=df16[['Country', 'Happiness Score',
       'Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)',
       'Freedom', 'Trust (Government Corruption)', 'Generosity','year']]
df17['year']=2017
d17=df17[['Country', 'Happiness.Score', 'Economy..GDP.per.Capita.', 'Family',
       'Health..Life.Expectancy.', 'Freedom',
       'Trust..Government.Corruption.','Generosity','year']]
df18['year']=2018
d18=df18[['Country or region', 'Score', 'GDP per capita',
       'Social support', 'Healthy life expectancy',
       'Freedom to make life choices','Perceptions of corruption', 'Generosity','year']]
df19['year']=2019
d19=df19[[ 'Country or region', 'Score', 'GDP per capita',
       'Social support', 'Healthy life expectancy',
       'Freedom to make life choices','Perceptions of corruption', 'Generosity','year']]
df20['year']=2020
d20=df20[['Country name', 'Ladder score','Explained by: Log GDP per capita', 'Explained by: Social support',
       'Explained by: Healthy life expectancy',
       'Explained by: Freedom to make life choices','Explained by: Perceptions of corruption',
       'Explained by: Generosity','year' ]]
d20.head()

In [None]:
## Changing the names of the variables to make it uniform across datasets
a=d15.columns
b=d17.columns
for c,d in zip(a,b):
    d17=d17.rename(columns={d:c})
    
e=d18.columns
for c,d in zip(a,e):
    d18=d18.rename(columns={d:c})
    
f=d19.columns
for c,d in zip(a,f):
    d19=d19.rename(columns={d:c})
    
g=d20.columns
for c,d in zip(a,g):
    d20=d20.rename(columns={d:c})

In [None]:
df=pd.concat([d15,d16,d17,d18,d19,d20],axis=0)
df.head()

In [None]:
df.tail()

***Lets consider few contries to check the trend over the years to keep the graph neat***

In [None]:
countries=['India','United States','United Kingdom','Russia','China','Canada','Germany','France','Switzerland', 'Iceland', 'Denmark', 'Norway', 'Finland',
       'Netherlands','Japan', 'South Korea','Italy','Singapore','Greece','Iran','Spain','Mexico','Egypt','Ukraine', 'Iraq', 'South Africa']
df1=df[df['Country'].isin(countries)]


# ***Trend of happiness score over time***

In [None]:
 
import plotly.express as px
fig=px.line(df1,x='year',y='Happiness Score',color='Country',template="plotly_dark")
fig.show()

**Observation:**
* The happiness score of Finland is constantly incresing over the years and has the highest score since 2018.
* The score of India  is constanly decreasing and is the lowest among the countries considered in the view


*Lets try with fewer countries to visualize better*

In [None]:
countries=['India','United States','United Kingdom','Russia','China','Canada','Germany','France','Switzerland','United Arab Emirates','Japan','Italy'] 
df2=df[df['Country'].isin(countries)]


In [None]:
fig=px.line(df2,x='year',y='Happiness Score',color='Country',template="plotly_dark",title='Year vs. Happiness Score')
fig.show()

**Observation:**
* We can see that the happiness score of India is decreasing over time as compaired to the other countries where the trend is constant or increasing.
* Switzerland has the highest score over the years

# Trend of Economy over the years 

In [None]:
fig=px.line(df2,x='year',y='Economy (GDP per Capita)',color='Country',template="plotly_dark",title='Year vs. Economy (GDP per Capita) ')
fig.show()

**Observation:**
* We can see that their is a huge spike in the GDP of UAE in the year of 2018, and has highest GDP across the years.
* India has the lowest GDP. 
* Apart from UAE,rest all countries follow a similar pattern to each other*

# Trend of Health(Life Expectancy) over the years

In [None]:
fig=px.line(df2,x='year',y='Health (Life Expectancy)',color='Country',template="plotly_dark",title='Year vs. Health (Life Expectancy)')
fig.show()

**Observation:**
* Japan has the highest life expetancy 
* India has the least life expetancy.
* All the countries saw a slight spike in life expetancy in the year 2019.

# Trend of Family/Social support scores over the years

In [None]:
fig=px.line(df2,x='year',y='Family',color='Country',template="plotly_dark",title='Year vs. Family')
fig.show()

**Observation:**
* India has the lowest Family score but has an incresing trend
* UAE saw a sharp dip in scores in the year 2018 .Rest all countries have a similar pattern and high score.

# Trend of Trust (Government Corruption) scores over the years

In [None]:
fig=px.line(df2,x='year',y='Trust (Government Corruption)',color='Country',template="plotly_dark",title='Year vs. Trust (Government Corruption)')
fig.show()

**Observation:**
* The Trust(goverment corruption) score is highest in Switzerland 
* Lowest score in counties like Russia and Italy.
* Their is huge dip in the score of UK for the year 2018. 
* Score of China increases over time and rest all countries have pretty stable score over the years.

# Trend of Freedom scores over the years

In [None]:
fig=px.line(df2,x='year',y='Freedom',color='Country',template="plotly_dark",title='Year vs. Freedom')
fig.show()

**Observation:**
* Italy has the lowest freedom score
* Switzerland has the highest.
* Their is a sharp decrease in the freedom score of UAE in the year of 2018.

# Trend of Generosity score over the years

In [None]:
fig=px.line(df2,x='year',y='Generosity',color='Country',template="plotly_dark",title='Year vs. Generosity')
fig.show()

**Observation:**
* UK has the highest generosity score 
* The bottom three in generosity score is Japan,Russia and China. 

# Few Key Observations from the analysis****

1. ***Finland got the highest happiness score for three years in a row starting from 2018 till 2020.***
2. ***Most of the countries with high happiness score belong to Western Europe region.***
3. ***Most of the countries with low happiness score belong to Sub-Sahara-Africa region.***
4. ***Happiness score is related mostly to GDP,Health Expectancy and Social Support/Family. And least related to Generosity.***

**Note**:
Their is change in name of few countries in the report over the years, eg. Taiwan and Hong Kong are mentioned as Taiwan provience of China and Hong Kong provience of China respectively , also in some report North Cyprus is mentioned as Northern Cyprus.
Hence to visualize all the countries first change the name across all the reports .

 ***Please do upvote if you like my work***