**1. About Dataset**

On March 20th, the world celebrates the International Day of Happiness. On this day, in 2017, the UN also release the World Happiness Report - a ranking of which countries in the world could be considered as "happy". This report contains 155 countries from each continent to construct an understanding of which countries may be the happiest. This ranking is revered across the globe, as it could be an indication of the country's policy-making decision skills. Experts around the world have noted that these scores may be a good indication of a country's progress.


In [None]:
import pandas as pd # data processing
import chart_studio.plotly as py #for data visualization
import plotly.graph_objs as go #for data visualization
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
import seaborn as sns #for data visualization
import matplotlib.pyplot as plt 
plt.rcParams['figure.figsize'] = (20, 10)

In [None]:
df_2015 = pd.read_csv('../input/world-happiness/2015.csv')
df_2016 = pd.read_csv('../input/world-happiness/2016.csv')
df_2017 = pd.read_csv('../input/world-happiness/2017.csv')

**2.Preparing and Describing the Data**

In [None]:
df_2015.describe()

In [None]:
df_2015.columns

Data was separated into three files if someone chose to analyze all three years separately. However, I decided that it would be interesting to observe the data from a holistic point of view. Therefore, once I imported the data and removed any columns that I felt were unnecessary to this analysis and used pd.concate to put together all three data frames, and observe the overall happiness rank based on the past three years. After that to study individual regions I will use each years data to analyse.

In [None]:
df_2015.columns = ['Country', 'Region', 'Happiness_Rank', 'Happiness_Score',
       'Standard Error', 'Economy', 'Family',
       'Health', 'Freedom', 'Trust',
       'Generosity', 'Dystopia_Residual']
new_df_2015 = df_2015.drop(['Standard Error'], axis=1)

In [None]:
new_df_2015.head()

In [None]:
drop_2016 = ['Lower Confidence Interval','Upper Confidence Interval' ]
new_df_2016 = df_2016.drop(drop_2016, axis=1)
new_df_2016.columns = ['Country', 'Region','Happiness_Rank', 'Happiness_Score','Economy', 'Family',
       'Health', 'Freedom', 'Trust',
       'Generosity', 'Dystopia_Residual']

In [None]:
new_df_2016.head()

In [None]:
columns_2017 = ['Whisker.high','Whisker.low' ]
new_df_2017 = df_2017.drop(columns_2017, axis=1)
new_df_2017.columns = ['Country', 'Happiness_Rank', 'Happiness_Score','Economy', 'Family',
       'Health', 'Freedom', 'Trust',
       'Generosity', 'Dystopia_Residual']

In [None]:
new_df_2017.head()

In [None]:
new_df_2015['Year']=2015
new_df_2016['Year']=2016
new_df_2017['Year']=2017
frames = [new_df_2015, new_df_2016, new_df_2017]
happiness = pd.concat(frames,sort=True)

In [None]:
happiness.head()
new_df_2016.head()

**3. Visualization**

This visual gives us a more appealing view of where each country is placed in the World ranking report. How to read the map: the darker colored countries (purple — blue) have the highest rating on the report (i.e. are the “happiest), while the lighter colored countries have a lower ranking. We can clearly see that countries in the European, and Americas region have a fairly high ranking than ones in the Asian and African regions.

In [None]:
data1 = dict(type = 'choropleth', 
           locations = happiness['Country'],
           locationmode = 'country names',
           z = happiness['Happiness_Rank'], 
           text = happiness['Country'],
          colorscale = 'Viridis', reversescale = False)
layout = dict(title = 'Happiness Rank Across the World', 
             geo = dict(showframe = False, 
                       projection = {'type': 'mercator'}))
choromap6 = go.Figure(data = [data1], layout=layout)
iplot(choromap6)

In [None]:
data2 = dict(type = 'choropleth', 
           locations = happiness['Country'],
           locationmode = 'country names',
           z = happiness['Happiness_Score'], 
           text = happiness['Country'],
           colorbar = {'title':'Happiness'})
layout = dict(title = 'Happiness Score Across the World', 
             geo = dict(showframe = False, 
                       projection = {'type': 'mercator'}))
choromap3 = go.Figure(data = [data2], layout=layout)
iplot(choromap3)

In [None]:
f,ax = plt.subplots(figsize =(20,10))
sns.boxplot(x="Year" , y="Happiness_Score", hue="Region",data=happiness,palette="PRGn",ax=ax)
plt.show()

It can be oberserved from the heatmap of 2015 and 2016 that Happiness score is highly coreleated with Economy, Family and Health.

In [None]:
sns.heatmap(new_df_2015.corr(), cmap='Blues',annot = True)
plt.rcParams['figure.figsize'] = (20, 10)
plt.show()

In [None]:
sns.heatmap(new_df_2016.corr(), cmap='Blues',annot = True)
plt.rcParams['figure.figsize'] = (20, 10)
plt.show()

In [None]:
data4 = new_df_2015.groupby('Region')['Happiness_Score','Economy','Family','Health'].median()
data4 = pd.DataFrame(data4)
data4

Scatterplot between Happiness Score and Economy, Family and Health for Regions shows linear relaionship between them so high corelation between them is justified.

In [None]:
sns.scatterplot(data4['Happiness_Score'], data4['Economy'],hue = data4.index, legend='brief',s=200)
plt.rcParams['figure.figsize'] = (20, 10)
plt.show()

In [None]:
sns.scatterplot(data4['Happiness_Score'], data4['Family'],hue = data4.index, legend='brief',s=200)
plt.show()

In [None]:
sns.scatterplot(data4['Happiness_Score'], data4['Health'],hue = data4.index, legend='brief',s=200)
plt.show()

Now we will observe the corelation between different factors of regions of high happiness score.

In [None]:
df_1 = new_df_2016.loc[lambda new_df_2016: new_df_2016['Region'] == 'Western Europe']
df_2 = new_df_2016.loc[lambda new_df_2016: new_df_2016['Region'] == 'North America']
df = pd.concat([df_1, df_2], axis = 0)
sns.heatmap(df.corr(), cmap = 'Blues', annot = True)
plt.rcParams['figure.figsize'] = (20, 10)
plt.show()

The thing which needs to be observed from the heatmap of high happniess score regions is that that is quiet low corelation between Happiness Score and Health(Life Expectancy) and high corelation is observed with Economy, Family, Freedom and Trust. To justify this we will make scatterplot between Happiness Score and Economy Health and it can be obsereved that there is linear relatioj between with Economy but with Health its linear for smaller values but become random at larger values so extremely low corelation is justified.

In [None]:
plt.subplot(1,2,1)
sns.scatterplot(new_df_2016['Happiness_Score'], new_df_2016['Economy'],)
plt.subplot(1,2,2)
sns.scatterplot(new_df_2016['Happiness_Score'], new_df_2016['Health'])
plt.rcParams['figure.figsize'] = (20, 10)
plt.show()

**Now we will study region with low Happiness Score.
**

For low happniess Regions it can be observed that happiness score has high corealation of happiness score with Economy, Family and Health.

In [None]:
df_1 = new_df_2016.loc[lambda new_df_2016: new_df_2016['Region'] == 'Eastern Asia']
df_2 = new_df_2016.loc[lambda new_df_2016: new_df_2016['Region'] == 'Sub Saharan Africa']
df_3 = new_df_2016.loc[lambda new_df_2016: new_df_2016['Region'] == 'Southern Asia']
df = pd.concat([df_1, df_2,df_3], axis = 0)
sns.heatmap(df.corr(), cmap = 'Blues', annot = True)
plt.rcParams['figure.figsize'] = (20, 10)
plt.show()

In [None]:
plt.subplot(1,3,1)
sns.scatterplot(df['Economy'],df['Happiness_Score'])
plt.subplot(1,3,2)
sns.scatterplot(df['Family'],df['Happiness_Score'])
plt.subplot(1,3,3)
sns.scatterplot(df['Health'],df['Happiness_Score'])
plt.rcParams['figure.figsize'] = (20, 10)
plt.show()

**Analysis of Moderate Happiness Score Regions**

Similar Corelation is observed here.

In [None]:
df_1 = new_df_2016.loc[lambda new_df_2016: new_df_2016['Region'] == 'Australia and New Zealand']
df_2 = new_df_2016.loc[lambda new_df_2016: new_df_2016['Region'] == 'Middle East and Northern Africa']
df_3 = new_df_2016.loc[lambda new_df_2016: new_df_2016['Region'] == 'Latin America and Caribean']
df = pd.concat([df_1, df_2,df_3], axis = 0)
sns.heatmap(df.corr(), cmap = 'Blues', annot = True)
plt.rcParams['figure.figsize'] = (20, 10)
plt.show()

**Please appreciate with upvote**