#  **Happiness analysis (2016)**

 My goal is to analyse what factors have the most impact on the happiness level

In [None]:
# importing essential libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
# creating dataframe        
df=pd.read_csv('../input/world-happiness-report/2016.csv')


First, lets see what kind of data we have and how many rows are there...

In [None]:
df.head(20)
df.count

Now let's have a look at some descriptive statistics

In [None]:
df.describe()

The point of our interest is what are the happiest and the least happiest counties.

In [None]:
plt.figure(figsize=(20,10))
countries = df.iloc[0:10,0]
hScores = df.iloc[0:10,3]
plt.title('Barchart of 10 happiest countries with their ranks')
plt.bar(countries, hScores,color=(0.76, 0.1, 0.1, 0.6))
plt.ylim(6.8,7.55)

In [None]:
plt.figure(figsize=(20,10))
plt.title('Barchart of 10 least happiest countries with their ranks')
plt.bar(df['Country'].tail(10), df['Happiness Score'].tail(10),color=(0.3, 0.4, 0.6, 0.6))
plt.ylim(2.6,3.8)
plt.show()


Then let's have a look at the coorelation matrix and heatmap:

In [None]:
df.corr()

In [None]:
plt.figure(figsize=(12,12))
plt.title('Heatmap')
sns.heatmap(df.corr(), annot=True, cmap=sns.diverging_palette(20, 220, n=200))

From here we can see that the most interesting variables to analyse are **freedom, health, family and economy**. They have the most influence on happiness level. **Trust to the government** has also positive correlation with Happiness score.

In [None]:
fig, axs = plt.subplots(2, 2,figsize=(12,12))
axs[0, 0].scatter(df['Economy (GDP per Capita)'], df['Happiness Score'],color=(0.2, 0.4, 0.6, 0.6))
axs[0,0].set_title('Hapiness score and GDP')
axs[0, 1].scatter(df['Family'], df['Happiness Score'],color=(0.2, 0.4, 0.6, 0.6))
axs[0,1].set_title('Hapiness score and Family')
axs[1, 0].scatter(df['Freedom'], df['Happiness Score'],color=(0.2, 0.4, 0.6, 0.6))
axs[1,0].set_title('Freedom')
axs[1, 1].scatter(df['Health (Life Expectancy)'], df['Happiness Score'],color=(0.2, 0.4, 0.6, 0.6))
axs[1,1].set_title('Hapiness score and Health (Life Expectancy)')

fig.tight_layout()

I will consider the group of first 10 countries and others as well as 10 least happiest countries and others. 
Having this, I'll analyse our four selected variables.


In [None]:
happiest = np.array(df.iloc[0:10,0])
unhappiest = np.array(df.iloc[146:157,0])
df['happiest'] = df['Country'].replace(happiest, 1)
df.loc[(df.happiest != 1),'happiest']=0
df['leastHappiest'] = df['Country'].replace(unhappiest, 1)
df.loc[(df.leastHappiest != 1),'leastHappiest']=0
fig = plt.figure(figsize=(15,10))
fig.subplots_adjust(hspace=0.4, wspace=0.4)
ax = fig.add_subplot(2, 4, 1)
sns.barplot(x="leastHappiest", y="Health (Life Expectancy)", data=df)
ax = fig.add_subplot(2, 4, 2)
sns.barplot(x="leastHappiest", y="Freedom", data=df)
ax = fig.add_subplot(2, 4, 3)
sns.barplot(x="leastHappiest", y="Economy (GDP per Capita)", data=df)
ax = fig.add_subplot(2, 4, 4)
sns.barplot(x="leastHappiest", y="Family", data=df)
ax = fig.add_subplot(2, 4, 5)
sns.barplot(x="happiest", y="Health (Life Expectancy)", data=df)
ax = fig.add_subplot(2, 4, 6)
sns.barplot(x="happiest", y="Freedom", data=df)
ax = fig.add_subplot(2, 4, 7)
sns.barplot(x="happiest", y="Economy (GDP per Capita)", data=df)
ax = fig.add_subplot(2, 4, 8)
sns.barplot(x="happiest", y="Family", data=df)


As wee see here, all four variables means are considerably higher for 10 happiest countries comparing to other countries. Similarly, for least happiest group, those means are much lower.

Comparing directly last 10 and first 10 countries by GDP, we can see a large gap in economy between two groups..

In [None]:
fig = plt.figure(figsize=(25,20))
fig.subplots_adjust(hspace=0.4, wspace=0.4)
ax = fig.add_subplot(2, 2, 1)
g = sns.barplot(x=df.iloc[146:157,0], y="Economy (GDP per Capita)", data=df)
g.set(ylim=(0, 1.6))
ax = fig.add_subplot(2, 2, 2)
sns.barplot(x=df.iloc[0:10,0], y="Economy (GDP per Capita)", data=df)

Well. now it is clear that economy is the first factor that effects people's happiness level.

Is it possible to say the same about trust to the government?

In [None]:
fig = plt.figure(figsize=(25,20))
fig.subplots_adjust(hspace=0.4, wspace=0.4)
ax = fig.add_subplot(2, 2, 1)
g = sns.barplot(x=df.iloc[146:157,0], y="Trust (Government Corruption)", data=df)
g.set(ylim=(0, 0.6))
ax = fig.add_subplot(2, 2, 2)
p = sns.barplot(x=df.iloc[0:10,0], y="Trust (Government Corruption)", data=df)
p.set(ylim=(0, 0.6))

Here the difference is also visible but it's not as strong as it was in previous plots. Moreover, the situation for Rwanda is different - trust to government is high but it still is one of the least happiest country.

Let's glance at regions. What are the happiest?

In [None]:
plt.figure(figsize=(30,12))
sns.barplot(x="Region", y="Happiness Score", data=df,palette="rocket").set_title("Happiness by regions")




Surprisingly, North America and Australia and New Zealand win. What about boxplots?

In [None]:
plt.figure(figsize=(25,20))
ax = sns.boxplot(x="Region", y="Happiness Score", data=df)
ax.set(ylim=(2.7, 7.6))

Interesting fact is that spread for North America is relatively high comparing to Australia and New Zealand, but the mean values are pretty close. Looking at Middle East and Northern Africa and Southeastern Asia, their means are also very close but the spread is different. That means that in Southeastern Asian people have more stable happiness values.

Finally, we can make several conclusions:
* happiest countries have higher GDP per Capita comparing to less happiest countries  
* happiest countries have higher Family value comparing to less happiest countries   
* happiest countries have higher life expectancy comparing to less happiest countries
* happiest countries have more freedom in questions of life choice comparing to less happiest countries 
* trust to the government effects the level of happiness but not as strong as previous factors
* the happiest regions are Western Europe, North America and Australia and Northern Zealand
* unhappiest region is Sub-Saharan Africa 