# WORLD HAPPINESS REPORT
## Introduction 

The World Happiness Report is a publication of the United Nations Sustainable Development Solutions Network. It contains articles and rankings of national happiness, based on respondent ratings of their own lives, which the report also correlates with various (quality of) life factors. The report primarily uses data from the Gallup World Poll. Each annual report is available to the public to download on the World Happiness Report website.

The rankings of national happiness are based on a Cantril ladder survey. Nationally representative samples of respondents are asked to think of a ladder, with the best possible life for them being a 10, and the worst possible life being a 0. They are then asked to rate their own current lives on that 0 to 10 scale.


## Libraries

In [None]:
#libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import pycountry

# Import the Dataset
The data for year 2019 is imported below. The features considered for calculationg the happiness score are:

* **Country** : Name of the country.
* **Region** : Region the country belongs to.
* **Happiness Rank** : Rank of the country based on the Happiness Score.
* **Happiness Score** : A metric measured in 2019 by asking the people how happy they are.
* **Economy** : GDP per capita
* **Family** : social support
* **Health** : healthy life expectancy
* **Freedom** : freedom to make life choices
* **Trust** : perceptions of corruption
* **Generosity** : perceptions of generosity

In [None]:
#data import 
data_2015 = pd.read_csv("../input/world-happiness/2015.csv")
data_2016 = pd.read_csv("../input/world-happiness/2016.csv")
data_2017 = pd.read_csv("../input/world-happiness/2017.csv")
data_2018 = pd.read_csv("../input/world-happiness/2018.csv")
data_2019 = pd.read_csv("../input/world-happiness/2019.csv")

# Exploratory Data Analysis
**Shape of the dataframe**

In [None]:
#shape of the dataframe
data_2019.shape

**Are there any missing values in the dataset?**

In [None]:
data_2019.info()

**Finding correlation between features using matrix method and visualization using heatmap**

Taking a look at how the Happiness Score relates to each other variable in the dataset.

In [None]:
corr_2019 = data_2019.corr()
corr_2019

We have calculated correlation matrix for our dataset. Next, we will use heatmap to visualize the correlation between the features.

In [None]:
#heatmap
sns.heatmap(corr_2019, annot = True)
plt.show()

It appears that Economy, Family and Health are strongly correlated with the Happiness Score(Strong Positive Correlation). Freedom and Trust have positve but mediocre correlation with the happiness score. 

Similarly, let us look at the pairwise comparison of our features in the dataset. 

In [None]:
#pairwise comparison
sns.pairplot(data_2019)
plt.show()

From the scatterplot shown above, we can say the following :
* Economy, Family and Health have linear correlation with score. 
* The distribution for Freedom, Generosity and Trust is all over the place and does not display any particular pattern. 

# Data Visualization
**Let's look at which countries are on the top of Happiness report 2019**

In [None]:
sns.barplot(x='Score',y='Country or region',data=data_2019.nlargest(10,'Score'),palette="Blues")
plt.show()

**Let's see if the above top 10 countries have the highest ranking in each feature category**

In [None]:
#subplots to show countries in each category 
fig, axes = plt.subplots(3, 2, constrained_layout = True, figsize = (12,8))
fig.suptitle('Top 10 countries in each category - 2019')
sns.barplot(ax = axes[0,0], x='GDP per capita', y='Country or region', data = data_2019.nlargest(10, 'GDP per capita'), palette = "YlOrBr")
sns.barplot(ax = axes[0,1], x='Social support', y='Country or region', data = data_2019.nlargest(10, 'Social support'), palette = "YlOrBr")
sns.barplot(ax = axes[1,0], x='Healthy life expectancy', y='Country or region', data = data_2019.nlargest(10, 'Healthy life expectancy'), palette = "YlOrBr")
sns.barplot(ax = axes[1,1], x='Freedom to make life choices', y='Country or region', data = data_2019.nlargest(10, 'Freedom to make life choices'), palette = "YlOrBr")
sns.barplot(ax = axes[2,0], x='Generosity', y='Country or region', data = data_2019.nlargest(10, 'Generosity'), palette = "YlOrBr")
sns.barplot(ax = axes[2,1], x='Perceptions of corruption', y='Country or region', data = data_2019.nlargest(10, 'Perceptions of corruption'), palette = "YlOrBr")

As we can see from the barplots above, Finland which has the highest Happiness score is not necessarily on top in each category. It is not even present among the top 10 of few categories. 

Similar pattern is observed for other other countries too.

**It's time to look at how Happiness Score looks around the world in 2019.**

In [None]:
import plotly.graph_objs as go
from plotly.offline import iplot

data = dict(type = 'choropleth',
  
            # location: COUNTRIES
            locations = data_2019['Country or region'],
              
            # countries around the world
            locationmode = 'country names',
              
            # colorscale can be added as per requirement
            colorscale = 'Viridis',
            text = data_2019['Country or region'],
            z = data_2019['Score'],
            colorbar = {'title': 'Happiness Score'})

layout = dict(title = 'Happiness Score Around the World(2019)', 
              geo = dict(showframe = True, projection = {'type': 'mercator'}))

# passing data dictionary as a list 
choromap = go.Figure(data = [data], layout = layout)
  
# plotting graph
iplot(choromap)

The visualization below shows how each factor is performing in happiness category ranging from low to high.

In [None]:
#dividing the happiness score into four quartiles for each dataframe
category = ['Low', 'Mid-Low', 'Mid-High', 'High']
data_2019['Category'] = pd.qcut(data_2019['Score'], len(category), labels = category)

In [None]:
#GDP Vs Category
sns.displot(data_2019, x = data_2019['GDP per capita'], hue = "Category", kind = "kde")
plt.title('GDP vs Score Category')

In [None]:
#Life Expectancy Vs Category
sns.displot(data_2019, x = data_2019['Healthy life expectancy'], hue = "Category", kind = "kde")
plt.title('Life Expectancy vs Score Category')

In [None]:
#Family Vs Category
sns.displot(data_2019, x = data_2019['Social support'], hue = "Category", kind = "kde")
plt.title('Family vs Score Category')

## Is there a significant change in the Happiness Score from 2015 to 2019?

In [None]:
#Happiness score distribution over the years
plt.figure(figsize=(10,5))
sns.kdeplot(data_2015['Happiness Score'],color='red')
sns.kdeplot(data_2016['Happiness Score'],color='blue')
sns.kdeplot(data_2017['Happiness.Score'],color='limegreen')
sns.kdeplot(data_2018['Score'],color='orange')
sns.kdeplot(data_2019['Score'],color='pink')
plt.title('Happiness Score over the Years',size=20)
plt.show()

