# World Happiness Report Project
## A project to investigate the correlations and carry out exploratory data analysis (EDA) on the World Happiness Report 2019, finding why certain countries are happier than others.
## The numbers produced represent each variables contribution to the final happiness score of each country. 

In [None]:
import numpy as np
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
df = pd.read_csv('../input/world-happiness/2019.csv')
df.head()

# Finding maximums and minimums for each variable

In [None]:
df.head(1)

In [None]:
df.tail(1)

* The world's happiest country is Finland, with a happiness score of 7.769. 
* The world's least happy country is South Sudan, with a happiness score of 2.853.

In [None]:
GDP = df["GDP per capita"] #Finding the maximum GDP per capita
GDP.idxmax()

In [None]:
max_gdp = df.loc[28]
max_gdp

In [None]:
GDP.idxmin() #Finding the minimum GDP per capita

In [None]:
min_gdp = df.loc[111]
min_gdp

In [None]:
social = df["Social support"] #Finding maximum social support
social.idxmax()

In [None]:
max_social = df.loc[3]
max_social

In [None]:
social.idxmin() #Finding minimum social support

In [None]:
min_social = df.loc[154]
min_social

In [None]:
healthy = df["Healthy life expectancy"] #Max life expectancy
healthy.idxmax()

In [None]:
max_healthy = df.loc[33]
max_healthy

In [None]:
healthy.idxmin() #Min life expectancy

In [None]:
min_healthy = df.loc[134]
min_healthy

In [None]:
freedom = df["Freedom to make life choices"] #Max freedom
freedom.idxmax()

In [None]:
max_freedom = df.loc[40]
max_freedom

In [None]:
#Min freedom
freedom.idxmin()

In [None]:
min_freedom = df.loc[153]
min_freedom

In [None]:
gen = df["Generosity"] #Max generosity
gen.idxmax()

In [None]:
max_gen = df.loc[130]
max_gen

In [None]:
gen.idxmin()

In [None]:
min_gen = df.loc[81]
min_gen

In [None]:
cor = df["Perceptions of corruption"] #Max corruption
cor.idxmax()

In [None]:
max_cor = df.loc[33]
max_cor

In [None]:
cor.idxmin() #Min corruption

In [None]:
min_cor = df.loc[70]
min_cor

## Findings:
* The country with the maximum GDP per capita is Qatar, whilst the country with the least GDP per capita is Somalia.
* The country with the most social support is Iceland, and the country with the least is the Central African Republic.
* The country with the highest healthy life expectancy is Singapore, and that with the lowest life expectancy is Swaziland.
* The country with the most freedom is Uzbekistan, and the country with the least freedom is Afghanistan.
* The country with the highest generosity is Myanmar, and the country with the lowest is Greece. 
* Finally, the country with the most negative perceptions of corruption is Singapore, and that of the most positive perceptions is Moldova. 

# Finding Correlation

In [None]:
cor = df.corr()
cor

In [None]:
plt.subplots(figsize=(20,10))
sns.set(font_scale=1.4)
ax = plt.axes()
sns.heatmap(cor)
ax.set_title('Correlation Heat Map for World Happiness', fontsize=40, y=1.05)

In [None]:
pos = cor[cor > 0.75] #Displaying strong positive correlations
pos

We can see that there are strong positive correlations between Score & GDP per capita, Score & Social Support, Score & Healthy life expectancy, GDP per capita & Social support, and GDP per capita & Healthy life expectancy. 

# Visualising Correlations

In [None]:
plt.figure( figsize=(30,10))
plt.scatter(df['Score'], df['GDP per capita'], color='purple')
plt.title('Score VS GDP Per Capita', fontsize=40, y=1.05)
plt.xlabel('Score', fontsize=14)
plt.ylabel('GDP Per Capita', fontsize=14)


The above scatter graph displays how a country with a higher GDP per capita will generally be a happier place to live, although there are exceptions. From the calculations earlier, it is clear that Qatar had the maximum GDP per capita, but are not near to being the happiest country. This may prove that a strong economy is not the answer to happiness. 

In [None]:
plt.figure( figsize=(30,10))
plt.scatter(df['Score'], df['Social support'], color='red')
plt.title('Score VS Social Support', fontsize=40, y=1.05)
plt.xlabel('Score', fontsize=14)
plt.ylabel('Social Support', fontsize=14)

The scatter graph representing the correlation between social support and score displays a stronger correlation near the end of the happiness scale. This supports the idea that social support is key to creating a happier country, with the country with the most social support being Iceland, the 4th happiest country on the planet. The top 4 happiest countries are the same as the top 4 countries for social support. 

In [None]:
plt.figure( figsize=(30,10))
plt.scatter(df['Score'], df['Healthy life expectancy'], color='orange')
plt.title('Score VS Healthy Life Expectancy', fontsize=40, y=1.05)
plt.xlabel('Score', fontsize=14)
plt.ylabel('Healthy Life Expectancy', fontsize=14)

Similar to the previous scatter graph, the graph representing the correlation between happiness score and healthy life exectancy also displays a stronger correlation with the higher scores. However, it does seem that the curve flattens as you reach a healthy life expectancy score of 1, showing that beyond this point of life, happiness stays the same. 

# Creating a linear model
I will now use the variables with the greatest correlation to happiness score to create a linear regression model. This can be used to predict a countries happiness score based on other factors.

In [None]:
from sklearn.linear_model import LinearRegression
lm = LinearRegression()

# Define predictor and target variables
x = df[["GDP per capita", "Social support", "Healthy life expectancy"]]
y = df["Score"]

lm.fit(x, y)

yhat = lm.predict(x)

lm.intercept_


In [None]:
lm.coef_

Here we obtain that the linear regression model is as follows:

yhat = 2.14 + 0.81(GDP per capita) + 1.32(Social support) + 1.30(Healthy life expectancy) to 2 d.p.