# Happyness Report


Columns data:
- GDP per capita:GDP per capita is a measure of a country's economic output that accounts for its number of people.
- Social support:Social support means having friends and other people, including family, to turn to in times of need or crisis to give you a broader focus and positive self-image. Social support enhances quality of life and provides a buffer against adverse life events.
- Healthy life expectancy:Healthy Life Expectancy is the average number of years that a newborn can expect to live in "full health"—in other words, not hampered by disabling illnesses or injuries.
- Freedom to make life choices:Freedom of choice describes an individual's opportunity and autonomy to perform an action selected from at least two available options, unconstrained by external parties.
- Generosity:the quality of being kind and generous.
- Perceptions of corruption:The Corruption Perceptions Index (CPI) is an index published annually by Transparency International since 1995 which ranks countries "by their perceived levels of public sector corruption, as determined by expert assessments and opinion surveys.

In [None]:
## Imports
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## 1. Data Visualization

In [1]:
## Load the dataset and show the shape and the first rows

In [2]:
## Show the types of each column

In [3]:
## Show some plots to help how the data looks


In [None]:
## Plotting pairwise relationships in the dataset.

f, ax = plt.subplots(figsize= [20, 15])
sns.heatmap(df.corr(), annot=True, ax=ax)

## Are all attributes on the same scale?

In [None]:
## Heatmap for correlation among the variables: 

# sns.heatmap(df.corr())

## Are there correlated variables?

## 2. Data Normalization

In [None]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler


cols = ['gdp','social_support','life_expectancy','freedom','generosity','corruption']
mms = MinMaxScaler()
scaler = StandardScaler()

df_range = mms.fit_transform(df[cols])
df_scaled = pd.DataFrame(scaler.fit_transform(df_range), columns = cols)

## 3. Dimension Reduction by PCA

In [None]:
from sklearn.decomposition import PCA

pca = PCA()
pca.fit(df_scaled)

percentage_variance = np.round(pca.explained_variance_ratio_ * 100, decimals=2)
xlabels = ['PC' + str(x) for x in range(1, len(percentage_variance)+1)]

plt.plot(range(1,len(percentage_variance)+1), percentage_variance, '-o')
plt.axvline(x=3, color='red', linestyle='--')
plt.ylabel('Percentage of Explained Variance')
plt.xlabel('Principal Component')
plt.title('Scree Plot')
plt.show()

In [None]:
pca = PCA(n_components=3,random_state = 7)
PC = pca.fit_transform(df_scaled)
pca_happiness = pd.DataFrame(data = PC,
               columns = ['PC 1', 'PC 2','PC 3'])
 
pca_happiness.head(6)

## 4. Clustering
### 4.1 K-Means

In [None]:
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, davies_bouldin_score, calinski_harabasz_score

## Apply Kmeans with different k and plt the scores


In [4]:
## Apply Kmeans with the best k

In [5]:
## Plot the centroids

In [6]:
## Plot some representations of the results

### 4.2 Hierarchial Clustering 

In [None]:
from scipy.cluster.hierarchy import linkage, dendrogram

fig,ax = plt.subplots(figsize = (15,10))
merg = linkage(df, method = "complete")
dendrogram(merg,leaf_rotation=90)
plt.xlabel("Countries")
plt.ylabel("Distances")
plt.show()