<a href="https://colab.research.google.com/github/nallagondu/datatrained_inter_public/blob/main/World_Happiness_Report_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Project Description:**

The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.


**What is Dystopia?**
**Dystopia** is an imaginary country that has the world’s least-happy people. The purpose in establishing Dystopia is to have a benchmark against which all countries can be favorably compared (no country performs more poorly than Dystopia) in terms of each of the six key variables, thus allowing each sub-bar to be of positive width. The lowest scores observed for the six key variables, therefore, characterize Dystopia. Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom and least social support, it is referred to as “Dystopia,” in contrast to Utopia.


**What are the residuals?**
The residuals, or unexplained components, differ for each country, reflecting the extent to which the six variables either over- or under-explain average life evaluations. These residuals have an average value of approximately zero over the whole set of countries.


**What do the columns succeeding the Happiness Score(like Family, Generosity, etc.) describe?**

The following columns: GDP per Capita, Family, Life Expectancy, Freedom, Generosity, Trust Government Corruption describe the extent to which these factors contribute in evaluating the happiness in each country.
The Dystopia Residual metric actually is the Dystopia Happiness Score(1.85) + the Residual value or the unexplained value for each country.
The Dystopia Residual is already provided in the dataset.
If you add all these factors up, you get the happiness score so it might be un-reliable to model them to predict Happiness Scores.


You need to predict the happiness score considering all the other factors mentioned in the dataset.

**Dataset Link-**
https://github.com/FlipRoboTechnologies/ML-Datasets/blob/main/World%20Happiness/happiness_score_dataset.csv


personal : https://github.com/nallagondu/ML-Datasets/blob/main/World%20Happiness/happiness_score_dataset.csv


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import warnings
warnings.filterwarnings("ignore")




In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix,classification_report,accuracy_score
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

In [None]:
url = "https://raw.githubusercontent.com/nallagondu/ML-Datasets/main/World%20Happiness/happiness_score_dataset.csv"
df = pd.read_csv(url)
df

In [None]:
df.shape

In [None]:
df.describe()

In [None]:
df.info()

In [None]:
#find any uniqu entries in Country
unique_contries =df['Country'].unique()
print(unique_contries)

In [None]:
df.describe(include = 'object')

In [None]:
#find any uniqu entries in Region
unique_Region =df['Region'].unique()
print(unique_Region)

In [None]:
region_count = df['Region'].value_counts()
print("Region count :", region_count)

In [None]:
df.isnull().sum()

**There is no missing values in this Data **

In [None]:
df.describe(include = 'number')

In [None]:
df.info()

In [None]:
from re import Pattern
sns.countplot(x = 'Region', data = df,color = 'darkblue')
plt.xticks(rotation = 90 )
plt.show()

In [None]:
plt.figure(figsize=(25,10))
qty = df['Region'].value_counts()
sns.barplot(x=qty.index,y=qty.values,order=qty.index,palette='Dark2')
plt.title("Counties distrubution of Region ",fontsize = 12)
for index,value in enumerate(qty.values):
  plt.text(index,value, value, fontsize=10)
plt.tight_layout()
plt.show()


In [None]:
sns.boxplot(data=df['Happiness Score'], orient = 'v', palette = "Set1")
plt.xticks(rotation = 30)
plt.show()

In [None]:
plt.figure(figsize=(10,6))
sns.histplot(df['Happiness Score'],kde =True,bins =30)
plt.title("Happyness Score Distrubution ")
plt.xlabel('Happiness Score')
plt.ylabel("Frequency")
plt.show()

In [None]:
plt.figure(figsize=(10,6))
sns.histplot(df['Happiness Score'],kde =True,bins =30)
plt.title("Happyness Score Distrubution ")
plt.xlabel('Region')
plt.ylabel("Happiness Score")
plt.show()

In [None]:
df.filter_happyness = df[(df['Happiness Score'] > 7.5) | (df['Happiness Score'] < 4 )]
sns.barplot(x= 'Happiness Score' , y = 'Country', data = df.filter_happyness, palette = "Set1")
plt.show()

In [None]:
plt.figure(figsize = (15,9))
sns.kdeplot(x=df['Happiness Score'], hue=df['Region'], fill= True)
plt.axvline(df['Happiness Score'].mean(), c = 'black')
plt.title("Happiness Score by Country  ")
plt.ylim(0,0.2)
plt.legend()
plt.show()

In [None]:
df.dtypes

In [None]:
numerical_cols  = df.select_dtypes(include = 'number')
correlation_mat = numerical_cols.corr()

mask = np.triu(np.ones_like(correlation_mat,dtype=bool))

plt.figure(figsize= (14,12))
sns.heatmap(correlation_mat,annot=True,cmap='coolwarm',fmt='.2f',linewidths=0.5,mask =mask )
plt.xticks(rotation=25, ha = 'right')
plt.title("Correlation heatmap ")
plt.show()

In [None]:
df.head()

In [None]:
plt.figure(figsize = (15,9))
sns.kdeplot(x=df['Economy (GDP per Capita)'], hue=df['Happiness Score'], fill= True)
#plt.axvline(df['Economy (GDP per Capita)'].mean(), c = 'black')
plt.title("Economy (GDP per Capita) by Country  ")
#plt.ylim(0,0.2)
plt.legend()
plt.show()

In [None]:
df.columns

In [None]:
sns.pairplot(df, vars= ['Happiness Rank', 'Happiness Score',
       'Standard Error', 'Economy (GDP per Capita)', 'Family',
       'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)',
       'Generosity', 'Dystopia Residual'])
plt.show()

In [None]:
#correlation
df_correlation = df[['Happiness Rank', 'Happiness Score',
       'Standard Error', 'Economy (GDP per Capita)', 'Family',
       'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)',
       'Generosity', 'Dystopia Residual']].dropna().corr()
df_correlation

In [None]:
# @title Happiness Rank vs Economy over time

df_correlation.plot.line(x='Happiness Rank', y='Economy (GDP per Capita)')

In [None]:
sns.heatmap(df_correlation,annot=True)