**The World Happiness Report was written by a group of independent experts acting in their personal capacities. Any views expressed in this report do not necessarily reflect the views of any organization, agency or programme of the United Nations**

![](https://lh3.googleusercontent.com/proxy/YHhLL1cnxdBKOewhApDZITRkRRK6Vj7M6FpK418Px1L4pjAs1KKEed0Gzdjx7Lp28p3H8IiTDl4UuKk2c9sF5Lv6FD_zXXuQkw2J9BFrxvYq7DjcNsudyRNzBLXdQNfeT6n-ns6Z)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

In [None]:
data=pd.read_csv("../input/world-happiness-index-report/world-happiness-report-2021.csv")

In [None]:
data.info

In [None]:
data.describe()

In [None]:
data.isnull().sum()

In [None]:
data.duplicated()

**Dataset columnsdata.columns**

In [None]:
data.columns

**Checking for correlation between the columns for better understanding
**

In [None]:
plt.figure(figsize=(12,8))
sns.heatmap(data.corr(),annot=True)
plt.xticks(rotation=90)
plt.title("HeatMap")

**Lets understand the dataset better**


    Country name = Name of the country
    Regional indicator = Region on the continent
    Ladder Score = Our Target column.
    upperwhisker = upper confidence interval
    lowerwhisker = lower confidence interval
    Logged GDP per capita = capability of buying things as per USD
    Social Support = index of getting hep from society, friends and relatives
    Healthy life expectancy = healthy life expectation at birth, decided by WHO
    Generosity = donations and charity
    Freedom to make life choices = freedom to make life choices


**Our target column "Ladder Score" has higher correlation with "Logged GDP per capita","Healthy life expectancy", "Social Support","Freedom to make life choices"**

***Data Visualization***

In [None]:
sns.pairplot(data)

Using pairplot and kind='scatter' to understand correlation visually. We see Ladder Score is positively correlated with 'Logged GDP per capita', 'Social support', 'Healthy life expectancy','Freedom to make life choices', Athough with 'Generosity' the scatter is well spread hence much cannot be made out of it. For 'Perceptions of corruption' its negatively correlated. 

In [None]:
sns.pairplot(data, hue='Regional indicator')

Pairplot with hue ='Regional indicator'. Helps understand the correlation as per regions on globe.

In [None]:
data_subset= data[['Country name', 'Regional indicator', 'Ladder score', 'upperwhisker', 'lowerwhisker','Logged GDP per capita', 'Social support', 'Healthy life expectancy','Freedom to make life choices', 'Generosity','Perceptions of corruption']]

In [None]:
data_subset.head()

In [None]:
data_subset.tail()

In [None]:
plt.figure(figsize=(11,7))
sns.scatterplot(x='Ladder score',y='Logged GDP per capita',data=data_subset,hue='Regional indicator')
plt.title("Ladder Score VS GDP per capita as per Regional Indicator")
plt.legend(loc="lower right")


    1. Western Europe and North America and ANZ region has the highest and closley    clustered Ladder Score and Logged GDP.
    2. For Central and Eastern Europe region rules the central part of the graph.
    3. Sub-Saharan Africa is amongst the lowest part of the graph, with ladder score between 3-5 and Logged GDP between less than 7 to 9. The scatter is not closely clustered which means the countries do have limited ladder score of 3-5 range but the logged GDP are not at all same or almost same.


In [None]:
plt.figure(figsize=(11,7))
sns.scatterplot(x='Ladder score',y='Social support',data=data_subset,hue='Regional indicator')
plt.title("Social Support affection Ladder Score-Regional Indicator")
plt.legend(loc="lower right")


   1.  Central Europe,North America&AMZ top the graph forming a cluster almost like an outlier for others.
   2.  The central graph is dominated by countries - Latin America and Caribbean, Central and Eastern Europe, few Middle East and North Africa,and Southeast Asia. These countries form a very closely corellated cluster.
   3.  We can see one country from South Asia is at the bottom else here we can see Sub-Saharan Africa countries loosely clustered and forming the bottom most group.



In [None]:
plt.figure(figsize=(11,7))
sns.scatterplot(x='Ladder score',y='Healthy life expectancy',data=data_subset,hue='Regional indicator')
plt.title("How Health Life Expectancy affects Ladder Score by Regional Indicators")
plt.legend(loc="lower right")


   1.  Few North MArican&AMZ and Western Countries have the highest Healthy Life Expectancy. Range 70+. We see one Latin and also Southeast Asian country also belong to the same range
   2.  The central cluster i.e 60 -70 is dominated by Central and Eastern Europe, Latin America and Caribbean also Commonwealth Independent STates.
   3. Sub-Saharan African has the lowest ladder score range 3.5-5.2 (approx) and also the Healthy Lide Expectation is 48-62 approx. Which is also loosely clustered.



In [None]:
plt.figure(figsize=(8,8))
sns.jointplot(x='Ladder score',y='Freedom to make life choices',data=data_subset,kind='kde')


The ladder score and freedom to make life choices show positive relationship but also need to be observed how it spreads for the countries with lower ladder score and Freedom to make life choices.

In [None]:
sns.catplot(x='Regional indicator',y='Perceptions of corruption',data=data_subset)
plt.xticks(rotation=90)
plt.title("Corruption Index in different regions")


    1. Western Europe - the Corruption distribution is the maximum. 0.2 to 0.9 approx
    2. Latin American & AMZ - the distribution is quite densed, specially towards 0.8 range.
    3. Southeast Asia- This amazes. The lowest Corruption rate is also the lowest across all the regions but also shares the top ranks. But the corruption amongst the countries are wide spread.
    4. Sub-Saharan Africa region has the closest clustered corruption and also can be said that they have corruption in almost all the countries.
    5. Central and Eastern Europe - has the highest corruption amongst all the regions, reaching aboove 0.8. Also there are alot of countries who have corruption above 0.8 almost 1.0(observed)



In [None]:
sns.swarmplot(x='Regional indicator',y='Generosity',data=data_subset)
plt.xticks(rotation=90)
plt.title("Genorisity in different regions")


    1. Western Europe - most widespread reading below -0.2 to 0.2
    2. Latin America and Caribbean - clustered between -0.2 to 0.00
    3. Sub-Saharan Africa - Has the most number of Generosity, maybe due to the number of countries, also the cluster is packed between -0.1 to 0.1
    4. Southeast Asia - Has the highest generosity crossing 0.4. It also acts like an outlier for the complete graph



In [None]:
plt.figure(figsize=(11,6))
sns.pointplot(x='Country name',y='Ladder score', data=data_subset[:10],linestyles='--')
plt.xticks(rotation=90)
plt.title("Top 10 countries with best ladder score")

**This line plot describes the ladder score for the Top 10 Countries accross the regions**

In [None]:
plt.figure(figsize=(11,6))
sns.pointplot(x='Country name',y='Ladder score', data=data_subset[-10:],linestyles='--',color='red')
plt.xticks(rotation=90)
plt.title("Top 10 countries with lowest ladder score")

**This line plot describes the ladder score for the lowest 10 Countries accross the regions**

In [None]:
plt.figure(figsize=(11,6))
sns.barplot(x='Regional indicator',y='Ladder score',data=data_subset)
plt.xticks(rotation=90)
plt.title("Ladder Score")

**Bar Chart demonstrating the Ladder Score across all the Regions**

In [None]:
plt.figure(figsize=(11,6))
sns.boxplot(x='Regional indicator',y='Logged GDP per capita',data=data_subset)
plt.xticks(rotation=90)
plt.title("Logged GDP per capita")
plt.figure(figsize=(11,6))
sns.boxplot(x='Regional indicator',y='Healthy life expectancy',data=data_subset)
plt.xticks(rotation=90)
plt.title("Healthy Life Expectancy")

In [None]:
top_10countries_ladderscore=data_subset[:10].groupby('Ladder score')['Regional indicator','Country name','Ladder score']

In [None]:
top_10countries_ladderscore.head()

In [None]:
data_subset.groupby('Regional indicator').sum()

**European Countries**

In [None]:
Europe=data[(data['Regional indicator']=='Central and Eastern Europe') | (data['Regional indicator']=='Western Europe')]

In [None]:
Europe.head()

In [None]:
plt.figure(figsize=(11,6))
sns.scatterplot(data=Europe[:10],x='Logged GDP per capita',y='Ladder score',hue='Country name')
plt.title("Ladder Score correlation with Logged GDP for top 10 European countries",fontsize=15)

1.Finland has the highest Ladder Score but its Logged GDP stands below 10.8 
2.Most of the European countries Logged GDP ranges between 10.8 and 11.2 
3.Luxembourg is the outlier in the chart with the highest Logged GDP but its Ladder Score lies amongst the lowest.

In [None]:
plt.figure(figsize=(9,9))
sns.relplot(data=Europe[:10],x='Ladder score',y='Healthy life expectancy',hue='Country name')
plt.title("Ladder Score correlation with Healthy Life Expectation for top 10 European countries",fontsize=15)

1. Switzerland has the highest Healthy Life Expectancy -- outlier
2. Most of the countries' Healthy Life Expectation ranges between 72.5 to below 73.5
3. Germany has the lowest Healthy Life Expectancy although it also has the highest   Ladder Score



In [None]:
plt.figure(figsize=(11,6))
sns.relplot(data=Europe[:10],x='Ladder score',y='Perceptions of corruption',hue='Country name')
plt.title("Ladder Score correlation with Perceptions of corruption for top 10 European countries",fontsize=15)

1. Germany and Denmark have the lowest Perceptions of Corruption
2. Iceland has the highest value, which also acts like an outlier for the others, crossing 0.6 and above easily.
3. We can observe that the courrption is widespread amongst the rest of the countries, which shows that the corruption perception is not uniformly distributed and are lossely scattered.



**Asian Countries**

In [None]:
Asia=data[(data['Regional indicator']=='South Asia') | (data['Regional indicator']=='Southeast Asia')]

In [None]:
Asia.head(10)

In [None]:
plt.figure(figsize=(11,6))
sns.scatterplot(data=Asia[:10],x='Logged GDP per capita',y='Ladder score',hue='Country name')
plt.title("Ladder Score correlation with Logged GDP for top 10 Asian countries",fontsize=15)


1. Singapore has the highest Ladder Score completely outscoring the rest countries from   this region.
2. Nepal being one of the smallest and developing country has to have the lowest ladder score per Logged GDP per capita
3. The remaining countries have widespread reading, showing us that these countries have a huge difference in Ladder Score, per capita GDP



In [None]:
plt.figure(figsize=(9,9))
sns.relplot(data=Asia[:10],x='Ladder score',y='Healthy life expectancy',hue='Country name')
plt.title("Ladder Score correlation with Healthy Life Expectation for top 10 Asian countries",fontsize=15)

1. Singapore has the highest Healthy Life Expectancy, no other country is even nearer to it.
2. Laos has the lowest value in Asian countries.
3. Rest of the countries have Healthy Life Expectancy between 62.5 to 70.0



In [None]:
plt.figure(figsize=(11,6))
sns.relplot(data=Asia[:10],x='Ladder score',y='Perceptions of corruption',hue='Country name')
plt.title("Ladder Score correlation with Perceptions of corruption for top 10 Asian countries",fontsize=15)

1. Singapore has the lowest perception of corruption, almost equal to 0.1 which is by far away from the rest of the countries.
2. Thailand has the highest rate of corruption, 0.9
3. The largest cluster of country is between range 0.8 and a bit lower than 0.9



In [None]:
plt.figure(figsize=(11,6))
sns.relplot(data=Asia[:10],x='Ladder score',y='Social support',hue='Country name')
plt.title("Ladder Score correlation with Social Support for top 10 Asian countries",fontsize=15)

1. Singapore, Maldives and Thailand are amongst the highest Social Support.
2. Bangladesh and Loas are amongst the lowest.
3. Most of the countries have the Social Support is between 0.80 and 0.85



**Thank You**
Please upvote if you liked it! Feel free to comment. 