Basic understanding for column meanings

life ladder:The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you

GDP per capita:GDP per capita stands for Gross Domestic Product (GDP) per capita (per person). It is derived from a straightforward division of total GDP by the population.

Social support: Social support means having friends and other people, including family, to turn to in times of need or crisis

Healthy life expectancy at birth: The average equivalent number of years of full health that a newborn could expect to live

Freedom to make life choices: Freedom of choice describes an individual's opportunity and autonomy to perform an action selected from at least two available options, unconstrained by external parties.

Generosity: the quality of kindness and generous.

Perceptions of corruption: perceived levels of public sector corruption, as determined by expert assessments and opinion surveys.

In [None]:
#Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import plotly.express as px
import warnings
warnings.filterwarnings("ignore")

In [None]:
#Loading data
df = pd.read_csv("../input/world-happiness-report-2021/world-happiness-report-2021.csv")

In [None]:
df.head(10)

In [None]:
#Creating new column Total score
df["Total_score"]=(df["Logged GDP per capita"]+df["Social support"]+df["Healthy life expectancy"]+df["Freedom to make life choices"]+df["Generosity"]-df["Perceptions of corruption"])

In [None]:
df.describe().transpose()

In [None]:
#Exploring the most satisfied regions
regions = df.groupby('Regional indicator').mean().sort_values('Total_score', ascending=False).reset_index()

In [None]:
regions

In [None]:
#Creating the barplot, that shows total score of the regions
sns.barplot(x = 'Total_score', y = 'Regional indicator', data = regions, palette = 'magma')

In [None]:
#Creationg the heatmap that shows correlation between columns
sns.heatmap(regions.corr())

In [None]:
#Exploring the head and the tail of the list of ladder score rate
ladder_score = df[['Country name', 'Ladder score',"Standard error of ladder score"]].sort_values("Ladder score",ascending=False)

In [None]:
# Seeing top 10 and bottom 10 ladder scores

fig = px.histogram(ladder_score.head(10), x = "Ladder score" , y = "Country name",color='Country name',title="Top 10 Ladder Scores"); fig.show()

In [None]:
# Seeing bottom 10 ladder scores
fig = px.histogram(ladder_score.tail(10), x = "Ladder score" , y = "Country name",color='Country name',title="Bottom 10 Ladder Scores"); fig.show()

In [None]:
#Reading one more dataset - happiness by years
by_years = pd.read_csv("../input/world-happiness-report-2021/world-happiness-report.csv")

In [None]:
by_years.head(10)

In [None]:
#Exploring info about the dataset
by_years.info()

In [None]:
#Cleaning the data
by_years.fillna(0,inplace= True)

In [None]:
#Creating Total score column again
by_years["Total_score"]=(by_years["Life Ladder"]+by_years["Log GDP per capita"]+by_years["Social support"]+by_years["Healthy life expectancy at birth"]+by_years["Freedom to make life choices"]+by_years["Generosity"]+by_years["Positive affect"])- (by_years["Perceptions of corruption"]+by_years["Negative affect"])

In [None]:
by_years.head()

In [None]:
#Exploring correlation between Log GDP and Total score and its distribution
plt.figure(figsize=(7,15))
sns.jointplot(x="Log GDP per capita",y= "Total_score",kind= "kde",data= by_years)
plt.show()

In [None]:
#Exploring correlation between Total score and Healthy life expectancy at birth and its distribution
plt.figure(figsize=(7,15))
sns.jointplot(x="Total_score",y= "Healthy life expectancy at birth",data= by_years, kind = 'hex')
plt.show()

In [None]:
#Exploring overroll Healthy life expectancy at birth by years
plt.figure(figsize=(15,5))
x= by_years.groupby("year")["Healthy life expectancy at birth"].mean()
ax= sns.lineplot(x.index,x.values)
ax.set_ylabel("Life expectancy at birth (score)")
plt.show()

In [None]:
#Exploring overroll Freedom to make life choices by years
plt.figure(figsize=(15,5))
x= by_years.groupby("year")["Freedom to make life choices"].mean()
ax= sns.lineplot(x.index,x.values)
ax.set_ylabel("Freedom to make life choices (score)")
plt.show()

In [None]:
#Looking at the countries with higher freedom to make life choices
by_years.groupby("Country name")["Freedom to make life choices"].mean().sort_values(ascending= False).head(10)

In [None]:
#Exploring Russian Log GDP
x = by_years[by_years["Country name"]=="Russia"]
plt.figure(figsize=(15,5))
sns.lineplot(x= "year",y= "Log GDP per capita",data= x)
plt.show()

In [None]:
#Exploring Russian Healthy life expectancy at birth
x = by_years[by_years["Country name"]=="Russia"]
plt.figure(figsize=(15,5))
sns.lineplot(x= "year",y= "Healthy life expectancy at birth",data= x)
plt.show()

In [None]:
#Exploring Russian Perceptions of corruption
x= by_years[by_years["Country name"]=="Russia"]
plt.figure(figsize=(15,5))
sns.lineplot(x= "year",y= "Perceptions of corruption",data= x)
plt.show()

In [None]:
#Exploring Russian Total score grouth
x= by_years[by_years["Country name"]=="Russia"]
plt.figure(figsize=(15,5))
sns.lineplot(x= "year",y= "Total_score",data= x)
plt.show()

Now lets compare scores of my native country Russia and its northen neighbours - Finland, Sweden, Norway and Canada. Usually politicians say that such a low standard of living in most of Russian regions is caused by its unfortunate geographical location in the north and the harsh climate, so I decided to compare the indicators of Russia and its northern neighbors around the globe.

In [None]:
compare = by_years[(by_years["Country name"]=="Norway") | (by_years["Country name"]=="Finland")| (by_years["Country name"]=="Sweden")| (by_years["Country name"]=="Canada")| (by_years["Country name"]=="Russia")]

In [None]:
#Comparing Total score over the years for 5 northen countries
plt.figure(figsize=(10,7))
sns.lineplot(x= "year",y= "Total_score",data = compare, hue = "Country name", legend = 'full')
plt.show()

In [None]:
#Comparing Log GDP per capita over the years for 5 northen countries
plt.figure(figsize=(10,7))
sns.lineplot(x= "year",y = "Log GDP per capita",data = compare, hue = "Country name")
plt.show()

In [None]:
#Comparing Freedom to make life choices over the years for 5 northen countries
plt.figure(figsize=(10,7))
sns.lineplot(x= "year",y= "Freedom to make life choices",data = compare, hue = "Country name", legend = 'full')
plt.show()

In [None]:
#Comparing Healthy life expectancy at birth over the years for 5 northen countries
plt.figure(figsize=(10,7))
sns.lineplot(x= "year",y= "Healthy life expectancy at birth",data = compare, hue = "Country name", legend = 'full')
plt.show()

In [None]:
#Comparing Perceptions of corruption over the years for 5 northen countries
plt.figure(figsize=(10,7))
sns.lineplot(x= "year",y= "Perceptions of corruption",data = compare, hue = "Country name", legend = 'full')
plt.show()

Finally lets check is there a correlation between Perceptions of corruption and Total score.

In [None]:
sns.lmplot(y = 'Perceptions of corruption', x = 'Total_score', data = compare)

The conclusions are disappointing. the higher the level of corruption provides the lower standard of living, the only point in which Russia wins over its neighbors is the level of corruption (based on the available data). In the last plot, we looked out the dependence of one indicator on another. accordingly, in order to improve the standard of living, it is necessary first of all to take control of corruption.