# Make Data Analysis on World Happiness

## Introduction
* The World Happiness Report is a landmark survey of the state of global happiness.
* The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions.
* Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations.
* The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.

![mutluluk raporu](https://ichef.bbci.co.uk/news/800/cpsprodpb/1344B/production/_117632987_mediaitem117632986.jpg)

# Analysis Context 
1. [Python Libraries](#one)
2. [Data Content](#two)
3. [Analysis Data](#three)
4. [Report distribution in 2021](#four)
5. [Happiest and Unhappiest Countries in 2021](#five)
6. [Regional Happiness Distribution](#six)
7. [Ladder Score Compared by Countries](#seven)
8. [Most Generous and Less Generous Countries](#eight)
9. [The Generous distribution by using map view](#nine)
10. [The Generous distribution in 2021 according to Regional Indicator](#ten)
11. [Relationship between Happiness and Income](#eleven)
12. [Relationship between Happiness and Freedom](#twelve)
13. [Relationship Between Happiness and Corruption](#thirteen)
14. [Relationship Between Features](#fourteen)
15. [Conclusions](#fifteen)

<a id="one"></a>
## Python Libraries
* In this section, we import used libraries during this kernel.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# data visulation 
import matplotlib.pyplot as plt 
import seaborn as sns
sns.set_style("whitegrid")

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Use world map 
import plotly.express as px 
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
plt.style.use("seaborn-notebook")

# İgnore code warnings
import warnings
warnings.filterwarnings("ignore")

<a id="two"></a>
## Data Content
* The happiness scores and rankings use data from the Gallup World Poll.
    * Gallup World Poll: In 2005, Gallup began its World Poll, which continually surveys citizens in 160 countries, representing more than 98% of the world's adult population. The Gallup World Poll consists of more than 100 global questions as well as region-specific items.
* The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors.
* **Ladder score**: Happiness score or subjective well-being. This is the national average response to the question of life evaluations.
* **Logged GDP per capita**: The GDP-per-capita time series from 2019 to 2020 using countryspecific forecasts of real GDP growth in 2020.
* **Social support**: Social support refers to assistance or support provided by members of social networks to an individual.
* **Healthy life expectancy**: Healthy life expectancy is the average life in good health - that is to say without irreversible limitation of activity in daily life or incapacities - of a fictitious generation subject to the conditions of mortality and morbidity prevailing that year.
* **Freedom to make life choices**: Freedom to make life choices is the national average of binary responses to the GWP question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?” ... It is defined as the average of laughter and enjoyment for other waves where the happiness question was not asked
* **Generosity**: Generosity is the residual of regressing national average of response to the GWP question “Have you donated money to a charity in the past month?” on GDP per capita.
* **Perceptions of corruption**: The measure is the national average of the survey responses to two questions in the GWP: “Is corruption widespread throughout the government or not” and “Is corruption widespread within businesses or not?”
* **Ladder score in Dystopia**: It has values equal to the world’s lowest national averages.
* **Perceptions of corruption**: The measure is the national average of the survey responses to two questions in the GWP: “Is corruption widespread throughout the government or not” and “Is corruption widespread within businesses or not?”
* **Ladder score in Dystopia**: It has values equal to the world’s lowest national averages. Dystopia as a benchmark against which to compare contributions from each of the six factors. Dystopia is an imaginary country that has the world's least-happy people. ... Since life would be very unpleasant in a country with the world's lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom, and least social support, it is referred to as “Dystopia,” in contrast to Utopia
* World Happiness Report Official Website: [Click to Website](https://worldhappiness.report/)

<a id="three"></a>
## Analysis Data

### Read data

In [None]:
data_before_2021 = pd.read_csv("/kaggle/input/world-happiness-report-2021/world-happiness-report.csv") # data before 2021
data_2021 = pd.read_csv("/kaggle/input/world-happiness-report-2021/world-happiness-report-2021.csv") # 2021 data

### Show Data

In [None]:
data_before_2021.head().style.background_gradient()

In [None]:
data_2021.head().style.background_gradient()

## Explain of details data

In [None]:
def above_zero(val):
    color = 'green' if val > 0 else 'red'
    return 'color: %s' % color
data_before_2021.describe().style.applymap(above_zero)

In [None]:
data_2021.describe().style.applymap(above_zero)

## Info about of data

In [None]:
data_before_2021.info() # Info about data before 2021

In [None]:
data_2021.info() # Info about data in 2021

<a id="four"></a>
## Report distribution in 2021
* [Unique Countries](#2021-1)
* [Count Regional Indicator](#2021-2)
* [Distribution of Remaining Features](#2021-3)

<a id="2021-1"></a>
### Unique Countries

In [None]:
data_2021["Country name"].unique()

<a id="2021-2"></a>
### Count Regional Indicator

In [None]:
sns.countplot(data_2021["Regional indicator"])
plt.xticks(rotation = 75)
plt.show()

<a id="2021-3"></a>
### Distribution of Remaining Features

In [None]:
# Distribution features list 1
list_features = ["Social support", "Freedom to make life choices", "Generosity", "Perceptions of corruption"]
sns.boxplot(data = data_2021.loc[:, list_features], orient = "h", palette = "Set3")
plt.show()

In [None]:
# distribution of feature set 2
list_features = ["Ladder score", "Logged GDP per capita"]
sns.boxplot(data = data_2021.loc[:, list_features], orient = "h", palette = "Set1")
plt.show()

In [None]:
# distribution of feature set 3
list_features = ["Healthy life expectancy"]
sns.boxplot(data = data_2021.loc[:, list_features], orient = "h", palette = "Set2")
plt.show()

<a id="five"></a>
## Happiest and Unhappiest Countries in 2021

In [None]:
print("Unhappiest Countries Score",data_2021["Ladder score"].min())
print("Happinest Countries Score",data_2021["Ladder score"].max())
print("Overage Happienes Score",data_2021["Ladder score"].mean())

In [None]:
overage_countries_score = data_2021["Ladder score"].mean()
data_2021_happinest = data_2021[data_2021.loc[:,"Ladder score"]>overage_countries_score][:5]
sns.barplot(x="Ladder score",y="Country name",data=data_2021_happinest,palette="rocket")
sns.cubehelix_palette(start=.5, rot=-.75, as_cmap=True)
plt.title("Happinest Countries")
plt.xlabel("Happiness Score")
plt.ylabel("Country")
plt.show()

In [None]:
overage_countries_score = data_2021["Ladder score"].mean()
data_2021_happinest = data_2021[data_2021.loc[:,"Ladder score"]<overage_countries_score].sort_values(by="Ladder score",ascending=True)[:5][::-1]
sns.barplot(x="Ladder score",y="Country name",data=data_2021_happinest,palette="mako")
sns.cubehelix_palette(start=.5, rot=-.75, as_cmap=True)
plt.title("Unhappinest Countries")
plt.xlabel("Happiness Score")
plt.ylabel("Country")
plt.show()

#### West Europe Country happiness more than East Country

<a id="six"></a>
## Regional Happiness Distribution

In [None]:
plt.figure(figsize = (15,8))
sns.kdeplot(data_2021["Ladder score"], hue = data_2021["Regional indicator"], fill = True, linewidth = 2)
plt.axvline(data_2021["Ladder score"].mean(), c = "black")
plt.title("Ladder Score Distribution by Regional Indicator")
plt.show()

In [None]:
# Happinest three regions
j=1
for i in data_2021["Regional indicator"].sort_values(ascending=False).unique()[:3]:
    print(str(j)+".region="+i)
    j = j + 1 

In [None]:
# Unhappinest three regions
j=1
for i in data_2021["Regional indicator"].sort_values(ascending=True).unique()[:3][::-1]:
    print(str(j)+".region="+i)
    j = j + 1 

<a id="seven"></a>
## Ladder Score Compared by Countries

In [None]:
compared_countries_before_2021 = data_before_2021.select_dtypes(include = ["float64", "int64"])
sns.pairplot(compared_countries_before_2021);

In [None]:
compared_countries_2021 = data_2021.select_dtypes(include = ["float64", "int64"])
sns.pairplot(compared_countries_2021);

In [None]:
#  The Most life ladder countries year by year 
data_before_2021.groupby("year")[["Country name","Life Ladder"]].max()

In [None]:
#  The Less life ladder countries year by year
data_before_2021.groupby("year")[["Country name","Life Ladder"]].min()

In [None]:
fig = px.choropleth(data_before_2021.sort_values("year"),
                    locations = "Country name",
                    color = "Life Ladder",
                    locationmode = "country names",
                    animation_frame = "year")
fig.update_layout(title = "Life Ladder Comparison by Countries")
fig.show()

<a id="eight"></a>
## Most Generous and Less Generous Countries

In [None]:
df2021_g = data_2021[(data_2021.loc[:, "Generosity"] > 0.4) | (data_2021.loc[:, "Generosity"] < -0.2)]
sns.barplot(x = "Generosity", y = "Country name", data = df2021_g, palette = "coolwarm")
plt.title("Most Generous and Most Ungenerous Countries in 2021")
plt.show()

In [None]:
print(data_2021.loc[:, "Generosity"].mean()) # Average Country score
print(data_2021.loc[:, "Generosity"].max()) # Most Country score
print(data_2021.loc[:, "Generosity"].min()) # Less Country score

In [None]:
data_2021[data_2021.loc[:, "Generosity"]>0.5]["Country name"].max() # Most Generous Country

In [None]:
data_2021[data_2021.loc[:, "Generosity"]<0.5]["Country name"].min() # Less Generous Country

<a id="nine"></a>
## The Generous distribution by using map view

In [None]:
fig = px.choropleth(data_before_2021.sort_values("year"),
                   locations = "Country name",
                   color = "Generosity",
                   locationmode = "country names",
                   animation_frame = "year")
fig.update_layout(title = "Generosity Comparison by Countries")
fig.show()

In [None]:
# The less generosity year by year
data_before_2021.groupby("year")[["Country name","Generosity","year"]].min().head()

In [None]:
# The most generosity year by year
data_before_2021.groupby("year")[["Country name","Generosity","year"]].max().head()

<a id="ten"></a>
## The Generous distribution in 2021 according to Regional Indicator

In [None]:
sns.swarmplot(x="Regional indicator",y="Generosity",data=data_2021)
plt.xticks(rotation = 75)
plt.title("The Generous distribution in 2021 according to Regional Indicator")
plt.show()

In [None]:
# The less Generosity Countries by Regional indicator
data_2021[["Regional indicator","Generosity"]].sort_values(by="Generosity",ascending=False)[:10]

In [None]:
# The most Generosity Countries by Regional indicator
data_2021[["Regional indicator","Generosity"]].sort_values(by="Generosity",ascending=True)[:10]

<a id="eleven"></a>
## Relationship between Happiness and Income

In [None]:
# With another data we add to data so as to population count
pop = pd.read_csv("/kaggle/input/world-population-19602018/population_total_long.csv")
pop.head()

In [None]:
# Split up prior to 2021 and 2021 data 
country_continent = dict()
for i in range(len(data_2021)):
     country_continent[data_2021["Country name"][i]] = data_2021["Regional indicator"][i]
all_countries = data_before_2021["Country name"].value_counts().reset_index()["index"].to_list()
all_countries_2021 = data_2021["Country name"].value_counts().reset_index()["index"].to_list()
all_countries_not_2021 = [i for i in all_countries if i not in all_countries_2021]

In [None]:
# 15 Country add to data 
region = []
for i in range(len(data_before_2021)):
    if data_before_2021['Country name'][i] == 'Angola':
        region.append("Sub-Saharan Africa")
    elif data_before_2021['Country name'][i] == 'Belize':
        region.append("Latin America and Caribbean")
    elif data_before_2021['Country name'][i] == 'Congo (Kinshasa)':
        region.append("Sub-Saharan Africa")
    elif data_before_2021['Country name'][i] == 'Syria':
        region.append("Middle East and North Africa")
    elif data_before_2021['Country name'][i] == 'Trinidad and Tobago':
        region.append("Latin America and Caribbean")
    elif data_before_2021['Country name'][i] == 'Cuba':
        region.append("Latin America and Caribbean")
    elif data_before_2021['Country name'][i] == 'Qatar':
        region.append("Middle East and North Africa")
    elif data_before_2021['Country name'][i] == 'Sudan':
        region.append("Middle East and North Africa")
    elif data_before_2021['Country name'][i] == 'Central African Republic':
        region.append("Sub-Saharan Africa")
    elif data_before_2021['Country name'][i] == 'Djibouti':
        region.append("Sub-Saharan Africa")
    elif data_before_2021['Country name'][i] == 'Somaliland region':
        region.append("Sub-Saharan Africa")
    elif data_before_2021['Country name'][i] == 'South Sudan':
        region.append("Middle East and North Africa")
    elif data_before_2021['Country name'][i] == 'Somalia':
        region.append("Sub-Saharan Africa")
    elif data_before_2021['Country name'][i] == 'Oman':
        region.append("Middle East and North Africa")
    elif data_before_2021['Country name'][i] == 'Guyana':
        region.append("Latin America and Caribbean")
    elif data_before_2021['Country name'][i] == 'Guyana':
        region.append("Latin America and Caribbean")
    elif data_before_2021['Country name'][i] == 'Bhutan':
        region.append("South Asia")
    elif data_before_2021['Country name'][i] == 'Suriname':
        region.append("Latin America and Caribbean")
    else:
        region.append(country_continent[data_before_2021['Country name'][i]])
        
data_before_2021["region"] = region
data_before_2021.head()

In [None]:
all_countries = data_before_2021["Country name"].value_counts().reset_index()["index"].tolist()
all_countries_pop = pop["Country Name"].value_counts().reset_index()["index"].tolist()

del_cou = []
for x in all_countries:
    if x not in all_countries_pop:
        del_cou.append(x)
del_cou

In [None]:
# Before 2021 data learn attributes 
data_before_2021.columns

In [None]:
# Population attributes take in order to compared Log GDP per capita
pop_df = data_before_2021[['Log GDP per capita', 'Life Ladder', 'Country name', 'year', 'Social support', 'Healthy life expectancy at birth',
       'Freedom to make life choices', 'Generosity',"region",'Perceptions of corruption']].copy()
pop_df.head()

In [None]:
pop_df = pop_df[~pop_df["Country name"].isin(del_cou)]
pop_df = pop_df[~pop_df.year.isin([2006,2005,2007,2018,2019,2020,2021])]
pop_dict = {x:{} for x in range(2008,2018)}
for i in range(len(pop)):
    if(pop["Year"][i] in range(2008,2018)):
        pop_dict[pop["Year"][i]][pop["Country Name"][i]] = pop["Count"][i]

In [None]:
population = []
for i in pop_df.index:
    population.append(pop_dict[pop_df["year"][i]][pop_df["Country name"][i]])
pop_df["population"] = population
pop_df.head()

In [None]:
# Population Life Ladder and Log GDP per capita Comparison by Countries via Regions for each Year
fig = px.scatter(pop_df, 
                 x = "Log GDP per capita",
                 y = "Life Ladder",
                 animation_frame = "year",
                 animation_group = "Country name",
                 size = "population",
                 template = "plotly_white",
                 color = "region", 
                 hover_name = "Country name", 
                 size_max = 60)
fig.update_layout(title = "Life Ladder and Log GDP per capita Comparison by Countries via Regions for each Year")
fig.show()

In [None]:
# As increasing Population,it is decreasing effect on Life Ladder 
# China have most population
pop_df.groupby("population")[["Country name","Life Ladder","year"]].max().sort_values(by="population",ascending=False).head()

In [None]:
# As decreasing Population,it is increasing effect on Life Ladder 
# Iceland have most population
pop_df.groupby("population")[["Country name","Life Ladder","year"]].min().sort_values(by="population",ascending=True).head()

<a id="twelve"></a>
## Relationship between Happiness and Freedom

In [None]:
# The Countries decribes freedom affect on happiness 
fig = px.scatter(pop_df, 
                 x = "Freedom to make life choices",
                 y = "Life Ladder",
                 animation_frame = "year",
                 animation_group = "Country name",
                 size = "population",
                 template = "plotly_dark",
                 color = "region", 
                 hover_name = "Country name", 
                 size_max = 60)
fig.update_layout(title = "Life Ladder and Freedom Comparison by Countries via Regions for each Year")
fig.show()

In [None]:
pop_df.groupby(by="Freedom to make life choices").max().sort_values(by="Freedom to make life choices",ascending=False)[["Life Ladder","year","Country name","population"]].head(15)

In [None]:
pop_df.groupby(by="Freedom to make life choices").max().sort_values(by="Freedom to make life choices",ascending=True)[["Life Ladder","year","Country name","population"]].head(15)

<a id="thirteen"></a>
## Relationship Between Happiness and Corruption

In [None]:
fig = px.scatter(pop_df, 
                 x = "Perceptions of corruption",
                 y = "Life Ladder",
                 animation_frame = "year",
                 animation_group = "Country name",
                 size = "population",
                 color = "region", 
                 hover_name = "Country name", 
                 size_max = 60)
fig.update_layout(title = "Life Ladder and Corruption Comparison by Countries via Regions for each Year")
fig.show()

In [None]:
# As incresing rate of corruption, Life Ladder rate is decreasing 
pop_df.groupby(by="Perceptions of corruption").max().sort_values(by="Perceptions of corruption",ascending=False)[["Life Ladder","year","Country name","population"]].head(15)

In [None]:
pop_df.groupby(by="Perceptions of corruption").max().sort_values(by="Perceptions of corruption",ascending=True)[["Life Ladder","year","Country name","population"]].head(15)

<a id="fourteen"></a>
## Relationship Between Features

In [None]:
# Rigth proportion Log DGP per capita with Life Ladder
# Rigth proportion Helhtly with Life Ladder
# Rigth proportion Social support with Life Ladder
sns.heatmap(data_before_2021.corr(),annot=True,fmt=".2f",linewidth=.7)
plt.title("Relationship Between Features ")
plt.show()

In [None]:
sns.clustermap(data_before_2021.corr(), center = 0, cmap = "vlag", dendrogram_ratio = (0.1, 0.2), annot = True, linewidths = .7, figsize=(10,10))
plt.show()

<a id="fifteen"></a>
# Conclusions
1. The Happinest County is Finland.
2. The Unhappinest Country is Afghanistan
3. The happinest region is West Europe
4. The unhappinest region is East Asia
5. There is no relationship between healthy and generosity
6. Rigth proportion Log DGP per capita with Life Ladder
7. Rigth proportion Helhtly with Life Ladder
8. Rigth proportion Social support with Life Ladder
9. As increasing Population,it is decreasing effect on Life Ladder
10. As incresing rate of corruption, Life Ladder rate is decreasing 