# World Happines Report EDA

##### This data is great for exploratory data analysis and data visualization. And for this purpose I want to realize my first implementation with plotly library which is really cool.

##### Bu veriseti keşifsel veri analizi ve veri görselleştirmesi için çok iyi veriseti. Ben de bu amaçla bayağı havalı grafikler çizmeyi mümkün kılan plotly kütüphanesiyle ilk uygulamamı gerçekleştirmek istiyorum.

# Importing Libraries

In [None]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

# Loading Data

In [None]:
data = pd.read_csv("../input/world-happiness-report-2021/world-happiness-report.csv")
data.fillna(0, inplace=True)
data.info()

In [None]:
data.rename(columns={"Country name" : "country",
                    "Life Ladder" : "ladder_score",
                    "Log GDP per capita" : "gdp_per_capita",
                    "Social support" : "social_support",
                    "Healthy life expectancy at birth" : "life_expectancy",
                    "Freedom to make life choices" : "freedom",
                    "Generosity" : "generosity",
                    "Perceptions of corruption" : "corruption",
                    "Positive affect" : "positive_affect",
                    "Negative affect" : "negative_affect"},
           inplace=True)

In [None]:
data.describe()

In [None]:
data.head(20)

In [None]:
data_ = pd.read_csv("../input/world-happiness-report-2021/world-happiness-report-2021.csv")
data_.head(3)

#### I'm using the code cell below to insert region feature from second dataframe (named data_) to first dataframe (named as data.) The loop checks entries in country columns and if they are same countries, the entry corresponding to region at that moment is assigned first dataframe's region column's proper index that is corresponding country's matched.

#### TR: 2.dataframedeki region özelliğini ilk (data) dataframe'ine eklemek için aşağıdaki kod parçasını kullanıyorum. country kolonlarındaki girdiler sırasıyla kontrol edilerek eşit olduğu durumda 2.dataframeden regiona karşılık gelen girdiyi ilk dataframede eşit olduğu durumun indeksinde region kolonuna ekler.

In [None]:
data_.rename(columns={"Regional indicator" : "region", "Country name" : "country"},inplace=True)

data["region"] = np.nan

for i,country in enumerate(data_.country):
    if data["country"].eq(country).any(0):
        ind = data.index[data.eq(country).any(1)]
        data["region"][ind] = data_.region[i]
        
data.head(10)

# Filling NaN entries

#### Checking if there is null entries after merging.

#### Birleştirdikten sonra null girdi oldu mu diye kontrol ediyorum.

In [None]:
data.isna().sum()

####  There is NaN values in region column and I will fill them with proper region by searching their locations
 
#### TR: region kolonunda NaN girdiler var ve bunları ülkelerin konumlarına göre uygun bölgeyi seçip NaN girdiyi dolduracağım.



In [None]:
data.region.unique()

In [None]:
null = data[data["region"].isnull()]
null['country'].unique()

In [None]:
region_list = []

for country in data['country']:
    if country == "Angola":
        region_list.append("Sub-Saharan Africa")
    if country == "Belize":
        region_list.append("Latin America and Caribbean")
    if country == "Bhutan":
        region_list.append("South Asia")
    if country == "Central African Republic":
        region_list.append("Sub-Saharan Africa")
    if country == "Congo (Kinshasa)":
        region_list.append("Sub-Saharan Africa")
    if country == "Cuba":
        region_list.append("Latin America and Caribbean")
    if country == "Djibouti":
        region_list.append("Middle East and North Africa")
    if country == "Guyana":
        region_list.append("Latin America and Caribbean") 
    if country == "Oman":
        region_list.append("Middle East and North Africa")  
    if country == "Qatar":
        region_list.append("Middle East and North Africa") 
    if country == "Somalia":
        region_list.append("Middle East and North Africa") 
    if country == "Somaliland region":
        region_list.append("Sub-Saharan Africa") 
    if country == "South Sudan":
        region_list.append("Sub-Saharan Africa") 
    if country == "Sudan":
        region_list.append("Middle East and North Africa")  
    if country == "Suriname":
        region_list.append("Latin America and Caribbean") 
    if country == "Syria":
        region_list.append("Middle East and North Africa") 
    if country == "Trinidad and Tobago":
        region_list.append("Latin America and Caribbean")     
        
        
for i,j in zip(null.index,range(len(null))):
    data["region"][i] = region_list[j]

In [None]:
data.isnull().sum()

#### * *Since there is no missing values, skipping to visualization.*

# Expolartory Data Analysis

### Overview

In [None]:
fig = px.box(data, x="ladder_score",color_discrete_sequence = ['red'],hover_data=["country"])
fig.update_layout(width=900,height=450,
xaxis_title_text = 'Ladder Score',
template='plotly_dark')
fig.show()

In [None]:
fig = px.box(data, x="freedom",color_discrete_sequence = ['mediumspringgreen'],hover_data=["country"])
fig.update_layout(width=900,height=450,
xaxis_title_text = 'Freedom',
template='plotly_dark')
fig.show()

In [None]:
fig = px.box(data[["corruption","country"]][data["corruption"] > 0], x="corruption",color_discrete_sequence = ['darkviolet'],hover_data=["country"])
fig.update_layout(width=900,height=450,
xaxis_title_text = 'Corruption',
template='plotly_dark')
fig.show()

In [None]:
fig = px.box(data[["social_support","country"]][data["social_support"] > 0], x="social_support",color_discrete_sequence = ['darkorange'],hover_data=["country"])
fig.update_layout(width=900,height=450,
xaxis_title_text = 'Social Support',
template='plotly_dark')
fig.show()

In [None]:
group = data.groupby('country', as_index=False).mean().sort_values(by='ladder_score',ascending=False)

fig = make_subplots(rows=1, cols=2,
                   subplot_titles=['The Happiest 10 Countries', 'The Unhappiest 10 Countries'])
fig.append_trace(go.Bar(x=group['ladder_score'].head(10),
                       y=group["country"].head(10),
                        orientation='h',
                       marker={'color': '#3366ff','line': dict(color='#3366ff', width=1)},
                       ), 1,1
                )
fig.append_trace(go.Bar(x=group['ladder_score'].tail(10),
                       y=group['country'].tail(10),
                        orientation='h',
                        marker={'color': '#ff9933','line': dict(color='#ff9933', width=1)},
                       ), 1,2
                )
fig.update_layout(
    template='plotly_dark',
    showlegend=False)
fig.show()

In [None]:
gdp = data.groupby('country', as_index=False).mean().sort_values(by='gdp_per_capita',ascending=False)
x = gdp[gdp['gdp_per_capita'] > 0].tail(10)

fig = make_subplots(rows=1, cols=2,
                   subplot_titles=['The Highest 10 Countries as GDP per capita', 'The Lowest 10 Countries as GDP per capita'])
fig.append_trace(go.Bar(x=gdp['gdp_per_capita'].head(10),
                       y=gdp["country"].head(10),
                        orientation='h',
                       marker={'color': '#ccccff','line': dict(color='#ccccff', width=1)},
                       ), 1,1
                )
fig.append_trace(go.Bar(x=x['gdp_per_capita'],
                       y=gdp['country'].tail(10),
                        orientation='h',
                        marker={'color': '#99ccff','line': dict(color='#99ccff', width=1)},
                       ), 1,2
                )
fig.update_layout(
    template='plotly_dark',
    showlegend=False,
    title_font_size=16)
fig.show()

In [None]:
free = data.groupby("country",as_index=False).mean().groupby(["country","freedom"], as_index=False).mean().sort_values(by='freedom', ascending=False)

fig = make_subplots(rows=1, cols=2,
                   subplot_titles=['Top 10 Countries as freedom', 'The Lowest 10 Countries as freedom'])
fig.append_trace(go.Bar(x=free['freedom'].head(10),
                       y=free["country"].head(10),
                        orientation='h'
                       ), 1,1
                )
fig.append_trace(go.Bar(x=free['freedom'].tail(10),
                       y=free['country'].tail(10),
                        orientation='h'
                       ), 1,2
                )
fig.update_layout(
    template='ggplot2',
    showlegend=False)
fig.show()

In [None]:
corruption = data.groupby(["country"],as_index=False).mean().sort_values(by="corruption", ascending=False)
x = corruption[corruption['corruption'] > 0].tail(10)

fig = make_subplots(rows=1, cols=2,
                   subplot_titles=['Top 10 Countries as corruption', 'The Lowest 10 Countries as corruption'])
fig.append_trace(go.Bar(x=corruption['corruption'].head(10),
                       y=corruption["country"].head(10),
                        orientation='h'
                       ), 1,1
                )
fig.append_trace(go.Bar(x=x['corruption'].tail(10),
                       y=corruption['country'].tail(10),
                        orientation='h'
                       ), 1,2
                )
fig.update_layout(
    template='seaborn',
    showlegend=False)
fig.show()

### Regional Analysis

#### GDP per capita and Life expectancy

In [None]:
fig = px.scatter(data, y="life_expectancy", x="gdp_per_capita", size="ladder_score", color="region",
           hover_name="country", animation_frame="year", animation_group="country", log_x=True, size_max=12, template="ggplot2", title="GDP per capita by regions")
fig.show()

In [None]:
for region in data.region.unique():
    fig = px.scatter(data[data["region"]==region], y="life_expectancy", x="gdp_per_capita", size="ladder_score", color="country",
           hover_name="country", animation_frame="year", animation_group="country", log_x=True, size_max=15, template="plotly_dark", title=region)
    fig.show()

#### These plots shows that it is possible to say that the higher gdp_per_capita, the higher life expectancy.

#### Bu animasyonlu grafikleri bölge bölge inceleyince gdp per capita arttıkça life expectancy için de artış olduğu görülmekte.

#### GDP per capita / Ladder Score

In [None]:
for region in data.region.unique():
    fig = px.scatter(data[data["region"]==region], y="ladder_score", x="gdp_per_capita", size="freedom", color="country",
           hover_name="country", animation_frame="year", animation_group="country", log_x=False, size_max=15, template="plotly_dark", title=region)
    fig.show()