 ###### ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 

 <h1><a id="Top">Analysis on World Happiness</a></h1>
<h5> Date Created: July 22, 2022<br>
     Team Members: Aastha Mittal, Marina Lui, Janna Fasheh, Mubai Li, Jesus Cruz</h5>

 ###### ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 



 <h2><a id="intro">Introduction</a></h2>

- Using the **World Happiness Report, 2021 Happiness Index and the Population in 2020**, we, as a team, began to explore the relationships between concepts such as GDP, Life Expectancy, Social Support, etc with the Happiness Index associated to each country/continent. With a number of changes, both positive and negative, the happiness of countries and continents is constantly changing. In order to understand which factors have a greater impact, we chose to visualize data related to topic.

- The goal of this notebook is to perform **exploratory data analysis** and learn how and what impacts the happiness index of a country. This is visualized through scatter plots, bar graphs, line plot and more throughout the notebook.



 <h2><a id="questions">Research Questions</a></h2>



**Hypothesis:** Happiness is influenced by the following questions:

- Does social support, GDP, freedom, and corruption have an effect on the happiness of a country? 
- How are life expectancy and happiness correlated?
- How different are continents' happiness?
- Does population affect happiness? Also, as the population changes over time, how does this affect happiness?



 <h2><a id="data">Data Description</a></h2>



For our analysis, we obtained the three separate datasets:

- **World Happiness Report:** This dataset contains the world happiness report for 146 countries each year from 2008\-2020.
  - [https://www.kaggle.com/datasets/ajaypalsinghlo/world\-happiness\-report\-2021?select=world\-happiness\-report.csv](https://www.kaggle.com/datasets/ajaypalsinghlo/world-happiness-report-2021?select=world-happiness-report.csv)
- **2021 Happiness Index:** This dataset contains the world happiness report for 146 countries for the year 2021.
  - [[[https://www.kaggle.com/datasets/ajaypalsinghlo/world\-happiness\-report\-2021?select=world\-happiness\-report\-2021.csv](https://www.kaggle.com/datasets/ajaypalsinghlo/world-happiness-report-2021?select=world-happiness-report-2021.csv)](https://www.kaggle.com/datasets/ajaypalsinghlo/world-happiness-report-2021?select=world-happiness-report.csv)](https://www.kaggle.com/datasets/ajaypalsinghlo/world-happiness-report-2021)
- **2020 Populations:** This dataset contains the population for 146 countries in the year 2020.
  - [https://www.kaggle.com/datasets/tanuprabhu/population\-by\-country\-2020](https://www.kaggle.com/datasets/tanuprabhu/population-by-country-2020)

We merged them together for our convenience, which resulted in a total of twelve columns used in our analysis, along with 146 different countries studied. 



 <h2><a id="packages">Importing required packages</a></h2>



In [5]:
import pandas as pd # data manipulation
import numpy as np # Mathemetical operations

#statistics
import statsmodels

# visuals
import plotly.express as px 
import plotly.graph_objects as go
import geopandas
import matplotlib.pyplot as plt
import plotly.figure_factory as ff

import pycountry
from IPython.display import Image


In [6]:
#function required in the notebook

def alpha3code(column):
    CODE=[]
    for country in column:
        try:
            code=pycountry.countries.get(name=country).alpha_3
            CODE.append(code)
        except:
            CODE.append('None')
    return CODE

<h2><a id="Top">Importing the Dataset</a></h2>



In [7]:
happy_2021_df = pd.read_csv('data/world-happiness-report-2021.csv')

In [8]:
happy_2021_df.head()

Unnamed: 0,Country name,Regional indicator,Ladder score,Standard error of ladder score,upperwhisker,lowerwhisker,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Ladder score in Dystopia,Explained by: Log GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption,Dystopia + residual
0,Finland,Western Europe,7.842,0.032,7.904,7.78,10.775,0.954,72.0,0.949,-0.098,0.186,2.43,1.446,1.106,0.741,0.691,0.124,0.481,3.253
1,Denmark,Western Europe,7.62,0.035,7.687,7.552,10.933,0.954,72.7,0.946,0.03,0.179,2.43,1.502,1.108,0.763,0.686,0.208,0.485,2.868
2,Switzerland,Western Europe,7.571,0.036,7.643,7.5,11.117,0.942,74.4,0.919,0.025,0.292,2.43,1.566,1.079,0.816,0.653,0.204,0.413,2.839
3,Iceland,Western Europe,7.554,0.059,7.67,7.438,10.878,0.983,73.0,0.955,0.16,0.673,2.43,1.482,1.172,0.772,0.698,0.293,0.17,2.967
4,Netherlands,Western Europe,7.464,0.027,7.518,7.41,10.932,0.942,72.4,0.913,0.175,0.338,2.43,1.501,1.079,0.753,0.647,0.302,0.384,2.798


<h2><a id="Top">Data Cleaning</a></h2>

- We converted 10 unique regional indicators into 6 continents, because dividing regions too finely is unnecessary

- We dropped extra columns that contained “Explained by” information.

- We renamed several columns so that the column names are more clear to read, such as “happy score” replaced “ladder score”, “continents” replaced “regional indicator”

- We added missing information in column “Continents”, by creating “Oceania” and classified “Australia” and “New Zealand”.



<h5>
    <a>Drop Columns</a>
</h5> 

In [9]:
columns_to_drop = ['Ladder score in Dystopia',
       'Explained by: Log GDP per capita', 'Explained by: Social support',
       'Explained by: Healthy life expectancy',
       'Explained by: Freedom to make life choices',
       'Explained by: Generosity', 'Explained by: Perceptions of corruption',
       'Dystopia + residual','Standard error of ladder score', 'upperwhisker', 'lowerwhisker']

happy_2021_df.drop(columns_to_drop, axis = 1, inplace = True)

In [10]:
happy_2021_df.rename(columns = {'Ladder score': 'happy_score'}, inplace = True)

<h5>
    <a>Change the unique regions to continents</a>
</h5> 

In [11]:
happy_2021_df['Regional indicator'].mask(happy_2021_df['Regional indicator'] == 'Western Europe','Europe', inplace = True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Regional indicator'] == 'Central and Eastern Europe','Europe', inplace = True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Regional indicator'] == 'North America and ANZ','North America', inplace = True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Regional indicator'] == 'Latin America and Caribbean','South America', inplace = True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Regional indicator'] == 'Southeast Asia','Asia', inplace = True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Regional indicator'] == 'East Asia','Asia', inplace = True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Regional indicator'] == 'South Asia','Asia', inplace = True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Regional indicator'] == 'Sub-Saharan Africa','Africa', inplace = True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Regional indicator'] == 'Middle East and North Africa','Africa', inplace = True)

<h5>
    <a>Assigning some countries to their continents</a>
</h5> 

In [12]:
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'Russia', 'Europe',inplace =True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'Belarus', 'Europe',inplace =True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'Azerbaijan', 'Asia',inplace =True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'Uzbekistan', 'Asia',inplace =True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'Kazakhstan', 'Asia',inplace =True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'Kyrgyzstan', 'Asia',inplace =True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'Tajikistan', 'Asia',inplace =True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'Ukraine', 'Europe',inplace =True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'Moldova', 'Europe',inplace =True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'Turkmenistan', 'Asia',inplace =True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'Armenia', 'Asia',inplace =True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'Georgia', 'Europe',inplace =True)

In [13]:
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'Australia', 'Oceania',inplace =True)
happy_2021_df['Regional indicator'].mask(happy_2021_df['Country name'] == 'New Zealand', 'Oceania',inplace =True)

In [14]:
happy_2021_df.rename(columns = {'Regional indicator': 'Continents'}, inplace = True)

In [15]:
# create a column for code 
happy_2021_df_copy = happy_2021_df.copy()
happy_2021_df_copy['CODE']=alpha3code(happy_2021_df_copy['Country name'])

<h4><a>Data Columns</a></h4>

- **Country name:** This conveys the name of each of the countries in the data frame
- **Continents:** The six continents over the world
- **Happy Score:** The happiness index of the country
- **Logged GDP per capita:** GDP per capita of the country
- **Social support:** Social support index of the country
- **Healthy life expectancy:** This is a decimal value that calculates the amount of social support a person receives in a country
- **Freedom to make life choices:** Freedom index of the country.
- **Generosity**: Generosity index of the countries
- **Perceptions of corruption** : The index to measure the perception of corruoption in a country.



###### ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
<h1><a>Analysis</a></h1>

###### ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

<h2> <a id = "World Map"> World Happiness Index</a></h2> 

The following visual aims to show the difference in happiness index by country, which is indicated by the colors on the map. For example, darker colors indicate a lower happiness score, while lighter colors show a higher happiness score. Through the interactive world map, one can hover over and scroll through countries to understand their happiness index.



In [16]:
fig=px.choropleth(happy_2021_df_copy, locations="CODE", color="happy_score",
                 hover_name="Country name",color_continuous_scale=px.colors.sequential.Plasma,projection="natural earth")

fig.update_layout(title="World Happiness Index",
                 font_family="San Serif",
                titlefont={"size":25})


fig.show()

<h2> <a id = "heatmap"> Heatmap</a> </h2>

- By using plotly and data visualization skills through the pandas library we have created a heatmap using the 2021 Happiness Index dataset. The goal of the following heatmap is to convey which columns (different factors) have a stronger correlation to “happy_score” in our data set.
- Similar to the world map, darker colors show a lower (negative) correlation, while the lighter colors show a higher (positive) correlation.
- As seen in the heatmap, the **“Logged GDP per Capita”** has the highest (positive) correlation to the happiness score, this tells us that as GDP per Capita increases so does the Happiness score. The column, **Perception of Corruption**, has the lowest (negative) correlation. With the negative value associated, we cannot use this column to make a fair judgement about the happiness index.



In [17]:
df_corr = happy_2021_df.corr()

x = list(df_corr.columns)
y = list(df_corr.index)
z = np.array(df_corr)

fig = ff.create_annotated_heatmap(
    z,
    x = x,
    y = y ,
    annotation_text = np.around(z, decimals=2),
    hoverinfo='z',
    colorscale='sunset'
    )

fig.update_layout(
    autosize=False,
    width=700,
    height=600,
   )
fig.show()


<h2> <a id = "GDP"> GDP and Happiness Score Variation by Continents in the Year 2021</a> </h2>

- In order to analyze which factors have an impact on the happiness index and answer our research question, we developed a scatter plot to visualize the trend/correlation of GDP and the Happy Score. 
- We plotted the GDP on the x-axis and the Happy Score on the y-axis. To further analyze this data, we assigned the Continents column to the color attribute and assigned “Freedom to make choices” to the size. 
- With GDP measuring the monetary value of goods and services of countries, we can see how countries like **Afghanistan** appear lower on the plot due to recent terrorist activities taking place in the country which restricts freedom and decreases happiness. 



In [18]:
px.scatter(happy_2021_df, 'Logged GDP per capita', 'happy_score', color = 'Continents', size = 'Freedom to make life choices',hover_data=['Country name'], title = "GDP and Happiness Score Variation by Continents", trendline = 'ols', trendline_scope="overall", template = 'plotly')

<h2> <a id = "avgcont"> Average Happiness Score of Continents</a> </h2>

- In order to analyze which continents have the average highest and lowest happiness score we chose to plot the following bar plot. In order to calculate the average happiness score for each continent, we made use of the “groupby” method which helped us to group by continent and then performed the mean. 

- Discluding Oceania, due to only 2 countries being present there, North America is seen to have the highest average happiness score, while Africa has the lowest happiness score. 

- A number of things could explain the following graph. With North America being home to many advancing countries, with access to the a number of resources and having a higher GDP and life expectancy is bound to have a higher happiness score. Unfortunately, countries in Africa lack the availabilty of modern resources and have a lower GDP, which does significantly impact their happiness score as well, therefore explaining their position on the graph.



In [19]:
continent_avgscore = happy_2021_df.groupby('Continents')[['happy_score']].mean().reset_index()

In [20]:
continent_avgscore.sort_values(by=['happy_score'], ascending=False, inplace=True)

In [43]:
fig = px.bar(continent_avgscore, x='Continents', y='happy_score', title = "Average Happiness Score of Continents", template = "plotly")
fig.update_layout(
    autosize=False,
    width=1200,
    height=400,)
fig.show()

<h2><a>Which country had the most happy Index?</a></h2>



In [23]:
happy_2021_df[happy_2021_df['happy_score'] == happy_2021_df['happy_score'].max()]

Unnamed: 0,Country name,Continents,happy_score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
0,Finland,Europe,7.842,10.775,0.954,72.0,0.949,-0.098,0.186


<h2><a>Which country had the min happy Index?</a></h2>


In [24]:
happy_2021_df[happy_2021_df['happy_score'] == happy_2021_df['happy_score'].min()]

Unnamed: 0,Country name,Continents,happy_score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
148,Afghanistan,Asia,2.523,7.695,0.463,52.493,0.382,-0.102,0.924


<h2> <a id = "ss"> Relationship of Social Support and Happiness Index</a> </h2>

- We can see social support has a **positive correlation** with the happiness score. Each dot represents one country with a corresponding color to the continent they are located at, with the size of the dot representing the perception of corruption. 
- Most African countries are clustered near the bottom of the date with a low happy score and low social support while most South American and European countries have a higher happy and social support score. 
- On the other hand, countries in Asia are scattered throughout the graph. Countries such as Finland, Denmark, and Norway could have **higher social support due the smaller nature, population size, and wealth**.



In [25]:
px.scatter(happy_2021_df, x = "Social support", y = "happy_score", hover_data = ["Country name"],color = 'Continents', size = 'Perceptions of corruption',trendline = "ols", trendline_scope = "overall", title="Relationship of Social Support and Happiness Index", template = 'plotly')

<h2> <a id = "life_expectancy"> How Happiness Affects Life Expectancy</a> </h2>

- This scatterplot shows a **positive correlation** between the healthy life expectancy and happiness score for most of the countries. 
- Lower life expectancy in most **African countries** can be a result of lack of access to healthcare/welfare and the HIV/AID epidemic.
- High life expectancies in **European countries** can be attributed to the larger amount of money being put into healthcare and the focus/access for non-processed foods in these countries diets.



In [26]:
px.scatter(happy_2021_df, x = 'Healthy life expectancy', y = 'happy_score', hover_data = ["Country name"], title = "How Happiness Affects Life Expectancy", size= "Social support", color = 'Continents', trendline = "ols", trendline_scope = "overall", template = 'plotly')

<h2> <a id = "Box Plot Continents">Box Plot of happiness scores for each continent </a> </h2>

- We used a box plot to show the relationship between the happy score of each continent. The **median of each continent's happiness is represented by the line in the center of each box**. The sections of the boxes and the lines protruding represent the range of the quartiles. The dots are outlier countries with extreme happiness scores, unlike the other countries in their respective continent. 
- Asia and South America both have outliers extending from the bottom of their blot plot, while Africa has four outliers extending from the top of its box plot.
- Africa has the lowest median happiness score
- Asia contains the lowest happiness score
- Europe contains the highest happiness score
- Oceania and North America have a high happiness score, but with only 3 countries in each section, this can lead to skewed data when comparing with other continents.



In [27]:
px.box(happy_2021_df, x="Continents", y="happy_score", color = 'Continents')

<h2> <a id = "Box Plot Continents">The Relationship between Freedom to make life choices and the Perception of Corruption in each country</a> </h2>

- Freedom of choice and the perceived corruption do not have a clear linear trend line. 

- The majority of countries have a high perception of corruption ranging around 0.95 to 0.6 regardless of freedom to make life choices. 

- Oceania countries along with European countries with a high freedom to make life choices score have a much lower rate of perceived corruption which can be attributed to the lower population

- **Afghanistan** is an outlier with extreme levels of both no freedom to choices and high corruption. Low levels of freedom of life choices can be attributed to the Taliban, an extremist political group that has taken control of the government. 

- On the opposite end, **Singapore** is the country with the highest freedom of choice and lowest levels of perceived corruption. Singapore is an independent state and one of  the richest countries which can be a major factor in their position on the graph.



In [28]:
px.scatter(happy_2021_df, x = 'Freedom to make life choices', y = 'Perceptions of corruption', hover_data = ["Country name"], title = "", size= "happy_score", color = 'Continents', template = 'plotly')

###### ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
<h2><a>Importing Population Dataframe</a></h2>

###### ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

In [29]:
population_df = pd.read_csv('data/population_by_country_2020.csv')

In [30]:
population_df.head()

Unnamed: 0,Country (or dependency),Population (2020),Yearly Change,Net Change,Density (P/Km²),Land Area (Km²),Migrants (net),Fert. Rate,Med. Age,Urban Pop %,World Share
0,China,1440297825,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
1,India,1382345085,0.99 %,13586631,464,2973190,-532687.0,2.2,28,35 %,17.70 %
2,United States,331341050,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
3,Indonesia,274021604,1.07 %,2898047,151,1811570,-98955.0,2.3,30,56 %,3.51 %
4,Pakistan,221612785,2.00 %,4327022,287,770880,-233379.0,3.6,23,35 %,2.83 %


In [31]:
population_df.columns

Index(['Country (or dependency)', 'Population (2020)', 'Yearly Change',
       'Net Change', 'Density (P/Km²)', 'Land Area (Km²)', 'Migrants (net)',
       'Fert. Rate', 'Med. Age', 'Urban Pop %', 'World Share'],
      dtype='object')

In [32]:
population_df.rename(columns = {'Country (or dependency)': 'Country name'}, inplace = True)

In [33]:
population = population_df[['Population (2020)','Country name']]

In [34]:
happy_2021_df = pd.merge(happy_2021_df, population, on='Country name')

In [35]:
happy_2021_df.head(3)

Unnamed: 0,Country name,Continents,happy_score,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Population (2020)
0,Finland,Europe,7.842,10.775,0.954,72.0,0.949,-0.098,0.186,5542237
1,Denmark,Europe,7.62,10.933,0.954,72.7,0.946,0.03,0.179,5795780
2,Switzerland,Europe,7.571,11.117,0.942,74.4,0.919,0.025,0.292,8665615


<h2> <a id = "pop/happy"> Does greater population indicate a change in the happiness score?</a> </h2>

- To study whether a greater population indicates a change in a country’s happiness score, we created a scatterplot with the 2020 population on the x-axis and the happiness score on the y-axis.
- This scatterplot shows **no clear relationship** between the population of a country and its happiness score. Therefore, we can conclude that there is no effect on the happiness score from the population of the country, and vice versa.
- A reason for this may be because a person’s happiness is not dictated by the population of their country, but instead by the quality of the people.
- An example of this can be seen by **Afghanistan and Iraq**. They both have a population of almost 40 million people, but Afghanistan’s happiness score is at 2.5, while Iraq has almost double, at 4.8.



In [38]:
fig = px.scatter(happy_2021_df, x = 'Population (2020)', y = 'happy_score', log_x=True,hover_data = ["Country name"], title = "Relationship of population and happiness score", color = 'Continents', template = 'plotly')
fig.update_traces(marker=dict(size=12,
                              line=dict(width=0,
                                        )),
                  selector=dict(mode='markers'))
fig.show()


<h2> <a id = "pop/corr/happy"> Population, Perception of Corruption, and Happiness Score</a> </h2>

- To study whether the population and the perception of corruption of a country have any relation with the happiness level, we created a scatterplot with the 2020 population on the x-axis and the perception of corruption on the y-axis, with the size of the dot indicating the happiness level.
- There seems to be no clear positive or negative correlation from the scatterplot, but there is a fairly clustered horizontal line centered at the 0.8 perception of corruption. This shows that the majority of countries have a **fairly similar perception of corruption**, no matter their population.
- However, many of the countries with a lower perception of corruption have a population of less than 100 million people. A reason for this may be that, with less people, it is easier to keep a lower perception of corruption.
- We can also see the happiness index of the country based on the size of the dot, which seems to have **no effect** on whether the country has a low or high perception of corruption. This can be seen with **Singapore and Turkmenistan**, where both countries have a population of about 6 million people. However, Singapore has the lowest perception of corruption, at 0.08, while Turkmenistan has a perception of corruption of almost 0.9. Both countries have a population of about 6 million people, and a happiness score from 5-6.



In [39]:
px.scatter(happy_2021_df, x =  "Population (2020)" , y = "Perceptions of corruption",hover_data = ["Country name"],size = "happy_score", log_x = True, color = "Continents", title = "Varitation in Corruption and Happiness Index with Population (2020)")

<h2> <a id = "pop/life/happy"> Population, Healthy Life Expectancy, and Happiness Score</a> </h2>

- To study whether the population and the healthy life expectancy of a country have any relation with the happiness level, we created a scatterplot with the population on the x-axis, the healthy life expectancy on the y-axis, and the happiness level denoted by the size of the dot.
- There is **no clear relationship** between the population or healthy life expectancy shown in the plot.
- However, there seems to be a **decrease** in happiness level as the healthy life expectancy **also decreases**. This can be seen through countries like **Chad and the Netherlands**. Both have a population of about 17 million people, but Chad’s healthy life expectancy is only 48 years, while in the Netherlands it is 72 years. Also, Chad has a happiness score of about 4.4 while the Netherlands has a happiness score of about 7.5.



In [44]:
px.scatter(happy_2021_df, x = 'Population (2020)', y = 'Healthy life expectancy',  color = 'Continents', size  = 'happy_score', hover_data = ["Country name"], log_x=True)

<h2> <a id = "gdp/life/happy"> GDP Per Capita, Healthy Life Expectancy, and Happiness Score</a> </h2>

- To study whether the healthy life expectancy, GDP per capita, and happiness level have any relationship, we created a 3-D plot. This plot has the healthy life expectancy on the x-axis, the GDP per capita on the y-axis, and the happiness score on the z-axis.
- There seems to be **positive relationship** between these factors, with increaase in GDP and Life expectency we see increase in happiness Index. This can be seen by **Finland and Japan**. Both countries have a GDP per capita of about 10.75. However, Japan has a happiness score of 5.9 while Finland’s is about 7.8. The healthy life expectancy in Japan is 75 years, while Finland’s is a bit lower, at 72 years.



In [45]:
px.scatter_3d(happy_2021_df, x =  "Healthy life expectancy" , y = "Logged GDP per capita", z='happy_score',
              color='Continents',hover_data = ["Country name"])

###### ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
<h2><a> Importing Happiness Dataset (2008-2019)</a></h2>

###### ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


In [46]:
happy_all_df = pd.read_csv('data/world-happiness-report.csv')

In [47]:
happy_all_df.columns

Index(['Country name', 'year', 'Life Ladder', 'Log GDP per capita',
       'Social support', 'Healthy life expectancy at birth',
       'Freedom to make life choices', 'Generosity',
       'Perceptions of corruption', 'Positive affect', 'Negative affect'],
      dtype='object')

In [48]:
continents = happy_2021_df[['Continents','Country name']]

In [49]:
happy_all_df.rename(columns={'Life Ladder': 'happy_score'}, inplace =True)

In [50]:
happy_all_df = pd.merge(happy_all_df, continents, on='Country name')

In [51]:
happy_all_df.head()

Unnamed: 0,Country name,year,happy_score,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect,Continents
0,Afghanistan,2008,3.724,7.37,0.451,50.8,0.718,0.168,0.882,0.518,0.258,Asia
1,Afghanistan,2009,4.402,7.54,0.552,51.2,0.679,0.19,0.85,0.584,0.237,Asia
2,Afghanistan,2010,4.758,7.647,0.539,51.6,0.6,0.121,0.707,0.618,0.275,Asia
3,Afghanistan,2011,3.832,7.62,0.521,51.92,0.496,0.162,0.731,0.611,0.267,Asia
4,Afghanistan,2012,3.783,7.705,0.521,52.24,0.531,0.236,0.776,0.71,0.268,Asia


<h2> <a id = "heatmap"> Happiness Score over the years </a> </h2>

- Worldwide, the happy score had a significant decline in 2006, it stabilized after that and started to pick up in 2019.

- The relationship between happiness score, GDP and years

> Economic growth and job growth both fell in 2006 from previous years as the residential housing boom came to an end, then _**the Great Recession**_ came after 2006 and caused a global economy crisis. As the economy gradually recover after 2009, the global happy score _**starts increase**_, then stablized starting from 2012 to 2018. Surprisingly, the global happy score _**didn't stop growing**_ during 2019 and 2020 due to Covid. It is true that global pandemic was a hard time for everyone, but suffering also motivated us to discover the good in life and pursue happiness.



In [52]:
year_score = happy_all_df.groupby('year')[['happy_score']].mean().reset_index()

In [53]:
fig = px.line(year_score,x = "year", y = "happy_score", text = 'happy_score')
fig.update_traces(textposition="bottom right")
fig.show()

<h2><a>Happiness over the year for continents</a></h2>

- This line graph shows the happy score in different continents with respect to years, all continents' happy score have different trends except a **common decrease** during the Great Recession from 2006 to 2009. In 2019, most continents' happy score started increase, only **South America** had a **significant decline** in happy score.



In [54]:
cont_year_score = happy_all_df.groupby(['Continents','year'])[['happy_score']].mean().reset_index()

In [55]:
fig = px.line(cont_year_score,x = "year", y = "happy_score" ,color =  'Continents')
fig.show()

<h2>
<a>
Top 2 countries from each continents
    </a>

</h2>



In [56]:
India = happy_all_df[happy_all_df["Country name"] == "India"]
China = happy_all_df[happy_all_df["Country name"] == "China"]
USA = happy_all_df[happy_all_df["Country name"] == "United States"]
Canada = happy_all_df[happy_all_df["Country name"] == "Canada"]
Israel = happy_all_df[happy_all_df["Country name"] == "Israel"]
Australia = happy_all_df[happy_all_df["Country name"] == "Australia"]
NZ = happy_all_df[happy_all_df["Country name"] == "New Zealand"]
Finland = happy_all_df[happy_all_df["Country name"] == "Finland"]
Iceland = happy_all_df[happy_all_df["Country name"] == "Iceland"]
Uruguay = happy_all_df[happy_all_df["Country name"] == "Uruguay"]
Saudi_Arabia = happy_all_df[happy_all_df["Country name"] == "Saudi Arabia"]
Chile = happy_all_df[happy_all_df["Country name"] == "Chile"]

In [57]:
top_10_final = pd.concat([India,China, USA, Canada, Israel, Australia, NZ, Finland, Iceland, Uruguay, Saudi_Arabia, Chile])

In [58]:
top_10_final.head()

Unnamed: 0,Country name,year,happy_score,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect,Continents
671,India,2006,5.348,8.145,0.707,55.72,0.774,,0.855,0.687,0.199,Asia
672,India,2007,5.027,8.204,0.569,56.14,0.729,-0.051,0.862,0.668,0.253,Asia
673,India,2008,5.146,8.22,0.684,56.56,0.756,-0.072,0.891,0.674,0.259,Asia
674,India,2009,4.522,8.281,0.653,56.98,0.679,-0.026,0.895,0.771,0.301,Asia
675,India,2010,4.989,8.349,0.605,57.4,0.783,0.058,0.863,0.697,0.267,Asia


<h2><a>Happyscore over the year top 2 countries from each continent</a></h2>

- This line graph demonstrates the trend of happy scores in different countries with respect to years. 

- Each line’s color represents the country, the dots along each line correspond to the year on y\-axis and the country’s happy score in that year. All countries’ trends in happiness score are different except in the year of 2019 \(When Covid breakout\), most top countries' happiness score **decreased** or **didn’t increase a lot.**

- It is different than the general trend \(global trend and continental trend\), because most of these top countries were **impact by global pandemic significantlly.**



In [59]:
fig = px.line(top_10_final,x = "year", y = "happy_score",color = 'Country name')
fig.update_traces(textposition="bottom right")
fig.show()

<h2><a>GDP over the years in 12 countries</a></h2>

- This line graph shows the trend of GDP in 12 countries with respect to years. Besides the year and GDP provided on x and y axis, the happy scores of the country in each year were also labeled on every data point. 

- Through the graph, we can also find the relationship between the happy score and GDP. For instance, there’s a **positive correlation between China’s happy score and GDP**, but there’s a **negative correlation between India’s happy score and GDP.**



In [60]:
figs= px.line(top_10_final,x = "year", y = "Log GDP per capita", text= "happy_score",color = 'Country name')
figs.update_traces(textposition="bottom right")
figs.show()

###### ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
<h2><a id="Conclusion">Conclusion</a></h2>

###### ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

- GDP has a significant affect on the happiness of the country. In the heatmap we see that the “Logged GDP per Capita” has the highest correlation/relevance in calculating the happiness score. As seen in our graphs as well, GDP and happiness are closely related. Freedom also does have an impact on the happiness of a country, however, not is not as strongly correlated to happiness as GDP, but still does have a positive correlation. Corruption does not have an impact on the happiness of a country, as seen in the heatmap. 

- Life expectancy is significantly correlated to happiness in each country. With low life expectancy people are dying more frequently which can result in the trauma of losing someone and grieving. This can be observed in the Life Expectancy and Happiness scatter plot. Life expectancy can be influenced by many factors such as healthcare, disease, diet, pollution, and more. In most African Continents their low life expectancy can be attributed to lack of access to healthcare, while in most European countries more money is put into healthcare which is a major factor resulting their high score.

- Generally, continents with more **developed countries** would have **significantlly higher happy scores**. For instance, **North America, Europe and Oceania**. Because countries in those continents have higher GDP, social support and more freedom of life choices. The other important factor is **population**, the two **most populated countries** in the world \- China and India, both located in Asia and **have relatively low happiness scores**, because a massive population means heavy load on social welfare and quality of life, also creates a fiercely competitive social environment. As an opposite example, the happiest country in the world \- Finland, only has 5 million population, and located in Europe.

- Population **does not affect** happiness levels. We see in the scatterplot “Does greater population indicate a change in the happiness score?” that there is no correlation between the two variables whatsoever. However, the happiness score seems to **change dramatically** as the population changes over time. In the line plot “Happiness Score over the years” we can see that the happiness score starts quite high in 2005, at around 6.5, then drops dramatically in 2006 to 5.2. The happiness score stays quite constant at 5.5 until 2019, when it begins to increase again.

- So, Yes, according this data, we can say that the factors we analyzed above can affect the Happiness.



<h4><a id="nextsteps">Next Steps</a></h4>



- For further analysis, we can include more index if available and add more information.

- Some countries were not present in the dataset like some african and north korea.

- Add some direct analysis scope like crime, agriculture, transportation and disease, etc.

###### ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

In [61]:
Image(url="minions.png", width=1200, height=600)