## Data from World Happiness Report

The World Happiness Report is an annual publication of the United Nations Sustainable Development Solutions Network. It contains articles, and rankings of national happiness based on respondent ratings of their own lives, which the report also correlates with various life factors.

In this notebook we will explore the happiness of different countries and the features associated.
The datasets that we will use are available in *Data*: **happiness2020.pkl** and **countries_info.csv**.

Although the features are self-explanatory, here a summary: 

**happiness2020.pkl**
* country: *Name of the country*
* happiness_score: *Happiness score*
* social_support: *Social support (mitigation the effects of inequality)*
* healthy_life_expectancy: *Healthy Life Expectancy*
* freedom_of_choices: *Freedom to make life choices*
* generosity: *Generosity (charity, volunteers)*
* perception_of_corruption: *Corruption Perception*
* world_region: *Area of the world of the country*

**countries_info.csv**
* country_name: *Name of the country*
* area: *Area in sq mi*
* population: *Number of people*
* literacy: *Literacy percentage*

In [1]:

!head Data / countries_info.csv

country_name,area,population,literacy
afghanistan,647500,31056997,"36,0"
albania,28748,3581655,"86,5"
algeria,2381740,32930091,"70,0"
argentina,2766890,39921833,"97,1"
armenia,29800,2976372,"98,6"
australia,7686850,20264082,"100,0"
austria,83870,8192880,"98,0"
azerbaijan,86600,7961619,"97,0"
bahrain,665,698585,"89,1"


In [2]:
import pandas as pd
% matplotlib inline

DATA_FOLDER = 'Data/'

HAPPINESS_DATASET = DATA_FOLDER + "happiness2020.csv"
COUNTRIES_DATASET = DATA_FOLDER + "countries_info.csv"

## Task 1: Load the data

Load the 2 datasets in Pandas dataframes (called *happiness* and *countries*), and show the first rows.


**Hint**: Use the correct reader and verify the data has the expected format.

In [17]:
happiness = pd.read_csv(HAPPINESS_DATASET)
print(happiness.dtypes)
print(happiness.describe())
happiness

country                      object
happiness_score             float64
social_support              float64
healthy_life_expectancy     float64
freedom_of_choices          float64
generosity                  float64
perception_of_corruption    float64
world_region                 object
dtype: object
       happiness_score  social_support  healthy_life_expectancy  \
count       135.000000      135.000000               135.000000   
mean          5.525062        0.815165                64.762495   
std           1.123414        0.116311                 6.694776   
min           2.566900        0.468671                48.003624   
25%           4.749000        0.740405                59.809444   
50%           5.541500        0.836419                66.480164   
75%           6.292700        0.910313                69.145870   
max           7.808700        0.974670                76.804581   

       freedom_of_choices  generosity  perception_of_corruption  
count          135.000000  1

Unnamed: 0,country,happiness_score,social_support,healthy_life_expectancy,freedom_of_choices,generosity,perception_of_corruption,world_region
0,Afghanistan,2.5669,0.470367,52.590000,0.396573,-0.096429,0.933687,South Asia
1,Albania,4.8827,0.671070,68.708138,0.781994,-0.042309,0.896304,Central and Eastern Europe
2,Algeria,5.0051,0.803385,65.905174,0.466611,-0.121105,0.735485,Middle East and North Africa
3,Argentina,5.9747,0.900568,68.803802,0.831132,-0.194914,0.842010,Latin America and Caribbean
4,Armenia,4.6768,0.757479,66.750656,0.712018,-0.138780,0.773545,Commonwealth of Independent States
...,...,...,...,...,...,...,...,...
130,Venezuela,5.0532,0.890408,66.505341,0.623278,-0.169091,0.837038,Latin America and Caribbean
131,Vietnam,5.3535,0.849987,67.952736,0.939593,-0.094533,0.796421,Southeast Asia
132,Yemen,3.5274,0.817981,56.727283,0.599920,-0.157735,0.800288,Middle East and North Africa
133,Zambia,3.7594,0.698824,55.299377,0.806500,0.078037,0.801290,Sub-Saharan Africa


In [28]:


countries = pd.read_csv(COUNTRIES_DATASET)
countries.literacy = pd.to_numeric(countries.literacy.str.replace(",", "."), downcast="float")
print(countries.dtypes)
print(countries.describe())
countries

country_name     object
area              int64
population        int64
literacy        float32
dtype: object
               area    population    literacy
count  1.350000e+02  1.350000e+02  133.000000
mean   9.007829e+05  4.552204e+07   81.851143
std    2.244994e+06  1.505270e+08   20.514482
min    3.160000e+02  2.993880e+05   17.600000
25%    6.540500e+04  4.636146e+06   70.000000
50%    2.375000e+05  1.023546e+07   90.900002
75%    7.000570e+05  2.967980e+07   98.400002
max    1.707520e+07  1.313974e+09  100.000000


Unnamed: 0,country_name,area,population,literacy
0,afghanistan,647500,31056997,36.000000
1,albania,28748,3581655,86.500000
2,algeria,2381740,32930091,70.000000
3,argentina,2766890,39921833,97.099998
4,armenia,29800,2976372,98.599998
...,...,...,...,...
130,venezuela,912050,25730435,93.400002
131,vietnam,329560,84402966,90.300003
132,yemen,527970,21456188,50.200001
133,zambia,752614,11502010,80.599998


## Task 2: Let's merge the data

Create a dataframe called *country_features* by merging *happiness* and *countries*. A row of this dataframe must describe all the features that we have about a country.

**Hint**: Verify to have all the rows in the final dataframe

In [43]:
happiness.country = happiness.country.str.lower()
merged = pd.merge(happiness, countries, left_on="country", right_on="country_name")
assert len(merged) == len(happiness)
assert len(merged) == len(countries)
merged

Unnamed: 0,country,happiness_score,social_support,healthy_life_expectancy,freedom_of_choices,generosity,perception_of_corruption,world_region,country_name,area,population,literacy
0,afghanistan,2.5669,0.470367,52.590000,0.396573,-0.096429,0.933687,South Asia,afghanistan,647500,31056997,36.000000
1,albania,4.8827,0.671070,68.708138,0.781994,-0.042309,0.896304,Central and Eastern Europe,albania,28748,3581655,86.500000
2,algeria,5.0051,0.803385,65.905174,0.466611,-0.121105,0.735485,Middle East and North Africa,algeria,2381740,32930091,70.000000
3,argentina,5.9747,0.900568,68.803802,0.831132,-0.194914,0.842010,Latin America and Caribbean,argentina,2766890,39921833,97.099998
4,armenia,4.6768,0.757479,66.750656,0.712018,-0.138780,0.773545,Commonwealth of Independent States,armenia,29800,2976372,98.599998
...,...,...,...,...,...,...,...,...,...,...,...,...
130,venezuela,5.0532,0.890408,66.505341,0.623278,-0.169091,0.837038,Latin America and Caribbean,venezuela,912050,25730435,93.400002
131,vietnam,5.3535,0.849987,67.952736,0.939593,-0.094533,0.796421,Southeast Asia,vietnam,329560,84402966,90.300003
132,yemen,3.5274,0.817981,56.727283,0.599920,-0.157735,0.800288,Middle East and North Africa,yemen,527970,21456188,50.200001
133,zambia,3.7594,0.698824,55.299377,0.806500,0.078037,0.801290,Sub-Saharan Africa,zambia,752614,11502010,80.599998


## Task 3: Where do people are happier?

Print the top 10 countries based on their happiness score (high is better).

In [45]:
merged.sort_values(["happiness_score"], ascending=False).head(10)

Unnamed: 0,country,happiness_score,social_support,healthy_life_expectancy,freedom_of_choices,generosity,perception_of_corruption,world_region,country_name,area,population,literacy
38,finland,7.8087,0.95433,71.900825,0.949172,-0.059482,0.195445,Western Europe,finland,338145,5231372,100.0
31,denmark,7.6456,0.955991,72.402504,0.951444,0.066202,0.168489,Western Europe,denmark,43094,5450661,100.0
115,switzerland,7.5599,0.942847,74.102448,0.921337,0.105911,0.303728,Western Europe,switzerland,41290,7523934,99.0
50,iceland,7.5045,0.97467,73.0,0.948892,0.246944,0.71171,Western Europe,iceland,103000,299388,99.900002
92,norway,7.488,0.952487,73.200783,0.95575,0.134533,0.263218,Western Europe,norway,323802,4610820,100.0
87,netherlands,7.4489,0.939139,72.300919,0.908548,0.207612,0.364717,Western Europe,netherlands,41526,16491461,99.0
114,sweden,7.3535,0.926311,72.600769,0.939144,0.111615,0.25088,Western Europe,sweden,449964,9016596,99.0
88,new zealand,7.2996,0.949119,73.202629,0.936217,0.191598,0.221139,North America and ANZ,new zealand,268680,4076140,99.0
6,austria,7.2942,0.928046,73.002502,0.899989,0.085429,0.499955,Western Europe,austria,83870,8192880,98.0
72,luxembourg,7.2375,0.906912,72.599998,0.905636,-0.004621,0.367084,Western Europe,luxembourg,2586,474413,100.0


We are interested to know in what world region the people are happier. 

Create and print a dataframe with the (1) average happiness score and (2) the number of contries for each world region.
Sort the result to show the happiness ranking.

In [82]:
# region_happiness = merged.groupby(["world_region"]).happiness_score.mean().sort_values(ascending=False)
# region_happiness
merged.groupby(["world_region"])[["happiness_score", "social_support"]].agg(["mean", "count"]).sort_values(
    ("happiness_score", "mean"), ascending=False)

Unnamed: 0_level_0,happiness_score,happiness_score,social_support,social_support
Unnamed: 0_level_1,mean,count,mean,count
world_region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
North America and ANZ,7.173525,4,0.933842,4
Western Europe,6.967405,20,0.917773,20
Latin America and Caribbean,5.97128,20,0.853971,20
Central and Eastern Europe,5.891393,14,0.883027,14
Southeast Asia,5.517788,8,0.829054,8
East Asia,5.483633,3,0.87315,3
Commonwealth of Independent States,5.358342,12,0.856729,12
Middle East and North Africa,5.269306,16,0.794934,16
Sub-Saharan Africa,4.393856,32,0.694164,32
South Asia,4.355083,6,0.674968,6


The first region has only a few countries! What are them and what is their score?

In [63]:
first = region_happiness.index[0]
merged.filter(lambda x: x.world_region == first)

TypeError: 'function' object is not iterable

## Task 4: How literate is the world?

Print the name of countries with a level of literacy of 100%. 

For each country, print the name and the world region with the format: *{region name} - {country name} ({happiness score})*

In [None]:
# Write your code here

What is the global average?

In [None]:
# Write your code here

Calculate the proportion of countries with a literacy level below 50%. Print the value in percentage, formatted with 2 decimals.

In [None]:
# Write your code here

Print the raw number and the percentage of world population that is illiterate.

In [None]:
# Write your code here

## Task 5: Population density

Add to the dataframe a new field called *population_density* computed by dividing *population* by *area*.

In [None]:
# Write your code here

What is the happiness score of the 3 countries with lowest population density?

In [None]:
# Write your code here

## Task 6: Healty and happy?

Plot in scatter plot the happiness score (x) and healty like expectancy (y).

In [None]:
# Write your code here

Feel free to continue the exploration of the dataset! We'll release the solutions next week.

----
Enjoy EPFL and be happy, next year Switzerland must be #1.