<a href="https://colab.research.google.com/github/njaincode/python_for_data_science/blob/main/How_happy_is_the_world.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How happy is the world?

![HappinessImage-Benjamin Scott](https://drive.google.com/uc?id=1wWdXTclLAjPpZUiKkR8XGg0Sl7iD1u8T)  

(Image by Benjamin Scott, [source](https://www.natureindex.com/news-blog/data-visualization-these-are-the-happiest-countries-world-happiness-report-twenty-nineteen))   

The Sustainable Development Solutions Network (SDSN) collects data across the world relating to happiness.  They use this data to rank countries in order of happiness factor.

This is not an exact science but can give food for thought in terms of what factors might have the most impact on a nation's happiness levels.

Data is taken from the Gallup World Poll, so not collected directly by SDSN.  
Countries are grouped by region.  

### The factors included are:
Economy (measured in GDP per Capita)	
Family (support systems) 	
Health (measured by Life Expectancy)	
Freedom (sense of)  	 
Trust (Government Corruption)	
Generosity (charitable inclinations)  	
Dystopia Residual 
*  Dystopic is the theoretical most unhappy country with the lowest levels in all six of the above factors  
*  The Residual measure is a calculated as the average of the six distances from lowest

Let's take a look at the data


---
### Open a data set

Open the data set, an Excel file with only one sheet (so sheet_name is not necessary) from here: https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true

Interrogate the data (head, tail, iloc) to get to know what it contains.


In [8]:
import pandas as pd

def happiness_stat():
  excel_url = 'https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true'
  df_hap_2015 = pd.read_excel(excel_url)
    
  # Set some display options when run
  pd.options.display.max_rows= 20
  pd.options.display.max_columns= 12
  
  # Show summary information - number of rows, columns and their data types
  #print(f'Happiness data summary \n {df_hap_2015.info()}')
  #print(df_hap_2015.info())

  # Describe the dataset - shows count/min/max/std/25% etc.
  #print(f'Happiness data description \n {df_hap_2015.describe()}')

  # Display head
  print(f'Happiness data head \n {df_hap_2015.head()}')

  # Display tail
  print(f'Happiness data tail \n {df_hap_2015.tail()}')

  happiness_stat()

Happiness data head 
        Country          Region  Happiness Rank  Happiness Score  \
0  Switzerland  Western Europe               1            7.587   
1      Iceland  Western Europe               2            7.561   
2      Denmark  Western Europe               3            7.527   
3       Norway  Western Europe               4            7.522   
4       Canada   North America               5            7.427   

   Standard Error  Economy (GDP per Capita)   Family  \
0         0.03411                   1.39651  1.34951   
1         0.04884                   1.30232  1.40223   
2         0.03328                   1.32548  1.36058   
3         0.03880                   1.45900  1.33095   
4         0.03553                   1.32629  1.32261   

   Health (Life Expectancy)  Freedom  Trust (Government Corruption)  \
0                   0.94143  0.66557                        0.41978   
1                   0.94784  0.62877                        0.14145   
2                   0.874

---
### Sort the data in different ways

The data is currently sorted in order of rank.  To sort the data in the table, run the code below, which identifies the column on which to sort in the brackets.

Then, **try sorting on other columns** *Note: you must type the column heading in the quotes and exactly as it appears in the table (including capitalisation)*.  To sort on multiple columns, enter a list of column headings in the brackets (e.g. `.sort_values(['Region','Freedom'])`



In [20]:
import pandas as pd

def happiness_stat_sort():
  excel_url = 'https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true'
  df_hap_2015 = pd.read_excel(excel_url)
    
  # Set some display options when run
  pd.options.display.max_rows= 20
  pd.options.display.max_columns= 12

  display_key = ["Country", "Happiness Rank", "Family"]
  sorted_family_table = df_hap_2015.sort_values(['Family'], ascending=False)
  print(f'sorted_family_table \n') 
  print(sorted_family_table[display_key])

happiness_stat_sort()

sorted_family_table 

                      Country  Happiness Rank   Family
1                     Iceland               2  1.40223
17                    Ireland              18  1.36948
2                     Denmark               3  1.36058
0                 Switzerland               1  1.34951
43                 Uzbekistan              44  1.34043
..                        ...             ...      ...
116                     India             117  0.38174
154                     Benin             155  0.35386
152               Afghanistan             153  0.30285
157                      Togo             158  0.13995
147  Central African Republic             148  0.00000

[158 rows x 3 columns]


---
### Summarising the data

Look at the happiness dataframe.  Create new dataframes from a range of rows, columns, statistical information, etc.

For each dataframe, add a text cell to explain what it is showing

In [21]:
# Region, Freedom

import pandas as pd

def happiness_stat_analysis():
  excel_url = 'https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true'
  df_hap_2015 = pd.read_excel(excel_url)
    
  # Set some display options when run
  pd.options.display.max_rows= 20
  pd.options.display.max_columns= 12

  # How to cumilate data across regions?
  display_key = ["Region", "Freedom"]
  sorted_family_table = df_hap_2015.sort_values(['Freedom'], ascending=False)
  print(f'sorted_family_table \n') 
  print(sorted_family_table[display_key])

happiness_stat_analysis()

sorted_family_table 

                              Region  Freedom
3                     Western Europe  0.66973
0                     Western Europe  0.66557
144                Southeastern Asia  0.66246
7                     Western Europe  0.65980
43        Central and Eastern Europe  0.65821
..                               ...      ...
136               Sub-Saharan Africa  0.10384
117               Sub-Saharan Africa  0.10081
95        Central and Eastern Europe  0.09245
101                   Western Europe  0.07699
111  Middle East and Northern Africa  0.00000

[158 rows x 2 columns]


---
### Next steps

There are data sets for the years 2015 to 2019 available.  To access and try out other years, change 2015 to the required year in the URL in the first code cell.  Leave the rest exactly as it is.  

Other years may have different column headings and so there will be different data to play with.