<a href="https://colab.research.google.com/github/nishah8/dataandpython/blob/main/Copy_of_How_happy_is_the_world.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How happy is the world?

![HappinessImage-Benjamin Scott](https://drive.google.com/uc?id=1wWdXTclLAjPpZUiKkR8XGg0Sl7iD1u8T)  

(Image by Benjamin Scott, [source](https://www.natureindex.com/news-blog/data-visualization-these-are-the-happiest-countries-world-happiness-report-twenty-nineteen))   

The Sustainable Development Solutions Network (SDSN) collects data across the world relating to happiness.  They use this data to rank countries in order of happiness factor.

This is not an exact science but can give food for thought in terms of what factors might have the most impact on a nation's happiness levels.

Data is taken from the Gallup World Poll, so not collected directly by SDSN.  
Countries are grouped by region.  

### The factors included are:
Economy (measured in GDP per Capita)
Family (support systems)
Health (measured by Life Expectancy)
Freedom (sense of)
Trust (Government Corruption)
Generosity (charitable inclinations)
Dystopia Residual
*  Dystopic is the theoretical most unhappy country with the lowest levels in all six of the above factors  
*  The Residual measure is a calculated as the average of the six distances from lowest

Let's take a look at the data


---
### Open a data set

Open the data set, an Excel file with only one sheet (so sheet_name is not necessary) from here: https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true

Interrogate the data (head, tail, iloc) to get to know what it contains.


In [None]:
import pandas as pd
df = pd.read_excel( 'https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2019.xlsx?raw=true')
print (df.head())
print (df.tail())
print (df.iloc[int((len(df)/2)-3):int((len(df)/2)+3)])


   Overall rank Country or region  Score  GDP per capita  Social support  \
0             1           Finland  7.769           1.340           1.587   
1             2           Denmark  7.600           1.383           1.573   
2             3            Norway  7.554           1.488           1.582   
3             4           Iceland  7.494           1.380           1.624   
4             5       Netherlands  7.488           1.396           1.522   

   Healthy life expectancy  Freedom to make life choices  Generosity  \
0                    0.986                         0.596       0.153   
1                    0.996                         0.592       0.252   
2                    1.028                         0.603       0.271   
3                    1.026                         0.591       0.354   
4                    0.999                         0.557       0.322   

   Perceptions of corruption  
0                      0.393  
1                      0.410  
2                

---
### Sort the data in different ways

The data is currently sorted in order of rank.  To sort the data in the table, run the code below, which identifies the column on which to sort in the brackets.

Then, **try sorting on other columns** *Note: you must type the column heading in the quotes and exactly as it appears in the table (including capitalisation)*.  To sort on multiple columns, enter a list of column headings in the brackets (e.g. `.sort_values(['Region','Freedom'])`



In [None]:
import pandas as pd
df = pd.read_excel( 'https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2019.xlsx?raw=true')
sorted_table = df.sort_values(by=['Healthy life expectancy'], ascending=False)
print (sorted_table)  # output the table below

sorted_table1 = df.sort_values(by=['Healthy life expectancy','GDP per capita'], ascending=False)
print (sorted_table1)


     Overall rank         Country or region  Score  GDP per capita  \
33             34                 Singapore  6.262           1.572   
75             76                 Hong Kong  5.430           1.438   
57             58                     Japan  5.886           1.327   
29             30                     Spain  6.354           1.286   
5               6               Switzerland  7.480           1.452   
..            ...                       ...    ...             ...   
98             99               Ivory Coast  4.944           0.569   
131           132                      Chad  4.350           0.350   
143           144                   Lesotho  3.802           0.489   
154           155  Central African Republic  3.083           0.026   
134           135                 Swaziland  4.212           0.811   

     Social support  Healthy life expectancy  Freedom to make life choices  \
33            1.463                    1.141                         0.556   
75 

---
### Summarising the data

Look at the happiness dataframe.  Create new dataframes from a range of rows, columns, statistical information, etc.

For each dataframe, add a text cell to explain what it is showing

In [6]:
import pandas as pd
df = pd.read_excel( 'https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2019.xlsx?raw=true')
print ('Information about Happiness Dataframe')
print()
df.info()
print()


print ('statistical information for some of the columns')
gen = df.agg(
    {
        "Freedom to make life choices": ["mean","max", "min",],
        "Generosity": ["mean","max", "min", "std"],
        "GDP per capita":["mean","min","max","std"],
    })
print(gen)
print()

print('dataframe showing information about gdp & social support')
W = df. iloc[:, [0,1,3,4]]
print(W)
print()

print ('countries filtered on basis of life expectancy over 1')
lex = df.loc[df['Healthy life expectancy'] >= 1]
print ('number of countries fitting the requirements are:', len(lex))
print(lex)
print()




Information about Happiness Dataframe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 156 entries, 0 to 155
Data columns (total 9 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Overall rank                  156 non-null    int64  
 1   Country or region             156 non-null    object 
 2   Score                         156 non-null    float64
 3   GDP per capita                156 non-null    float64
 4   Social support                156 non-null    float64
 5   Healthy life expectancy       156 non-null    float64
 6   Freedom to make life choices  156 non-null    float64
 7   Generosity                    156 non-null    float64
 8   Perceptions of corruption     156 non-null    float64
dtypes: float64(7), int64(1), object(1)
memory usage: 11.1+ KB

statistical information for some of the columns
      Freedom to make life choices  Generosity  GDP per capita
mean                      0.39257

---
### Next steps

There are data sets for the years 2015 to 2019 available.  To access and try out other years, change 2015 to the required year in the URL in the first code cell.  Leave the rest exactly as it is.  

Other years may have different column headings and so there will be different data to play with.