<a href="https://colab.research.google.com/github/nishah8/dataandpython/blob/main/Copy_of_How_happy_is_the_world.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How happy is the world?

![HappinessImage-Benjamin Scott](https://drive.google.com/uc?id=1wWdXTclLAjPpZUiKkR8XGg0Sl7iD1u8T)  

(Image by Benjamin Scott, [source](https://www.natureindex.com/news-blog/data-visualization-these-are-the-happiest-countries-world-happiness-report-twenty-nineteen))   

The Sustainable Development Solutions Network (SDSN) collects data across the world relating to happiness.  They use this data to rank countries in order of happiness factor.

This is not an exact science but can give food for thought in terms of what factors might have the most impact on a nation's happiness levels.

Data is taken from the Gallup World Poll, so not collected directly by SDSN.  
Countries are grouped by region.  

### The factors included are:
Economy (measured in GDP per Capita)
Family (support systems)
Health (measured by Life Expectancy)
Freedom (sense of)
Trust (Government Corruption)
Generosity (charitable inclinations)
Dystopia Residual
*  Dystopic is the theoretical most unhappy country with the lowest levels in all six of the above factors  
*  The Residual measure is a calculated as the average of the six distances from lowest

Let's take a look at the data


---
### Open a data set

Open the data set, an Excel file with only one sheet (so sheet_name is not necessary) from here: https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true

Interrogate the data (head, tail, iloc) to get to know what it contains.


In [7]:
import pandas as pd
df = pd.read_excel( 'https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true')
print (df.head())
print (df.tail())
print (df.iloc[int((len(df)/2)-3):int((len(df)/2)+3)])


       Country          Region  Happiness Rank  Happiness Score  \
0  Switzerland  Western Europe               1            7.587   
1      Iceland  Western Europe               2            7.561   
2      Denmark  Western Europe               3            7.527   
3       Norway  Western Europe               4            7.522   
4       Canada   North America               5            7.427   

   Standard Error  Economy (GDP per Capita)   Family  \
0         0.03411                   1.39651  1.34951   
1         0.04884                   1.30232  1.40223   
2         0.03328                   1.32548  1.36058   
3         0.03880                   1.45900  1.33095   
4         0.03553                   1.32629  1.32261   

   Health (Life Expectancy)  Freedom  Trust (Government Corruption)  \
0                   0.94143  0.66557                        0.41978   
1                   0.94784  0.62877                        0.14145   
2                   0.87464  0.64938           

---
### Sort the data in different ways

The data is currently sorted in order of rank.  To sort the data in the table, run the code below, which identifies the column on which to sort in the brackets.

Then, **try sorting on other columns** *Note: you must type the column heading in the quotes and exactly as it appears in the table (including capitalisation)*.  To sort on multiple columns, enter a list of column headings in the brackets (e.g. `.sort_values(['Region','Freedom'])`



In [8]:
import pandas as pd
df = pd.read_excel( 'https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true')
sorted_table = df.sort_values(by=['Health (Life Expectancy)'], ascending=False)
print (sorted_table)  # output the table below

sorted_table1 = df.sort_values(by=['Health (Life Expectancy)','Economy (GDP per Capita)'], ascending=False)
print (sorted_table1)


                      Country              Region  Happiness Rank  \
23                  Singapore   Southeastern Asia              24   
71                  Hong Kong        Eastern Asia              72   
45                      Japan        Eastern Asia              46   
46                South Korea        Eastern Asia              47   
35                      Spain      Western Europe              36   
..                        ...                 ...             ...   
96                    Lesotho  Sub-Saharan Africa              97   
100                 Swaziland  Sub-Saharan Africa             101   
147  Central African Republic  Sub-Saharan Africa             148   
127                  Botswana  Sub-Saharan Africa             128   
122              Sierra Leone  Sub-Saharan Africa             123   

     Happiness Score  Standard Error  Economy (GDP per Capita)   Family  \
23             6.798         0.03780                   1.52186  1.02000   
71             5.474 

---
### Summarising the data

Look at the happiness dataframe.  Create new dataframes from a range of rows, columns, statistical information, etc.

For each dataframe, add a text cell to explain what it is showing

In [11]:
import pandas as pd
df = pd.read_excel( 'https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true')
print ('Information about Happiness Dataframe')
print()
df.info()
print()


print ('statistical information for some of the columns')
gen = df.agg(
    {
        "Family": ["mean","max", "min",],
        "Generosity": ["mean","max", "min", "std"],
        "Economy (GDP per Capita)":["mean","min","max","std"],
    })
print(gen)
print()

print('dataframe showing information about happiness rank, freedom and Government Trust ')
W = df. iloc[:, [0,2,8,9]]
print(W)
print()

print ('countries filtered on basis of life expectancy over 1')
lex = df.loc[df['Health (Life Expectancy)'] >= 0.9]
print ('number of countries fitting the requirements are:', len(lex))
print(lex)
print()




Information about Happiness Dataframe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 158 entries, 0 to 157
Data columns (total 12 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Country                        158 non-null    object 
 1   Region                         158 non-null    object 
 2   Happiness Rank                 158 non-null    int64  
 3   Happiness Score                158 non-null    float64
 4   Standard Error                 158 non-null    float64
 5   Economy (GDP per Capita)       158 non-null    float64
 6   Family                         158 non-null    float64
 7   Health (Life Expectancy)       158 non-null    float64
 8   Freedom                        158 non-null    float64
 9   Trust (Government Corruption)  158 non-null    float64
 10  Generosity                     158 non-null    float64
 11  Dystopia Residual              158 non-null    float64
dtypes: float64(

---
### Next steps

There are data sets for the years 2015 to 2019 available.  To access and try out other years, change 2015 to the required year in the URL in the first code cell.  Leave the rest exactly as it is.  

Other years may have different column headings and so there will be different data to play with.