<a href="https://colab.research.google.com/github/olkasadova/data-and-python/blob/main/Data_Analysis_Worksheets%20/Copy_of_How_happy_is_the_world.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How happy is the world?

![HappinessImage-Benjamin Scott](https://drive.google.com/uc?id=1wWdXTclLAjPpZUiKkR8XGg0Sl7iD1u8T)  

(Image by Benjamin Scott, [source](https://www.natureindex.com/news-blog/data-visualization-these-are-the-happiest-countries-world-happiness-report-twenty-nineteen))   

The Sustainable Development Solutions Network (SDSN) collects data across the world relating to happiness.  They use this data to rank countries in order of happiness factor.

This is not an exact science but can give food for thought in terms of what factors might have the most impact on a nation's happiness levels.

Data is taken from the Gallup World Poll, so not collected directly by SDSN.  
Countries are grouped by region.  

### The factors included are:
Economy (measured in GDP per Capita)
Family (support systems)
Health (measured by Life Expectancy)
Freedom (sense of)
Trust (Government Corruption)
Generosity (charitable inclinations)
Dystopia Residual
*  Dystopic is the theoretical most unhappy country with the lowest levels in all six of the above factors  
*  The Residual measure is a calculated as the average of the six distances from lowest

Let's take a look at the data


---
### Open a data set

Open the data set, an Excel file with only one sheet (so sheet_name is not necessary) from here: https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true

Interrogate the data (head, tail, iloc) to get to know what it contains.


In [None]:
import pandas as pd
url = "https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true"
happiness = pd.read_excel(url)

happiness.head (10)
happiness.tail (30)
happiness.iloc [70:100]

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
70,Mauritius,Sub-Saharan Africa,71,5.477,0.07197,1.00761,0.98521,0.7095,0.56066,0.07521,0.37744,1.76145
71,Hong Kong,Eastern Asia,72,5.474,0.05051,1.38604,1.05818,1.01328,0.59608,0.37124,0.39478,0.65429
72,Estonia,Central and Eastern Europe,73,5.429,0.04013,1.15174,1.22791,0.77361,0.44888,0.15184,0.0868,1.58782
73,Indonesia,Southeastern Asia,74,5.399,0.02596,0.82827,1.08708,0.63793,0.46611,0.0,0.51535,1.86399
74,Vietnam,Southeastern Asia,75,5.36,0.03107,0.63216,0.91226,0.74676,0.59444,0.10441,0.1686,2.20173
75,Turkey,Middle East and Northern Africa,76,5.332,0.03864,1.06098,0.94632,0.73172,0.22815,0.15746,0.12253,2.08528
76,Kyrgyzstan,Central and Eastern Europe,77,5.286,0.03823,0.47428,1.15115,0.65088,0.43477,0.04232,0.3003,2.2327
77,Nigeria,Sub-Saharan Africa,78,5.268,0.04192,0.65435,0.90432,0.16007,0.34334,0.0403,0.27233,2.89319
78,Bhutan,Southern Asia,79,5.253,0.03225,0.77042,1.10395,0.57407,0.53206,0.15445,0.47998,1.63794
79,Azerbaijan,Central and Eastern Europe,80,5.212,0.03363,1.02389,0.93793,0.64045,0.3703,0.16065,0.07799,2.00073


---
### Sort the data in different ways

The data is currently sorted in order of rank.  To sort the data in the table, run the code below, which identifies the column on which to sort in the brackets.

Then, **try sorting on other columns** *Note: you must type the column heading in the quotes and exactly as it appears in the table (including capitalisation)*.  To sort on multiple columns, enter a list of column headings in the brackets (e.g. `.sort_values(['Region','Freedom'])`



In [6]:
import pandas as pd
url = "https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true"
happiness = pd.read_excel(url)

sorted_table = happiness.sort_values(['Health (Life Expectancy)'])
sorted_table  # output the table below

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
122,Sierra Leone,Sub-Saharan Africa,123,4.507,0.07068,0.33024,0.95571,0.00000,0.40840,0.08786,0.21488,2.51009
127,Botswana,Sub-Saharan Africa,128,4.332,0.04934,0.99355,1.10464,0.04776,0.49495,0.12474,0.10461,1.46181
147,Central African Republic,Sub-Saharan Africa,148,3.678,0.06112,0.07850,0.00000,0.06699,0.48879,0.08289,0.23835,2.72230
100,Swaziland,Sub-Saharan Africa,101,4.867,0.08742,0.71206,1.07284,0.07566,0.30658,0.03060,0.18259,2.48676
96,Lesotho,Sub-Saharan Africa,97,4.898,0.09438,0.37545,1.04103,0.07612,0.31767,0.12504,0.16388,2.79832
...,...,...,...,...,...,...,...,...,...,...,...,...
35,Spain,Western Europe,36,6.329,0.03468,1.23011,1.31379,0.95562,0.45951,0.06398,0.18227,2.12367
46,South Korea,Eastern Asia,47,5.984,0.04098,1.24461,0.95774,0.96538,0.33208,0.07857,0.18557,2.21978
45,Japan,Eastern Asia,46,5.987,0.03581,1.27074,1.25712,0.99111,0.49615,0.18060,0.10705,1.68435
71,Hong Kong,Eastern Asia,72,5.474,0.05051,1.38604,1.05818,1.01328,0.59608,0.37124,0.39478,0.65429


---
### Summarising the data

Look at the happiness dataframe.  Create new dataframes from a range of rows, columns, statistical information, etc.

For each dataframe, add a text cell to explain what it is showing

In [None]:


#  What is the highest and lowest Trust rate (corruption)
# What country has the highest and lowest Trust rate
# display Max and min Familty rate, and Health Max, min and median
# group by region and find the average of Happiness, what region is the happiest?
# group regions by average life expectancy, what is the lowest?


In [29]:
import pandas as pd
url = "https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true"
happiness = pd.read_excel(url)

#  What is the highest and lowest Trust rate (corruption)
print (happiness.agg ({ "Trust (Government Corruption)": ["max", "min"] }))

# What country has the highest and lowest Trust rate
sorted_table = happiness.sort_values(["Trust (Government Corruption)"])
print (sorted_table.iloc[0], sorted_table.iloc[-1])


# display Max and min Familty rate, and Health Max, min and median
print (happiness.agg ({
    "Family" : ["max", "min"],
    "Health (Life Expectancy)" : ["max", "min", "median"]
}))
# group by region and find the average of Happiness, what region is the happiest?
print(happiness.groupby("Region") ["Happiness Score"]. median())

# group regions by average life expectancy, what is the lowest?
Region_Life= happiness.groupby("Region") ["Health (Life Expectancy)"]. median()
print (Region_Life)

# can I sort these data? by median life



     Trust (Government Corruption)
max                        0.55191
min                        0.00000
Country                                  Indonesia
Region                           Southeastern Asia
Happiness Rank                                  74
Happiness Score                              5.399
Standard Error                             0.02596
Economy (GDP per Capita)                   0.82827
Family                                     1.08708
Health (Life Expectancy)                   0.63793
Freedom                                    0.46611
Trust (Government Corruption)                  0.0
Generosity                                 0.51535
Dystopia Residual                          1.86399
Name: 73, dtype: object Country                                      Rwanda
Region                           Sub-Saharan Africa
Happiness Rank                                  154
Happiness Score                               3.465
Standard Error                              0.03464

---
### Next steps

There are data sets for the years 2015 to 2019 available.  To access and try out other years, change 2015 to the required year in the URL in the first code cell.  Leave the rest exactly as it is.  

Other years may have different column headings and so there will be different data to play with.

In [43]:
import pandas as pd
url = "https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2019.xlsx?raw=true"
happiness = pd.read_excel(url)

happiness.head (10)

# sort countries by social support, and the sort by Health life expectancy. Compare the first 10 rows (with lowest values), if there are countries that are in both lists
sorted_support = happiness.sort_values (["Social support"])
sorted_expectancy = happiness.sort_values (["Healthy life expectancy"])

for s in range (0, 10):
  support_v = sorted_support.iloc [s]
  for ex in range (0, 10):
    expectancy = sorted_expectancy.iloc [ex]
    if expectancy ["Country or region"]==support_v ["Country or region"]:
      print (" Country with both lowest expectancy and support: ", expectancy ["Country or region"])




 Country with both lowest expectancy and support:  Central African Republic
 Country with both lowest expectancy and support:  South Sudan
