# World Happiness Report

This case study is based on the [7th World Happiness Report](https://worldhappiness.report/ed/2019/). The first was released in April 2012 in support of a UN High level meeting on “Wellbeing and Happiness: Defining a New Economic Paradigm”. That 2012 report presented the available global data on national happiness and reviewed related evidence from the emerging science of happiness, showing that the quality of people’s lives can be coherently, reliably, and validly assessed by a variety of subjective well-being measures, collectively referred to then and in subsequent reports as “happiness.” 

This year’s World Happiness Report focuses on happiness and the community: how happiness has evolved over the past dozen years, with a focus on the technologies, social norms, conflicts and government policies that have driven those changes.

I have downloaded the data from [Chapter 2: Online Data](https://s3.amazonaws.com/happiness-report/2019/Chapter2OnlineData.xls) and filtered out data prior to 2018. The result is available in CSV format, as the file `happiness-report.csv`. _Data Prep Notes:_ The Happiness Score column is from Figure 2.6 in the downloaded report; the other data columns are from Table 2.1 in the same report. If a country wasn't in either list, it wasn't included in the CSV file.

### 1. Validate the data

Even though you are given the downloaded data in `happiness-report.csv`, use the description in the last paragraph to repeat the procedure. Do you get the same results? 

In [None]:
# Execute this cell to compare
!comm -3 happiness-report.csv happiness-report-your-copy.csv

Did you get the same result?

**Explain:**

**Irrespective of your answer, use `happiness-report.csv` for the remainder of this notebook.**

In [1]:
import pandas as pd
data1 = pd.read_csv('happiness-report.csv')
data1

Unnamed: 0,Country,Year,HappinessScore,LifeLadder,LogGDP,SocialSupport,HealthyLifeExpectancyAtBirth,FreedomToMakeLifeChoices,Generosity,PerceptionsOfCorruption,PositiveAffect,NegativeAffect,ConfidenceInNationalGovernment
0,Afghanistan,2018,3.203,2.694303,7.494588,0.507516,52.599998,0.373536,-0.084888,0.927606,0.424125,0.404904,0.364666
1,Albania,2018,4.719,5.004403,9.412399,0.683592,68.699997,0.824212,0.005385,0.899129,0.713300,0.318997,0.435338
2,Algeria,2018,5.211,5.043086,9.557952,0.798651,65.900002,0.583381,-0.172413,0.758704,0.591043,0.292946,
3,Argentina,2018,6.086,5.792797,9.809972,0.899912,68.800003,0.845895,-0.206937,0.855255,0.820310,0.320502,0.261352
4,Armenia,2018,4.559,5.062449,9.119424,0.814449,66.900002,0.807644,-0.149109,0.676826,0.581488,0.454840,0.670828
...,...,...,...,...,...,...,...,...,...,...,...,...,...
131,Venezuela,2018,4.707,5.005663,9.270281,0.886882,66.500000,0.610855,-0.176156,0.827560,0.759221,0.373658,0.260700
132,Vietnam,2018,5.175,5.295547,8.783416,0.831945,67.900002,0.909260,-0.039124,0.808423,0.692222,0.191061,
133,Yemen,2018,3.380,3.057514,,0.789422,56.700001,0.552726,,0.792587,0.461114,0.314870,0.308151
134,Zambia,2018,4.107,4.041488,8.223958,0.717720,55.299999,0.790626,0.036644,0.810731,0.702698,0.350963,0.606715


### 2. Copy the header from the dataframe &hellip;

&hellip; and obtain column names for the first 7 columns.

In [2]:
colnames = 'Country	Year	HappinessScore	LifeLadder	LogGDP	SocialSupport	HealthyLifeExpectancyAtBirth'.split('\t')
colnames

['Country',
 'Year',
 'HappinessScore',
 'LifeLadder',
 'LogGDP',
 'SocialSupport',
 'HealthyLifeExpectancyAtBirth']

### 3. Create a new dataframe `data2` with just the first 7 columns.

In [3]:
data2 = data1[colnames]
data2

Unnamed: 0,Country,Year,HappinessScore,LifeLadder,LogGDP,SocialSupport,HealthyLifeExpectancyAtBirth
0,Afghanistan,2018,3.203,2.694303,7.494588,0.507516,52.599998
1,Albania,2018,4.719,5.004403,9.412399,0.683592,68.699997
2,Algeria,2018,5.211,5.043086,9.557952,0.798651,65.900002
3,Argentina,2018,6.086,5.792797,9.809972,0.899912,68.800003
4,Armenia,2018,4.559,5.062449,9.119424,0.814449,66.900002
...,...,...,...,...,...,...,...
131,Venezuela,2018,4.707,5.005663,9.270281,0.886882,66.500000
132,Vietnam,2018,5.175,5.295547,8.783416,0.831945,67.900002
133,Yemen,2018,3.380,3.057514,,0.789422,56.700001
134,Zambia,2018,4.107,4.041488,8.223958,0.717720,55.299999


### 4. Drop invalid rows

Look up the pandas method `dropna` and use it to create a new dataframe `data3` with cleaned up data.

In [4]:
data3 = data2.dropna()
data3

Unnamed: 0,Country,Year,HappinessScore,LifeLadder,LogGDP,SocialSupport,HealthyLifeExpectancyAtBirth
0,Afghanistan,2018,3.203,2.694303,7.494588,0.507516,52.599998
1,Albania,2018,4.719,5.004403,9.412399,0.683592,68.699997
2,Algeria,2018,5.211,5.043086,9.557952,0.798651,65.900002
3,Argentina,2018,6.086,5.792797,9.809972,0.899912,68.800003
4,Armenia,2018,4.559,5.062449,9.119424,0.814449,66.900002
...,...,...,...,...,...,...,...
130,Uzbekistan,2018,6.174,6.205460,8.773365,0.920821,65.099998
131,Venezuela,2018,4.707,5.005663,9.270281,0.886882,66.500000
132,Vietnam,2018,5.175,5.295547,8.783416,0.831945,67.900002
134,Zambia,2018,4.107,4.041488,8.223958,0.717720,55.299999


### 5. Drop a Column

Create a new dataframe `data4` that is the same as `data3` but with the **LifeLadder** column dropped.

In [5]:
data4 = data3.drop(columns = 'LifeLadder')
data4

Unnamed: 0,Country,Year,HappinessScore,LogGDP,SocialSupport,HealthyLifeExpectancyAtBirth
0,Afghanistan,2018,3.203,7.494588,0.507516,52.599998
1,Albania,2018,4.719,9.412399,0.683592,68.699997
2,Algeria,2018,5.211,9.557952,0.798651,65.900002
3,Argentina,2018,6.086,9.809972,0.899912,68.800003
4,Armenia,2018,4.559,9.119424,0.814449,66.900002
...,...,...,...,...,...,...
130,Uzbekistan,2018,6.174,8.773365,0.920821,65.099998
131,Venezuela,2018,4.707,9.270281,0.886882,66.500000
132,Vietnam,2018,5.175,8.783416,0.831945,67.900002
134,Zambia,2018,4.107,8.223958,0.717720,55.299999


### 6. From `data4`, show the row for United States

The Dataframe outputs in the cells above didn't show US and you don't know if the country is called 'USA' or 'United States' or 'United States of America' or something else. _You could look at the underlying file but that isn't the point of this question. The point is to be able to search for things in the dataframe._

In [6]:
data4[data4['Country'].str.lower().str.contains('states')]

Unnamed: 0,Country,Year,HappinessScore,LogGDP,SocialSupport,HealthyLifeExpectancyAtBirth
128,United States,2018,6.892,10.922465,0.903856,68.300003


### 7. Change a Row

Hypothetical scenario: _Belarus_ decides to officially change its name to _Belarussian Republic_ and its happiness score increases by 5%. Copy data4 into data5 and make these changes to data5. Show all rows in data5 that match the substring `russia` (ignoring case, so both Russia and Belarussian Republic would be shown).

In [7]:
data5 = data4
data5.loc[data5.Country == 'Belarus', 'Country'] = 'Belarussian Republic'

# Since we changed the name of the country first, we must use the new name in this next line

data5.loc[data5.Country == 'Belarussian Republic', 'HappinessScore'] = data5.loc[data5.Country == 'Belarussian Republic', 'HappinessScore']*1.05
data5[data5['Country'].str.lower().str.contains('russia')]

Unnamed: 0,Country,Year,HappinessScore,LogGDP,SocialSupport,HealthyLifeExpectancyAtBirth
9,Belarussian Republic,2018,5.58915,9.778739,0.904569,66.099998
101,Russia,2018,5.648,10.13239,0.908726,64.300003


In [8]:
data5 = data4
data5.loc[data5.Country == 'Belarussian Republic', 'HappinessScore'] = data5.loc[data5.Country == 'Belarussian Republic', 'HappinessScore']*1.05
data5[data5['Country'].str.lower().str.contains('russia')]

Unnamed: 0,Country,Year,HappinessScore,LogGDP,SocialSupport,HealthyLifeExpectancyAtBirth
9,Belarussian Republic,2018,5.868608,9.778739,0.904569,66.099998
101,Russia,2018,5.648,10.13239,0.908726,64.300003


# When you're done, submit the notebook

1. **Run all the cells in order.**

2. Submit the notebook by saving it as PDF. 
    * In the cluster environment, it's File | Print (Save as PDF) and submit to [Gradescope](https://www.gradescope.com/courses/182658)<sup>&dagger;</sup>, 
    * On other versions, it may be File | Download As (PDF) and then submit to [Gradescope](https://www.gradescope.com/courses/182658)<sup>&dagger;</sup>.

<sup>&dagger;</sup>To submit to Gradescope, log into the website, add course 9W7PW3 (if not already added) and submit. The assignment name should match the name of this notebook.

![The end](https://live.staticflickr.com/32/89187454_3ae6aded89_b.jpg)