# Day 10 Workout - Operating on Data

This notebook uses the [GapMinder data set](https://www.gapminder.org)

In [2]:
import pandas as pd
data = pd.read_csv("data/gapminder.tsv", sep='\t')
data.head() # show five entries

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap
0,Afghanistan,Asia,1952,28.801,8425333,779.445314
1,Afghanistan,Asia,1957,30.332,9240934,820.85303
2,Afghanistan,Asia,1962,31.997,10267083,853.10071
3,Afghanistan,Asia,1967,34.02,11537966,836.197138
4,Afghanistan,Asia,1972,36.088,13079460,739.981106


Create a new column in the data frame called `lifeExpMonth` representing the values of `lifeExp` in months (i.e. multiply lifeExp by 12)

In [5]:
data['lifeExpMonth'] = data['lifeExp'] * 12
data.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,lifeExpMonth
0,Afghanistan,Asia,1952,28.801,8425333,779.445314,345.612
1,Afghanistan,Asia,1957,30.332,9240934,820.85303,363.984
2,Afghanistan,Asia,1962,31.997,10267083,853.10071,383.964
3,Afghanistan,Asia,1967,34.02,11537966,836.197138,408.24
4,Afghanistan,Asia,1972,36.088,13079460,739.981106,433.056


Create a new column `gdp` which is the total gdp (i.e., multiply pop by gdpPercap)

In [7]:
data['gdp'] = data.gdpPercap * data['pop']
data.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,lifeExpMonth,gdp
0,Afghanistan,Asia,1952,28.801,8425333,779.445314,345.612,6567086000.0
1,Afghanistan,Asia,1957,30.332,9240934,820.85303,363.984,7585449000.0
2,Afghanistan,Asia,1962,31.997,10267083,853.10071,383.964,8758856000.0
3,Afghanistan,Asia,1967,34.02,11537966,836.197138,408.24,9648014000.0
4,Afghanistan,Asia,1972,36.088,13079460,739.981106,433.056,9678553000.0


Round lifeExp to the nearest whole number and make its datatype an integer, overwrite the values in `lifeExp`

In [17]:
data['lifeExp'] = round(data.lifeExp).apply(int)
data.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,lifeExpMonth,gdp
0,Afghanistan,Asia,1952,29,8425333,779.445314,345.612,6567086000.0
1,Afghanistan,Asia,1957,30,9240934,820.85303,363.984,7585449000.0
2,Afghanistan,Asia,1962,32,10267083,853.10071,383.964,8758856000.0
3,Afghanistan,Asia,1967,34,11537966,836.197138,408.24,9648014000.0
4,Afghanistan,Asia,1972,36,13079460,739.981106,433.056,9678553000.0


Assuming $1,000 is the threshold of poverty, what are the countries that have been considered poor? (i.e., gdpPercap less than $1,000)

In [18]:
set(data[data['gdpPercap'] < 1000]['country'])

{'Afghanistan',
 'Bangladesh',
 'Benin',
 'Bosnia and Herzegovina',
 'Botswana',
 'Burkina Faso',
 'Burundi',
 'Cambodia',
 'Central African Republic',
 'Chad',
 'China',
 'Comoros',
 'Congo, Dem. Rep.',
 'Equatorial Guinea',
 'Eritrea',
 'Ethiopia',
 'Gambia',
 'Ghana',
 'Guinea',
 'Guinea-Bissau',
 'India',
 'Indonesia',
 'Kenya',
 'Lesotho',
 'Liberia',
 'Madagascar',
 'Malawi',
 'Mali',
 'Mauritania',
 'Mongolia',
 'Mozambique',
 'Myanmar',
 'Nepal',
 'Niger',
 'Pakistan',
 'Rwanda',
 'Sao Tome and Principe',
 'Sierra Leone',
 'Somalia',
 'Tanzania',
 'Thailand',
 'Togo',
 'Uganda',
 'Vietnam',
 'Yemen, Rep.',
 'Zimbabwe'}

- Calculate the average gdpPercap.
- Let us define poverty as having a gdpPercap less than 0.5 x average. What are the poor countries?

In [19]:
avg_gdpPercap = data['gdpPercap'].mean()
avg_gdpPercap

7215.327081212149

In [20]:
set(data[data['gdpPercap'] < avg_gdpPercap / 2]['country'])

{'Afghanistan',
 'Albania',
 'Algeria',
 'Angola',
 'Bangladesh',
 'Benin',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Botswana',
 'Brazil',
 'Bulgaria',
 'Burkina Faso',
 'Burundi',
 'Cambodia',
 'Cameroon',
 'Central African Republic',
 'Chad',
 'China',
 'Colombia',
 'Comoros',
 'Congo, Dem. Rep.',
 'Congo, Rep.',
 'Costa Rica',
 "Cote d'Ivoire",
 'Croatia',
 'Djibouti',
 'Dominican Republic',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Eritrea',
 'Ethiopia',
 'Gambia',
 'Ghana',
 'Greece',
 'Guatemala',
 'Guinea',
 'Guinea-Bissau',
 'Haiti',
 'Honduras',
 'Hong Kong, China',
 'India',
 'Indonesia',
 'Iran',
 'Iraq',
 'Jamaica',
 'Japan',
 'Jordan',
 'Kenya',
 'Korea, Dem. Rep.',
 'Korea, Rep.',
 'Lesotho',
 'Liberia',
 'Libya',
 'Madagascar',
 'Malawi',
 'Malaysia',
 'Mali',
 'Mauritania',
 'Mauritius',
 'Mexico',
 'Mongolia',
 'Montenegro',
 'Morocco',
 'Mozambique',
 'Myanmar',
 'Namibia',
 'Nepal',
 'Nicaragua',
 'Niger',
 'Nigeria',
 'Oman',
 'Pakistan',
 'Panam

The above calculation ignores that the value of money is changing. 
- Write a function that takes a year, and returns the average gdpPercap in that year
- What are the poor countries in 1967?
- What are the poor countries in 2007?
- What countries were poor in 1960 and remained poor in 2007?

In [21]:
def average_gdp(year):
    subset = data[data['year'] == year]
    return subset['gdpPercap'].mean()

In [22]:
average_gdp(1967)

5483.653046835212

In [23]:
average_gdp(2007)

11680.071819878167

In [24]:
data['average_year'] = data['year'].apply(average_gdp)
data.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,lifeExpMonth,gdp,average_year
0,Afghanistan,Asia,1952,29,8425333,779.445314,345.612,6567086000.0,3725.276046
1,Afghanistan,Asia,1957,30,9240934,820.85303,363.984,7585449000.0,4299.408345
2,Afghanistan,Asia,1962,32,10267083,853.10071,383.964,8758856000.0,4725.812342
3,Afghanistan,Asia,1967,34,11537966,836.197138,408.24,9648014000.0,5483.653047
4,Afghanistan,Asia,1972,36,13079460,739.981106,433.056,9678553000.0,6770.082815


In [25]:
poor_countries = data[data['gdpPercap'] < 0.5 * data['average_year']]
poor_countries.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,lifeExpMonth,gdp,average_year
0,Afghanistan,Asia,1952,29,8425333,779.445314,345.612,6567086000.0,3725.276046
1,Afghanistan,Asia,1957,30,9240934,820.85303,363.984,7585449000.0,4299.408345
2,Afghanistan,Asia,1962,32,10267083,853.10071,383.964,8758856000.0,4725.812342
3,Afghanistan,Asia,1967,34,11537966,836.197138,408.24,9648014000.0,5483.653047
4,Afghanistan,Asia,1972,36,13079460,739.981106,433.056,9678553000.0,6770.082815


In [26]:
poor_countries_1967 = set(poor_countries[poor_countries.year == 1967].country)
poor_countries_1967

{'Afghanistan',
 'Bangladesh',
 'Benin',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Botswana',
 'Burkina Faso',
 'Burundi',
 'Cambodia',
 'Cameroon',
 'Central African Republic',
 'Chad',
 'China',
 'Colombia',
 'Comoros',
 'Congo, Dem. Rep.',
 'Congo, Rep.',
 "Cote d'Ivoire",
 'Dominican Republic',
 'Egypt',
 'Equatorial Guinea',
 'Eritrea',
 'Ethiopia',
 'Gambia',
 'Ghana',
 'Guinea',
 'Guinea-Bissau',
 'Haiti',
 'Honduras',
 'India',
 'Indonesia',
 'Jordan',
 'Kenya',
 'Korea, Dem. Rep.',
 'Korea, Rep.',
 'Lesotho',
 'Liberia',
 'Madagascar',
 'Malawi',
 'Malaysia',
 'Mali',
 'Mauritania',
 'Mauritius',
 'Mongolia',
 'Morocco',
 'Mozambique',
 'Myanmar',
 'Nepal',
 'Niger',
 'Nigeria',
 'Pakistan',
 'Paraguay',
 'Philippines',
 'Rwanda',
 'Sao Tome and Principe',
 'Senegal',
 'Sierra Leone',
 'Somalia',
 'Sri Lanka',
 'Sudan',
 'Swaziland',
 'Syria',
 'Taiwan',
 'Tanzania',
 'Thailand',
 'Togo',
 'Tunisia',
 'Uganda',
 'Vietnam',
 'West Bank and Gaza',
 'Yemen, Rep.',
 'Zambia',
 'Zim

In [27]:
poor_countries_2007 = set(poor_countries[poor_countries.year == 2007].country)
poor_countries_2007

{'Afghanistan',
 'Angola',
 'Bangladesh',
 'Benin',
 'Bolivia',
 'Burkina Faso',
 'Burundi',
 'Cambodia',
 'Cameroon',
 'Central African Republic',
 'Chad',
 'China',
 'Comoros',
 'Congo, Dem. Rep.',
 'Congo, Rep.',
 "Cote d'Ivoire",
 'Djibouti',
 'Egypt',
 'El Salvador',
 'Eritrea',
 'Ethiopia',
 'Gambia',
 'Ghana',
 'Guatemala',
 'Guinea',
 'Guinea-Bissau',
 'Haiti',
 'Honduras',
 'India',
 'Indonesia',
 'Iraq',
 'Jordan',
 'Kenya',
 'Korea, Dem. Rep.',
 'Lesotho',
 'Liberia',
 'Madagascar',
 'Malawi',
 'Mali',
 'Mauritania',
 'Mongolia',
 'Morocco',
 'Mozambique',
 'Myanmar',
 'Namibia',
 'Nepal',
 'Nicaragua',
 'Niger',
 'Nigeria',
 'Pakistan',
 'Paraguay',
 'Philippines',
 'Rwanda',
 'Sao Tome and Principe',
 'Senegal',
 'Sierra Leone',
 'Somalia',
 'Sri Lanka',
 'Sudan',
 'Swaziland',
 'Syria',
 'Tanzania',
 'Togo',
 'Uganda',
 'Vietnam',
 'West Bank and Gaza',
 'Yemen, Rep.',
 'Zambia',
 'Zimbabwe'}

In [28]:
stuck_in_poverty = poor_countries_1967 & poor_countries_2007
stuck_in_poverty

{'Afghanistan',
 'Bangladesh',
 'Benin',
 'Bolivia',
 'Burkina Faso',
 'Burundi',
 'Cambodia',
 'Cameroon',
 'Central African Republic',
 'Chad',
 'China',
 'Comoros',
 'Congo, Dem. Rep.',
 'Congo, Rep.',
 "Cote d'Ivoire",
 'Egypt',
 'Eritrea',
 'Ethiopia',
 'Gambia',
 'Ghana',
 'Guinea',
 'Guinea-Bissau',
 'Haiti',
 'Honduras',
 'India',
 'Indonesia',
 'Jordan',
 'Kenya',
 'Korea, Dem. Rep.',
 'Lesotho',
 'Liberia',
 'Madagascar',
 'Malawi',
 'Mali',
 'Mauritania',
 'Mongolia',
 'Morocco',
 'Mozambique',
 'Myanmar',
 'Nepal',
 'Niger',
 'Nigeria',
 'Pakistan',
 'Paraguay',
 'Philippines',
 'Rwanda',
 'Sao Tome and Principe',
 'Senegal',
 'Sierra Leone',
 'Somalia',
 'Sri Lanka',
 'Sudan',
 'Swaziland',
 'Syria',
 'Tanzania',
 'Togo',
 'Uganda',
 'Vietnam',
 'West Bank and Gaza',
 'Yemen, Rep.',
 'Zambia',
 'Zimbabwe'}