# Country populations

In this notebook, we'll use pandas to join some relational data:
- `../data/country-codes.csv` -- a table of ISO country codes and country names
- `../data/country-population.csv` -- country population data from the U.N.

👉 For more information on merging data in pandas, [check out this notebook](../reference/Merging%20data%20in%20pandas.ipynb).

First, let's import pandas `as` pd.

In [13]:
import pandas as pd

When we read in the CSVs, we need to make sure that pandas doesn't parse the ISO codes as numbers, because we want to keep any leading zeroes. Why? Because later on we're going to be matching those up for the join, and we'll get an error if we try to join up text to numbers. So, in addition to providing the path to where the CSVs live, we'll also provide a keyword argument called `dtype` to specify that values in the `code` column need to be parsed as strings (`str`).

In [14]:
country_codes = pd.read_csv('../data/country-codes.csv', dtype={'code': str})

In [15]:
country_codes.head()

Unnamed: 0,code,country
0,108,Burundi
1,174,Comoros
2,262,Djibouti
3,232,Eritrea
4,231,Ethiopia


In [16]:
country_pop = pd.read_csv('../data/country-population.csv', dtype={'code': str})

In [17]:
country_pop.head()

Unnamed: 0,code,pop2000,pop2001,pop2002,pop2003,pop2004,pop2005,pop2006,pop2007,pop2008,pop2009,pop2010,pop2011,pop2012,pop2013,pop2014,pop2015
0,108,6401.0,6556.0,6742.0,6953.0,7182.0,7423.0,7675.0,7940.0,8212.0,8489.0,8767.0,9044.0,9320.0,9600.0,9892.0,10199.0
1,174,542.0,556.0,569.0,583.0,597.0,612.0,626.0,642.0,657.0,673.0,690.0,707.0,724.0,742.0,759.0,777.0
2,262,718.0,733.0,746.0,759.0,771.0,783.0,796.0,809.0,823.0,837.0,851.0,866.0,881.0,897.0,912.0,927.0
3,232,3393.0,3497.0,3615.0,3738.0,3859.0,3969.0,4067.0,4153.0,4233.0,4310.0,4391.0,4475.0,4561.0,4651.0,4746.0,4847.0
4,231,66537.0,68492.0,70497.0,72545.0,74624.0,76727.0,78851.0,81000.0,83185.0,85416.0,87703.0,90047.0,92444.0,94888.0,97367.0,99873.0


### Question: Which country had the largest population change, as a percentage, from 2000 to 2015?

First step: Calculate the percentage change as a new column. The formula is `((New number - Old number) / Old number) * 100`.

To create a new column in a pandas dataframe, assign a new column name in square brackets and set that equal to the formula you're calculating.

In [18]:
country_pop['pct_change'] = ((country_pop['pop2015'] - country_pop['pop2000']) / country_pop['pop2000']) * 100

In [19]:
country_pop.head()

Unnamed: 0,code,pop2000,pop2001,pop2002,pop2003,pop2004,pop2005,pop2006,pop2007,pop2008,pop2009,pop2010,pop2011,pop2012,pop2013,pop2014,pop2015,pct_change
0,108,6401.0,6556.0,6742.0,6953.0,7182.0,7423.0,7675.0,7940.0,8212.0,8489.0,8767.0,9044.0,9320.0,9600.0,9892.0,10199.0,59.334479
1,174,542.0,556.0,569.0,583.0,597.0,612.0,626.0,642.0,657.0,673.0,690.0,707.0,724.0,742.0,759.0,777.0,43.357934
2,262,718.0,733.0,746.0,759.0,771.0,783.0,796.0,809.0,823.0,837.0,851.0,866.0,881.0,897.0,912.0,927.0,29.108635
3,232,3393.0,3497.0,3615.0,3738.0,3859.0,3969.0,4067.0,4153.0,4233.0,4310.0,4391.0,4475.0,4561.0,4651.0,4746.0,4847.0,42.852933
4,231,66537.0,68492.0,70497.0,72545.0,74624.0,76727.0,78851.0,81000.0,83185.0,85416.0,87703.0,90047.0,92444.0,94888.0,97367.0,99873.0,50.101447


### Sort the data and select the relevant columns

In [20]:
top_change = country_pop.sort_values('pct_change', ascending=False)[['code', 'pop2000', 'pop2015', 'pct_change']]

In [21]:
top_change.head()

Unnamed: 0,code,pop2000,pop2015,pct_change
102,634,592.0,2482.0,319.256757
107,784,3155.0,9154.0,190.142631
93,48,665.0,1372.0,106.315789
99,414,2051.0,3936.0,91.906387
26,226,614.0,1175.0,91.368078


### Join the data with the country codes lookup table

To join data in pandas, we can use the [`merge()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html) method.

In [22]:
merged = pd.merge(top_change, country_codes, on='code')

In [23]:
merged.head()

Unnamed: 0,code,pop2000,pop2015,pct_change,country
0,634,592.0,2482.0,319.256757,Qatar
1,784,3155.0,9154.0,190.142631,United Arab Emirates
2,414,2051.0,3936.0,91.906387,Kuwait
3,226,614.0,1175.0,91.368078,Equatorial Guinea
4,512,2268.0,4200.0,85.185185,Oman
