In this exercise, you'll redo the data gathering phase of the UNData Exploration project by using APIs instead of downloading csv files.

You'll make use of the [World Bank Indicators API](https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-about-the-indicators-api-documentation). Note that this API does not require an API key. Before attempting the exercise, it would be a good idea to skim through the Documentation page and to check out the [Basic Call Structure article](https://datahelpdesk.worldbank.org/knowledgebase/articles/898581)

In [29]:
import requests
import matplotlib.pyplot as plt
import pandas as pd

### Call the endpoint - https://api.worldbank.org/v2/country/all/indicator/NY.GDP.PCAP.PP.KD

In [30]:
#setting per page 341 so that we can pull all rows
endpoint = 'https://api.worldbank.org/v2/country/all/indicator/NY.GDP.PCAP.PP.KD'

params = {
    'format': 'json',
    "per_page" : 341
}

response = requests.get(endpoint, params = params)

res = response.json()

In [31]:
#review keys
res[1][0].keys()

dict_keys(['indicator', 'country', 'countryiso3code', 'date', 'value', 'unit', 'obs_status', 'decimal'])

In [32]:
#review how to access data
res[1][0]

{'indicator': {'id': 'NY.GDP.PCAP.PP.KD',
  'value': 'GDP per capita, PPP (constant 2021 international $)'},
 'country': {'id': 'ZH', 'value': 'Africa Eastern and Southern'},
 'countryiso3code': 'AFE',
 'date': '2023',
 'value': 4047.00703115979,
 'unit': '',
 'obs_status': '',
 'decimal': 0}

In [38]:
gdp_ppp = pd.json_normalize(response.json()[1])

In [47]:
gdp_ppp.sample(10)

Unnamed: 0,countryiso3code,date,value,unit,obs_status,decimal,indicator.id,indicator.value,country.id,country.value
254,CSS,1961,,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",S3,Caribbean small states
311,CEB,1968,,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",B8,Central Europe and the Baltics
276,CEB,2003,21525.620473,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",B8,Central Europe and the Baltics
183,ARB,1968,,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",1A,Arab World
281,CEB,1998,17872.38035,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",B8,Central Europe and the Baltics
220,CSS,1995,13829.144184,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",S3,Caribbean small states
56,AFE,1967,,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
327,EAR,2016,9868.255725,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",V2,Early-demographic dividend
130,ARB,2021,15776.984852,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",1A,Arab World
232,CSS,1983,,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",S3,Caribbean small states


Lets clean the columns

In [59]:
gdp_ppp = gdp_ppp.drop(columns=['countryiso3code','unit','obs_status','indicator.id','indicator.value','decimal'])

In [66]:
gdp_ppp = gdp_ppp.rename(columns={"date": "Year", "value": "GDP_Per_Capita","country.id":"country_id","country.value":"country"})

In [67]:
gdp_ppp

Unnamed: 0,Year,GDP_Per_Capita,country_id,country
0,2023,4047.007031,ZH,Africa Eastern and Southern
1,2022,4038.638689,ZH,Africa Eastern and Southern
2,2021,3994.171654,ZH,Africa Eastern and Southern
3,2020,3919.499230,ZH,Africa Eastern and Southern
4,2019,4130.057222,ZH,Africa Eastern and Southern
...,...,...,...,...
336,2007,7569.420392,V2,Early-demographic dividend
337,2006,7257.102626,V2,Early-demographic dividend
338,2005,6941.349729,V2,Early-demographic dividend
339,2004,6652.360703,V2,Early-demographic dividend


In [None]:
### Call the endpoint - https://api.worldbank.org/v2/country/all/indicator/SP.DYN.LE00.IN

In [40]:
endpoint = 'https://api.worldbank.org/v2/country/all/indicator/SP.DYN.LE00.IN'

params = {
    'format': 'json',
    "per_page" : 341
}

response = requests.get(endpoint, params = params)

res = response.json()

In [43]:
#review data
res[1][0]

{'indicator': {'id': 'SP.DYN.LE00.IN',
  'value': 'Life expectancy at birth, total (years)'},
 'country': {'id': 'ZH', 'value': 'Africa Eastern and Southern'},
 'countryiso3code': 'AFE',
 'date': '2023',
 'value': None,
 'unit': '',
 'obs_status': '',
 'decimal': 0}

In [62]:
df_le_birth_total = pd.json_normalize(response.json()[1])

In [63]:
df_le_birth_total.sample(10)

Unnamed: 0,countryiso3code,date,value,unit,obs_status,decimal,indicator.id,indicator.value,country.id,country.value
299,CEB,1980,69.888985,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",B8,Central Europe and the Baltics
232,CSS,1983,65.91458,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",S3,Caribbean small states
209,CSS,2006,69.942879,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",S3,Caribbean small states
269,CEB,2010,75.293966,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",B8,Central Europe and the Baltics
18,AFE,2005,54.219651,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
256,CEB,2023,,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",B8,Central Europe and the Baltics
42,AFE,1981,50.057073,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
205,CSS,2010,71.093767,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",S3,Caribbean small states
170,ARB,1981,57.692929,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",1A,Arab World
267,CEB,2012,75.993489,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",B8,Central Europe and the Baltics


Lets rename the columns. Drop columns that are not needed.

In [64]:
df_le_birth_total = df_le_birth_total.drop(columns=['countryiso3code','unit','obs_status','indicator.id','indicator.value','decimal'])

In [72]:
df_le_birth_total = df_le_birth_total.rename(columns={"date": "Year", "value": "Life_expectancy","country.id":"country_id","country.value":"country"})

In [70]:
gdp_ppp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 341 entries, 0 to 340
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Year            341 non-null    object 
 1   GDP_Per_Capita  191 non-null    float64
 2   country_id      341 non-null    object 
 3   country         341 non-null    object 
dtypes: float64(1), object(3)
memory usage: 10.8+ KB


In [73]:
df_le_birth_total.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 341 entries, 0 to 340
Data columns (total 4 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Year             341 non-null    object 
 1   Life_expectancy  335 non-null    float64
 2   country_id       341 non-null    object 
 3   country          341 non-null    object 
dtypes: float64(1), object(3)
memory usage: 10.8+ KB


### Merge two data frames

In [76]:
gdp_le_df = gdp_ppp.merge(df_le_birth_total)

In [77]:
gdp_le_df

Unnamed: 0,Year,GDP_Per_Capita,country_id,country,Life_expectancy
0,2023,4047.007031,ZH,Africa Eastern and Southern,
1,2022,4038.638689,ZH,Africa Eastern and Southern,62.899031
2,2021,3994.171654,ZH,Africa Eastern and Southern,62.454590
3,2020,3919.499230,ZH,Africa Eastern and Southern,63.313860
4,2019,4130.057222,ZH,Africa Eastern and Southern,63.755678
...,...,...,...,...,...
336,2007,7569.420392,V2,Early-demographic dividend,66.884712
337,2006,7257.102626,V2,Early-demographic dividend,66.592030
338,2005,6941.349729,V2,Early-demographic dividend,66.191503
339,2004,6652.360703,V2,Early-demographic dividend,65.742902
