# **UNData API Exercise**


In this exercise, you'll redo the data gathering phase of the UNData Exploration project by using APIs instead of downloading csv files.

You'll make use of the [World Bank Indicators API](https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-about-the-indicators-api-documentation). Note that this API does not require an API key. Before attempting the exercise, it would be a good idea to skim through the Documentation page and to check out the [Basic Call Structure article](https://datahelpdesk.worldbank.org/knowledgebase/articles/898581).

In [1]:
import requests
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
import json

### 1. Use the API to get all available data for the _GDP per capita, PPP (constant 2021 international $)_ indicator. Hint: this indicator has code "NY.GDP.PCAP.PP.KD". Adjust the query parameters so that you can retrieve all available rows. Convert the results to a DataFrame.

In [28]:
#from MH

country = "all"
indicator = "NY.GDP.PCAP.PP.KD"

URL = f"http://api.worldbank.org/v2/country/{country}/indicator/{indicator}"

params = {
    "format": "json",
    "per_page": 16758
}

response = requests.get(URL, params = params)

gdp_per_capita = pd.json_normalize(response.json()[1])
gdp_per_capita = gdp_per_capita.rename(columns = {'value': 'gdp_per_capita'})
gdp_per_capita.head(20)

Unnamed: 0,countryiso3code,date,gdp_per_capita,unit,obs_status,decimal,indicator.id,indicator.value,country.id,country.value
0,AFE,2023,4047.007031,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
1,AFE,2022,4038.638689,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
2,AFE,2021,3994.171654,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
3,AFE,2020,3919.49923,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
4,AFE,2019,4130.057222,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
5,AFE,2018,4142.950313,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
6,AFE,2017,4135.767937,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
7,AFE,2016,4125.361392,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
8,AFE,2015,4129.05512,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
9,AFE,2014,4108.190054,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern


An alternative approach to the task at hand:

In [19]:
#from TW

#variables
lst = []
endpoint = "http://api.worldbank.org/v2/country/all/indicator/"
indicator = 'NY.GDP.PCAP.PP.KD'
params = {
        'format': 'json',
        'per_page': 50
}
url = endpoint + indicator

#get number of pages
res = requests.get(url, params).json()
pages = res[0]['pages']

for page in range(1, pages + 1):
        params['pages'] = page
        res = requests.get(url, params).json()
        lst.append(res[1])

gdp = pd.json_normalize([r for d in lst for r in d])
gdp.head(20)

Unnamed: 0,countryiso3code,date,value,unit,obs_status,decimal,indicator.id,indicator.value,country.id,country.value
0,AFE,2023,4047.007031,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
1,AFE,2022,4038.638689,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
2,AFE,2021,3994.171654,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
3,AFE,2020,3919.49923,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
4,AFE,2019,4130.057222,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
5,AFE,2018,4142.950313,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
6,AFE,2017,4135.767937,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
7,AFE,2016,4125.361392,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
8,AFE,2015,4129.05512,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
9,AFE,2014,4108.190054,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern


### 2. Now, use the API to get all available data for Life expectancy at birth, total (years). This indicator has code "SP.DYN.LE00.IN". Again, convert the results to a DataFrame.

In [4]:
country = "all"
indicator = "SP.DYN.LE00.IN"

URL = f"http://api.worldbank.org/v2/country/{country}/indicator/{indicator}"

params = {
    "format": "json",
    "per_page": 16758,
}

response = requests.get(URL, params = params)

life_expectancy = pd.json_normalize(response.json()[1])
life_expectancy = life_expectancy.rename(columns = {'value': 'life_expectancy'})
life_expectancy.head(20)

Unnamed: 0,countryiso3code,date,life_expectancy,unit,obs_status,decimal,indicator.id,indicator.value,country.id,country.value
0,AFE,2023,,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
1,AFE,2022,62.899031,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
2,AFE,2021,62.45459,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
3,AFE,2020,63.31386,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
4,AFE,2019,63.755678,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
5,AFE,2018,63.365863,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
6,AFE,2017,62.92239,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
7,AFE,2016,62.44405,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
8,AFE,2015,61.856458,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
9,AFE,2014,61.337917,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern


### 3. Merge the two results DataFrames together. You may want to rename or drop columns prior to merging.



Cleaning up the Data Frame GDP Per Capita: (1) Keeping the columns of interest, (2) Renaming column headers

In [11]:
gdp_per_capita = gdp_per_capita[['date', 'gdp_per_capita', 'country.value']]
gdp_per_capita

Unnamed: 0,date,gdp_per_capita,country.value
0,2023,4047.007031,Africa Eastern and Southern
1,2022,4038.638689,Africa Eastern and Southern
2,2021,3994.171654,Africa Eastern and Southern
3,2020,3919.499230,Africa Eastern and Southern
4,2019,4130.057222,Africa Eastern and Southern
...,...,...,...
16753,1974,,Virgin Islands (U.S.)
16754,1973,,Virgin Islands (U.S.)
16755,1972,,Virgin Islands (U.S.)
16756,1971,,Virgin Islands (U.S.)


In [17]:
gdp_per_capita.rename(columns={'date': 'year', 'country.value': 'country'}, inplace=True)
gdp_per_capita

Unnamed: 0,year,gdp_per_capita,country
0,2023,4047.007031,Africa Eastern and Southern
1,2022,4038.638689,Africa Eastern and Southern
2,2021,3994.171654,Africa Eastern and Southern
3,2020,3919.499230,Africa Eastern and Southern
4,2019,4130.057222,Africa Eastern and Southern
...,...,...,...
16753,1974,,Virgin Islands (U.S.)
16754,1973,,Virgin Islands (U.S.)
16755,1972,,Virgin Islands (U.S.)
16756,1971,,Virgin Islands (U.S.)


Cleaning up the Data Frame Life Expectancy: (1) Keeping the columns of interest, (2) Renaming column headers

In [15]:
life_expectancy = life_expectancy[['date', 'life_expectancy', 'country.value']]
life_expectancy

Unnamed: 0,date,life_expectancy,country.value
0,2023,,Africa Eastern and Southern
1,2022,62.899031,Africa Eastern and Southern
2,2021,62.454590,Africa Eastern and Southern
3,2020,63.313860,Africa Eastern and Southern
4,2019,63.755678,Africa Eastern and Southern
...,...,...,...
16753,1974,67.283683,Virgin Islands (U.S.)
16754,1973,67.070488,Virgin Islands (U.S.)
16755,1972,66.389878,Virgin Islands (U.S.)
16756,1971,66.183220,Virgin Islands (U.S.)


In [16]:
life_expectancy.rename(columns={'date': 'year', 'country.value': 'country'}, inplace=True)
life_expectancy

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  life_expectancy.rename(columns={'date': 'year', 'country.value': 'country'}, inplace=True)


Unnamed: 0,year,life_expectancy,country
0,2023,,Africa Eastern and Southern
1,2022,62.899031,Africa Eastern and Southern
2,2021,62.454590,Africa Eastern and Southern
3,2020,63.313860,Africa Eastern and Southern
4,2019,63.755678,Africa Eastern and Southern
...,...,...,...
16753,1974,67.283683,Virgin Islands (U.S.)
16754,1973,67.070488,Virgin Islands (U.S.)
16755,1972,66.389878,Virgin Islands (U.S.)
16756,1971,66.183220,Virgin Islands (U.S.)


Merge these 2 data frames:

In [18]:
Merged_LifeExpectancy_GDP_PerCapita = pd.merge(life_expectancy, gdp_per_capita, on =['year', 'country'])
Merged_LifeExpectancy_GDP_PerCapita

Unnamed: 0,year,life_expectancy,country,gdp_per_capita
0,2023,,Africa Eastern and Southern,4047.007031
1,2022,62.899031,Africa Eastern and Southern,4038.638689
2,2021,62.454590,Africa Eastern and Southern,3994.171654
3,2020,63.313860,Africa Eastern and Southern,3919.499230
4,2019,63.755678,Africa Eastern and Southern,4130.057222
...,...,...,...,...
16753,1974,67.283683,Virgin Islands (U.S.),
16754,1973,67.070488,Virgin Islands (U.S.),
16755,1972,66.389878,Virgin Islands (U.S.),
16756,1971,66.183220,Virgin Islands (U.S.),


### 4. You can also get more information about the available countries (region, capital city, income level classification, etc.) by using the Country API. Use this API to pull in all available data. Merge this with your other datasets. Use this to now remove the rows that correspond to regions and not countries.

Access to electricity (% of population)

In [23]:
country = "all"
indicator = "EG.ELC.ACCS.ZS"

URL = f"http://api.worldbank.org/v2/country/{country}/indicator/{indicator}"

params = {
    "format": "json",
    "per_page": 16758,
}

response = requests.get(URL, params = params)

Acc_to_Electricity = pd.json_normalize(response.json()[1])
Acc_to_Electricity = Acc_to_Electricity.rename(columns = {'value': 'Percent_of_Pop_with_Access_to_Electricity'})
Acc_to_Electricity 

Unnamed: 0,countryiso3code,date,Percent_of_Pop_with_Access_to_Electricity,unit,obs_status,decimal,indicator.id,indicator.value,country.id,country.value
0,AFE,2023,,,,1,EG.ELC.ACCS.ZS,Access to electricity (% of population),ZH,Africa Eastern and Southern
1,AFE,2022,48.711995,,,1,EG.ELC.ACCS.ZS,Access to electricity (% of population),ZH,Africa Eastern and Southern
2,AFE,2021,48.100862,,,1,EG.ELC.ACCS.ZS,Access to electricity (% of population),ZH,Africa Eastern and Southern
3,AFE,2020,46.264875,,,1,EG.ELC.ACCS.ZS,Access to electricity (% of population),ZH,Africa Eastern and Southern
4,AFE,2019,44.381259,,,1,EG.ELC.ACCS.ZS,Access to electricity (% of population),ZH,Africa Eastern and Southern
...,...,...,...,...,...,...,...,...,...,...
16753,VIR,1974,,,,1,EG.ELC.ACCS.ZS,Access to electricity (% of population),VI,Virgin Islands (U.S.)
16754,VIR,1973,,,,1,EG.ELC.ACCS.ZS,Access to electricity (% of population),VI,Virgin Islands (U.S.)
16755,VIR,1972,,,,1,EG.ELC.ACCS.ZS,Access to electricity (% of population),VI,Virgin Islands (U.S.)
16756,VIR,1971,,,,1,EG.ELC.ACCS.ZS,Access to electricity (% of population),VI,Virgin Islands (U.S.)


Cleaning data from the above data frame: (1) Keeping only the most pertinent columns

In [24]:
Acc_to_Electricity = Acc_to_Electricity[['date', 'Percent_of_Pop_with_Access_to_Electricity', 'country.value']]
Acc_to_Electricity

Unnamed: 0,date,Percent_of_Pop_with_Access_to_Electricity,country.value
0,2023,,Africa Eastern and Southern
1,2022,48.711995,Africa Eastern and Southern
2,2021,48.100862,Africa Eastern and Southern
3,2020,46.264875,Africa Eastern and Southern
4,2019,44.381259,Africa Eastern and Southern
...,...,...,...
16753,1974,,Virgin Islands (U.S.)
16754,1973,,Virgin Islands (U.S.)
16755,1972,,Virgin Islands (U.S.)
16756,1971,,Virgin Islands (U.S.)


(2) Renaming Column Headers

In [25]:
Acc_to_Electricity.rename(columns={'date': 'year', 'country.value': 'country'}, inplace=True)
Acc_to_Electricity

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Acc_to_Electricity.rename(columns={'date': 'year', 'country.value': 'country'}, inplace=True)


Unnamed: 0,year,Percent_of_Pop_with_Access_to_Electricity,country
0,2023,,Africa Eastern and Southern
1,2022,48.711995,Africa Eastern and Southern
2,2021,48.100862,Africa Eastern and Southern
3,2020,46.264875,Africa Eastern and Southern
4,2019,44.381259,Africa Eastern and Southern
...,...,...,...
16753,1974,,Virgin Islands (U.S.)
16754,1973,,Virgin Islands (U.S.)
16755,1972,,Virgin Islands (U.S.)
16756,1971,,Virgin Islands (U.S.)


Merge this new data frame (Acc_to_Electricity) with the data frame that was newly built with GDP per Capita data and Life Expectancy data. 

In [26]:
Merged_Electricity_LifeExpectancy_GDP_PerCapita = pd.merge(Acc_to_Electricity, Merged_LifeExpectancy_GDP_PerCapita, on =['year', 'country'])
Merged_Electricity_LifeExpectancy_GDP_PerCapita

Unnamed: 0,year,Percent_of_Pop_with_Access_to_Electricity,country,life_expectancy,gdp_per_capita
0,2023,,Africa Eastern and Southern,,4047.007031
1,2022,48.711995,Africa Eastern and Southern,62.899031,4038.638689
2,2021,48.100862,Africa Eastern and Southern,62.454590,3994.171654
3,2020,46.264875,Africa Eastern and Southern,63.313860,3919.499230
4,2019,44.381259,Africa Eastern and Southern,63.755678,4130.057222
...,...,...,...,...,...
16753,1974,,Virgin Islands (U.S.),67.283683,
16754,1973,,Virgin Islands (U.S.),67.070488,
16755,1972,,Virgin Islands (U.S.),66.389878,
16756,1971,,Virgin Islands (U.S.),66.183220,


Select rows that are of countries only, i.e. excluding regions: (1) Obtain a list of all the countries in the world

In [31]:
URL = f'https://api.worldbank.org/v2/country'

params = {
    "format": "json",
    "per_page": 16758,
}

response = requests.get(URL, params = params)

ListOfCountries = pd.json_normalize(response.json()[1])
ListOfCountries = ListOfCountries.rename(columns = {'name': 'country'})
ListOfCountries 

Unnamed: 0,id,iso2Code,country,capitalCity,longitude,latitude,region.id,region.iso2code,region.value,adminregion.id,adminregion.iso2code,adminregion.value,incomeLevel.id,incomeLevel.iso2code,incomeLevel.value,lendingType.id,lendingType.iso2code,lendingType.value
0,ABW,AW,Aruba,Oranjestad,-70.0167,12.5167,LCN,ZJ,Latin America & Caribbean,,,,HIC,XD,High income,LNX,XX,Not classified
1,AFE,ZH,Africa Eastern and Southern,,,,,,Aggregates,,,,,,Aggregates,,,Aggregates
2,AFG,AF,Afghanistan,Kabul,69.1761,34.5228,SAS,8S,South Asia,SAS,8S,South Asia,LIC,XM,Low income,IDX,XI,IDA
3,AFR,A9,Africa,,,,,,Aggregates,,,,,,Aggregates,,,Aggregates
4,AFW,ZI,Africa Western and Central,,,,,,Aggregates,,,,,,Aggregates,,,Aggregates
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
291,XZN,A5,Sub-Saharan Africa excluding South Africa and ...,,,,,,Aggregates,,,,,,Aggregates,,,Aggregates
292,YEM,YE,"Yemen, Rep.",Sana'a,44.2075,15.352,MEA,ZQ,Middle East & North Africa,MNA,XQ,Middle East & North Africa (excluding high inc...,LIC,XM,Low income,IDX,XI,IDA
293,ZAF,ZA,South Africa,Pretoria,28.1871,-25.746,SSF,ZG,Sub-Saharan Africa,SSA,ZF,Sub-Saharan Africa (excluding high income),UMC,XT,Upper middle income,IBD,XF,IBRD
294,ZMB,ZM,Zambia,Lusaka,28.2937,-15.3982,SSF,ZG,Sub-Saharan Africa,SSA,ZF,Sub-Saharan Africa (excluding high income),LMC,XN,Lower middle income,IDX,XI,IDA


In [32]:
ListOfCountries = ListOfCountries[['country']]
ListOfCountries

Unnamed: 0,country
0,Aruba
1,Africa Eastern and Southern
2,Afghanistan
3,Africa
4,Africa Western and Central
...,...
291,Sub-Saharan Africa excluding South Africa and ...
292,"Yemen, Rep."
293,South Africa
294,Zambia


Merge the aforementioned List of Countries with the recently merged Data Frame regarding Access to Electricity around the world:

In [33]:
Merged_Countries_Electricity_LifeExpectancy_GDP_PerCapita = pd.merge(ListOfCountries, Merged_Electricity_LifeExpectancy_GDP_PerCapita, on =['country'])
Merged_Countries_Electricity_LifeExpectancy_GDP_PerCapita

Unnamed: 0,country,year,Percent_of_Pop_with_Access_to_Electricity,life_expectancy,gdp_per_capita
0,Aruba,2023,,,
1,Aruba,2022,99.9,74.992,42260.451837
2,Aruba,2021,100.0,74.626,38226.146157
3,Aruba,2020,100.0,75.723,29935.070278
4,Aruba,2019,100.0,76.248,39432.128933
...,...,...,...,...,...
16625,South Africa,1964,,53.906,
16626,South Africa,1963,,53.633,
16627,South Africa,1962,,53.376,
16628,South Africa,1961,,53.085,
