In this exercise, you'll redo the data gathering phase of the UNData Exploration project by using APIs instead of downloading csv files.

You'll make use of the [World Bank Indicators API](https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-about-the-indicators-api-documentation). Note that this API does not require an API key. Before attempting the exercise, it would be a good idea to skim through the Documentation page and to check out the [Basic Call Structure article](https://datahelpdesk.worldbank.org/knowledgebase/articles/898581)

In [27]:
import requests
import matplotlib.pyplot as plt
import pandas as pd

### Call the endpoint - https://api.worldbank.org/v2/country/all/indicator/NY.GDP.PCAP.PP.KD

In [28]:
#setting per page 341 so that we can pull all rows
endpoint = 'https://api.worldbank.org/v2/country/all/indicator/NY.GDP.PCAP.PP.KD'

params = {
    'format': 'json',
    "per_page" : 17024
}

response = requests.get(endpoint, params = params)

res = response.json()

In [29]:
#review keys
res[1][0].keys()

dict_keys(['indicator', 'country', 'countryiso3code', 'date', 'value', 'unit', 'obs_status', 'decimal'])

In [30]:
#review how to access data
res[1][0]

{'indicator': {'id': 'NY.GDP.PCAP.PP.KD',
  'value': 'GDP per capita, PPP (constant 2021 international $)'},
 'country': {'id': 'ZH', 'value': 'Africa Eastern and Southern'},
 'countryiso3code': 'AFE',
 'date': '2023',
 'value': 4047.00703115979,
 'unit': '',
 'obs_status': '',
 'decimal': 0}

In [31]:
gdp_ppp = pd.json_normalize(response.json()[1])

In [32]:
gdp_ppp.sample(10)

Unnamed: 0,countryiso3code,date,value,unit,obs_status,decimal,indicator.id,indicator.value,country.id,country.value
9598,KAZ,1961,,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",KZ,Kazakhstan
15929,TKM,1966,,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",TM,Turkmenistan
8223,GUM,1992,,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",GU,Guam
2006,MNA,2001,9626.812308,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",XQ,Middle East & North Africa (excluding high inc...
16530,VUT,2005,2993.003868,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",VU,Vanuatu
8190,GRD,1961,,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",GD,Grenada
1369,IDA,1998,2668.393453,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",XG,IDA total
5257,KHM,2014,3621.718465,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",KH,Cambodia
9292,ITA,2011,50289.856294,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",IT,Italy
2578,PRE,2005,3406.427695,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",V1,Pre-demographic dividend


Lets clean the columns

In [33]:
gdp_ppp = gdp_ppp.drop(columns=['countryiso3code','unit','obs_status','indicator.id','indicator.value','decimal'])

In [34]:
gdp_ppp = gdp_ppp.rename(columns={"date": "Year", "value": "GDP_Per_Capita","country.id":"country_id","country.value":"country"})

In [35]:
gdp_ppp

Unnamed: 0,Year,GDP_Per_Capita,country_id,country
0,2023,4047.007031,ZH,Africa Eastern and Southern
1,2022,4038.638689,ZH,Africa Eastern and Southern
2,2021,3994.171654,ZH,Africa Eastern and Southern
3,2020,3919.499230,ZH,Africa Eastern and Southern
4,2019,4130.057222,ZH,Africa Eastern and Southern
...,...,...,...,...
17019,1964,,ZW,Zimbabwe
17020,1963,,ZW,Zimbabwe
17021,1962,,ZW,Zimbabwe
17022,1961,,ZW,Zimbabwe


### Call the endpoint - https://api.worldbank.org/v2/country/all/indicator/SP.DYN.LE00.IN

In [36]:
endpoint = 'https://api.worldbank.org/v2/country/all/indicator/SP.DYN.LE00.IN'

params = {
    'format': 'json',
    "per_page" : 17024
}

response = requests.get(endpoint, params = params)

res = response.json()

In [37]:
#review data
res[1][0]

{'indicator': {'id': 'SP.DYN.LE00.IN',
  'value': 'Life expectancy at birth, total (years)'},
 'country': {'id': 'ZH', 'value': 'Africa Eastern and Southern'},
 'countryiso3code': 'AFE',
 'date': '2023',
 'value': None,
 'unit': '',
 'obs_status': '',
 'decimal': 0}

In [38]:
df_le_birth_total = pd.json_normalize(response.json()[1])

In [39]:
df_le_birth_total[df_le_birth_total['country.id'] == 'GE']

Unnamed: 0,countryiso3code,date,value,unit,obs_status,decimal,indicator.id,indicator.value,country.id,country.value
7744,GEO,2023,,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",GE,Georgia
7745,GEO,2022,71.587,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",GE,Georgia
7746,GEO,2021,71.694,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",GE,Georgia
7747,GEO,2020,72.765,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",GE,Georgia
7748,GEO,2019,73.470,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",GE,Georgia
...,...,...,...,...,...,...,...,...,...,...
7803,GEO,1964,62.333,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",GE,Georgia
7804,GEO,1963,62.228,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",GE,Georgia
7805,GEO,1962,61.858,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",GE,Georgia
7806,GEO,1961,61.416,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",GE,Georgia


Lets rename the columns. Drop columns that are not needed.

In [40]:
df_le_birth_total = df_le_birth_total.drop(columns=['countryiso3code','unit','obs_status','indicator.id','indicator.value','decimal'])

In [41]:
df_le_birth_total = df_le_birth_total.rename(columns={"date": "Year", "value": "Life_expectancy","country.id":"country_id","country.value":"country"})

In [42]:
gdp_ppp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17024 entries, 0 to 17023
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Year            17024 non-null  object 
 1   GDP_Per_Capita  8175 non-null   float64
 2   country_id      17024 non-null  object 
 3   country         17024 non-null  object 
dtypes: float64(1), object(3)
memory usage: 532.1+ KB


In [43]:
df_le_birth_total.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17024 entries, 0 to 17023
Data columns (total 4 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Year             17024 non-null  object 
 1   Life_expectancy  16124 non-null  float64
 2   country_id       17024 non-null  object 
 3   country          17024 non-null  object 
dtypes: float64(1), object(3)
memory usage: 532.1+ KB


### Merge two data frames

In [44]:
gdp_le_df = gdp_ppp.merge(df_le_birth_total)

In [45]:
#look up some countries
gdp_le_df[gdp_le_df['country_id' ]== 'A5']

Unnamed: 0,Year,GDP_Per_Capita,country_id,country,Life_expectancy


#### You can also get more information about the available countries (region, capital city, income level classification, etc.) by using the [Country API](https://datahelpdesk.worldbank.org/knowledgebase/articles/898590-country-api-queries). Use this API to pull in all available data. Merge this with your other datasets. Use this to now remove the rows that correspond to regions and not countries.

In [46]:
endpoint = 'https://api.worldbank.org/v2/country'

params = {
    'format': 'json',
    "per_page" : 296
}

response = requests.get(endpoint, params = params)

res = response.json()

In [47]:
country_code_list = []
country_name_List = []
country_capital_list = []
country_region_list = []
country_income_list =[]

for i in range(len(res[1])):
    country_code_list.append(res[1][i]['iso2Code'])
    country_name_List.append(res[1][i]['name'])
    country_capital_list.append(res[1][i]['capitalCity'])
    country_region_list.append(res[1][i]['region']['value'])
    country_income_list.append(res[1][i]['incomeLevel']['value'])

country_dict = {'country_id': country_code_list, 'country': country_name_List, 'capital': country_capital_list, 'region': country_region_list, 'income_level': country_income_list}
country_df = pd.DataFrame(country_dict)
   
print(country_df)

    country_id                                            country     capital  \
0           AW                                              Aruba  Oranjestad   
1           ZH                        Africa Eastern and Southern               
2           AF                                        Afghanistan       Kabul   
3           A9                                             Africa               
4           ZI                         Africa Western and Central               
..         ...                                                ...         ...   
291         A5  Sub-Saharan Africa excluding South Africa and ...               
292         YE                                        Yemen, Rep.      Sana'a   
293         ZA                                       South Africa    Pretoria   
294         ZM                                             Zambia      Lusaka   
295         ZW                                           Zimbabwe      Harare   

                         re

In [48]:
gdp_le_country_final = gdp_le_df.merge(country_df,how='outer')
gdp_le_country_final

Unnamed: 0,Year,GDP_Per_Capita,country_id,country,Life_expectancy,capital,region,income_level
0,2023,4047.007031,ZH,Africa Eastern and Southern,,,Aggregates,Aggregates
1,2022,4038.638689,ZH,Africa Eastern and Southern,62.899031,,Aggregates,Aggregates
2,2021,3994.171654,ZH,Africa Eastern and Southern,62.454590,,Aggregates,Aggregates
3,2020,3919.499230,ZH,Africa Eastern and Southern,63.313860,,Aggregates,Aggregates
4,2019,4130.057222,ZH,Africa Eastern and Southern,63.755678,,Aggregates,Aggregates
...,...,...,...,...,...,...,...,...
17051,,,6N,IDA countries not classified as fragile situat...,,,Aggregates,Aggregates
17052,,,R6,Resource rich Sub-Saharan Africa countries,,,Aggregates,Aggregates
17053,,,ZG,Sub-Saharan Africa,,,Aggregates,Aggregates
17054,,,A4,Sub-Saharan Africa excluding South Africa,,,Aggregates,Aggregates


In [49]:
#should this be dropped?
gdp_le_country_final[gdp_le_country_final['region' ] == 'Aggregates']

Unnamed: 0,Year,GDP_Per_Capita,country_id,country,Life_expectancy,capital,region,income_level
0,2023,4047.007031,ZH,Africa Eastern and Southern,,,Aggregates,Aggregates
1,2022,4038.638689,ZH,Africa Eastern and Southern,62.899031,,Aggregates,Aggregates
2,2021,3994.171654,ZH,Africa Eastern and Southern,62.454590,,Aggregates,Aggregates
3,2020,3919.499230,ZH,Africa Eastern and Southern,63.313860,,Aggregates,Aggregates
4,2019,4130.057222,ZH,Africa Eastern and Southern,63.755678,,Aggregates,Aggregates
...,...,...,...,...,...,...,...,...
17051,,,6N,IDA countries not classified as fragile situat...,,,Aggregates,Aggregates
17052,,,R6,Resource rich Sub-Saharan Africa countries,,,Aggregates,Aggregates
17053,,,ZG,Sub-Saharan Africa,,,Aggregates,Aggregates
17054,,,A4,Sub-Saharan Africa excluding South Africa,,,Aggregates,Aggregates




**Bonus Questions:** After doing a basic request to get all records, you can attempt these exercises to get additional practice using query parameters.

#### 1. Adjust your request so that it returns data just for the United States.

In [50]:
endpoint = 'https://api.worldbank.org/v2/country/us'

params = {
    'format': 'json'
}

response = requests.get(endpoint, params = params)

res = response.json()
res

[{'page': 1, 'pages': 1, 'per_page': '50', 'total': 1},
 [{'id': 'USA',
   'iso2Code': 'US',
   'name': 'United States',
   'region': {'id': 'NAC', 'iso2code': 'XU', 'value': 'North America'},
   'adminregion': {'id': '', 'iso2code': '', 'value': ''},
   'incomeLevel': {'id': 'HIC', 'iso2code': 'XD', 'value': 'High income'},
   'lendingType': {'id': 'LNX', 'iso2code': 'XX', 'value': 'Not classified'},
   'capitalCity': 'Washington D.C.',
   'longitude': '-77.032',
   'latitude': '38.8895'}]]

#### 2. Adjust your request so that it returns data just for the United States for the year 2021.

In [51]:
endpoint = 'https://api.worldbank.org/v2/country/us/indicator/SP.DYN.LE00.IN'

params = {
    'format': 'json',
    'date' : 2021
}

response = requests.get(endpoint, params = params)

res = response.json()
res

[{'page': 1,
  'pages': 1,
  'per_page': 50,
  'total': 1,
  'sourceid': '2',
  'lastupdated': '2024-10-24'},
 [{'indicator': {'id': 'SP.DYN.LE00.IN',
    'value': 'Life expectancy at birth, total (years)'},
   'country': {'id': 'US', 'value': 'United States'},
   'countryiso3code': 'USA',
   'date': '2021',
   'value': 76.3292682926829,
   'unit': '',
   'obs_status': '',
   'decimal': 0}]]

####  3. Adjust your request so that it returns data just for the United States for the years 2000 through 2021.

In [52]:
endpoint = 'https://api.worldbank.org/v2/country/us/indicator/SP.DYN.LE00.IN'

params = {
    'format': 'json',
    'date' : '2000:2021'
}

response = requests.get(endpoint, params = params)

res = response.json()
res
#response.url

[{'page': 1,
  'pages': 1,
  'per_page': 50,
  'total': 22,
  'sourceid': '2',
  'lastupdated': '2024-10-24'},
 [{'indicator': {'id': 'SP.DYN.LE00.IN',
    'value': 'Life expectancy at birth, total (years)'},
   'country': {'id': 'US', 'value': 'United States'},
   'countryiso3code': 'USA',
   'date': '2021',
   'value': 76.3292682926829,
   'unit': '',
   'obs_status': '',
   'decimal': 0},
  {'indicator': {'id': 'SP.DYN.LE00.IN',
    'value': 'Life expectancy at birth, total (years)'},
   'country': {'id': 'US', 'value': 'United States'},
   'countryiso3code': 'USA',
   'date': '2020',
   'value': 76.9804878048781,
   'unit': '',
   'obs_status': '',
   'decimal': 0},
  {'indicator': {'id': 'SP.DYN.LE00.IN',
    'value': 'Life expectancy at birth, total (years)'},
   'country': {'id': 'US', 'value': 'United States'},
   'countryiso3code': 'USA',
   'date': '2019',
   'value': 78.7878048780488,
   'unit': '',
   'obs_status': '',
   'decimal': 0},
  {'indicator': {'id': 'SP.DYN.LE00.I

#### 4. Adjust your request so that it returns data for the United States and Canada for the years 2000 through 2021.

In [53]:
endpoint = 'https://api.worldbank.org/v2/country/us;ca/indicator/SP.DYN.LE00.IN'

params = {
    'format': 'json',
    "per_page" : 128,
    'date' : '2000:2021'
}

response = requests.get(endpoint, params = params)

res = response.json()
res
#response.url

[{'page': 1,
  'pages': 1,
  'per_page': 128,
  'total': 44,
  'sourceid': '2',
  'lastupdated': '2024-10-24'},
 [{'indicator': {'id': 'SP.DYN.LE00.IN',
    'value': 'Life expectancy at birth, total (years)'},
   'country': {'id': 'CA', 'value': 'Canada'},
   'countryiso3code': 'CAN',
   'date': '2021',
   'value': 81.5870731707317,
   'unit': '',
   'obs_status': '',
   'decimal': 0},
  {'indicator': {'id': 'SP.DYN.LE00.IN',
    'value': 'Life expectancy at birth, total (years)'},
   'country': {'id': 'CA', 'value': 'Canada'},
   'countryiso3code': 'CAN',
   'date': '2020',
   'value': 81.6604878048781,
   'unit': '',
   'obs_status': '',
   'decimal': 0},
  {'indicator': {'id': 'SP.DYN.LE00.IN',
    'value': 'Life expectancy at birth, total (years)'},
   'country': {'id': 'CA', 'value': 'Canada'},
   'countryiso3code': 'CAN',
   'date': '2019',
   'value': 82.2239024390244,
   'unit': '',
   'obs_status': '',
   'decimal': 0},
  {'indicator': {'id': 'SP.DYN.LE00.IN',
    'value': 'Li

#### 5. If you haven't already done so and you would like to get some additional practice using loops, use the page parameter in order to pull all records. Do not change the value of the per_page parameter. You will likely need to utilize a loop of some kind in order to pull all records.

In [63]:
page=1
endpoint = 'https://api.worldbank.org/v2/country/'
params = {
    'format': 'json'
}
#finalData=[]
params['page'] = page

response = requests.get(endpoint, params = params)
res = response.json()
res[0]['pages']
#while(page <= 2):
#    finalData.append(res[1])
#    page = page + 1
#    params['page'] = page
#    params
#    response = requests.get(endpoint, params = params)
#    res = response.json()
#    res

#finalData


    




6