<a href="https://colab.research.google.com/github/franzruch/rrf24_training_ulli/blob/main/1-foundations/4-api-and-dataviz/foundations-s4-api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Interacting with APIs using Python

Now that we understanding what APIs are, we can start interacting with them programatically.

# The `requests` library

- `requests` is a Python library to interact with information from the internet
- It sends and receives data to and from URLs
- You can think of it as a web browser (Chrome, Firefox), but without the graphic interface
- APIs are URLs, so we can use `requests` to interact with APIs in Python

To enable the use `requests` , run:

In [4]:
import requests

Requests is a library but is always part of "base" Python, so you don't have to install it.

## Sending a request and receiving information from the web

- When you access a URL in your web browser, it is sending a request to receive information from a web server
- Usually, most of the information your browser receives as a response to your request is in HTML format
- Your web browser then renders the HTML information in the response and shows it to you
- A familiar example:

`https://www.worldbank.org/ext/en/home`

![world-bank-site](https://github.com/worldbank/dec-python-course/blob/main/1-foundations/4-api-and-dataviz/img/world-bank-site.png?raw=1)

Interacting with the web using Python through `requests` is not different than this. The basic syntax of `requests` is the following:

`requests.get(my_url)`

- `my_url` is a string with the URL address you want to access
- the `get()` commands of `requests` uses your internet connection to enter a URL address in the internet and obtain a response with information from it
- you can save the response in a Python variable this way:

`response = requests.get(my_url)`

See the following example:

In [1]:
url = 'https://www.worldbank.org/ext/en/home'

In [5]:
response = requests.get(url)

`response` is a variable of an ad-hoc type used by the `requests` library, similarly to how data frames and series are custom variable types from the Pandas library.

In [6]:
type(response)

The response to your request will be stored in `response` even if it failed. To check if your request was successful or not, use the attribute `status_code` of this variable type or print `response`.

In [7]:
response.status_code

200

In [8]:
print(response)

<Response [200]>


A status code of 200 means that your request received a successful response. These are some of the most common types of response codes:

- **200 - OK:** successful request
- **403 - Forbidden:** the user (you) is not authorized to access that URL
- **404 - Not found:** the web server cannot find the requested resource (often because your URL is incorrect)
- **429 - Too many requests:** the web server has a limit on how many requests a user can send over a period of time (rate limit) and you went over that limit
- **500 - Internal server error:** the request didn't work because of an unspecified error originated by the web server, not the user

If a request is successful, the response variable will contain the content (also called "body") of the response from the server. When you access a URL from your web browser, this is the part that contains the HTML code that your browser renders.

In `response`, the response content is in the `content` attribute:

In [9]:
response.content

b'<!DOCTYPE html>\n<html>\n  <head>\n    <title>World Bank Group - International Development, Poverty and Sustainability</title>\n    <link rel="canonical" href="https://www.worldbank.org/ext/en/home">\n    <meta name="description" content="With 189 member countries, the World Bank Group is a unique global partnership fighting poverty worldwide through sustainable solutions.">\n    <meta property="og:title" content="World Bank Group - International Development, Poverty and Sustainability">\n    <meta property="og:description" content="With 189 member countries, the World Bank Group is a unique global partnership fighting poverty worldwide through sustainable solutions.">\n    <meta property="og:url" content="https://www.worldbank.org/ext/en/home">\n    <meta property="og:image" content="https://www.worldbank.org/ext/en/media_13e1e0ebaf2e8653d9a179f051a057aba984252e3.jpeg?width=1200&#x26;format=pjpg&#x26;optimize=medium">\n    <meta property="og:image:secure_url" content="https://www.wo

## Interacting with APIs

- `requests` works very similarly when you use it to interact with an API instead of a URL with HTML code in the response content
- the only difference is that the content of an API request will be in a data-friendly format, such as JSON
- JSON is a format for storing data that is commonly used to transfer data in the web
- Python handles JSON data and reads them into lists or dictionaries (more on this below)

Remember the API to get live data from the ISS? http://open-notify.org

We'll retrieve information from one of its two endpoints:

- Astronauts in space now - http://api.open-notify.org/astros.json

In [10]:
url = 'http://api.open-notify.org/astros.json'

In [11]:
response = requests.get(url)

In [12]:
response.status_code

200

In [13]:
response.content

b'{"people": [{"craft": "ISS", "name": "Oleg Kononenko"}, {"craft": "ISS", "name": "Nikolai Chub"}, {"craft": "ISS", "name": "Tracy Caldwell Dyson"}, {"craft": "ISS", "name": "Matthew Dominick"}, {"craft": "ISS", "name": "Michael Barratt"}, {"craft": "ISS", "name": "Jeanette Epps"}, {"craft": "ISS", "name": "Alexander Grebenkin"}, {"craft": "ISS", "name": "Butch Wilmore"}, {"craft": "ISS", "name": "Sunita Williams"}, {"craft": "Tiangong", "name": "Li Guangsu"}, {"craft": "Tiangong", "name": "Li Cong"}, {"craft": "Tiangong", "name": "Ye Guangfu"}], "number": 12, "message": "success"}'

Response variable types from `requests` have the `json()` method which works very efficiently to convert JSON contents of a response into Python dictionaries or lists, depending on the format of the JSON data.

In [15]:
response.json()

{'people': [{'craft': 'ISS', 'name': 'Oleg Kononenko'},
  {'craft': 'ISS', 'name': 'Nikolai Chub'},
  {'craft': 'ISS', 'name': 'Tracy Caldwell Dyson'},
  {'craft': 'ISS', 'name': 'Matthew Dominick'},
  {'craft': 'ISS', 'name': 'Michael Barratt'},
  {'craft': 'ISS', 'name': 'Jeanette Epps'},
  {'craft': 'ISS', 'name': 'Alexander Grebenkin'},
  {'craft': 'ISS', 'name': 'Butch Wilmore'},
  {'craft': 'ISS', 'name': 'Sunita Williams'},
  {'craft': 'Tiangong', 'name': 'Li Guangsu'},
  {'craft': 'Tiangong', 'name': 'Li Cong'},
  {'craft': 'Tiangong', 'name': 'Ye Guangfu'}],
 'number': 12,
 'message': 'success'}

The result of `response.json()` is a Python dictionary for this example.

In [16]:
data_dic = response.json()
type(data_dic)

dict

Now that the response content is saved into a dictionary, we can explore more about its content.

In [17]:
data_dic.keys()

dict_keys(['people', 'number', 'message'])

In [18]:
data_dic['number']

12

In [19]:
data_dic['people']

[{'craft': 'ISS', 'name': 'Oleg Kononenko'},
 {'craft': 'ISS', 'name': 'Nikolai Chub'},
 {'craft': 'ISS', 'name': 'Tracy Caldwell Dyson'},
 {'craft': 'ISS', 'name': 'Matthew Dominick'},
 {'craft': 'ISS', 'name': 'Michael Barratt'},
 {'craft': 'ISS', 'name': 'Jeanette Epps'},
 {'craft': 'ISS', 'name': 'Alexander Grebenkin'},
 {'craft': 'ISS', 'name': 'Butch Wilmore'},
 {'craft': 'ISS', 'name': 'Sunita Williams'},
 {'craft': 'Tiangong', 'name': 'Li Guangsu'},
 {'craft': 'Tiangong', 'name': 'Li Cong'},
 {'craft': 'Tiangong', 'name': 'Ye Guangfu'}]

In [None]:
data_dic['message']

Furthermore, we can transform the list inside `data['people']` into a Pandas dataframe for further analysis.

In [20]:
import pandas as pd

In [21]:
df = pd.DataFrame(data=data_dic['people'])

In [22]:
df

Unnamed: 0,craft,name
0,ISS,Oleg Kononenko
1,ISS,Nikolai Chub
2,ISS,Tracy Caldwell Dyson
3,ISS,Matthew Dominick
4,ISS,Michael Barratt
5,ISS,Jeanette Epps
6,ISS,Alexander Grebenkin
7,ISS,Butch Wilmore
8,ISS,Sunita Williams
9,Tiangong,Li Guangsu


**Note:** the content and format of the JSON data in the response is specific to the API endpoint you access.

# Exercise 1

Create a function that repeats the steps shown above and returns the latitude and longitude of the current location of the ISS. The location endpoint is this: `http://api.open-notify.org/iss-now.json`.

Suggested steps:

1. use `requests` to send a request for this URL and store the result in a variable
2. If the response was not successful, return `None`
3. If it was, extract the response content into a dictionary
4. Extract the latitude and longitude from the dictionary and return them in your function

In [None]:
# Note: this function should not take any inputs. Leave the parentheses empty after the function name
def iss_location():

    # === REPLACE THE EMPTY STRINGS AND None BELOW TO ADD YOUR SOLUTION ===

    url = 'http://api.open-notify.org/astros.json'            # 1. Add URL here
    response = requests.get()     # 2. use requests to get the response of the URL

    if response.status_code == 200:

        data = response.json()         # 3. Extract the data from response with the json() method
        latitude = float(data['issposition']['latitude'])     # 4. Extract the latitude from the data dictionary
        longitude = float(data['issposition']['longitude'])   # 5. Extract the logitude from the data dictionary

        return latitude, longitude

    # === DO NOT MODIFY THE FUNCTION FROM THIS POINT ON ===
    else:
        print('Request failed!')
        return None

Remove the first line of the following block and run it to verify that your solution works. It should not return an error.

In [None]:
%%script echo Remove this line after filling in your own code

position = iss_location()
assert isinstance(position[0], float) and isinstance(position[1], float)

# Coding a simple API client

- An **API client** is a piece of code that facilitates the interaction with an API
- In the example with the astronauts, we had to go through several coding steps to execute the request, obtain the JSON data, and load it into a Pandas data frame
- All of those steps could be packed in a Python function that simplified the process of interacting with the API. That function is an API client
- In fact, in exercise 1 you were inadvertently creating an API client!
- Most APIs require custom information to be passed in the URL. This can be incorporated to an API client, as in the examples below

## Example: URL-based parameters

We previously introduced the geoBoundaries API example. After exploring this API and its documentation, we knew that it takes URLs with the following generic form:

`https://www.geoboundaries.org/api/current/gbOpen/[3-letter-iso-code]/[admin-level]/`

Then, we can build a function that takes the 3-letter ISO code and the administative boundaries level as parameters to automate API calls.

In [None]:
def fetch_geoboundaries_data(country_code, admin_level):

    endpoint = 'https://www.geoboundaries.org/api/current/gbOpen/'
    url = endpoint + country_code + '/ADM' + str(admin_level)
    response = requests.get(url)

    if response.status_code == 200:

        data = response.json()
        return data

    else:

        print('Request failed!')
        return None

We can use our new function to fetch admin-1 level data from Kenya.

In [None]:
kenya_data = fetch_geoboundaries_data('KEN', 1)

In [None]:
kenya_data

A few notes:
- You might have noticed that the result stored in `kenya_data` is not the actual admin-1 level boundaries, but _metadata_ about the boundaries data
- A visual inspection of `kenya_data` shows that a URL to the data is in the key `simplifiedGeometryGeoJSON`

In [None]:
kenya_data['simplifiedGeometryGeoJSON']

- You can use `requests` once again to fetch the data from this URL

**Important:**
- Not all APIs will provide direct access to the information you need
- Many will require additional coding to get from the initial API call to the data of your interest

## Coding API clients - Takeaways

- Programming a client for an API requires reviewing the documentation, understanding the API uses, and how they fit your needs
- Many APIs will use several URL-based parameters to pass information in API queries. A good API client will take that into account to build the correct query URL and parameters argument
- Remember that further coding might be needed to get from the API result to the information that is relevant for a user
- Some APIs divide results with many observations in pages and return only a limited number of pages. It's up to the user to review the results and take measures to ensure data is complete (check the notebook with bonus content for examples on this)

# Exercise 2a

Create a function that builds upon `fetch_geoboundaries_data()` and returns the actual geographic data from the API.

Suggested steps:

1. Inspect the result of `fetch_geoboundaries_data()` to locate where is the URL with the geographic data in the resulting dictionary
1. Access the URL using `requests.get()` and store the response in a variable
1. Use the `.json()` method to transform the response content in a dictionary and return that variable

In [None]:
#  == DO NOT MODIFY THIS FUNCTION BUT THE NEXT ==
def fetch_geoboundaries_data(country_code, admin_level):

    endpoint = 'https://www.geoboundaries.org/api/current/gbOpen/'
    url = endpoint + country_code + '/ADM' + str(admin_level)
    response = requests.get(url)

    if response.status_code == 200:

        data = response.json()
        return data

    else:

        print('Request failed!')
        return None

In [None]:
# == MODIFY THIS FUNCTION FOR YOUR ANSWER ==
def obtain_geodata(country_code, admin_level):

    metadata = fetch_geoboundaries_data(country_code, admin_level)

    if metadata is None:
        # this is a check to see if the previous line worked. Do not modify it
        return None

    else:
        # === REPLACE THE None BELOW TO ADD YOUR SOLUTION ===

        data_url = None       # 1. Extract the URL containing the data from the metadata dictionary
        response = None       # 2. Use requests to get a new reponse from that URL

        if response.status_code == 200:
            geojson_data = None    # 3. Use the json() method to extract the data from response
            return geojson_data

        # === DO NOT MODIFY THE FUNCTION FROM THIS POINT ON ===
        else:
            print('Request failed!')
            return None

# Python API client libraries

- Programming a client for an API is not always needed
- Many APIs have their own client in the form of a Python library
- Check the example of [`geopy`](https://geopy.readthedocs.io/), a [Nominatim](https://nominatim.org/) encoder. Nominatim operations are also called "reverse geocoding"

In [None]:
# Installing geopy in your personal library
!pip install geopy

In [None]:
from geopy.geocoders import Nominatim

In [None]:
# Important: please add a user alias in this function
geolocator = Nominatim(user_agent='write-your-alias-here')

In [None]:
query = geolocator.geocode('1818 H St NW, Washington DC, USA')

In [None]:
print(query)

`query` has an ad-hoc variable type from this library.

In [None]:
type(query)

If you want to check the attributes and methods of this variable type, you can use `help(query)`, `dir(query)`, `query?`, or check the [`geopy` library documentation](https://geopy.readthedocs.io/en/stable/).

In [None]:
query?

In [None]:
print('The address of the WB main building is: {}'.format(query.address))
print('The location of the WB main building is: {}, {}'.format(query.latitude, query.longitude))

When an API client is a Python library, it will most probably have ad-hoc variable types with particular attributes and methods. You will have to check the corresponding library documentation to learn how to operate with them.

Please also note the following:
- We didn't have to code an API client using `requests` this time: `geopy` is the API client
- The results of our query are not in JSON format. `geopy` returns an ad-hoc variable class with the attributes `.adress`, `.latitude`, and `.longitude`, among others
- We've mostly seen examples of database query APIs, but APIs can do much more!
    + `geopy` is an example of an API that does some data processing with the information passed (the address)
    + Remember: in general, an API is a channel to interact with a web server

A few notes about API authentication:
- Some APIs require some form of authentication to control API overuse
    + That's why `Nominatim()` requires the `user_agent` parameter: it's a way of detecting which API calls come from the the same alias
    + This is a very soft way of authentication
- When authentication is needed, most APIs will require users to register and account and will provide a unique combination of characters called _API key_ that uniquely identifies the user
- When they are required, API keys are usually passed as parameters. If the API has a dedicated client library, they will ask for the key after importing the library (as `user_agent` in `Nominatim()`)

## The World Bank API Python library

The WBG has an extensive API with country indicators, among many other data. The following examples uses the endpoint of the total population API to fetch country population data.

Documentation and use examples of the WBG API can be found in the [WBG Knowledge Base](https://datahelpdesk.worldbank.org/knowledgebase) and the [Developer Information resources](https://datahelpdesk.worldbank.org/knowledgebase/topics/125589-developer-information).

In [None]:
def fetch_population_by_year(year):

    endpoint = 'https://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL/'
    url = endpoint + '?date=' + str(year) + '&format=json'
    parameters = {'date': year, 'format':'json'}
    # note: the API documentation specifies that format=json
    # is a required parameter in order to return the results as JSON
    response = requests.get(url)

    if response.status_code == 200:

        data = response.json()
        return data

    else:

        print('Request failed!')
        return None

In [None]:
pop_2015 = fetch_population_by_year(2015)

In [None]:
pop_2015

A few notes:
- This is the same data you obtain when accessing https://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL?date=2015&format=json on a web browser
- In this case, the resulting JSON is not interpreted by Python as a dictionary but a list
- The first element of this list contains metadata about the data returned by the API. The second element contains the actual data

Programming this complicated client is not needed at all to interact with the The World Bank API. It has a dedicated client Python library that greatly facilitates its use.
- Release blog post [here](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data)
- Documentation [here](https://pypi.org/project/wbgapi/)
- Examples [here](https://nbviewer.org/github/tgherzog/wbgapi/blob/master/examples/wbgapi-cookbook.ipynb)

In [None]:
!pip install wbgapi

In [None]:
import wbgapi as wb

This example gets the total population of Brazil for all years available in a Pandas dataframe:

In [None]:
wb.data.DataFrame('SP.POP.TOTL', 'BRA', labels=True)

This URL would have returned a similar result in JSON format:
https://api.worldbank.org/v2/country/BRA/indicator/SP.POP.TOTL?date=1960:2021&format=json

We can also get the series for multiple countries if we specify a list instead of a single string:

In [None]:
wb.data.DataFrame('SP.POP.TOTL', ['BRA', 'ARG', 'URY', 'PRY'], labels=True)

Lastly, we can specify the years we want in a population query:

In [None]:
countries = ['BRA', 'ARG', 'URY', 'PRY']
years = range(2015, 2021) # note the last element is never included in range()
wb.data.DataFrame('SP.POP.TOTL', countries, years, labels=True)

The WB API has hundreds of indicators available. They can be explored with `wb.series.info()`:

In [None]:
wb.series.info()

## Python API client libraries - Main takeaways

- API client libraries greatly facilitate the use of APIs. You don't have to code your own client anymore!
    - However, not all APIs have a client library.
- You always need to review the library documentation and explore its use to know how to use them
- The resulting variables from client libraries might not be in JSON format
    + `geopy` returned an ad-hoc variable class
    + `wb.data.DataFrame` returned a Pandas dataframe, which is very convenient for further data analysis
- Many client libraries do much more than just retrieving the API results

**Final note only if you're working on Colab:**
Remember to go to `File` > `Save a copy in Drive` to save a copy of this notebook in your Google account.