# Web APIs and HTTP requests

In this class, we will cover the basics of using and accessing public data web APIs
(*Application Programming Interface*).
In a nutshell, web APIs are publicly (usually; there is plenty of private APIs, but for obvious reasons, we do not care about them as we can not use them) available interfaces through which third parties (this is us!) can access
some data resources in a remote, reliable and programmable manner.

What does it mean in practice?

* **Remote.** Users can access the resource from anywhere, provided they have an internet connection.

* **Reliable.** The interface exposed to users is independent of the internal
details of the system that produces the data. In other words, the way a user
communicates with the API is independent of the way the system works. In practice it means that a user does not have to know anything about the system,
it is enough to know the API interface.

* **Programmable.** API can be interacted with based on a predefined set of commands/methods (an interface) in a way that can be expressed with a programming language. This is usually achieved by using HTTP protocol which a standard communication protocol in the Web and for which utilities are available in any major programming language.

## Practical example

But what does that all really mean? Let us now turn to an example to understand it better. We will use the public API of Wikipedia (we all know what it is, right?).

Public Wikipedia API can be used for many purposes, but it also makes publicly available a lot (in fact almost all) of data that is stored within Wikipedia,
such as page statistics, registered users, etc.

We mentioned that in some sense an API is an interface that allows third parties to communicate with and requests various thing from some platform in an orderly and programmable manner. Let us now see such a real example of such an interface.

Wikipedia API (for English Wikipedia) lives at this url:

* [https://en.wikipedia.org/w/api.php](https://en.wikipedia.org/w/api.php)

The url takes us to an ugly webpage that contains documentation on all so-called API endpoints exposed by the Wikipedia API. What are they? Endpoints are *commands/requests* that the API understands and that can be used to extract some data from it. They define exactly the interface through which one can communicate with some external system via API.

So summing up, an API understood as in interface is:

* a publicly available *place* on the internet (associated with a particular URL)
* a set of endpoints (commands) that define possible interactions with the API.

Ok, so we have seen that the Wikipedia API lives at a particular URL. However, the URL by itself just leads us to documentation describing all the endpoints. So how can we use a particular endpoint to actually do something? Let us inspect endpoint called [query](https://en.wikipedia.org/w/api.php?action=help&modules=query)

`https://en.wikipedia.org/w/api.php?action=help&modules=query`

Now we see the documentation for the endpoint `query`. It is quite complex as it kind of defines another nested API within the top API. From now on we will work exclusively with this part of the Wikipedia API since this is the one we have to use to extract data from Wikipedia.

Let us now note that the URL has already a particular form:

`<URL>?<key-value pair>&<key-value pair> ...`

The part after the `?` sign is crucial here as it defines a so-called query string that can be passed with an url. A query string does not specify a different location (like a URL does), instead, it attaches some additional data to a request sent to a location specified in the standard `<URL>` part. This is additional data is crucial here since it allows us to communicate with APIs through the HTTP protocol.

Now it is clear that `https://en.wikipedia.org/w/api.php?action=help&modules=query` is still the same address as `https://en.wikipedia.org/w/api.php` but enhanced with additional data that told the Wikipedia API to take as to the help page of the module (endpoint) `query`.

So let us now try to finally do something useful.

### Extracting list of Wikiprojects from Wikipedia API

Now from the docs of the `query` endpoint, we select the `projects` [(sub)endpoint](https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bprojects). The documentation gives us instructions on how to use the endpoint as well as some usage examples.

When we click the link from the first example we see a long list of project names. These are so-called *Wikiprojects* which are registered semi-official groups of editors dedicated to working on a specific topic/theme. They can give us some basic insight into what kinds of topics are of most interest to Wikipedia editors (but do not base any claims solely on this simple information!)

The URL from the first example looks like this:

`api.php?action=query&list=projects`

Again, it has the URL part (some of it omitted) and the query string part that specify that we use the `query` endpoint and ask it to list all the projects.

This is great! We can look at the list in our browsers. However, even this list is somewhat too long to deal with it like this, so we would like to process it in Python.

## Talking to API from Python

Fortunately, Python makes it very easy to build HTTP requests and talk to an API. Utilities for this kind of work can be found in the `requests` package.

### What is a package?

Package in Python is a set of functions (and classes etc.) designed to solve
some specific sets of problems wrapped together in one code bundle so they can be imported and called by other code.

In [None]:
import requests
# From now on we can refer to the `requests` module
# by its name (it is saved as a variable!)

How can we use it to get some data from an API?

Let us decompose this problem into several steps.

In [None]:
# First define the base API url
URL = "https://en.wikipedia.org/w/api.php"
# Then define the query string parameters you want to pass with your requests
# often called 'payload'
payload = {
    'action': 'query',
    'list': 'projects',
    'format': 'json'
}
# 'requests' package wants us to define payload as a dict
# since this makes it easy to build GET requests dynamically
# in a program

In [None]:
# Now with the URL and the payload ready we can send a request
# to the Wikipedia API to kindly ask for the list of projects
#
# We do this like this:
response = requests.get(URL, params=payload)
response

In [None]:
# By a time-honored tradition of a countless generation
# of computer programmers we call the result of a web request
# a 'response'
#
# Now we would like to extract the actual data from it
data = response.json()
# What do we have in the response?
data.keys()

In [None]:
# Probably we want to focus on the 'query' part
data['query'].keys()
# And here are the projects

In [None]:
# Now let us save the projects in a variable
# to save us some typing
projects = data['query']['projects']
# Now we can easily count the projects
len(projects)

In [None]:
# and see them if we like ...
# ... but maybe not all of them at one
# maybe just first ten
projects[:10]

In [None]:
# and last ten
projects[-10:]

The point here is the fact that we loaded the data as a list of strings. These are objects and data types we know! So we can work with them in Python and compute anything we want!