# Learning REST APIs with NoMaD

REST APIs are a common pattern for accessing content across the internet. 
This notebook will explain how to understand the API of a new service and make requests of it using [NoMaD](https://nomad-lab.eu/nomad-lab/nomad.html) as an example. 
You will learn:

1. How to create a web request via Python
2. How to read API specifications
3. Debug basic errors in web requests

In [None]:
import requests

Configuration

In [None]:
base_url = 'https://nomad-lab.eu/prod/v1/api/v1'

## Web Requests
The [Requests library](https://requests.readthedocs.io/en/latest/) is an extension to Python that simplifies interacting with websites.
Every web request has a few key components, which include:

1. _Method_: The type of request. There are a few different types of requests. "GET," for example, is to read data from a source
1. _URL_: The location of the web service being contacted
1. _params_: Any options attached to the 
1. _body_: Content that you are sending to the server
1. _headers_: Metadata about the content you are sending, such as a security key to authenticate its source

For example, you can perform a Google search by sending parameters `{"q": "rest api"}` to `google.com/search`

In [None]:
response = requests.request('GET', 'http://google.com/search', params={"q": "rest api"}, data=None, headers=None)
response.url

In return, the web service sends you a reply with a few parts:

1. _Response code and reason_: One of an [established list](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) [of codes](https://http.cat/)
2. _Headers_: That provide you information about the content (e.g., what type is it)
3. _Content_: What you asked for


In [None]:
(response.status_code, response.reason)

In [None]:
response.headers

In [None]:
response.content[:128].decode('ascii')

These actions and data structures are the basic language of web applications. 

Google's normal web page returns data in a form, HTML, that can be turned by your web browser into a form legible by humans.

REST APIs are for data meant for machines.

## The NoMaD API Documentation
The NoMaD web API is a classic example of a REST API and, even better, one that follows the [OpenAPI specification](https://en.wikipedia.org/wiki/OpenAPI_Specification). 
We'll use it to teach you the basics.

Start by opening their [API specification](https://nomad-lab.eu/prod/v1/staging/api/v1/extensions/docs#/). The top level of the page provides different "endpoints" for making API results

![nomad-toplevel](./figures/nomad-api-toplevel.png)

Each row defines the request method and the address (releative to the webpage root) for a different type of methods.

Some endpoints, like `/materials/{material_id}/`, describe a pattern of URLs where an element in the URL has a specific meaning, like the index number of a material in a database.

Let's start with the `/materials/` endpoint

In [None]:
response = requests.get(f'{base_url}/materials')
response.content[:128]

The first point to note is that the data returns a JSON object. JSON messages are such a common language that the response object has a special method to parse this message.

In [None]:
data = response.json()
data.keys()

We don't print the full response because it's huge. But, you can see just from the keys that we you don't fully understand what the data means.

The API documentation explains that part to you.

![response-description](./figures/nomad-api-schema.png)

The data is in JSON and the API documentation webpage is rendered [a sophisticated JSON schema.](https://nomad-lab.eu/prod/v1/staging/api/v1/openapi.json)

Everything is documented, and NoMaD is in a good company of well-documented REST APIs.

## Practical Skills
Parameters and POSTing, pagination, and authentication are a few skills that will come up often when using web APIs.

### Parameters and POSTing
POSTing and Parameters are the mechanisms used to pass form data in web pages. Parameters are everything after the "?" the end of a web page, and POST forms are those which ask if if "you're sure you want to resubmit a form" when refreshing the page.

Those mechanisms are also used through a REST API. Let's use the [/entries](https://nomad-lab.eu/prod/v1/staging/api/v1/extensions/docs#/entries%2Fmetadata/get_entries_metadata_entries_get) and [/entries/query](https://nomad-lab.eu/prod/v1/staging/api/v1/extensions/docs#/entries%2Fmetadata/post_entries_metadata_query_entries_query_post) endpoints as examples, which support running queries via parameters and POST data (respectively).

Start by reading the documentation for `/entries/` to see that it takes a few options, like how to sort data. We can use that to see that the most recent data was published very recently.

In [None]:
response = requests.get(f'{base_url}/entries', params={'owner': 'visible', 'order_by': 'publish_time', 'order': 'desc'})
result = response.json()
result['data'][0]['publish_time']

And that the first data was almost 10 years ago

In [None]:
result = requests.get(f'{base_url}/entries', params={'owner': 'visible', 'order_by': 'publish_time', 'order': 'asc'}).json()
result['data'][0]['publish_time']

The `entries/query` endpoint provides the same functionality but through a different mechanism, `POST`. Any options you pass to the web service are sent in the body of the request rather than in the URL, as in the `GET` method.

In [None]:
response.url

So, the options are sent by providing a JSON-able object instead of providing them as "params".

In [None]:
response = requests.post(f'{base_url}/entries/query', json={'owner': 'visible', 'pagination': {'order_by': 'publish_time', 'order': 'asc'}})
result = response.json()
result['data'][0]['publish_time']

In [None]:
response.url

### Pagination
REST APIs work best with small message sizes (typically less than 10MB). Consequently, a service may not provide you with only part of the results and a shortcut for how to find the rest of them. This pattern is known as "pagination"

The last query we sent to NoMaD had pagination information

In [None]:
result.keys()

In [None]:
result['pagination']

Note how there are 12M records - far more than NoMaD wants to send you and you want to receive.

That information tells us how to go and find the next query. There is variation across web services but all provide something that can be pasted into your next request

In [None]:
result['pagination']['page_after_value'] = result['pagination']['next_page_after_value']
result = requests.post(f'{base_url}/entries/query', json={'owner': 'visible', 'pagination': result['pagination']}).json()  # Just paste in the last result
result['data'][0]['publish_time']

In [None]:
result['pagination']

The pages move forward each time we run.

Look for "pagination" in your website to figure out how to move foward. Not all are the same, but they all rhyme.

### Authentication

Authentication flows are composed of two segments: a "login flow" to get a token, and an area to place a token within a request.

The "industry-grade" login flow standard is the OAuth2 flow, which can grant different applications different levels of access and work with users authenticated different services that what is protected. [It's a mind-bender](https://www.digitalocean.com/community/tutorials/an-introduction-to-oauth-2), so not the best teaching example.

NoMaD's authentication is a simpler example. It uses a single endpoint, `/auth/token`, which grants a "token" that is what is used to verify requests are coming from you.

That token will be used as part of the header of any requests that require a user account.

In [None]:
token = 'whatever_auth/token_gave_you'  # Not real, of course

In [None]:
requests.get(f'{base_url}/users/me').json()

In [None]:
requests.get(f'{base_url}/users/me', headers={'Authorization': f'Bearer {token}'}).json()

Of course, this example did not work because the token was fake. 

Like pagination, you'll see a few variations of the theme for tokens:

1. Tokens which expire after a time
1. Tokens which are only good for certain requests
1. Tokens which are only used to re-create expired tokens

You'll also rarely need to inject your own tokens into headers. Many APIs (even [NOMAD's](https://nomad-lab.eu/prod/v1/staging/docs/apis/pythonlib.html)) provide libraries which hide the tedious parts of using their APIs, such as injecting tokens or pagination.

## Exercises

Try to answer a few questions about NoMaD to test your skills with REST interfaces.

You may want to see [NOMAD's query documentation](https://nomad-lab.eu/prod/v1/staging/docs/apis/api.html#queries)

- Find the largest number of atoms? (HINT: atom count is `optimade.nsites`)

My solution is hidden below
<code hidden>
data = requests.post(f'{base_url}/entries/query', json={'pagination': {'order_by': 'optimade.nsites', 'order': 'desc', 'page_size': 1}}).json()['data']
data[0]['optimade']['nsites']
</code>

- Find the chemical formula of the largest calculation performed with VASP (HINT: use the [results explorer to see available results fields](https://nomad-lab.eu/prod/v1/gui/search/entries/entry/id/zQJMKax7xk384h_rx7VW_-6bRIgi/data/results/method))

<code hidden>
data = requests.post(f'{base_url}/entries/query', json={'query': {'results.method.simulation.program_name': 'VASP'}, 'pagination': {'order_by': 'optimade.nsites', 'order': 'desc', 'page_size': 1}}).json()['data']
data[0]['results']['material']['chemical_formula_hill']
</code>

- Count the number of CP2K calculations above 2000 atoms

<code hidden>
result = requests.post(f'{base_url}/entries/query', json={'query': {'results.method.simulation.program_name': 'CP2K', 'optimade.nsites': {'gt': 2000}}, 'pagination': {'order_by': 'optimade.nsites', 'order': 'desc', 'page_size': 1}}).json()
result['pagination']['total']
</code>

- Count the number of times program_name was used to run a `results.method.method_name` of DFT above 2000 atoms. HINTS: Use [Counters](https://docs.python.org/3/library/collections.html#collections.Counter.update), there is no `next_page_after_value` if there are no results after the current page

<code hidden>
from collections import Counter
counter = Counter()
pagination = {}
while (result := requests.post(f'{base_url}/entries/query', json={'query': {'results.method.method_name': 'DFT', 'optimade.nsites': {'gt': 2000}}, 'pagination': pagination}).json()) is not None:
    # Update the counter
    counter.update(
        x['results']['method']['simulation'].get('program_name') for x in result['data']
    )

    # Break if done
    if 'next_page_after_value' not in result['pagination']:
        break
        
    # Make the new pagination
    pagination = result['pagination'].copy()
    pagination['page_after_value'] = pagination['next_page_after_value']
counter
</code>