# Automating the API

Now that we have successfully run one query, we can run automate things. This means we can make work that would otherwise be very bulky much easier. 

Say, for example, we wanted to find out what records were held at TNA for every ship in the Royal Navy during the war of 1812? By hand, this would require a huge amount of browsing. With automation, this can be made a lot easier. 

The first thing we need is a list of ship names. Helpfully, this is [available on Wikipedia](https://en.wikipedia.org/wiki/Category:War_of_1812_ships_of_the_United_Kingdom). The full list is shown in [the aditional data file](./aditional_data.py). Here, we are going to show just the first 3 ships. 
Note that, to simplify the code, we are using the names of ships with an HMS prefix, and only one name. 

In [None]:
three_ships = [
    "Devastation",
    "Dreadnought",
    "Invincible"
]

With some research, we can see that TNA has provided guides on how to search teh archives. In particular, they have provided information on [how to find logs and records from Royal Navy ships](https://www.nationalarchives.gov.uk/help-with-your-research/research-guides/royal-navy-ships-voyages-log-books/). From this, we can create another list - the various record series we would want to search. 

From the API documentation we saw before, we can see that these record series have to be included in a specific format. As with the list of ship names, we've included the full list in [the aditional data file](./aditional_data.py). Here, we are going to show just 3 record series.

In [None]:
"sps.recordSeries=ADM%2055", # record_series_ADM_55
"sps.recordSeries=ADM%20101", # record_series_ADM_101
"sps.recordSeries=MT%2032" # record_series_MT_32

With two lists, we can start to automate the requests. 

We will start by importing the libraries we need, along with the full length lists. 

In [None]:
%pip install requests
%pip install json
import requests
import json

import aditional_data

ship_list = aditional_data.ships

record_series = aditional_data.admiralty_record_series

ship_data = []

As we are going to automate the requests, we are going to need a place to store the results. We are going to use a list of list of dictionaries, making our results both easy to read and easy to work with. 

From the discovery API documentation, we can decide what bits of information we want. We are going to use the following:
- The ship name
- The record start date
- The record end date
- The record ID
- The record description
- Whether the record is digitised

It is often helpful to write an example of the results you want. It helps to visualise your goal, making it easier to build code. An example of the results we are looking for is below. 

In [None]:
[
    {
        "ship-name": "Devastation",
        "records" : [
            {
                "startDate": "1871-01-01",
                "endDate": "1871-12-31",
                "id": "C1234567",
                "description": "A description of the record",
                "disgitised": True
            },
            {
                "startDate": "1872-01-01",
                "endDate": "1872-12-31",
                "id": "C1234568",
                "description": "A description of the record",
                "disgitised": True
            }
        ]
    }, ## These outer brackets, and all data within, are then repeated for each ship
]

So, now we have a list of ships, a list of record series, and a list of the information we want. We can start to build our code.

This next cell is where where the requests actually happen - hence, there are line-by-line comments to give a detailed explanation of what is happening. The general aim of this cell is to show the power of an API and automation. Here, instead of having to search for each ship by hand, we can loop through the list and search for each ship automatically.

There are two further bits of information we need to note. 
1 - We are going to artificially limit the searches to the first 10 entries in the list. This is to prevent the Discovery API from rejecting our requests. From the [help page](https://www.nationalarchives.gov.uk/help/discovery-for-developers-about-the-application-programming-interface-api/), there is a request for less than 3,000 requests a day, and a rate of approx 1/sec. These limits are often in place on API's to prevent from overloading the servers - if several people were to seperately run all the requests in this notebook, an accidental [DDoS](https://en.wikipedia.org/wiki/Denial-of-service_attack) could occur. This is not going to happen with the requests in this notebook, but it is good practice to limit the number of requests you make, and for servers to limit the number of requests they accept.

Another approach to this would be to slow down the requests - you could wait for a set period of time between each request by using the [time.sleep()](https://docs.python.org/3/library/time.html#time.sleep) function. 

2 - We are going to need to do these requests in two steps. From the documentation, we can see that start and end date, and id are all avaialable from the general search, btu the description and digitisation status are not. To get these, we need to make a second request to the [`/records/details/{id}` endpoint](https://discovery.nationalarchives.gov.uk/API/sandbox/index#/Records). These requests will need to be for individual records, and will provide the extra information we need.

Note: this next cell can take a couple of minutes to run. 

In [None]:
for ship in ship_list:                                                                  # We start by looping through each ship in the list
        url = "https://discovery.nationalarchives.gov.uk/API/search/records?"           # For every ship, we need to build a custom URL
        for record in record_series:                                                    # We need to add each record series we want to search within, so we loop through the list of record series
                url += record                                                           # We add the record series to the URL
                url += "&"                                                              # An ampersand is added to seperate query parameters
        url += "sps.searchQuery=" + ship                                                              # We add the query parameter to the URL - this is the ship name 
        headers = {
                'Accept': 'application/json'                                            # Setting the headers indicates to the API that we want to receive JSON data. This will make it easier to work with the response data
            }
        response = requests.request("GET", url, headers=headers)                        # THe request is actually made! We store the response in a variable called response
        response_json = response.json()                                                 # We convert the response to JSON format within python.
        if response_json["records"] != []:                                              # In case some ships don't have any records, we need to check that the records list isn't empty
                records_found = []                                                      # Creating the list to store information about each record found
                for record in response_json["records"]:                        
                        records_found.append(                                           # For each record found, we add a dictionary to the list of records found
                                {
                                        "id": record["id"],                             # This dictionary contains the id
                                        "startDate": record["startDate"],               # Start date
                                        "endDate": record["endDate"]                    # And end date of the record - this is all this API endpoint returns
                                }
                        )
                ship_data.append(                                                       # Once we've looped through all the records, we add a dictionary to the list of ship data
                        {
                                "ship-name": ship,                                      # We want the ship name
                                "records": records_found                                # And the list of records found
                        }
                )

print(json.dumps(ship_data, indent=4, sort_keys=True))                                   # After we've looked through all the ship names, print the ship data to check it's worked. Here, we use the json.dumps() function to print the data in a readable format

Knowing whether the record is digitised is going to be valueable. If it is, we can download it from the Discovery GUI - using the API means we can go directly to the records, rather than having to browse. This is also going to be useful for the next notebook, where we are going to visualise various aspects of the records. 

As we have seen in the documentation, we are going to need to do a call to a different endpoint to get the extra information we need. This is going to be a seperate request, and will need to be done for each record. We then pick the information we want from the result, and add it to the data we've already collected for each record.

As we indicated before, we are going to limit the number of requests to avoid getting rejected by the discovery API. Here, rather than limiting the rate of requests, we are simply going to limit to the first 10 ships. The mechanism for doing this will be explained in-line, and will be used again in the notebook where we need to. 

In [None]:
details_url = "https://discovery.nationalarchives.gov.uk/API/records/v1/details/" 

for ship in ship_data[:10]:             # The [:10] here is a slice - it means we only want to look at the first 10 ships in the list
    for record in ship["records"]:      # For each ship, we want to look at each record. Note that this gives us a double-loop - this is usually a sign that we are going to create a lot of API calls!
        url = details_url + record["id"] # Build the URL for the details endpoint
        headers = {
            'Accept': 'application/json'
        }
        response = requests.request("GET", url, headers=headers)
        response_json = response.json()
        record["description"] = response_json["scopeContent"]["description"] # We can add more data to the dictionary for each record by simply adding a new key and value
        record["digitised"] = response_json["digitised"]                 

print(json.dumps(ship_data[:10], indent=4, sort_keys=True))  # Print the ship data again to check the new data has been added