## What is Discovery

[Discovery](https://discovery.nationalarchives.gov.uk/) is the catalogue for [The National Archives](https://www.nationalarchives.gov.uk/). It contains descriptions of over 32 million records held by The National Archives and more than 2,500 archives across the country. Over 9 million records are available for download. 

The archive is split into record series, with series split into lower and lower levels. At the highest level, record series typically represent Governmental Departments (if the record is held by TNA) or Archive (if the record is held elsewhere). Note: TNA holds records from the last 1,000 years – there are departments that no longer exist, so the list of departments may be longer than expected. 

The list of record series/departments can be found here – {url}. For these notebooks, we will be focusing on records at TNA, so start by selecting the first letter of the department you are looking for. Selecting details of your record series at this level provides an overview of the series, including a description, date range, and physical description, among other data. 

The other option is to browse by hierarchy – the boxes to the side of the main list. These are the first “child” level series of the main series selected on the left of the page. Clicking on their details button provides similar details to the main record series (although often to a reduced level), while clicking on their title takes you one level “deeper” into the hierarchy. These steps can be repeated, until you reach a page such as this – this is the edge of the tree, each entry on a page like this are actual records. 

Powering Discovery is a database; it is this database that stores the information about each record and is queried every time (either by the web user interface, or the API). It is the sheer number of records in Discovery that make using the API a valuable option. Rather than having to use the web interface separately for each unique query, following the rest of this notebook you will be able to automate its use - allowing you to filter more efficiently and retrieve large amounts of data more effectively. 

## Gathering information

The key thing you need when working with an API is the documentation. This provides information on what requests you can build: what endpoints are available, what query parameters you can use, what methods they accept, and so on. This information is often provided either directly alongside, or is similarly accessible as, the API. 

The [main help page](https://www.nationalarchives.gov.uk/help/discovery-for-developers-about-the-application-programming-interface-api/) provides useful information - particularly the terms of use, information on catalogue levels, and requests to limit rates of use. From this page, you can navigate to the [API sandbox](https://discovery.nationalarchives.gov.uk/API/sandbox/index). This page works as more thorough documentation of the API's functionality - showing available endpoints, parameters, and responses. The sandbox is interactive, allowing you to build and test requests in the browser.

At this stage, try it out! Have a go with the sandbox and see what you can retrieve.

There are a range of bits of information that are key and can be gathered from the sandbox. These include the base URL, endpoints relevent to queries we want to make and the methods these endpoints use. 

In [None]:
base_discovery_url = "https://discovery.nationalarchives.gov.uk/API"

search_endpoint = "/search/records"                          # GET

find_children_endpoint = "/records/children/{parentId}"      # GET

The search endpoint provides records that match queries made to it. These can be at any level of the heirarchy. The find children endpoint is useful for navigating the heirarchy of records - you can provide it the ID of a record series, and it will return the list of sub-series or records it contains. 

Note that this list of endpoints is far from exhaustive - there are endpoints avaiable to find options such as repositories, or details or authorities for records, amongst others.

### Example 

For this notebook, we are going to demonstrate with the search endpoint. 

Expanding the endpoint in the web interface provides us with a large wealth of information. 

The first section is a model of the response you can expect from the endpoint. This provides you with all the fields you can expect, and the type of value in each field – such as strings or ints. 

The next section lists all parameters available on this endpoint, with a description of what each query parameter should be used for, and what type of value should be provided. In the web sandbox UI, you can input values into each of these parameters to test. Note that the search query parameter is in bold, and is marked as the only required parameter. 

Below this, the list of responses you may expect from the endpoint are listed. 

If you press the “Try it out!” button, the UI will send a request to the API, and the response will be displayed in some new sections. 

The first two sections show the URL that was built with the query parameters and endpoint, and an example cURL request sending the request to the API. 

After this, the response from the API is shown, separated into the response code, headers, and the response body. 
