# 2 - Working Through The Result(s)

In [1.2](/1-intro-to-discovery-api.ipynb) we fetched record(s) through the Discovery API search endpoint and displayed the list of records returned. Now, we will look at another the response and modify the API call to receive the records we want.

As always, we will start by importing the needed libraries, and run an example API call. 

In [None]:
%pip install -q requests
%pip install -q json
import requests
import json

In [None]:
base_discovery_url = "https://discovery.nationalarchives.gov.uk/API"

search_endpoint = "/search/records"

search_query_parameter = "sps.searchQuery"

search_query = "London"

url = base_discovery_url + search_endpoint + "?" + search_query_parameter + "=" + search_query

response = requests.request("GET", url)

print(json.dumps(response.json(), indent=4))

## 2.1 - Number of Records

There is a range of fields also supplied in the response, along with the records list. Other data include: 
- The list of time periods covered by the records
- Where the records are held (in the National Archives, or in another institution), and the count per location
- The total number of records found. 

Some of these data can be used to make quick decisions about the records returned. For example, if we are looking for records from the 1800s, and all records returned are from the 1900s, we know that either: the query needs tweaking, or the records may not exist. One is a quick fix, the other is an interesting research question. 

We can use these details to start to build up more complete views of the data. 

The first step we will take is to display the number of results received through the API call. 

In [None]:
print(len(response.json()["records"]))


We can see that we've retrieved 15 results! However, if we look at the total number of results (`count`), we can see that its much higher (see the response from the example query earlier in the notebook). This is because the default result page size for the API is 15. To change the number of records we receive from the API, we add the query parameter "&sps.resultsPageSize=30" to our URL (if we want 30 records). The number should be between 0 and 1000.

In [None]:
result_size = "&sps.resultsPageSize=30"

Now, if we display the number of records received through the API call

In [None]:
full_search_url = base_discovery_url + search_endpoint + "?" + search_query_parameter + "=" + search_query + result_size

response = requests.request("GET", full_search_url)

print(len(response.json()["records"]))

## 2.2 - Multiple Pages of Results

If we want to check how many page(s) of records there are, its just simple maths: 

In [None]:
import math
number_of_pages = math.ceil(int(response.json()["count"])/len(response.json()["records"]))
print(number_of_pages)

If we want to get all the results, we will add a new query parameter `&sps.page=page_number` and iterate it over the number of pages. Lets get the second page, to test it out.

In [None]:
page_number = 2

result_page =  "&sps.page=" + str(page_number)

full_search_url = base_discovery_url + search_endpoint + "?" + search_query_parameter + "=" + search_query + result_size + result_page

response = requests.request("GET", full_search_url)

## First, make sure we got a 200 response

print(response.status_code)

Now, we lets print the results and have a look at what we've got.

In [None]:
print(json.dumps(response.json(), indent=4))

To get all pages of results, we will starty by defining a new list in which we will store all the records

In [None]:
result_array = []

And then fetch results for all the pages and store them in the list. Note that for this notebook, we will only fetch the first 5 pages of results, to keep the notebook running time short.

In [None]:
#for i in range(number_of_pages):  # If you want to get all the pages, uncomment this line and comment out the next
for i in range(1, 6):
  result_page =  "&sps.page=" + str(i)
  full_search_url = base_discovery_url + search_endpoint + "?" + search_query_parameter + "=" + search_query + result_size + result_page
  response = requests.request("GET", full_search_url)
  result_array += result_array + response.json()["records"]

print(len(result_array))


## 2.3 - Record Details, and the details API endpoint

Now that we have all the record(s) for our query, we will interogate one such record to check what extra information it holds. To do this, we will take the id from our result and use a new API endpoint `/records/v1/details/{id}` to get the details of the record.

In [None]:
record_id = result_array[1]["id"]
print(record_id)

In [None]:
details_endpoint = "/records/v1/details/"

Now, we will fetch the details for that particular record ID

In [None]:
full_search_url = base_discovery_url + details_endpoint + record_id

response = requests.request("GET", full_search_url)

print(json.dumps(response.json(), indent=4))

That's a lot of possible data! 

There are some particularly useful fields here - the citable reference, the description, whether it has been digitised, and the covering dates. These fields will always have data. There are also a vast number of other fields as well which will have data depending on the record and its contents. These data, combined with the data in the search endpoint are more than enough to build up a single page in the Discovery UI. 

## 2.4 - Getting the deails we want

With data aviabale from both the search and details endpoints, we could easily gather a very large dataset, and it become unwealdy. it is often better to thing about what data you will actually use first; so, we will create a new list of all the records, but with selected information like title, description, record open date, and parent ID (all information from the details endpoint).

In [None]:
new_result_array = {}

Then we make request to the details endpoint for each record ID, and store the results in the new list. Note that we are only fetching the first 10 records, to keep the notebook running time short and avoiding hitting the API rate limit.

In [None]:
for i in result_array[:10]:
  record_id = i["id"]
  full_search_url = base_discovery_url + record_endpoint + record_id
  response = requests.request("GET", full_search_url)
  new_result_array[response.json()["id"]] = {
    "title": response.json()["title"], 
    "description": response.json()["scopeContent"]["description"], 
    "openingDate": response.json()["recordOpeningDate"], 
    "parentID": response.json()["parentId"]
  }

Now we display the record in the new list corresponding to the id of the 1st record in the old list and then the description of it.

In [None]:
print(new_result_array[result_array[1]["id"]])
print(new_result_array[result_array[1]["id"]]["description"])