# Drawing graphs from Discovery data

## Introduction

This case study series of notebooks is going to look at using data from Discovery to draw meaningful graphs of interesting data. For this, the instructions will work through a single example in detail, with the intention being that you can then apply the same techniques to other data sets from the large sets of variables available in Discovery. This will be a two step process: this first notebook is going to request the data from Discovery, and use the pandas library to manipulate it into a form that can be used to draw graphs. The second notebook will then use the matplotlib library to draw the graphs. 

## Setting up

As with always, we need to start by importing and installing required libraries. 

In [1]:
%pip install -q pandas
%pip install -q json
%pip install -q requests
 
import pandas as pd
import json
import requests

import aditional_data

record_series = aditional_data.admiralty_record_series
ship_list = aditional_data.ships


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
[31mERROR: Could not find a version that satisfies the requirement json (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for json[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[3

Using the techniques learned in the main series of notebooks, we are going to use the Discovery API to download a dataset relating to ships logs - the record series suggested in [this]() research guide from the National Archives. This cell takes a minute or two, its making a lot of requests to the API.

In [12]:
ship_data = []

base_discovery_url = "https://discovery.nationalarchives.gov.uk/API/search/records?"

for ship in ship_list:
    url = base_discovery_url 
    for series in record_series:
        url += series
        url += "&"
    url += "sps.searchQuery=" + ship
    headers = {'Accept': 'application/json'}
    response = requests.get(url, headers=headers)
    response_json = response.json()
    print(response_json)
    if response_json["records"] != []:
        found_data = []
        for record in response_json["records"]:
            found_data.append(
                {
                    "id": record["id"],
                    "title": record["title"],
                    "startDate": record["startDate"],
                    "endDate": record["endDate"]
                }
            )
        ship_data.append(
            {
                "ship": ship,
                "data": found_data
            }
        )
        
        
#print(json.dumps(ship_data, indent=4))

{'records': [{'altName': '', 'places': [], 'corpBodies': [], 'taxonomies': ['C10065', 'C10072'], 'formerReferenceDep': '', 'formerReferencePro': '', 'heldBy': ['The National Archives, Kew'], 'context': "Admiralty, and Ministry of Defence, Navy Department: Ships' Logs.", 'content': '', 'urlParameters': None, 'department': 'ADM', 'note': '', 'adminHistory': '', 'arrangement': '', 'mapDesignation': '', 'mapScale': '', 'physicalCondition': '', 'catalogueLevel': 6, 'openingDate': '', 'closureStatus': 'O', 'closureType': 'N', 'closureCode': '30', 'documentType': None, 'coveringDates': '1918 Sept. 1 - 1918 Oct. 31', 'description': 'ACASTA.', 'endDate': '31/10/1918', 'numEndDate': 19181031, 'numStartDate': 19180901, 'startDate': '01/09/1918', 'id': 'C1496956', 'reference': 'ADM 53/32606', 'score': 199.3816, 'source': '100', 'title': 'ACASTA'}, {'altName': '', 'places': [], 'corpBodies': [], 'taxonomies': ['C10065', 'C10072'], 'formerReferenceDep': '', 'formerReferencePro': '', 'heldBy': ['The 

KeyboardInterrupt: 

So we now have a big json file with all the data we want. To make it easier to work with when drawing graphs, we are going to convert it to a dataframe using the pandas library. This will make it much easier to work with the data when drawing graphs. It allows us to do modifications such as specifying the format of a column, such as a date column, or filtering to only include rows that match a certain criteria.

The first step we are going to take is to flatten the json. 

In [9]:
ship_data_flat = []

for ship in ship_data:
    for record in ship["data"]:
        ship_data_flat.append(
            {
                "ship": ship["ship"],
                "id": record["id"],
                "title": record["title"],
                "startDate": record["startDate"],
                "endDate": record["endDate"]
            }
        )

print(json.dumps(ship_data_flat, indent=4))

[
    {
        "ship": "Acasta",
        "id": "C1496956",
        "title": "ACASTA",
        "startDate": "01/09/1918",
        "endDate": "31/10/1918"
    },
    {
        "ship": "Acasta",
        "id": "C1496944",
        "title": "ACASTA",
        "startDate": "08/02/1915",
        "endDate": "09/04/1915"
    },
    {
        "ship": "Acasta",
        "id": "C1496952",
        "title": "ACASTA",
        "startDate": "01/11/1917",
        "endDate": "02/02/1918"
    },
    {
        "ship": "Acasta",
        "id": "C1496951",
        "title": "ACASTA",
        "startDate": "01/05/1917",
        "endDate": "01/07/1917"
    },
    {
        "ship": "Acasta",
        "id": "C1480967",
        "title": "ACASTA",
        "startDate": "01/08/1913",
        "endDate": "30/09/1913"
    },
    {
        "ship": "Acasta",
        "id": "C1496964",
        "title": "ACASTA",
        "startDate": "01/01/1920",
        "endDate": "29/02/1920"
    },
    {
        "ship": "Acasta",
        "id"

With the data in a flat format, we can easily conver it to a dataframe. When we print this, we will be able to see that this looks a lot like a spreadsheet. As these dataframes are simiar to spreadsheets, we can save them as a csv file - it can then be opened in excel or other spreadsheet software, or a different python script (re-opening it with pandas). Here, we are going to save it and open it in the next notebook, which will focus on drawing graphs.

In [11]:
ship_data_frame = pd.DataFrame(ship_data_flat)

print(ship_data_frame)

ship_data_frame.to_csv("ship_data.csv")

        ship        id   title   startDate     endDate
0     Acasta  C1496956  ACASTA  01/09/1918  31/10/1918
1     Acasta  C1496944  ACASTA  08/02/1915  09/04/1915
2     Acasta  C1496952  ACASTA  01/11/1917  02/02/1918
3     Acasta  C1496951  ACASTA  01/05/1917  01/07/1917
4     Acasta  C1480967  ACASTA  01/08/1913  30/09/1913
...      ...       ...     ...         ...         ...
1893   Wolfe  C1589635   WOLFE  01/03/1948  31/03/1948
1894   Wolfe  C1589634   WOLFE  01/02/1948  28/02/1948
1895   Wolfe  C1588748   WOLFE  01/06/1947  30/06/1947
1896   Wolfe  C1587964   WOLFE  31/08/1946  30/09/1946
1897   Wolfe  C1587962   WOLFE  01/07/1946  31/07/1946

[1898 rows x 5 columns]
