<h1>GHAP Place Search API Demo (v1.0)</h1>

<h2>Introduction</h2>

This notebook offers a brief demonstration on using Python to interact with the GHAP web API for searching places and obtaining their details in JSON format. The current version covers fuzzy search and bounding box search. While the API returns responses in JSON, this guide includes the code needed to compile these results into a dataframe for convenient viewing and the capability to export them to CSV files.

We begin by importing the dependencies required to run the code in this notebook. See the code below:

In [1]:
import requests # used for making requests and receiving responses to and from the API. 
import pandas as pd # used to store the responses as dataframes and to export as csv
from pandas import json_normalize # A utility for converting nested JSON into dataframe columns

<h2>Fuzzy Search</h2>

Fuzzy search looks for text that closely matches a search phrase without the condition of an exact match. It is particularly useful when searching a database for place names that have slight variations in spelling, have diminutive nicknames, are often combined with descriptors, or are stored as part of a title.

Fuzzy search for place names can be conducted on the GHAP API by using the 'fuzzyname' keyword in the query string. The implementation of fuzzy search in GHAP first checks whether an entry contains the provided input, and then checks whether that entry contains words that are similar to the provided input.

The run a demo fuzzy search on place names in GHAP, please follow the demo code below.

<h3>Enter Parameters</h3>

The main parameter here is the search term (i.e., 'fuzzyname'), which is typically the name of a place that has corresponding geographical coordinates. However, in addition, the GHAP web API allows users to filter their search by state, which is done by using the 'state' keyword as part of the query.

In the block of code below, the search term is set to "newcastle" and the state is set to "NSW". Please feel free to change these variables to create your own fuzzy search query.

In [2]:
# Enter the term or phrase you would like to search
search_term = "newcastle"

# Add a state to filter results, or leave as an empty string
state = "NSW"

<h3>Run Search</h3>

The next block of code constructs the search query based on the 'search_term' and 'state' variables that were defined in the previous step. It then sends the query to the GHAP web API, and stores the server's response in a dataframe after converting it from JSON.

There is no need to change the code below - you just need to run it!

In [3]:
# Compile the query into a url
url = "https://ghap.tlcmap.org/search?format=json&fuzzyname=" + str(search_term) + "&state=" + str(state)

# Send the url as a request to the GHAP API
response = requests.get(url)

# Check to see whether the GHAP server successfully received and processed the request. If yes, the JSON response is converted
# to a dataframe and stored in memory. If not, then the appropriate error code is printed.
if response.status_code == 200:
    df = pd.DataFrame(json_normalize(response.json()["features"]))
    print("Data loaded successfully. " + str(len(df)) + " results found for the following query: " + url)
else:
    print("Failed to retrieve data. Status code:", response.status_code)

Data loaded successfully. 131 results found for the following query: https://ghap.tlcmap.org/search?format=json&fuzzyname=newcastle&state=NSW


<h3>Show Results</h3>

The only thing left to do now is display the results on screen. This is done by simply calling the dataframe variable (i.e., 'df'). This is done in the block of code below, along with modifying the dataframe's display settings via Pandas so that the columns nor the rows are truncated. 

In [4]:
# Use Pandas to modify the dataframe's display to prevent row truncation
pd.set_option('display.max_rows', None)

# Use Pandas to modify the dataframe's display to prevent column truncation
pd.set_option('display.max_columns', None)

# Display the dataframe and hence the results of the search
df

Unnamed: 0,type,geometry.type,geometry.coordinates,properties.name,properties.placename,properties.id,properties.state,properties.datestart,properties.dateend,properties.udatestart,properties.udateend,properties.latitude,properties.longitude,properties.linkback,properties.TLCMapLinkBack,properties.TLCMapDataset,properties.title_id,properties.place_id,display.popup.links,properties.description,properties.original_id,properties.street,properties.suburb,properties.postcode,properties.web_links,properties.feature_term,properties.category,properties.group,properties.authority,properties.supply_date,properties.parish,properties.lga,properties.source,properties.original_data_source
0,Feature,Point,"[151.7846, -32.9318]",The Voice of the North (NSW : 1918 - 1933),Newcastle,t38ac,NSW,1918,1918,-1641031000000.0,-1641031000000.0,-32.9318,151.7846,https://trove.nla.gov.au/newspaper/title/457,https://ghap.tlcmap.org/search?id=t38ac,https://ghap.tlcmap.org/publicdatasets/140,457.0,NSW42141,"[{'text': 'TLCMap Record: t38ac', 'link': 'htt...",,,,,,,,,,,,,,,
1,Feature,Point,"[151.75, -32.916667]",Newcastle,Newcastle,tf17b,NSW,1825-05-11,2017-03-26,-4564548000000.0,-4564548000000.0,-32.916667,151.75,https://www.ausstage.edu.au/pages/venue/10582,https://ghap.tlcmap.org/search?id=tf17b,https://ghap.tlcmap.org/publicdatasets/379,,,"[{'text': 'TLCMap Record: tf17b', 'link': 'htt...",A venue from The Australian Live Performance D...,10582.0,,Newcastle,2300.0,,,,,,,,,,
2,Feature,Point,"[151.7846, -32.9318]",The Newcastle Chronicle and Hunter River Distr...,Newcastle,t380a,NSW,1859,1859,-3502865000000.0,-3502865000000.0,-32.9318,151.7846,https://trove.nla.gov.au/newspaper/title/353,https://ghap.tlcmap.org/search?id=t380a,https://ghap.tlcmap.org/publicdatasets/140,353.0,NSW42141,"[{'text': 'TLCMap Record: t380a', 'link': 'htt...",,,,,,,,,,,,,,,
3,Feature,Point,"[151.7846, -32.9318]",The Newcastle Chronicle (NSW : 1866 - 1876),Newcastle,t3809,NSW,1866,1866,-3281940000000.0,-3281940000000.0,-32.9318,151.7846,https://trove.nla.gov.au/newspaper/title/354,https://ghap.tlcmap.org/search?id=t3809,https://ghap.tlcmap.org/publicdatasets/140,354.0,NSW42141,"[{'text': 'TLCMap Record: t3809', 'link': 'htt...",,,,,,,,,,,,,,,
4,Feature,Point,"[151.7846, -32.9318]",The Newcastle Argus and District Advertiser (N...,Newcastle,t3808,NSW,1916,1916,-1704190000000.0,-1704190000000.0,-32.9318,151.7846,https://trove.nla.gov.au/newspaper/title/513,https://ghap.tlcmap.org/search?id=t3808,https://ghap.tlcmap.org/publicdatasets/140,513.0,NSW42141,"[{'text': 'TLCMap Record: t3808', 'link': 'htt...",,,,,,,,,,,,,,,
5,Feature,Point,"[151.7178036, -32.86508436]",Newcastle,Newcastle,n839cf,NSW,,,,,-32.86508436,151.7178036,,https://ghap.tlcmap.org/search?id=n839cf,https://ghap.tlcmap.org,,,"[{'text': 'TLCMap Record: n839cf', 'link': 'ht...",,,,,,,parish,administrative area,administration,NSW,2021-07-13,,,,
6,Feature,Point,"[151.7844691, -32.9317512]",Newcastle,Newcastle,n839ce,NSW,,,,,-32.9317512,151.7844691,,https://ghap.tlcmap.org/search?id=n839ce,https://ghap.tlcmap.org,,,"[{'text': 'TLCMap Record: n839ce', 'link': 'ht...",,,,,,,locality,administrative area,administration,NSW,2021-07-13,,,,
7,Feature,Point,"[149.5344651, -28.76510084]",Newcastle,Newcastle,n839cd,NSW,,,,,-28.76510084,149.5344651,,https://ghap.tlcmap.org/search?id=n839cd,https://ghap.tlcmap.org,,,"[{'text': 'TLCMap Record: n839cd', 'link': 'ht...",,,,,,,parish,administrative area,administration,NSW,2021-07-13,,,,
8,Feature,Point,"[151.7846, -32.9318]",Miners' Advocate and Northumberland Recorder (...,Newcastle,t357e,NSW,1873,1873,-3061015000000.0,-3061015000000.0,-32.9318,151.7846,https://trove.nla.gov.au/newspaper/title/355,https://ghap.tlcmap.org/search?id=t357e,https://ghap.tlcmap.org/publicdatasets/140,355.0,NSW42141,"[{'text': 'TLCMap Record: t357e', 'link': 'htt...",,,,,,,,,,,,,,,
9,Feature,Point,"[151.78444444444443, -32.931666666666665]",Newcastle,Newcastle,acc43,NSW,,,,,-32.931666666666665,151.78444444444443,,https://ghap.tlcmap.org/search?id=acc43,https://ghap.tlcmap.org,,,"[{'text': 'TLCMap Record: acc43', 'link': 'htt...",A city about 2 km SE by S of Carrington and ab...,,,,,,suburb,,,,,NEWCASTLE,NEWCASTLE,State Records (TLCM)\n1. Cessnock 1826 - 1954;...,State Records (TLCM)\n1. Cessnock 1826 - 1954;...


<h3>Save Results</h3>

Finally, if required, the dataframe can be saved as a csv file so a copy of the search results can be stored in the current directory.

In [5]:
df.to_csv("ghap_fuzzy_output.csv")

<h2>Bounding Box Search</h2>

Bounding box search works by passing two pairs of latitude and longitude coordinates to define the bounding box of which all contained places should be returned. It is achieved by using the 'bbox' keyword in the query string. 

Bounding box search is particularly useful when you wish to find all coordinates within an otherwise non-defined geographical boundary. For example, in situations where the area of interest spans between two different states, rendering the 'state' keyword useless. 

The run a demo fuzzy search on place names in GHAP, please follow the demo code below.

<h3>Enter Parameters</h3>

The code below defines the bounding box we will use in out search as the (lat, lon) coordinates (-34, 143) and (-33, 144).
Feel free to change these coordinates to define your own bounding box.

In [6]:
# Enter the start and end latitudes of the bounding box
latitude_start = "-34"
latitude_end = "-33"

# Enter the start and end longitudes of the bounding box
longitude_start = "143"
longitude_end = "144"

<h3>Run Search</h3>

The next block of code constructs the search query based on the bounding box variables that were defined in the previous step. Like in the fuzzy search example, it then sends the query to the GHAP web API, and stores the server's response in a dataframe after converting it from JSON.

As before, there is no need to change the code below - you just need to run it!

In [7]:
# Compile the query into a url
bbox_string = longitude_start + "," + latitude_start + "," + longitude_end + "," + latitude_end

# Send the url as a request to the GHAP API
url = "https://ghap.tlcmap.org/search?format=json&bbox=" + bbox_string
response = requests.get(url)

# Check to see whether the GHAP server successfully received and processed the request. If yes, the JSON response is converted
# to a dataframe and stored in memory. If not, then the appropriate error code is printed.
if response.status_code == 200:
    df = pd.DataFrame(json_normalize(response.json()["features"]))
    print("Data loaded successfully. " + str(len(df)) + "results found for the following query: " + url)
else:
    print("Failed to retrieve data. Status code:", response.status_code)

Data loaded successfully. 278results found for the following query: https://ghap.tlcmap.org/search?format=json&bbox=143,-34,144,-33


<h3>Show Results</h3>

The only thing left to do now is display the results on screen. This is done by simply calling the dataframe variable (i.e., 'df'). This is done in the block of code below, along with modifying the dataframe's display settings so that the columns nor the rows are truncated. 

Just like last time, we display the results by calling the dataframe variable (i.e., 'df'). However, we do not need to modify the dataframe's display settings since Pandas maintains those settings.

In [8]:
df

Unnamed: 0,type,geometry.type,geometry.coordinates,properties.name,properties.placename,properties.id,properties.state,properties.feature_term,properties.latitude,properties.longitude,properties.TLCMapLinkBack,properties.TLCMapDataset,properties.category,properties.group,properties.authority,properties.supply_date,display.popup.links,properties.description,properties.linkback,properties.datestart,properties.dateend,properties.udatestart,properties.udateend,properties.StationID,properties.Years,properties.Percent,properties.AWS,properties.place,properties.language_code,properties.language_name,properties.language_synonym,properties.thesaurus_heading_language,properties.thesaurus_heading_people,properties.approximate_latitude_of_language_variety,properties.approximate_longitude_of_language_variety,properties.uri,properties.State,properties.Newspaper Place of Publication,properties.Newspaper,properties.Article Word Count,properties.Article Link,properties.original_id,properties.street,properties.suburb,properties.postcode,properties.web_links,properties.source,properties.original_data_source,properties.parish,properties.lga
0,Feature,Point,"[143.1346115, -33.43181783]",Garnpung Lake,Garnpung Lake,n88dc4,NSW,lake,-33.43181783,143.1346115,https://ghap.tlcmap.org/search?id=n88dc4,https://ghap.tlcmap.org,waterbody,hydrology,NSW,2021-07-13,"[{'text': 'TLCMap Record: n88dc4', 'link': 'ht...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,Feature,Point,"[143.7012684, -33.29847835]",Four Mile Well,Four Mile Well,n88e57,NSW,bore,-33.29847835,143.7012684,https://ghap.tlcmap.org/search?id=n88e57,https://ghap.tlcmap.org,water point,hydrology,NSW,2021-07-13,"[{'text': 'TLCMap Record: n88e57', 'link': 'ht...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,Feature,Point,"[143.265726, -33.90543026]",Greasy Catch Tank,Greasy Catch Tank,n895b6,NSW,water tank,-33.90543026,143.265726,https://ghap.tlcmap.org/search?id=n895b6,https://ghap.tlcmap.org,water point,hydrology,NSW,2021-07-13,"[{'text': 'TLCMap Record: n895b6', 'link': 'ht...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,Feature,Point,"[143.9679293, -33.16514199]",Gunnaramby Swamp,Gunnaramby Swamp,n89942,NSW,wetland,-33.16514199,143.9679293,https://ghap.tlcmap.org/search?id=n89942,https://ghap.tlcmap.org,waterbody,hydrology,NSW,2021-07-13,"[{'text': 'TLCMap Record: n89942', 'link': 'ht...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,Feature,Point,"[143.734608, -33.83181408]",Hatfield,Hatfield,n89966,NSW,settlement,-33.83181408,143.734608,https://ghap.tlcmap.org/search?id=n89966,https://ghap.tlcmap.org,populated place,administration,NSW,2021-07-13,"[{'text': 'TLCMap Record: n89966', 'link': 'ht...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,Feature,Point,"[143.0089284220361, -33.36590266631033]",Mulurula,,tb8690,,,-33.36590266631033,143.0089284220361,https://ghap.tlcmap.org/search?id=tb8690,https://ghap.tlcmap.org/publicdatasets/549,,,,,"[{'text': 'TLCMap Record: tb8690', 'link': 'ht...",8P Mulurula E Pooncarie -\t\nRef: Stretch Ledg...,https://livinghistories.newcastle.edu.au/nodes...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,Feature,Point,"[143.0171425183895, -33.73275305827356]",Mungo,,tb86aa,,,-33.73275305827356,143.0171425183895,https://ghap.tlcmap.org/search?id=tb86aa,https://ghap.tlcmap.org/publicdatasets/549,,,,,"[{'text': 'TLCMap Record: tb86aa', 'link': 'ht...",6Q Mungo E Pooncarie - Aboriginal meaning: a m...,https://livinghistories.newcastle.edu.au/nodes...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,Feature,Point,"[143.3426064051232, -33.08741813498178]",Pan Ban,,tb8849,,,-33.08741813498178,143.3426064051232,https://ghap.tlcmap.org/search?id=tb8849,https://ghap.tlcmap.org/publicdatasets/549,,,,,"[{'text': 'TLCMap Record: tb8849', 'link': 'ht...",7P Pan Ban S E Menindie - Aboriginal meaning: ...,https://livinghistories.newcastle.edu.au/nodes...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,Feature,Point,"[143.9375, -33.4021]",HATFIELD (CLARE),,t17828,,,-33.4021,143.9375,https://ghap.tlcmap.org/search?id=t17828,https://ghap.tlcmap.org/publicdatasets/461,,,,,"[{'text': 'TLCMap Record: t17828', 'link': 'ht...",,http://www.bom.gov.au/jsp/ncc/cdio/weatherData...,1873-01-01,2021-01-01,-3061015000000.0,-3061015000000.0,49008.0,148.1,100.0,N,,,,,,,,,,,,,,,,,,,,,,,
9,Feature,Point,"[143.9679878541338, -33.505177475732]",Clare,,tb7e35,,,-33.505177475732,143.9679878541338,https://ghap.tlcmap.org/search?id=tb7e35,https://ghap.tlcmap.org/publicdatasets/549,,,,,"[{'text': 'TLCMap Record: tb7e35', 'link': 'ht...",13Q Clare W Hillston -,https://livinghistories.newcastle.edu.au/nodes...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


<h3>Save Results</h3>

Finally, if required, the dataframe can be saved as a csv file so a copy of the search results can be stored in the current directory.

In [9]:
df.to_csv("ghap_bbox_output.csv")

<h2>Demo RO-Crate Output</h2>

[RO-Crate](https://www.researchobject.org/ro-crate/) (Research Object Crate) is a lightweight, community-driven approach to packaging research data along with its associated metadata in a machine-readable and interpretable format. It's designed to improve the reusability and reproducibility of digital research data by encapsulating both the data and its descriptive context in a structured, standardised way. RO-Crate uses JSON-LD (JavaScript Object Notation for Linked Data) to describe the data (such as datasets, code, and articles) and the relationships between them, making it easier to share and understand the components and origins of research projects. 

The following block of code provides simple RO-Crate output for the results we saved for each of our demo API searches ('ghap_fuzzy_output.csv' and 'ghap_bbox_output.csv'). Please feel free to adapt this code to fit your custom queries.

For more information, and to learn how to make more complex RO-Crates, please visit [this link](https://training.galaxyproject.org/topics/fair/tutorials/ro-crate-in-python/tutorial.html).

In [10]:
import json
import os

# Define the name of your CSV file
fuzzy_search_results_file_name = 'ghap_fuzzy_output.csv'
bbox_search_results_file_name = 'ghap_bbox_output.csv'

# Define the RO-Crate metadata
metadata = {
    "@context": "https://w3id.org/ro/crate/1.1/context",
    "@graph": [
        {
            "@id": "search_results1",
            "@type": "Dataset",
            "name": "Results for 'GHAP Place Search API Demo (v1.0)' Fuzzy Search",
            "description": "This dataset contains the results of a fuzzy name search on the GHAP web API. The search was conducted via an API call using a Jupyter Notebook. The Jupyter Notebook serves as a demo for making API calls to GHAP and was commissioned by ARDC.",
            "license": "https://creativecommons.org/licenses/by/4.0/"
        },
        {
            "@id": "ghap_fuzzy_output.csv",
            "@type": "File",
            "name": "Results for 'GHAP Place Search API Demo (v1.0)' Fuzzy Search",
            "description": "CSV file containing results from a fuzzy name search on the GHAP web API, which was conducted via a Jupyter Notebook as part of a demo of the GHAP place search API. Each row represents a geographical feature and contains a series of columns, including its coordinates.",
            "encodingFormat": "text/csv",
        },
        {
            "@id": "search_results2",
            "@type": "Dataset",
            "name": "Results for 'GHAP Place Search API Demo (v1.0)' Bounding Box Search",
            "description": "This dataset contains the results of a bounding box search on the GHAP web API. The search was conducted via an API call using a Jupyter Notebook. The Jupyter Notebook serves as a demo for making API calls to GHAP and was commissioned by ARDC.",
            "license": "https://creativecommons.org/licenses/by/4.0/"
        },
        {
            "@id": "ghap_fuzzy_output.csv",
            "@type": "File",
            "name": "Results for 'GHAP Place Search API Demo (v1.0)' Bounding Box Search",
            "description": "CSV file containing results from a bounding box search on the GHAP web API, which was conducted via a Jupyter Notebook as part of a demo of the GHAP place search API. Each row represents a geographical feature and contains a series of columns, including its coordinates.",
            "encodingFormat": "text/csv",
        },
        {
            "@id": "#author1",
            "@type": "Person",
            "name": "Kaine Usher",
            "affiliation": "GeoJikuu x Australian Research Data Commons (ARDC)",
        },
        {
            "@id": "./",
            "@type": "CreativeWork",
            "description": "This RO-Crate includes the results of a two demo calls to the GHAP place search API. The first set of results is from a fuzzy name search and the second set of results is from a bounding box search.",
        }
    ]
}

# Check if 'ro-crate-metadata.jsonld' already exists in the current directory
if not os.path.exists('ro-crate-metadata.jsonld'):
    # If no, create it
    metadata_file_path = os.path.join(os.getcwd(), 'ro-crate-metadata.jsonld')
    with open(metadata_file_path, 'w') as metadata_file:
        json.dump(metadata, metadata_file, indent=4)
else:
    # If yes, append any new information to the existing file
    with open('ro-crate-metadata.jsonld', 'r') as file:
        existing_metadata = json.load(file)
    existing_ids = {item['@id'] for item in existing_metadata['@graph'] if '@id' in item}
    final_append = ""
    for data in metadata["@graph"]:
        if isinstance(data, dict) and '@id' in data:
            if data['@id'] == './':
                final_append = data
                continue
            if data['@id'] not in existing_ids:
                existing_metadata['@graph'].append(data)
    
    for index, data in enumerate(existing_metadata["@graph"]):
        if isinstance(data, dict) and '@id' in data:
            if data['@id'] == './':
                del existing_metadata["@graph"][index]
    existing_metadata['@graph'].append(final_append)
    
    with open('ro-crate-metadata.jsonld', 'w') as file:
        json.dump(existing_metadata, file, indent=4)

print(f"RO-Crate metadata file saved to: {'ro-crate-metadata.jsonld'}")

RO-Crate metadata file saved to: ro-crate-metadata.jsonld
