# Knowledge Exploration With SPARQL queries

This notebook demonstrates how to explore, navigate, and analyze the *Gazetteers of Scotland Knowledge Graph* (1803‚Äì1901) using SPARQL queries over a remote Fuseki endpoint. The data is modeled using the [Heritage Textual Ontology (HTO)](https://w3id.org/hto), and includes semantically enriched descriptions of places, texts, volumes, and their provenance.


We have divided the questions to answer in two main blocks.

a) The first block of queries aim to familiarise a user with the knowledge graph.

b) The second block of queries are the key question that we wanted to address when we designed the HTO ontology.



## Setup

Make sure **SPARQLWrapper** is installed in your python environment.

In [9]:
!pip install SPARQLWrapper



## Connection

Chose one of the two connection options - recommended, the remote one :)

### Remote Fuseki Connection


We connect to the remote SPARQL server hosting the Gazetteers knowledge graph. The data is served via a [Fuseki SPARQL endpoint](http://query.frances-ai.com/hto_gazetteers), and includes RDF resources describing volumes, series, articles, locations, pages, and provenance information.

In [1]:
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper(
    "http://query.frances-ai.com/hto_gaz"
)
sparql.setReturnFormat(JSON)

### Local Fuseki Connection

Use this option if you want to query a locally hosted version of the Gazetteer RDF graph (e.g., using `gaz.ttl`).it.  

In [10]:
from rdflib import Graph, URIRef, Namespace
from rdflib.plugins.sparql import prepareQuery

# Create a new RDFLib Graph
basic_graph = Graph()

# Load the rdf file into the graph
basic_graph_file = "./gaz.ttl"
basic_graph.parse(basic_graph_file, format="turtle")


hto = Namespace("https://w3id.org/hto#")
# Print the number of "triples" in the Graph
print(f"Basic Graph has {len(basic_graph)} statements.")

## Block A - Basic Queries

The following questions are addressed using 16 targeted SPARQL queries:

1. **What RDF classes are defined in the dataset?**  
   ‚Üí Query 1 lists all `rdf:type` instances used from the HTO namespace.

2. **What properties are used across the knowledge graph?**  
   ‚Üí Query 2 enumerates all `hto:` properties in use.

3. **What Gazetteer series are present?**  
   ‚Üí Query 3 retrieves all `hto:Series` instances and their titles.

4. **Who are the editors associated with volumes or series?**  
   ‚Üí Query 4 explores the `hto:editor` property, retrieves linked `hto:Person` entities, and their names via `foaf:name`.

5. **What volumes are included, and how are they organized into series and collections?**  
   ‚Üí Query 5 lists all `hto:Volume` entities with their series and parent `hto:WorkCollection`.

6. **What metadata properties describe a given volume?**  
   ‚Üí Query 6 selects a sample volume and lists all associated RDF triples.

7. **How are `hto:OriginalDescription` entries structured?**  
   ‚Üí Query 7 lists all properties used to describe article-level entries.

8. **What is the text and source of each article?**  
   ‚Üí Query 8 retrieves full text and source documents for descriptions.

9. **What are all RDF triples for a specific article?**  
   ‚Üí Query 9 drills into a selected `hto:OriginalDescription` and inspects its metadata.

10. **How is a `LocationRecord` structured?**  
    ‚Üí Query 10 retrieves a sample `hto:LocationRecord` and lists all associated properties including name, description, and pages.

11. **What is the article title, text, and page range for each location record?**  
    ‚Üí Query 11 aggregates key fields (name, full text, start/end pages) from each `hto:LocationRecord`.

12. **How has a specific place (e.g., "DUNDEE") been described across the corpus?**  
    ‚Üí Query 12 retrieves all articles titled "DUNDEE", including their text, source volume, parent series, and publication year.

13. **What are the longest Gazetteer articles, and where do they appear?**  
    ‚Üí Query 13 lists the 10 longest `hto:LocationRecord` entries by text length, showing the article title, a text excerpt, the volume and series in which the article was published, and the year of publication. This helps surface dense or historically significant entries for further analysis.

14. **Which Gazetteer articles refer to other entries, and what do they reference?**  
    ‚Üí Query 14 explores internal semantic links using the `hto:refersTo` property. It returns `hto:LocationRecord` entries that refer to other records, displaying both the source and target names. This enables tracing redirects, summaries, and cross-references within the Gazetteers knowledge graph.

15. **Which article titles are reused across multiple Gazetteer entries?**  
    ‚Üí Query 15 groups `hto:LocationRecord` entries by their `hto:name` and lists those names that appear in multiple records. These cases reveal reused or ambiguous place names across volumes or editions (e.g., ‚ÄúLOGIE‚Äù, ‚ÄúKIRKHILL‚Äù), useful for disambiguation or tracking editorial duplication over time.

16. **Which Gazetteer articles include alternate or variant names?**  
    ‚Üí Query 16 identifies articles that contain both a primary name (`hto:name`) and one or more alternate names (`rdfs:label`), typically derived from metadata fields such as ‚ÄúAlternative names.‚Äù This supports fuzzy search, historical variant matching, and linguistic normalization.



Each question is addressed using targeted SPARQL queries, executed through `SPARQLWrapper` against the remote Fuseki endpoint (or local one)



### Query 1: Explore all RDF classes (types) defined in the HTO namespace

This query retrieves all distinct RDF types (`rdf:type`) that are used in the dataset and belong to the Heritage Textual Ontology (HTO).

In RDF, the `rdf:type` predicate is used to declare the class of a resource (e.g., `hto:Volume`, `hto:Location`, `hto:OriginalDescription`). Listing these types gives us a high-level overview of the entity types that populate the knowledge graph.

This is particularly useful at the beginning of an exploration session to understand the shape and semantics of the dataset.


In [2]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>

SELECT DISTINCT ?type WHERE {
  ?s a ?type .
  FILTER(STRSTARTS(STR(?type), "https://w3id.org/hto#"))
}
""")

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print(f"Type: {r['type']['value']}")
except Exception as e:
    print(e)



Type: https://w3id.org/hto#FeatureType
Type: https://w3id.org/hto#AuthorityType
Type: https://w3id.org/hto#LocationRecord
Type: https://w3id.org/hto#InternalRecord
Type: https://w3id.org/hto#Page
Type: https://w3id.org/hto#OriginalDescription
Type: https://w3id.org/hto#TextSegment
Type: https://w3id.org/hto#ConceptRecord
Type: https://w3id.org/hto#TextQuality
Type: https://w3id.org/hto#TermRecord
Type: https://w3id.org/hto#ExternalRecord
Type: https://w3id.org/hto#WorkCollection
Type: https://w3id.org/hto#Activity
Type: https://w3id.org/hto#Agent
Type: https://w3id.org/hto#Series
Type: https://w3id.org/hto#Volume


### Query 2: Explore all properties defined in the HTO namespace

This query returns all distinct RDF properties (predicates) in the dataset that belong to the Heritage Textual Ontology (HTO).

In RDF, predicates express the relationships between resources or between a resource and a literal (e.g., `hto:title`, `hto:editor`, `hto:startsAtPage`). By listing all used properties, we gain insight into the kinds of metadata and semantic links available in the graph.

This is useful for discovering which attributes are used to describe volumes, pages, descriptions, places, people, and other entities.


In [3]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>

SELECT DISTINCT ?p WHERE {
  ?s ?p ?o .
  FILTER(STRSTARTS(STR(?p), "https://w3id.org/hto#"))
}
""")

try:
    ret = sparql.queryAndConvert()
    print("All hto: properties in use:")
    for r in ret["results"]["bindings"]:
        print(f"{r['p']['value']}")
except Exception as e:
    print(e)


All hto: properties in use:
https://w3id.org/hto#number
https://w3id.org/hto#permanentURL
https://w3id.org/hto#hasAnnotation
https://w3id.org/hto#hasTextQuality
https://w3id.org/hto#text
https://w3id.org/hto#wasExtractedFrom
https://w3id.org/hto#isTextQualityLowerThan
https://w3id.org/hto#birthYear
https://w3id.org/hto#deathYear
https://w3id.org/hto#isTextQualityHigherThan
https://w3id.org/hto#endsAtPage
https://w3id.org/hto#hasOriginalDescription
https://w3id.org/hto#refersToModernPlace
https://w3id.org/hto#startsAtPage
https://w3id.org/hto#genre
https://w3id.org/hto#language
https://w3id.org/hto#mmsid
https://w3id.org/hto#printedAt
https://w3id.org/hto#subtitle
https://w3id.org/hto#title
https://w3id.org/hto#yearPublished
https://w3id.org/hto#physicalDescription
https://w3id.org/hto#shelfLocator
https://w3id.org/hto#numberOfPages
https://w3id.org/hto#hasConceptRecord
https://w3id.org/hto#hasAuthorityType
https://w3id.org/hto#hasFeatureType


### Query 3: Retrieve all Gazetteer series and their titles

This query retrieves all resources of type `hto:Series` and their corresponding titles using the `hto:title` property.

A `Series` in the HTO knowledge graph represents a multi-volume work (e.g., the *Imperial Gazetteer of Scotland* or the *Ordnance Gazetteer of Scotland*). Each series may consist of multiple volumes published across different years or editions.

This query helps establish the top-level bibliographic structure of the Gazetteers collection.


In [4]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>

SELECT ?series ?title WHERE {
  ?series a hto:Series ;
          hto:title ?title .
}
LIMIT 20
""")

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print(f"Series URI: {r['series']['value']} ‚Äî Title: {r['title']['value']}")
except Exception as e:
    print(e)


Series URI: https://w3id.org/hto/Series/9910440713804340 ‚Äî Title: gazetteer of Scotland. [With plates and maps.]
Series URI: https://w3id.org/hto/Series/9928112733804340 ‚Äî Title: imperial gazetteer of Scotland; or, Dictionary of Scottish topography, compiled from the most recent authorities, and forming a complete body of Scottish geography, physical, statistical, and historical
Series URI: https://w3id.org/hto/Series/9928151783804340 ‚Äî Title: topographical dictionary of Scotland
Series URI: https://w3id.org/hto/Series/9928228793804340 ‚Äî Title: Ordnance gazetteer of Scotland
Series URI: https://w3id.org/hto/Series/9930626093804340 ‚Äî Title: Ordnance gazetteer of Scotland
Series URI: https://w3id.org/hto/Series/9931003343804340 ‚Äî Title: gazetteer of Scotland
Series URI: https://w3id.org/hto/Series/9931344573804340 ‚Äî Title: gazetteer of Scotland: containing a particular and concise description of the counties, parishes, islands, cities ... With ... map
Series URI: https://w3

### Query 4: Explore editorial metadata in the Gazetteers knowledge graph

This set of queries investigates how editorial contributions are modeled in the dataset using the `hto:editor` property and linked `hto:Person` entities. Editors are critical figures in shaping the content and structure of the Gazetteers.




#### üîπ Query 4.1: Find all resources with an associated editor

This query retrieves all resources that declare an `schema:editor`, along with the URI of the editor (typically an `schema:Person`).

This helps identify which series or volumes are explicitly linked to known editors.



In [5]:
sparql.setQuery("""
PREFIX schema: <https://schema.org/>

SELECT ?subject ?editor WHERE {
  ?subject schema:editor ?editor .
}
LIMIT 20
""")

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print(f"{r['subject']['value']} ‚Üí {r['editor']['value']}")
except Exception as e:
    print(e)


https://w3id.org/hto/Series/9931344583804340 ‚Üí https://w3id.org/hto/Person/4607874226
https://w3id.org/hto/Series/9910440713804340 ‚Üí https://w3id.org/hto/Person/5247046190
https://w3id.org/hto/Series/9928151783804340 ‚Üí https://w3id.org/hto/Person/7593396701
https://w3id.org/hto/Series/9928112733804340 ‚Üí https://w3id.org/hto/Person/4251664498
https://w3id.org/hto/Series/9933057493804340 ‚Üí https://w3id.org/hto/Person/4251664498
https://w3id.org/hto/Series/9928228793804340 ‚Üí https://w3id.org/hto/Person/9594167312
https://w3id.org/hto/Series/9930626093804340 ‚Üí https://w3id.org/hto/Person/9594167312
https://w3id.org/hto/Series/9931003343804340 ‚Üí https://w3id.org/hto/Person/4957971131
https://w3id.org/hto/Series/9931344573804340 ‚Üí https://w3id.org/hto/Person/4957971131
https://w3id.org/hto/Series/9931344933804340 ‚Üí https://w3id.org/hto/Person/4957971131


#### üîπ Query 4.2: Inspect all properties of a specific editor (Person)

Given the URI of a specific `schema:Person`, this query lists all associated properties. This typically includes:

- `rdf:type` (should be `schema:Person`)
- `foaf:name` (if available)
- `hto:birthYear`, `hto:deathYear` (if known)

This allows us to inspect how editors are semantically described in the graph.


In [6]:
person_uri = "https://w3id.org/hto/Person/5247046190"

sparql.setQuery(f"""
SELECT ?p ?o WHERE {{
  <{person_uri}> ?p ?o .
}}
""")

try:
    ret = sparql.queryAndConvert()
    print(f"All properties for {person_uri}:\n")
    for r in ret["results"]["bindings"]:
        print(f"{r['p']['value']} ‚Üí {r['o']['value']}")
except Exception as e:
    print(e)


All properties for https://w3id.org/hto/Person/5247046190:

http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‚Üí https://schema.org/Person
http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‚Üí http://xmlns.com/foaf/0.1/Person
http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‚Üí https://w3id.org/hto#Agent
http://xmlns.com/foaf/0.1/name ‚Üí Chambers, William
https://w3id.org/hto#birthYear ‚Üí 1800
https://w3id.org/hto#deathYear ‚Üí 1883


#### üîπ Query 4.3: List all distinct editor names

This query navigates from edited resources (`schema:editor`) to the linked `schema:Person`, then retrieves the person's name using `foaf:name`.

This gives a clean list of all editors represented in the graph by name ‚Äî useful for documentation, indexing, or attribution.

Together, these queries illustrate how biographical and editorial metadata is encoded and linked across multiple entity types.

In [7]:
sparql.setQuery("""
PREFIX schema: <https://schema.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT DISTINCT ?name WHERE {
  ?instance schema:editor ?editor .
  ?editor foaf:name ?name .
}
LIMIT 20
""")

try:
    ret = sparql.queryAndConvert()
    print("Editor names (via foaf:name):\n")
    for r in ret["results"]["bindings"]:
        print(f"- {r['name']['value']}")
except Exception as e:
    print(e)


Editor names (via foaf:name):

- Scotland. [Appendix. - Descriptions, Topography & Travels.]
- Chambers, William
- Lewis, Samuel
- Wilson, John Marius.
- Groome, Francis Hindes
- Scotland. [Appendix. - Descriptions, Topography and Travels.]


### Query 5: Retrieve volumes and their parent series from the Gazetteers of Scotland collection

This query lists all `hto:Volume` resources that are members of a `hto:Series`, which in turn belongs to the broader `hto:WorkCollection` titled *Gazetteers of Scotland Collection*.

Each volume is returned with:
- Its title (`hto:title`)
- The title of the series it belongs to

This hierarchical query traverses three levels of structure:
1. **Collection** ‚Üí `hto:WorkCollection`
2. **Series** ‚Üí `hto:Series`
3. **Volume** ‚Üí `hto:Volume`

The result gives a curated view of how individual volumes are organized into series and grouped within the overall collection. It is useful for bibliographic exploration and user-facing navigation interfaces.


In [8]:
sparql.setQuery("""
    PREFIX hto: <https://w3id.org/hto#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX schema: <https://schema.org/>
    SELECT * WHERE {
        ?volume a hto:Volume;
            hto:title ?vol_title.
        ?series a hto:Series;
            hto:title ?series_title;
            schema:hasPart ?volume.
        ?collection a hto:WorkCollection;
            rdfs:label "Gazetteers of Scotland Collection";
            schema:hasPart ?series.
        }
    """
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print(f"Volume title: {r['vol_title']['value']}, in series: {r['series_title']['value']}")
except Exception as e:
    print(e)

Volume title: gazetteer of Scotland. [With plates and maps.] 1838, Volume 1, in series: gazetteer of Scotland. [With plates and maps.]
Volume title: gazetteer of Scotland. [With plates and maps.] 1838, Volume 2, in series: gazetteer of Scotland. [With plates and maps.]
Volume title: imperial gazetteer of Scotland; or, Dictionary of Scottish topography, compiled from the most recent authorities, and forming a complete body of Scottish geography, physical, statistical, and historical 1868, Volume 1, in series: imperial gazetteer of Scotland; or, Dictionary of Scottish topography, compiled from the most recent authorities, and forming a complete body of Scottish geography, physical, statistical, and historical
Volume title: imperial gazetteer of Scotland; or, Dictionary of Scottish topography, compiled from the most recent authorities, and forming a complete body of Scottish geography, physical, statistical, and historical 1868, Volume 2, in series: imperial gazetteer of Scotland; or, Dic

### Query 6: Explore metadata properties of a sample Gazetteer volume

This pair of queries is used to inspect the metadata of a single `hto:Volume` in detail.


#### üîπ Query 6.1: Select a sample volume URI

This query selects one instance of a `hto:Volume` from the dataset. It serves as a dynamic starting point for detailed inspection of that volume's metadata.

This step ensures that we are querying an actual, existing volume in the graph.


In [9]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>

SELECT ?volume WHERE {
  ?volume a hto:Volume .
}
LIMIT 1
""")

try:
    ret = sparql.queryAndConvert()
    if ret["results"]["bindings"]:
        volume_uri = ret["results"]["bindings"][0]["volume"]["value"]
        print("Volume URI:", volume_uri)
    else:
        print("No volumes found.")
except Exception as e:
    print(e)



Volume URI: https://w3id.org/hto/Volume/9910440713804340_97424370


#### üîπ Query 6.2: Retrieve all properties of the selected volume

Using the volume URI obtained in the previous step (e.g., `hto:Volume/9910440713804340_97424370`), this query retrieves all RDF properties and values linked to it.

Typical metadata includes:
- `hto:title` ‚Äî full title of the volume
- `hto:number` ‚Äî volume number
- `hto:numberOfPages` ‚Äî total page count
- `hto:permanentURL` ‚Äî link to the digitized version
- `schema:isPartOf` ‚Äî series the volume belongs to
- `schema:hasPart` ‚Äî individual pages contained in the volume

Together, these queries allow for close inspection of how volume-level bibliographic and structural metadata is modeled in the HTO knowledge graph.

In [10]:
sparql.setQuery("""
SELECT ?p ?o WHERE {
  <https://w3id.org/hto/Volume/9910440713804340_97424370> ?p ?o .
}

LIMIT 20
""")

try:
    ret = sparql.queryAndConvert()
    print("Properties for Volume 9910440713804340_97424370:\n")
    for r in ret["results"]["bindings"]:
        print(f"{r['p']['value']} ‚Üí {r['o']['value']}")
except Exception as e:
    print(e)


Properties for Volume 9910440713804340_97424370:

http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‚Üí https://schema.org/CreativeWork
http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‚Üí https://w3id.org/hto#Volume
http://purl.org/dc/terms/identifier ‚Üí 97424370
https://w3id.org/hto#number ‚Üí 1
https://w3id.org/hto#numberOfPages ‚Üí 538
https://w3id.org/hto#permanentURL ‚Üí https://digital.nls.uk/97424370
https://w3id.org/hto#title ‚Üí gazetteer of Scotland. [With plates and maps.] 1838, Volume 1
https://schema.org/isPartOf ‚Üí https://w3id.org/hto/Series/9910440713804340
https://schema.org/hasPart ‚Üí https://w3id.org/hto/LocationRecord/9910440713804340_97424370_1007501475_0
https://schema.org/hasPart ‚Üí https://w3id.org/hto/Page/9910440713804340_97424370_470
https://schema.org/hasPart ‚Üí https://w3id.org/hto/LocationRecord/9910440713804340_97424370_1007608998_0
https://schema.org/hasPart ‚Üí https://w3id.org/hto/Page/9910440713804340_97424370_471
https://schema.org/hasPart ‚Üí htt

### Query 7: List all properties used in `hto:OriginalDescription` entries

This query retrieves all distinct RDF properties that appear in resources of type `hto:OriginalDescription`. These represent individual article-level entries extracted from the gazetteers.

By examining which properties are used on `hto:OriginalDescription`, we can understand how each entry is semantically described ‚Äî including its content, provenance, and quality metadata.

Typical properties include:
- `hto:text` ‚Äî the full textual content of the article
- `hto:hasTextQuality` ‚Äî a quality indicator (e.g., Low, High)
- `hto:wasExtractedFrom` ‚Äî the source page or document
- `prov:wasAttributedTo` ‚Äî the responsible agent (e.g., MappingChange pipeline)

This query is useful for schema exploration and understanding how article-level data is structured.


In [11]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>

SELECT DISTINCT ?p WHERE {
  ?desc a hto:OriginalDescription ;
        ?p ?o .
}
LIMIT 50
""")

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print(f"Property: {r['p']['value']}")
except Exception as e:
    print(e)



Property: http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Property: https://w3id.org/hto#hasAnnotation
Property: https://w3id.org/hto#hasTextQuality
Property: https://w3id.org/hto#text
Property: https://w3id.org/hto#wasExtractedFrom
Property: http://www.w3.org/ns/prov#wasAttributedTo


### Query 8: Retrieve the text of Gazetteer articles and their source documents

This query returns sample `hto:OriginalDescription` entries, showing the full article text and the `hto:InformationResource` from which it was extracted (typically an ALTO XML file or digitized page).

Each result includes:
- The URI of the article description (`?desc`)
- The full text content (`hto:text`)
- The source document or page URI (`hto:wasExtractedFrom`)

This query provides a window into the actual semantic content of the gazetteer entries, enabling inspection of OCR outputs and understanding of how content is linked to its digitized provenance.

It is especially useful for building content previews, search indexes, or validating extraction quality.


In [12]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>

SELECT ?desc ?text ?source WHERE {
  ?desc a hto:OriginalDescription ;
        hto:text ?text ;
        hto:wasExtractedFrom ?source .
}
LIMIT 10
""")

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print(f"\nDescription: {r['desc']['value']}\nSource: {r['source']['value']}\nText: {r['text']['value'][:200]}...\n")
except Exception as e:
    print(e)




Description: https://w3id.org/hto/OriginalDescription/9910440713804340_97424370_1007501475_0NLS
Source: https://w3id.org/hto/DigitalFile/97424370_alto_97430004_34_xml
Text: a united parish on the mainland of Orkney, of nine miles in length, with a varying breadth, lying west of Kirkwall. In its centre is the lake of S tennis or Stenhouse, which is nearly divided in two b...


Description: https://w3id.org/hto/OriginalDescription/9910440713804340_97424370_1007608998_0NLS
Source: https://w3id.org/hto/DigitalFile/97424370_alto_97430016_34_xml
Text: an inlet of the sea on the south-east coast of Sutherlandshire, across the narrow neck of which there is a ferry, on the thoroughfare along the coast northwards from Dornoch....


Description: https://w3id.org/hto/OriginalDescription/9910440713804340_97424370_1016059890_0NLS
Source: https://w3id.org/hto/DigitalFile/97424370_alto_97430196_34_xml
Text: FRODA, an islet on the west coast of Skye....


Description: https://w3id.org/hto/OriginalDesc

### Query 9: Inspect all RDF properties of a single `hto:OriginalDescription` entry

This two-part query inspects one specific Gazetteer article by first selecting an example description and then listing all of its associated RDF triples.


#### üîπ Query 9.1: Select a sample `OriginalDescription` URI

This query retrieves one resource of type `hto:OriginalDescription`. This URI will be used to examine all the semantic properties associated with that individual article.


In [13]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>

SELECT ?desc ?p ?o WHERE {
  ?desc a hto:OriginalDescription ;
        ?p ?o .
}
LIMIT 1
""")

# Now this will work:
try:
    ret = sparql.queryAndConvert()
    binding = ret["results"]["bindings"][0]
    desc_uri = binding["desc"]["value"]
    print(f"Using description URI: {desc_uri}")
except Exception as e:
    print("Could not get description URI:", e)




Using description URI: https://w3id.org/hto/OriginalDescription/9910440713804340_97424370_1007501475_0NLS


#### üîπ Query 9.2: Retrieve all RDF properties of that description

Once a description URI is selected, this second query prints all RDF triples where that URI is the subject. This includes key metadata such as:

- `hto:text` ‚Äî the full article content
- `hto:wasExtractedFrom` ‚Äî the page or document source
- `hto:hasTextQuality` ‚Äî quality annotation (e.g., Low, High)
- `prov:wasAttributedTo` ‚Äî the agent responsible for the extraction
- Any other custom properties used in semantic modeling

Together, these queries allow you to deeply inspect the structure and provenance of individual gazetteer entries.

In [14]:
sparql.setQuery(f"""
SELECT ?p ?o WHERE {{
  <{desc_uri}> ?p ?o .
}}
""")

try:
    ret = sparql.queryAndConvert()
    print(f"\nAll properties for {desc_uri}:\n")
    for r in ret["results"]["bindings"]:
        print(f"{r['p']['value']} ‚Üí {r['o']['value']}")
except Exception as e:
    print(e)



All properties for https://w3id.org/hto/OriginalDescription/9910440713804340_97424370_1007501475_0NLS:

http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‚Üí https://w3id.org/hto#OriginalDescription
http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‚Üí http://www.w3.org/ns/prov#Entity
https://w3id.org/hto#hasAnnotation ‚Üí https://w3id.org/hto/Annotation/9910440713804340_97424370_1007501475_0NLS106_114
https://w3id.org/hto#hasAnnotation ‚Üí https://w3id.org/hto/Annotation/9910440713804340_97424370_1007501475_0NLS145_153
https://w3id.org/hto#hasAnnotation ‚Üí https://w3id.org/hto/Annotation/9910440713804340_97424370_1007501475_0NLS329_336
https://w3id.org/hto#hasAnnotation ‚Üí https://w3id.org/hto/Annotation/9910440713804340_97424370_1007501475_0NLS35_41
https://w3id.org/hto#hasAnnotation ‚Üí https://w3id.org/hto/Annotation/9910440713804340_97424370_1007501475_0NLS376_386
https://w3id.org/hto#hasAnnotation ‚Üí https://w3id.org/hto/Annotation/9910440713804340_97424370_1007501475_0NLS905_917

### Query 10: Retrieve and inspect a `hto:LocationRecord`

This two-part query focuses on exploring a `hto:LocationRecord`, which represents a semantically enriched article entry linked to a specific place.



#### üîπ Query 10.1: Retrieve a sample `LocationRecord` URI

This query selects a single resource of type `hto:LocationRecord`. This record aggregates structured metadata about an article that refers to a geographic location.


In [15]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>

SELECT ?record WHERE {
  ?record a hto:LocationRecord .
}
LIMIT 1
""")

try:
    ret = sparql.queryAndConvert()
    record_uri = ret["results"]["bindings"][0]["record"]["value"]
    print(f"Using LocationRecord URI: {record_uri}")
except Exception as e:
    print("Could not retrieve a LocationRecord:", e)




Using LocationRecord URI: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_1007501475_0


#### üîπ Query 10.2: List all RDF properties of that `LocationRecord`

Once the URI is obtained, this query lists all RDF properties associated with the record. These typically include:

- `rdfs:label` ‚Äî the article heading (e.g., "FIRTH AND STENNIS")
- `hto:hasOriginalDescription` ‚Äî link to the full textual description
- `hto:startsAtPage` / `hto:endsAtPage` ‚Äî page-level provenance
- `hto:refersToModernPlace` ‚Äî the linked `crm:Place` resource

This structure allows rich querying of articles by location, supports place-based exploration, and connects textual content with bibliographic context.

In [16]:
sparql.setQuery("""
SELECT ?p ?o WHERE {
  <https://w3id.org/hto/LocationRecord/9910440713804340_97424370_1007501475_0> ?p ?o .
}
""")

try:
    ret = sparql.queryAndConvert()
    print("All properties for the LocationRecord:\n")
    for r in ret["results"]["bindings"]:
        print(f"{r['p']['value']} ‚Üí {r['o']['value']}")
except Exception as e:
    print(e)


All properties for the LocationRecord:

http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‚Üí https://w3id.org/hto#LocationRecord
http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‚Üí https://w3id.org/hto#InternalRecord
http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‚Üí https://w3id.org/hto#ConceptRecord
http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‚Üí https://w3id.org/hto#TermRecord
http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‚Üí https://schema.org/CreativeWork
http://www.w3.org/2000/01/rdf-schema#label ‚Üí FIRTH AND STENNIS
https://w3id.org/hto#endsAtPage ‚Üí https://w3id.org/hto/Page/9910440713804340_97424370_470
https://w3id.org/hto#hasOriginalDescription ‚Üí https://w3id.org/hto/OriginalDescription/9910440713804340_97424370_1007501475_0NLS
https://w3id.org/hto#refersToModernPlace ‚Üí https://w3id.org/hto/SP2_Phenomenal_Place/7055516556
https://w3id.org/hto#startsAtPage ‚Üí https://w3id.org/hto/Page/9910440713804340_97424370_470
https://schema.org/isPartOf ‚Üí https://w3id.org/

### Query 11: Retrieve title, start-end page, ext of a article place


### Query 11: Retrieve article metadata including title, text, and page range

This query aggregates key metadata about Gazetteer articles modeled as `hto:LocationRecord` resources.

Each result includes:
- `rdfs:label` ‚Äî the article title or heading (e.g., ‚ÄúFIRTH AND STENNIS‚Äù)
- `hto:text` ‚Äî the full textual content of the article (from `hto:OriginalDescription`)
- `hto:startsAtPage` and `hto:endsAtPage` ‚Äî the page span in the digitized volume

The query joins the `hto:LocationRecord` with its corresponding `hto:OriginalDescription`, providing a compact view of what each article covers, how long it is, and where it appears in the source volume.

This is useful for content previews, document navigation interfaces, or comparative analysis across editions.


In [17]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?record ?name ?text ?startPage ?endPage WHERE {
  ?record a hto:LocationRecord ;
          rdfs:label ?name ;
          hto:hasOriginalDescription ?desc ;
          hto:startsAtPage ?startPage ;
          hto:endsAtPage ?endPage .
  ?desc hto:text ?text .
}
LIMIT 20
""")

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print(f"Article title: {r['name']['value']}")
        print(f"Start page: {r['startPage']['value']}")
        print(f"End page:   {r['endPage']['value']}")
        print(f"Text: {r['text']['value'][:200]}...\n")
except Exception as e:
    print(e)



Article title: FIRTH AND STENNIS
Start page: https://w3id.org/hto/Page/9910440713804340_97424370_470
End page:   https://w3id.org/hto/Page/9910440713804340_97424370_470
Text: a united parish on the mainland of Orkney, of nine miles in length, with a varying breadth, lying west of Kirkwall. In its centre is the lake of S tennis or Stenhouse, which is nearly divided in two b...

Article title: FLEET LOCH
Start page: https://w3id.org/hto/Page/9910440713804340_97424370_471
End page:   https://w3id.org/hto/Page/9910440713804340_97424370_471
Text: an inlet of the sea on the south-east coast of Sutherlandshire, across the narrow neck of which there is a ferry, on the thoroughfare along the coast northwards from Dornoch....

Article title: FRODA
Start page: https://w3id.org/hto/Page/9910440713804340_97424370_486
End page:   https://w3id.org/hto/Page/9910440713804340_97424370_486
Text: FRODA, an islet on the west coast of Skye....

Article title: CLYNE
Start page: https://w3id.org/hto/Page/9910

### Query 12: Retrieve all Gazetteer entries titled "DUNDEE" with article text, volume, series, and year

This query returns all Gazetteer entries with the title `"DUNDEE"` from the knowledge graph, using the `rdfs:label` property on `hto:LocationRecord`.

For each entry, the query retrieves:
- The full article text (`hto:text`)
- Start and end pages in the source volume (`hto:startsAtPage`, `hto:endsAtPage`)
- The volume (`hto:Volume`) it belongs to, with title
- The series (`hto:Series`) the volume is part of, with title
- The year of publication, resolved from either the volume or the series (`hto:yearPublished`)

This query is ideal for comparing how a single place, such as Dundee, has been described across different editions and series in the Gazetteers collection.


In [18]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <https://schema.org/>

SELECT DISTINCT ?record ?desc ?text ?startPage ?endPage ?volume ?volumeTitle ?series ?seriesTitle ?seriesYear WHERE {
  ?record a hto:LocationRecord ;
          rdfs:label "DUNDEE" ;
          hto:hasOriginalDescription ?desc ;
          hto:startsAtPage ?startPage ;
          hto:endsAtPage ?endPage .

  ?desc hto:text ?text .

  # Get the volume from the page
  ?volume schema:hasPart ?page .
  FILTER (?page = ?startPage || ?page = ?endPage)

  OPTIONAL { ?volume hto:title ?volumeTitle . }
  OPTIONAL { ?series schema:hasPart ?volume }
  OPTIONAL { ?series hto:title ?seriesTitle . }
  OPTIONAL { ?series hto:yearPublished ?seriesYear . }
}
LIMIT 20
""")

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print(f"üìò Record: {r['record']['value']}")
        print(f"üìÑ Start Page: {r['startPage']['value']} ‚Üí End Page: {r['endPage']['value']}")
        print(f"üìö Volume: {r.get('volumeTitle', {}).get('value', 'N/A')}")
        print(f"üì¶ Series: {r.get('seriesTitle', {}).get('value', 'N/A')}")
        print(f"üìÖ Year: {r.get('seriesYear', {}).get('value', 'N/A')}")
        print(f"üìù Text: {r['text']['value'][:300]}...\n")
except Exception as e:
    print(e)



üìò Record: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_769119998_0
üìÑ Start Page: https://w3id.org/hto/Page/9910440713804340_97424370_258 ‚Üí End Page: https://w3id.org/hto/Page/9910440713804340_97424370_268
üìö Volume: gazetteer of Scotland. [With plates and maps.] 1838, Volume 1
üì¶ Series: gazetteer of Scotland. [With plates and maps.]
üìÖ Year: 1838
üìù Text: DUNDEE. 229 of some high rocks close to the river, and about a quarter of a mile from the church, was erected, in early times, a tolerably secure fortress, similar to that still nearly entire, at Broughty, a few miles farther down the Tay. Little is satisfactorily known of the castle of Dundee. Afte...

üìò Record: https://w3id.org/hto/LocationRecord/9928112733804340_97459138_769119998_0
üìÑ Start Page: https://w3id.org/hto/Page/9928112733804340_97459138_562 ‚Üí End Page: https://w3id.org/hto/Page/9928112733804340_97459138_572
üìö Volume: imperial gazetteer of Scotland; or, Dictionary of Scottish t

### Query 13: List the longest Gazetteer articles by text length

This query retrieves the top 10 `hto:LocationRecord` entries in the Gazetteers knowledge graph, ranked by the length of their textual content (`hto:text`).

Each result includes:
- The article title (`rdfs:label`)
- The URI of the location record
- A sample of the full text
- Volume title

Sorting articles by character length is a useful heuristic for identifying substantial entries ‚Äî such as major cities, counties, or complex place groupings ‚Äî which often span multiple paragraphs or pages. These long entries are ideal candidates for in-depth analysis, LLM summarization,


In [19]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <https://schema.org/>

SELECT ?record ?name ?text ?volumeTitle WHERE {
  ?record a hto:LocationRecord ;
          rdfs:label ?name ;
          hto:hasOriginalDescription ?desc ;
          hto:startsAtPage ?page .

  ?desc hto:text ?text .

  ?volume schema:hasPart ?page .
  OPTIONAL { ?volume hto:title ?volumeTitle . }
}
ORDER BY DESC(STRLEN(?text))
LIMIT 10
""")

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print(f"üìç Title: {r['name']['value']}")
        print(f"üìù Record: {r['record']['value']}")
        print(f"üìö Volume: {r.get('volumeTitle', {}).get('value', 'N/A')} ({r.get('year', {}).get('value', 'N/A')})")
        print(f"üìè Length: {len(r['text']['value'])} characters")
        print(f"üîç Excerpt: {r['text']['value'][:300]}...\n")
except Exception as e:
    print(e)





üìç Title: EDINBURGH
üìù Record: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_5424703086_0
üìö Volume: gazetteer of Scotland. [With plates and maps.] 1838, Volume 1 (N/A)
üìè Length: 800374 characters
üîç Excerpt: EDINBURGH. 285 monarch, held his first parliament in Edinburgh, in the year 1214, and this event served to give it still more the air of a capital and seat of supreme justice. When Alexander, in 1221, married Joan, the princess of England, he made Edinburgh the place of his residence for some time. ...

üìç Title: EDINBURGH
üìù Record: https://w3id.org/hto/LocationRecord/9928112733804340_97459138_5424703086_0
üìö Volume: imperial gazetteer of Scotland; or, Dictionary of Scottish topography, compiled from the most recent authorities, and forming a complete body of Scottish geography, physical, statistical, and historical 1868, Volume 1 (N/A)
üìè Length: 678908 characters
üîç Excerpt: EDINBURGH. -,:;:; EDINBURGH. burgh and Glasgow railway and the term

### Query 14: Show article-to-article references (`rdfs:seeAlso`) by name

This query identifies and displays semantic links between Gazetteer entries that refer to one another using the `hto:refersTo` property.

Each result shows:
- The source article title (`rdfs:label`) and URI (`hto:LocationRecord`)
- The referred-to article‚Äôs title and URI

By joining the `rdfs:seeAlso` target with its own `rdfs:label`, the query outputs human-readable relations such as:

> `CRAWFURDSDIKES, see also GREENOCK.`

This is useful for:
- Mapping internal cross-references within the Gazetteers corpus
- Detecting redirects, summaries, or composite place descriptions
- Building link graphs or knowledge navigation tools


In [20]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?record ?recordName ?ref ?refName WHERE {
  ?record a hto:LocationRecord ;
          rdfs:label ?recordName ;
          rdfs:seeAlso ?ref .

  ?ref rdfs:label ?refName .
}
ORDER BY ?record
LIMIT 20
""")

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print(f"üîó {r['recordName']['value']} refers to {r['refName']['value']}")
        print(f"   ‚Ü≥ Record: {r['record']['value']}")
        print(f"   ‚Ü≥ See also reference: {r['ref']['value']}\n")
except Exception as e:
    print(e)





üîó CRAWFURDSDIKES refers to GREENOCK
   ‚Ü≥ Record: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_2371760060_0
   ‚Ü≥ See also reference: https://w3id.org/hto/LocationRecord/9910440713804340_97430830_2306452501_0

üîó ANDERSTON refers to GLASGOW
   ‚Ü≥ Record: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_318570476_0
   ‚Ü≥ See also reference: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_4307577365_0

üîó AYR NEWTON UPON refers to NEWTON UP ON AYR
   ‚Ü≥ Record: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_3325373239_0
   ‚Ü≥ See also reference: https://w3id.org/hto/LocationRecord/9910440713804340_97430830_9816079482_0

üîó CUNNINGHAM refers to AYRSHIRE
   ‚Ü≥ Record: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_3437658370_0
   ‚Ü≥ See also reference: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_3947268454_0

üîó CUTHBERTS ST refers to EDINBURGH
   ‚Ü≥ Record: https://w3id.org/hto/

### Query 15: Identify Gazetteer article titles reused across multiple records

This query counts how many times each `rdfs:label` (place or article title) appears across the corpus of `hto:LocationRecord` entries.

It groups records by name and returns those names that are used more than once, showing how many distinct records share the same title.

Each result includes:
- The name/title (`rdfs:label`)
- The number of associated records (e.g., entries across different volumes or years)

This is essential for:
- Detecting reused or ambiguous names (e.g., "LOGIE", "KIRKHILL")
- Understanding how a place was described differently across sources
- Supporting disambiguation, temporal analysis, or cross-edition alignment


In [21]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?name (COUNT(?record) AS ?count) WHERE {
  ?record a hto:LocationRecord ;
          rdfs:label ?name .
}
GROUP BY ?name
HAVING (COUNT(?record) > 1)
ORDER BY DESC(?count)
LIMIT 20
""")

try:
    ret = sparql.queryAndConvert()
    print("üß≠ Repeated article names across records:\n")
    for r in ret["results"]["bindings"]:
        print(f"{r['name']['value']} ‚Äî {r['count']['value']} records")
except Exception as e:
    print(e)




üß≠ Repeated article names across records:

GRANGE ‚Äî 37 records
KIRKHILL ‚Äî 36 records
LOGIE ‚Äî 34 records
KIRKMICHAEL ‚Äî 33 records
KINCARDINE ‚Äî 32 records
MILTON ‚Äî 32 records
ABBEY ‚Äî 30 records
CARRON ‚Äî 29 records
BANKHEAD ‚Äî 26 records
NEWTON ‚Äî 26 records
BENMORE ‚Äî 25 records
BRIDGEND ‚Äî 25 records
LADYKIRK ‚Äî 25 records
FLADDA ‚Äî 24 records
GREENLAW ‚Äî 23 records
KIRKLAND ‚Äî 23 records
INCH ‚Äî 22 records
LESLIE ‚Äî 22 records
NEWBIGGING ‚Äî 22 records
KILBRIDE ‚Äî 21 records


### Query 16: Retrieve Gazetteer articles with alternate names

This query identifies `hto:LocationRecord` entries that include both a primary name (`rdfs:label`) and one or more alternate or variant names stored using the `skos:altLabel` property.

Each result includes:
- The main article title (`rdfs:label`)
- An alternate name (`skos:altLabel`) such as a historical spelling, synonym, or variant
- The URI of the Gazetteer record

These alternate names are typically extracted from metadata fields like ‚ÄúAlternative names‚Äù in the original digitized sources. Including them is important for:
- Enhancing place name disambiguation
- Supporting fuzzy search and variant recognition
- Preserving historical name usage and orthographic shifts across editions


In [22]:
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?record ?name ?altName WHERE {
  ?record a hto:LocationRecord ;
          rdfs:label ?name ;
          skos:altLabel?altName .
}
LIMIT 20
""")

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print(f"üìù {r['name']['value']} ‚Äî also known as: {r['altName']['value']}")
        print(f"   ‚Ü≥ URI: {r['record']['value']}\n")
except Exception as e:
    print(e)



üìù AUCHTERTOUL ‚Äî also known as: AUCHTERTEEL
   ‚Ü≥ URI: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_1403281519_0

üìù ABBS HEAD ‚Äî also known as: ST ABBS HEAD
   ‚Ü≥ URI: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_1683840159_0

üìù AVENDALE ‚Äî also known as: STRATHAVEN
   ‚Ü≥ URI: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_1813584905_0

üìù FINDON ‚Äî also known as: FINNAN
   ‚Ü≥ URI: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_1859902221_0

üìù COPINSHA COPINSHAY ‚Äî also known as: CAPINSHAY
   ‚Ü≥ URI: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_1902924006_0

üìù CON ‚Äî also known as: CHON LOCH
   ‚Ü≥ URI: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_2039597840_0

üìù GLASSFORD ‚Äî also known as: GLASFORD
   ‚Ü≥ URI: https://w3id.org/hto/LocationRecord/9910440713804340_97424370_2273882797_0

üìù ATHOL ‚Äî also known as: ATHOLE
   ‚Ü≥ URI: https://w3id.org/hto/Loc

## Block B: Four key competency questions

CQ1: How is a place described over time?

CQ2: How is an article extracted?

CQ3: Where is the place that an article primarily describes?

CQ4: What places are mentioned in an article?


### CQ1: How is a place described over time?

Three queries are introduced here to:
1. Get all uris of concepts which were described in location records named 'Brucehaven'
2. Given one of above concept, list all location records which describe it.
3. Given one of above concept, how is it described in Wikidata.
This is essential for:
- Understanding how a place was described across years
- Supporting disambiguation, temporal analysis, or cross-edition alignment

#### Query 2.1:
This query gets all uris of concepts (`skos:Concept`) which were described in location records named 'Brucehaven'. Concepts and location records are linked visa `hto:hasConceptRecord` property. Multiple concepts for 'Brucehaven' indicates possible multiple different interpretations.


In [28]:
# Get all concept uris of the place Brucehaven
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?concept WHERE {
    ?concept a skos:Concept ;
          hto:hasConceptRecord ?record.
    ?record a hto:LocationRecord ;
        rdfs:label ?label.
    FILTER (LCASE(?label) = LCASE("Brucehaven"))
}
""")

try:
    ret = sparql.queryAndConvert()
    concepts = {}
    for r in ret["results"]["bindings"]:
        print(f"Concept uri for the Brucehaven: {r['concept']['value']}")
except Exception as e:
    print(e)

Concept uri for the Brucehaven: https://w3id.org/hto/Concept/gaz2337271726_1


#### Query 1.2:
Given one of the concept uri, this query list all location records which describe it, ordered by the publication years.

Each result includes:
1. The location record (`hto:LocationRecord`) with its name (`rdfs:label`) of uri.
2. Start and end page where the record is (`hto:startsAtPage`, `hto:endsAtPage`), with their page numbers.
3. The volume (`hto:Volume`) where the record is, with volume title (`hto:title`), publication year (`hto:yearPublished`).
4. A sample of full text (`hto:text`).

In [30]:
# How the concept https://w3id.org/hto/Concept/gaz2337271726_1 is described across editions.
import string
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX schema: <https://schema.org/>

SELECT * WHERE {
    <https://w3id.org/hto/Concept/gaz2337271726_1> a skos:Concept ;
          hto:hasConceptRecord ?record.
    ?record a hto:LocationRecord ;
        rdfs:label ?name;
        hto:startsAtPage ?s_page ;
        hto:endsAtPage ?e_page ;
        hto:hasOriginalDescription ?desc .
    ?desc hto:text ?text .
    ?s_page a hto:Page;
        hto:number ?s_page_num.
    ?e_page a hto:Page;
        hto:number ?e_page_num.
    ?volume a hto:Volume;
        hto:title ?vol_title ;
        schema:hasPart ?s_page.
    ?series a hto:Series;
        schema:hasPart ?volume;
        hto:title ?series_title ;
        hto:yearPublished ?year .
} ORDER BY ?year
""")

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print(f"üìò Record: {string.capwords(r['name']['value'])} - {r['record']['value']}")
        print(f"\tüìÑ in page: {r['s_page_num']['value']} -{r['e_page_num']['value']}")
        print(f"\tüìö in volume: <{r['vol_title']['value']}> published in {r['year']['value']}")
        print(f"with the following description:")
        print(f"{r['text']['value'][:200]}.....\n")
except Exception as e:
    print(e)

üìò Record: Brucehaven - https://w3id.org/hto/LocationRecord/9931344583804340_97421702_2337271726_0
	üìÑ in page: 38 -38
	üìö in volume: <Gazetteer of Scotland; arranged under the various descriptions of counties, parishes, islands 1825?> published in 1825
with the following description:
a small village in the parish of Dunfermline, Fifeshire, adjoining the village of Limekilns, where there is a brewery and a quay......

üìò Record: Brucehaven - https://w3id.org/hto/LocationRecord/9910440713804340_97424370_2337271726_0
	üìÑ in page: 142 -142
	üìö in volume: <gazetteer of Scotland. [With plates and maps.] 1838, Volume 1> published in 1838
with the following description:
a small village in Fife, on the coast of the Firth of Forth, in the parish of Dunfermline......

üìò Record: Brucehaven - https://w3id.org/hto/LocationRecord/9928112733804340_97459138_2337271726_0
	üìÑ in page: 305 -305
	üìö in volume: <imperial gazetteer of Scotland; or, Dictionary of Scottish topography, compi

#### Query 2.3:
Given one of above concept, this [federated query](https://www.w3.org/TR/sparql11-federated-query/) retrieve the linked Wikidata item which describes the concept utilising both SPARQL query service for our knowledge graph and the service provided by Wikidata (`https://query.wikidata.org/sparql`). The result includes a Wikidata item uri, and description (`schema:description`) of this item.

In [31]:
import sys

# How the concept https://w3id.org/hto/Concept/gaz2337271726_1 is described in Wikidata.
sparql.setQuery("""
PREFIX hto: <https://w3id.org/hto#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?record ?label ?description WHERE {
    <https://w3id.org/hto/Concept/gaz2337271726_1> a skos:Concept ;
        hto:hasConceptRecord ?record.
    ?record a hto:ExternalRecord;
        hto:hasAuthorityType hto:WikidataItem.
    SERVICE <https://query.wikidata.org/sparql> {
        ?record rdfs:label ?label ;
            schema:description ?description .
        FILTER (lang(?label) = "en")
        FILTER (lang(?description) = "en")
    }
}
""")

try:
    ret = sparql.queryAndConvert()
    concepts = {}
    for r in ret["results"]["bindings"]:
        #print(r['record']['value'])
        print(f"{r['label']['value']} has the following description in Wikidata <{r['record']['value']}>:\n {r['description']['value']}")
except Exception as e:
    print(e)


Brucehaven has the following description in Wikidata <http://www.wikidata.org/entity/Q56614958>:
 architectural structure in Fife, Scotland, UK


### CQ2: How is an article extracted?

This question concerns the provenance of an article, including its source files and the method used during extraction. This query retrieves this information for the Brucehaven article in the 1825 edition.

Each result includes:
- The description text (`hto:text`) of the article.
- The extraction tool (`prov:wasAttributedTo`).
- The label of the source where this article was extracted from.
- The dataset which the source belongs to.

This query is essential for
1. the extraction provenance of an article

In [35]:
article_uri = "<https://w3id.org/hto/LocationRecord/9931344583804340_97421702_2337271726_0>"
sparql.setQuery("""
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX hto: <https://w3id.org/hto#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX schema: <https://schema.org/>

SELECT ?text ?extraction_tool ?source_label ?dataset
WHERE {
     %s  hto:hasOriginalDescription ?desc.
    ?desc hto:text ?text;
  		hto:wasExtractedFrom ?source;
        prov:wasAttributedTo ?extraction_tool.
    ?extraction_tool a prov:SoftwareAgent.
    ?source rdfs:label ?source_label.
    ?dataset schema:hasPart ?source;
}
""" % article_uri)

try:
    ret = sparql.queryAndConvert()
    concepts = {}
    for r in ret["results"]["bindings"]:
        print(f"üìòRecord {article_uri} has the following text:\n {r['text']['value']}\n"
              f"üìÑthis text is extracted from {r['source_label']['value']} using {r['extraction_tool']['value']} \n"
              f"\t this source is part of the dataset: {r['dataset']['value']}")
except Exception as e:
    print(e)

üìòRecord <https://w3id.org/hto/LocationRecord/9931344583804340_97421702_2337271726_0> has the following text:
 a small village in the parish of Dunfermline, Fifeshire, adjoining the village of Limekilns, where there is a brewery and a quay.
üìÑthis text is extracted from 97421702/alto/97422152.34.xml using https://github.com/francesNLP/MappingChange 
	 this source is part of the dataset: https://data.nls.uk/data/digitised-collections/gazetteers-of-scotland/


### CQ3: Where is the place that an article primarily describes?

Visualizing places on a map provides an immediate spatial reference and requires either point coordinates or boundary geometries. When both historical and modern geometries are available, spatial change can be visually examined over time. the query retrieves the coordinates associated with the article describing Dundee in the 1803 edition, as well as the coordinates supplied
by a modern gazetteer.

The resulting map visualization displays only the modern coordinates recognized in 2025, as this work does not extract historical geometries from the text due to their complexity and irregularity.

This query is essential for
1. Visualizing historical and modern geometries of a place to track spatial change over time.

In [37]:
sparql.setQuery("""
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX hto: <https://w3id.org/hto#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX crmgeo: <http://www.ics.forth.gr/isl/CRMgeo/>
PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>

SELECT ?modern_place_wkt ?historical_place_wkt
?modern_place_wktLabel ?historical_place_wktLabel
WHERE {
  VALUES ?record
{<https://w3id.org/hto/LocationRecord/9931003343804340_97343436_769119998_0>}
   ?record hto:refersToModernPlace ?modern_place;
           rdfs:label ?record_name.
   ?modern_place a crmgeo:SP2_Phenomenal_Place;
        geo:hasCentroid ?centroid.
   ?centroid a crmgeo:SP6_Declarative_Place;
        geo:asWKT ?modern_place_wkt.
   ?modern_sp crm:P161_has_spatial_projection ?modern_place;
              crm:P160_has_temporal_projection ?mp_temporal.
   ?mp_temporal rdfs:label ?modern_place_temporal_label.
   BIND(CONCAT(?record_name, ", ", ?modern_place_temporal_label)
    AS ?modern_place_wktLabel)
   OPTIONAL {
   	?record hto:describesPlace ?historical_place.
   	?historical_place a crmgeo:SP2_Phenomenal_Place;
        geo:hasCentroid ?historical_centroid.
   	?historical_centroid a crmgeo:SP6_Declarative_Place;
        geo:asWKT ?historical_place_wkt.
    ?historical_sp crm:P161_has_spatial_projection ?historical_place;
              crm:P160_has_temporal_projection ?hp_temporal.
    ?hp_temporal rdfs:label ?historical_place_temporal_label.
    BIND(CONCAT(?record_name, ", ", ?historical_place_temporal_label)
      AS ?historical_place_wktLabel)
  }
}
""")

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print(f"üìçCentroid coordinates for {r['modern_place_wktLabel']['value']}: {r['modern_place_wkt']['value']}")
        if "historical_place_wkt" in r:
            print(f"üìçCentroid coordinates for {r['historical_place_wktLabel']['value']}: {r['historical_place_wkt']['value']}")
except Exception as e:
    print(e)

üìçCentroid coordinates for DUNDEE, 2025: POINT(-2.97489 56.46913)


### CQ4: What places are mentioned in an article?
When neither historical nor modern coordinates are available for the primary place, the locations mentioned within the article can provide valuable clues for approximating its spatial context. This query takes Brucehaven article as an example, lists all places mention with their modern coordinates and the positions in the text which allows further spatial relation examination.

In [41]:
sparql.setQuery("""
    PREFIX geo: <http://www.opengis.net/ont/geosparql#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX hto: <https://w3id.org/hto#>
    PREFIX oa: <http://www.w3.org/ns/oa#>
    PREFIX crmgeo: <http://www.ics.forth.gr/isl/CRMgeo/>

    SELECT ?modern_place_wkt ?modern_place_wktLabel
    WHERE {
      VALUES ?article
      {%s}
      ?article hto:hasOriginalDescription ?description.
      ?description hto:text ?text.
      ?annotation oa:hasBody ?location;
                  oa:hasTarget ?text_segment.
      ?location rdfs:label ?place_name;
                geo:hasCentroid ?centroid.
      ?centroid geo:asWKT ?modern_place_wkt.
      ?text_segment oa:hasSource ?description;
                    oa:hasSelector ?selector.
      ?selector oa:start ?start;
                oa:end ?end.
      BIND(CONCAT(?place_name, ", appears in text: | ",
          ?text, "| at ", STR(?start), "-", STR(?end)) AS ?modern_place_wktLabel)
    } ORDER BY ?start_index
    """ % article_uri
)

try:
    ret = sparql.queryAndConvert()
    text = ""
    for r in ret["results"]["bindings"]:
        print(f"{r['modern_place_wktLabel']['value']}\n This place has centroid coordinates for {r['modern_place_wkt']['value']}")

except Exception as e:
    print(e)

Dunfermline, appears in text: | a small village in the parish of Dunfermline, Fifeshire, adjoining the village of Limekilns, where there is a brewery and a quay.| at 33-44
 This place has centroid coordinates for POINT(-3.45887 56.07156)
Fifeshire, appears in text: | a small village in the parish of Dunfermline, Fifeshire, adjoining the village of Limekilns, where there is a brewery and a quay.| at 46-55
 This place has centroid coordinates for POINT(-3.0 56.33333)
Limekilns, appears in text: | a small village in the parish of Dunfermline, Fifeshire, adjoining the village of Limekilns, where there is a brewery and a quay.| at 82-91
 This place has centroid coordinates for POINT(-3.47713 56.03336)


 ### CQ5: How do the socio-economic roles of a place change over time?

Beyond identifying where a place is located, it is often crucial to understand how its socio-economic role evolves across editions. CQ5 therefore asks how economic and infrastructural activities associated with a place (e.g., harbours, mills, factories, railways) emerge, shift, or disappear over time. Using `hto:LocationRecord` together with the edition hierarchy (`hto:Edition`, `hto:Volume`, `hto:Page`) and the original article text (`hto:text`), we can follow
these changes in a structured and temporally aware manner.

#### Query 5.A: Retrieve socio-economic descriptions across editions

This query identifies all editions in which socio-economic keywords appear in the article for a given place.

The query returns
1. each edition year,
2. the article‚Äôs URI,
3. a short snippet of descriptive text, and
4. a concatenated list of all socio-economic keywords detected in that edition.

This makes it possible to compare not only when socio-economic terms appear, but also which combinations occur together in the same description.

In [45]:

sparql.setQuery("""
    PREFIX rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX hto:   <https://w3id.org/hto#>
    PREFIX schema:<https://schema.org/>
    PREFIX geo:   <http://www.opengis.net/ont/geosparql#>
    PREFIX skos:  <http://www.w3.org/2004/02/skos/core#>

    SELECT ?year ?record ?snippet ?wkt
    (GROUP_CONCAT(DISTINCT ?kw; SEPARATOR=", ") AS ?keywordsFound)
    WHERE {
      # Place concept
      ?concept a skos:Concept ;
               rdfs:label "Edinburgh" ;
               hto:hasConceptRecord ?record .

      ?record a hto:LocationRecord ;
              hto:hasOriginalDescription ?desc ;
              hto:refersToModernPlace ?modernPlace .

      # Edition / year
      ?record hto:startsAtPage ?page .
      ?volume  schema:hasPart ?page .
      ?edition schema:hasPart ?volume ;
               hto:yearPublished ?year .

      # Text
      ?desc hto:text ?text .
      BIND(LCASE(?text) AS ?ltxt)

      # Socio-economic keywords
      VALUES ?kw { "harbour" "railway" "factory" "mill" }

      # Keep only articles where this keyword appears
      FILTER(CONTAINS(?ltxt, ?kw))

      # Geometry (for mapping if needed)
      ?modernPlace geo:hasCentroid ?geom .
      ?geom geo:asWKT ?wkt .

      # Snippet
      BIND(SUBSTR(?text, 1, 500) AS ?snippet)
    }
    GROUP BY ?year ?record ?snippet ?wkt
    ORDER BY ?year
    """
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        year = r["year"]["value"]
        record_uri = r["record"]["value"]
        snippet = r["snippet"]["value"]
        keywords = r["keywordsFound"]["value"]
        wkt = r["wkt"]["value"]
        print(f"For article {record_uri} published in {year}\n"
              f"It describes the place has modern location here üìç: {wkt}\n"
              f"In the snippet of its description: {snippet}\n"
              f"We found these keywordsüìö: {keywords}\n")

except Exception as e:
    print(e)

For article https://w3id.org/hto/LocationRecord/9931344573804340_97414570_5424703086_0 published in 1806
It describes the place has modern location here üìç: POINT(-3.19648 55.95206)
In the snippet of its description: the metropolis of Scotland, and the county town of Mid-Lothian, to which county it often gives its name, lies in 55¬∞57' N. latitude, and 3¬∞14' W. longitude from London. It stands in the northern part of the county, about two miles S. from the Frith of Forth. The situation of the city is elevated, and it may be said without much impropriety, to stand on three hills. These run in a direction from E. to W. ; and the central hill, upon which the most ancient part of the city stands, is terminated on
We found these keywordsüìö: harbour

For article https://w3id.org/hto/LocationRecord/9931344583804340_97421702_5424703086_0 published in 1825
It describes the place has modern location here üìç: POINT(-3.19648 55.95206)
In the snippet of its description: the Metropolis of Sco

#### Query 5.B: Aggregate socio-economic signals over time

This query complements the qualitative view by aggregating how many Edinburgh articles per year mention each socio-economic keyword. Instead of returning snippets, it produces a compact temporal profile of socio-economic terminology.

In [46]:
sparql.setQuery("""
    PREFIX rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX hto:   <https://w3id.org/hto#>
    PREFIX schema:<https://schema.org/>
    PREFIX geo:   <http://www.opengis.net/ont/geosparql#>
    PREFIX skos:  <http://www.w3.org/2004/02/skos/core#>

    SELECT ?year ?keyword (COUNT(DISTINCT ?record) AS ?articleCount)
    WHERE {
      # Fix the place concept (Edinburgh here)
      ?concept a skos:Concept ;
               rdfs:label "Edinburgh" ;
               hto:hasConceptRecord ?record .

      ?record a hto:LocationRecord ;
              hto:hasOriginalDescription ?desc ;
              hto:refersToModernPlace ?modernPlace .
      # Edition / year
      ?record hto:startsAtPage ?page .
      ?volume schema:hasPart ?page .
      ?edition schema:hasPart ?volume ;
               hto:yearPublished ?year .
      # Text of the article
      ?desc hto:text ?text .
      BIND(LCASE(?text) AS ?ltxt)
      # Socio-economic keywords
      VALUES ?keyword { "harbour" "railway" "factory" "mill" }
      # Keep rows where the article mentions this keyword
      FILTER(CONTAINS(?ltxt, ?keyword))
    }
    GROUP BY ?year ?keyword
    ORDER BY ?year ?keyword
    """
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        year = r["year"]["value"]
        keyword = r["keyword"]["value"]
        article_counts = r["articleCount"]["value"]
        print(f"For articles in {year} describing Edinburgh, {article_counts} of them mentioned this keyword: {keyword} ")

except Exception as e:
    print(e)

For articles in 1806 describing Edinburgh, 1 of them mentioned this keyword: harbour 
For articles in 1825 describing Edinburgh, 1 of them mentioned this keyword: harbour 
For articles in 1825 describing Edinburgh, 1 of them mentioned this keyword: mill 
For articles in 1838 describing Edinburgh, 1 of them mentioned this keyword: factory 
For articles in 1838 describing Edinburgh, 1 of them mentioned this keyword: harbour 
For articles in 1838 describing Edinburgh, 1 of them mentioned this keyword: mill 
For articles in 1842 describing Edinburgh, 1 of them mentioned this keyword: harbour 
For articles in 1842 describing Edinburgh, 1 of them mentioned this keyword: mill 
For articles in 1846 describing Edinburgh, 1 of them mentioned this keyword: harbour 
For articles in 1846 describing Edinburgh, 1 of them mentioned this keyword: mill 
For articles in 1846 describing Edinburgh, 1 of them mentioned this keyword: railway 
For articles in 1868 describing Edinburgh, 1 of them mentioned thi