# Knowledge Exploration With SPARQL queries
This notebook explores the knowledge in knowledge graphs generated in this repository using sparql queries. Note that, this notebook are **only designed to test or explore sample knowledge graphs** as loading large graph here can be time-consuming!
Overall, we will parse an RDF source to a Graph using rdflib, and explore the knowledge by querying the graph.

Questions this knowledge graph should answer:
1. List all digitalised collections in this graph.
2. What volumes, editions, or series does a digitalised collection _C_ include?
3. What time period does a digitalised collection _C_ cover?
4. When was edition _E_, series _S_, or volume _V_ published?
5. Who published edition _E_, series _S_, or volume _V_?
6. Who edited edition _E_, series _S_, or volume _V_?
7. Which genre does an edition _E_, series _S_, or volume _V_ belongs to?
8. Where was an edition _E_, series _S_, or volume _V_ published or printed?
9. Which language did an edition _E_, series _S_, or volume _V_ use?
10. In EB, what articles a volume _V_ include?
11. Where an EB article _A_ was described (in a page, volume, edition)?
12. What are EB articles related to another EB article _T_?
13. What are EB articles which has similar description to _T_?
14. How a term with name _T_ was described in all editions?
15. What is the text in a page?
16. What sources the text descriptions of article _T_ or a page _P_ are extracted from?
17. What is the high description of term _T_?
18. What are the descriptions of term _T_ with the highest text quality?
19. What software was used to extract the description of article _T_ or a page _P_?
20. What software was used to digitise a document?
21. What is the primary name and alternative names of an article T
22. List all articles which have more than one name
23. List editions which was revision of another edition
24. Find the summary of the description of a topic article _A_
25. List all records for a concept
26. Given a description uri, track the source


## Load the graph

In [1]:
from rdflib import Graph, URIRef, Namespace

# Create a new RDFLib Graph
graph = Graph()

# Load the rdf file into the graph
ontology_file = "results/hto_eb.ttl"
graph.parse(ontology_file, format="turtle")
hto = Namespace("https://w3id.org/hto#")

In [2]:
# Print the number of "triples" in the Graph
print(f"Graph g has {len(graph)} statements.")

Graph g has 4150776 statements.


## Query the graph

### Question 1: List all digital collections

In [6]:
from rdflib.plugins.sparql import prepareQuery
query = prepareQuery('''
    PREFIX hto: <https://w3id.org/hto#>
    SELECT ?collection ?name WHERE {
        ?collection a hto:WorkCollection;
            hto:name ?name.
        FILTER (regex(?name, "Collection$", "i"))
        }
  '''
)

for r in graph.query(query):
      print("%s %s" % (r.collection, r.name))

https://w3id.org/hto/WorkCollection/EncyclopaediaBritannica Encyclopaedia Britannica Collection


### Question 2: What volumes, editions, or series does a digitalised collection _C_ include?

In [18]:
## List all editions of Encyclopaedia Britannica collection
eb_collection = "<https://w3id.org/hto/WorkCollection/EncyclopaediaBritannica>"

query = prepareQuery("""
    PREFIX hto: <https://w3id.org/hto#>
    SELECT * WHERE {
        %s  a hto:WorkCollection;
            hto:hadMember ?edition.
        ?edition a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title.
        OPTIONAL {
            ?edition hto:subtitle ?subtitle;
        }
        OPTIONAL {
            ?edition hto:number ?number.
        }
} ORDER BY ?number
    """ % eb_collection
)

try:
    for r in graph.query(query):
        subtitle = r["subtitle"]
        number = r["number"]
        print("Edition uri: %s | MMSID: %s | title: %s | subtitle: %s | number: %s" % (r["edition"], r["mmsid"], r["title"], subtitle,  number))
except Exception as e:
    print(e)

Edition uri: https://w3id.org/hto/Edition/9910796343804340 | MMSID: 9910796343804340 | title: Supplement to the third edition of the Encyclopaedia Britannica ... Illustrated with ... copperplates | subtitle: None | number: None
Edition uri: https://w3id.org/hto/Edition/9910796373804340 | MMSID: 9910796373804340 | title: Supplement to the fourth, fifth and sixth editions of the Encyclopaedia Britannica | subtitle: None | number: None
Edition uri: https://w3id.org/hto/Edition/9929192893804340 | MMSID: 9929192893804340 | title: Encyclopaedia Britannica: or, A dictionary of arts and sciences | subtitle: compiled upon a new plan. In which the different sciences and arts are digested into distinct treatises or systems; and the various technical terms, &c. are explained as they occur in the order of the alphabet. Illustrated with one hundred and sixty copperplates | number: 1
Edition uri: https://w3id.org/hto/Edition/992277653804341 | MMSID: 992277653804341 | title: Encyclopaedia Britannica; 

In [19]:
## List all volumes in Encyclopaedia Britannica 7th Edition
eb_edition7_uri = "<https://w3id.org/hto/Edition/9910796273804340>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:hadMember ?volume.
        ?volume a hto:Volume;
            hto:title ?title;
            hto:number ?number;
            hto:volumeId ?volumeId;
            hto:permanentURL ?permanentURL.
        OPTIONAL {
            ?volume hto:letters ?letters;
        }
    } ORDER BY ?number
    """ % eb_edition7_uri
)

try:
    for r in graph.query(query):
        letters = r["letters"]
        print("Volume uri: %s | title: %s | number: %s | id: %s |  permanent url: %s | letters: %s" % (r["volume"], r["title"] ,r["number"], r["volumeId"], r["permanentURL"], letters))
except Exception as e:
    print(e)

Volume uri: https://w3id.org/hto/Volume/9910796273804340_192547789 | title: Seventh edition, General index | number: 0 | id: 192547789 |  permanent url: https://digital.nls.uk/192547789 | letters: None
Volume uri: https://w3id.org/hto/Volume/9910796273804340_192984258 | title: Seventh edition, Volume 1, Preliminary dissertations | number: 1 | id: 192984258 |  permanent url: https://digital.nls.uk/192984258 | letters: Preliminarydissertations
Volume uri: https://w3id.org/hto/Volume/9910796273804340_192984259 | title: Seventh edition, Volume 2, A-Anatomy | number: 2 | id: 192984259 |  permanent url: https://digital.nls.uk/192984259 | letters: A-Anatomy
Volume uri: https://w3id.org/hto/Volume/9910796273804340_193057500 | title: Seventh edition, Volume 3, Anatomy-Astronomy | number: 3 | id: 193057500 |  permanent url: https://digital.nls.uk/193057500 | letters: Anatomy-Astronomy
Volume uri: https://w3id.org/hto/Volume/9910796273804340_193108322 | title: Seventh edition, Volume 4, Astronomy

### Question 3: What time period does a digitalised collection _C_ cover? (next version)

In [11]:
# At current version, this can be done check the publication years of a collection.
# For example, the publication year of each edition for EB collection.
eb_collection_uri = "<https://w3id.org/hto/WorkCollection/EncyclopaediaBritannica>"
query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s hto:hadMember ?edition.
        ?edition hto:yearPublished ?yearPublished.
    }
    """ % eb_collection_uri
)

try:
    years = []
    for r in graph.query(query):
            years.append(int(r["yearPublished"]))
    years.sort()
    print(years)
except Exception as e:
    print(e)

[1771, 1773, 1778, 1797, 1801, 1810, 1815, 1823, 1824, 1842, 1853]


In [33]:
# TODO Get time period which a digitalised collection _C_ cover through direct link to the collection (next version)

### Question 4: When was edition _E_, series _S_, or volume _V_ published?

In [12]:
eb_edition7_uri = "<https://w3id.org/hto/Edition/9910796273804340>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:yearPublished ?year_published;
    }
    """ % eb_edition7_uri
)

try:
    for r in graph.query(query):
        print("MMSID: %s | title: %s |  number: %s | year published: %s " % (r["mmsid"], r["title"] ,r["number"], r["year_published"]))
except Exception as e:
    print(e)


MMSID: 9910796273804340 | title: Encyclopaedia Britannica |  number: 7 | year published: 1842 


### Question 5: Who published edition _E_, series _S_, or volume _V_?

In [24]:
eb_edition7_uri = "<https://w3id.org/hto/Edition/9910796273804340>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:yearPublished ?year_published.
        OPTIONAL {
            %s hto:publisher ?publisher.
            ?publisher foaf:name ?publisher_name;
        }
    }
    """ % (eb_edition7_uri, eb_edition7_uri)
)

try:
    for r in graph.query(query):
        publisher = r["publisher"]
        publisher_name = r["publisher_name"]
        print("MMSID: %s | title: %s | number: %s | year published: %s | | (publisher: %s | name: %s) " % (r["mmsid"], r["title"] ,r["number"], r["year_published"], publisher, publisher_name))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica | number: 7 | year published: 1842 | | (publisher: None | name: None) 


### Question 6: Who edited edition _E_, series _S_, or volume _V_?

In [25]:
eb_edition7_uri = "<https://w3id.org/hto/Edition/9910796273804340>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:yearPublished ?year_published.
        OPTIONAL {
            %s hto:editor ?editor.
            ?editor a hto:Person;
                foaf:name ?editor_name;
            OPTIONAL {
                ?editor hto:birthYear ?birthYear.
            }
            OPTIONAL {
                ?editor hto:deathYear ?deathYear.
            }
            OPTIONAL {
                ?editor hto:termsOfAddress ?termsOfAddress.
            }
        }
}
    """ % (eb_edition7_uri, eb_edition7_uri)
)

try:

    for r in graph.query(query):
        editor = r["editor"]
        editor_name = r["editor_name"]
        birthYear = r["birthYear"]
        deathYear = r["deathYear"]
        termsOfAddress = r["termsOfAddress"]
        print("MMSID: %s | title: %s | number: %s | year published: %s | (editor: %s | name: %s | terms of address: %s | %s-%s ) " % (r["mmsid"], r["title"] ,r["number"], r["year_published"], editor, editor_name, termsOfAddress, birthYear, deathYear))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica | number: 7 | year published: 1842 | (editor: https://w3id.org/hto/Person/1436491835 | name: Stewart, Dugald | terms of address: Sir | 1753-1828 ) 


### Question 7: Which genre does a volume _V_ belongs to?

In [26]:
eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:genre ?genre;
    }
    """ % eb_edition7
)

try:
    for r in graph.query(query):
        print("MMSID: %s | title: %s |  number: %s | genre: %s " % (r["mmsid"], r["title"] ,r["number"], r["genre"]))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica |  number: 7 | genre: encyclopedia 


### Question 8: Where was a volume _V_ published or printed?

In [27]:
eb_edition7_uri = "<https://w3id.org/hto/Edition/9910796273804340>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:printedAt ?placePrinted_uri;
            hto:shelfLocator ?shelfLocator_uri.
        ?placePrinted_uri rdfs:label ?placePrinted.
        ?shelfLocator_uri rdfs:label ?shelfLocator.
    }
    """ % eb_edition7_uri
)

try:
    for r in graph.query(query):
        print("MMSID: %s | title: %s |  number: %s | place printed: %s | shelf locator: %s " % (r["mmsid"], r["title"] ,r["number"], r["placePrinted"], r["shelfLocator"]))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica |  number: 7 | place printed: Edinburgh | shelf locator: EB.15 


### Question 9: Which language does a volume _V_ use?

In [28]:
eb_edition7_uri = "<https://w3id.org/hto/Edition/9910796273804340>"
query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:language ?language;
}
    """ % eb_edition7_uri
)

try:
    for r in graph.query(query):
        print("MMSID: %s | title: %s |  number: %s | language: %s" % (r["mmsid"], r["title"],r["number"], r["language"]))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica |  number: 7 | language: eng


### Question 10: In EB, what articles a volume _V_ include?

In [36]:
# List 20 articles in volume 2 of 7th edition.
eb_edition7_volume2 = "<https://w3id.org/hto/Volume/9910796273804340_192984259>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s a hto:Volume;
            hto:hadMember ?page.
        ?termRecord a ?term_type;
                hto:name ?name.
        FILTER (?term_type = hto:ArticleTermRecord || ?term_type = hto:TopicTermRecord)
    } LIMIT 20
    """ % eb_edition7_volume2
)

try:
    
    for r in graph.query(query):
        print("Term uri: %s, name: %s" % (r["termRecord"], r["name"]))
except Exception as e:
    print(e)

Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_1007164464_0, name: ALZIRA
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_1018837245_0, name: ALEURITES
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_102256234_0, name: ALTIN
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_102256234_1, name: ALTIN
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_103000667_0, name: ALAN
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_1033477591_0, name: ALGOL
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_1083154598_0, name: ALMEHRAB
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_1115449109_0, name: ALEXIS
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_1148828355_0, name: ALABARCHA
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_1912538

### Question 11: Where an EB article _A_ was described (in a page, volume, edition)?

In [53]:
# Show which page an article starts at, which volume and edition, collection, this article was described
article_uri = "<https://w3id.org/hto/ArticleTermRecord/9910796233804340_193108317_6364534740_0>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?term_name;
            hto:startsAtPage ?page.
        ?volume a hto:Volume;
                hto:hadMember ?page.
        ?edition hto:hadMember ?volume.
        ?collection hto:hadMember ?edition.
    } 
    """ % article_uri
)

try:
    for r in graph.query(query):
        print("term name: %s | starts at page: %s | volume: %s | edition %s | collection: %s" % (r["term_name"], r["page"], r["volume"], r["edition"], r["collection"]))
except Exception as e:
    print(e)

term name: SUGAR | starts at page: https://w3id.org/hto/Page/9910796233804340_193108317_435 | volume: https://w3id.org/hto/Volume/9910796233804340_193108316 | edition https://w3id.org/hto/Edition/9910796233804340 | collection: https://w3id.org/hto/WorkCollection/EncyclopaediaBritannica


### Question 12: What are EB articles related to another EB article _T_?

In [54]:
article_sugar = "<https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_6364534740_0>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
        OPTIONAL {
            %s hto:refersTo ?see_term.
            ?see_term a ?term_type;
                hto:name ?see_term_name
            FILTER (?term_type = hto:ArticleTermRecord || ?term_type = hto:TopicTermRecord)
        }
    }
    """ % (article_sugar, article_sugar)
)

try:
    for r in graph.query(query):
        refersTo = r["see_term"]
        print("term: %s %s | see also term: %s %s" % (article_sugar, r["name"], refersTo, r["see_term_name"]))
except Exception as e:
    print(e)

term: <https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_6364534740_0> SUGAR | see also term: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_790527472_0 SACCHARUM


### Question 13: What are EB articles which has similar description to _T_?

In [55]:
# In current version of KGs,the similar terms are not linked as they can be queried through elastic search service. So example below will print nothing
article_sugar = "<https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_6364534740_0>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s a hto:TermRecord;
            hto:name ?name.
        OPTIONAL {
            %s hto:similarTo ?similar_term.
            ?similar_term a hto:TermRecord.
        }
    }
    """ % (article_sugar, article_sugar)
)

try:
    for r in graph.query(query):
        similar_term = r["similar_term"]
        print("term: %s | similar term: %s" % (r["name"], similar_term))
except Exception as e:
    print(e)

### Question 14: How a term with name _T_ was described in all editions?

In [64]:
term_name = "'EARTH'"
# TODO: fix bug: literal value in RDFlib query does not work

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        ?term a hto:ArticleTermRecord;
            hto:name %s;
            hto:startsAtPage ?page;
            hto:hasOriginalDescription ?desc.
        ?desc hto:text ?content.
        ?vol hto:hadMember ?page.
        ?edition a hto:Edition;
            hto:hadMember ?vol;
            hto:title ?title.
        OPTIONAL {
            ?edition hto:number ?number.
        }
    } ORDER BY ?number
    """ % term_name
)

try:
    for r in graph.query(query):
        edition_number = r["number"]
        print("term uri: %s | description: %s |  edition: %s | edition title: %s | edition number: %s" % (r["term"], r["content"], r["edition"], r["title"], edition_number))
except Exception as e:
    print(e)

### Question 15: What is the text in a page?

In [65]:
# check the text of a page from Chapbooks collection
query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        ?page a hto:Page;
            hto:hasOriginalDescription ?desc.
        ?desc hto:text ?text;
            hto:hasTextQuality ?textQuality.
    }
    LIMIT 5
    """
)

try:
    for r in graph.query(query):
        print("page uri: %s | content: %s | quality: %s" % (r["page"], r["text"], r["textQuality"]))
except Exception as e:
    print(e)

### Question 16: What sources the text descriptions of article _T_ or a page _P_ are extracted from?

In [67]:
article_sugar = "<https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_6364534740_0>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text.
        ?source prov:wasAttributedTo ?agent.
    }
    """ % article_sugar
)

try:
    for r in graph.query(query):
        print("name: %s | source: %s | agent: %s | description: %s " % (r["name"], r["source"], r["agent"], r["text"]))
except Exception as e:
    print(e)

name: SUGAR | source: https://w3id.org/hto/InformationResource/__data_EB_1_3_clean_cut_txt | agent: https://w3id.org/hto/Organization/Ash | description: in natural history, is properly the essential salt of the sugar-cane, as tartar is of the grape. See CHEMISTRY p. 161. and SACCHARUM. This plant tises to eight, nine, or more feet high ; the stalk, conic earthen. stalk, or cane, being round, jointed, and two or three inches in diameter at the bottom : the joints are three or four inches afunder, and in a rich foil more : the leaves are long and narrow and of a yellowish green colour whio s ornamented is is also the stalk itself, the top t ciodier of ariadincos tiowers, two with a panicle or three feet in length. They propagate the sugar-cane, by planting cuttings of it in the ground in furrows, dug parailel for that purpose ; the cuttings are laid level and even, and are covered up with earth ; they soon sho nt out new plants from their knots or joints : the ground is to be kept clear,

### Question 17: What is the High quality description of term _T_?

In [68]:
article_sugar = "<https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_6364534740_0>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text;
            hto:hasTextQuality hto:High.
        ?source prov:wasAttributedTo ?agent.

    }
    """ % article_sugar
)

try:
    for r in graph.query(query):
        print("name: %s | description: %s | source: %s | agent: %s" % (r["name"], r["text"], r["source"], r["agent"]))
except Exception as e:
    print(e)

name: SUGAR | description: in natural history, is properly the essential salt of the sugar-cane, as tartar is of the grape. See CHEMISTRY p. 161. and SACCHARUM. This plant tises to eight, nine, or more feet high ; the stalk, conic earthen. stalk, or cane, being round, jointed, and two or three inches in diameter at the bottom : the joints are three or four inches afunder, and in a rich foil more : the leaves are long and narrow and of a yellowish green colour whio s ornamented is is also the stalk itself, the top t ciodier of ariadincos tiowers, two with a panicle or three feet in length. They propagate the sugar-cane, by planting cuttings of it in the ground in furrows, dug parailel for that purpose ; the cuttings are laid level and even, and are covered up with earth ; they soon sho nt out new plants from their knots or joints : the ground is to be kept clear, at times, from weeds ; and the eanes grow so quick, that in cipht, ten, or twelve months, they are sit to cut for making of s

### Question 18: What are the descriptions of term _T_ with the highest text quality?

In [69]:
article_sugar = "<https://w3id.org/hto/ArticleTermRecord/9910796233804340_193108317_6364534740_0>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc ?derived_type ?source;
            hto:text ?text;
            hto:hasTextQuality ?textQuality.
        FILTER (?derived_type = prov:wasDerivedFrom || ?derived_type = hto:wasExtractedFrom)
        ?source prov:wasAttributedTo ?agent.
        FILTER NOT EXISTS {
          %s hto:hasOriginalDescription [hto:hasTextQuality [hto:isTextQualityHigherThan ?textQuality]].
        }
    }
    """ % (article_sugar, article_sugar)
)

try:
    for r in graph.query(query):
        print("name: %s | description: %s | source: %s | agent: %s | text quality: %s" % (r["name"], r["text"], r["source"], r["agent"], r["textQuality"]))
except Exception as e:
    print(e)

name: SUGAR | description: a solid sweet substance juice of the sugar-cane, or, according essential fair, capable of cryftallization, agreeable flavour, and contained in a greater type in almost every species of vegetables, abundant in the sugar-cane. As the sugar-cane is the principal West Indies, and the great source of is so important in a commercial view, meant which it gives to seamen, and the opens for merchants; and besides now safety of life and it may just be esteemed valuable plants in the world. The in Europe is estimated at nine million demand would probably be greater if a reduced price. Since freighter is curious a commodity, it must be an one Ct persons of curiosity and research, to obtain knowledge of the history and nature which it is produced, as well as to access by which the juice is extracted will therefore first inquire in what countries flouriftied, and when it was brought and became an article of commerce. From the few remains of the Grecian tigers which have su

### Question 19: What software was used to extract the description of article _T_ or a page _P_?

In [70]:
article_sugar = "<https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_6364534740_0>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc ?derived_type ?source;
            hto:text ?text;
            prov:wasAttributedTo ?software.
        FILTER (?derived_type = prov:wasDerivedFrom || ?derived_type = hto:wasExtractedFrom)
        ?software a hto:SoftwareAgent.
        ?source prov:wasAttributedTo ?agent.
        ?agent a ?agentType.
    }
    """ % (article_sugar)
)

try:
    for r in graph.query(query):
        print("name: %s | description: %s | extracted using software: %s | source: %s | agent: %s " % (r["name"], r["text"], r["software"], r["source"], r["agent"]))
except Exception as e:
    print(e)

name: SUGAR | description: in natural history, is properly the essential salt of the sugar-cane, as tartar is of the grape. See CHEMISTRY p. 161. and SACCHARUM. This plant tises to eight, nine, or more feet high ; the stalk, conic earthen. stalk, or cane, being round, jointed, and two or three inches in diameter at the bottom : the joints are three or four inches afunder, and in a rich foil more : the leaves are long and narrow and of a yellowish green colour whio s ornamented is is also the stalk itself, the top t ciodier of ariadincos tiowers, two with a panicle or three feet in length. They propagate the sugar-cane, by planting cuttings of it in the ground in furrows, dug parailel for that purpose ; the cuttings are laid level and even, and are covered up with earth ; they soon sho nt out new plants from their knots or joints : the ground is to be kept clear, at times, from weeds ; and the eanes grow so quick, that in cipht, ten, or twelve months, they are sit to cut for making of s

### Question 20: What software was used to digitise a document?

In [72]:
# Find software used to generate the source which the description of an article was extracted.
article_earth = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_193322688_5297117738_0>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc ?derived_type ?source;
            hto:text ?text.
        FILTER (?derived_type = prov:wasDerivedFrom || ?derived_type = hto:wasExtractedFrom)
        OPTIONAL {
            ?source prov:wasAttributedTo ?software.
            ?software a hto:SoftwareAgent.
        }
    }
    """ % (article_earth)
)

try:
    for r in graph.query(query):
        software = r["software"]
        print("name: %s | description: %s | source: %s | created using %s" % (r["name"], r["text"], r["source"], software))
except Exception as e:
    print(e)

name: EARTH | description: amongst ancient philosophers, owe of the four elements of which the whole system of nature was believed to be composed. | source: https://raw.githubusercontent.com/TU-plogan/kp-editions/main/eb07/TXT_v2/e8/kp-eb0708-039107-8218-v2.txt | created using https://pdf.abbyy.com
name: EARTH | description: amongst ancient philosophers, one elements of which the whole system of nature was Earth, in Astronomy and Geography, one mary planets, being the terraqueous globe which habit. (See the articles Figure of the Earth, | source: https://w3id.org/hto/InformationResource/193322688_alto_193328220_34_xml | created using None
name: EARTH | description: amongst ancient philosophers, one elements of which the whole system of nature was Earth, in Astronomy and Geography, one many planets, being the terraqueous globe which habit.( See the articles Figure of the Earth, | source: https://w3id.org/hto/OriginalDescription/9910796273804340_193322688_5297117738_0NLS | created using 

### Question 21: List all articles which have more than one name

In [96]:
article_aurora = "<https://w3id.org/hto/TopicTermRecord/9910796273804340_193108322_9268244686_0>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
     SELECT * WHERE {
        %s a hto:TopicTermRecord;
            hto:name ?primary_name.
        OPTIONAL {
          %s rdfs:label ?alter_name.
        }
    }
    """ % (article_aurora, article_aurora)
)

try:
    names = {} 
    for r in graph.query(query):
        names["primary_name"] = r["primary_name"].value
        if "alter_names" in names:
            names["alter_names"].append(r["alter_name"].value)
        else:
            names["alter_names"] = [r["alter_name"].value]
    print(names)       
except Exception as e:
    print(e)

{'primary_name': 'AURORA BOREALIS', 'alter_names': ['NORTHERN LIGHTS', 'POLAR LIGHT', 'STREAMERS']}


### Question 22: List editions which was revision of another edition

In [97]:
query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
     SELECT ?term (COUNT(?alter_name) AS ?n) WHERE {
        ?term a hto:ArticleTermRecord;
            rdfs:label ?alter_name.
    }
    GROUP BY ?term
    LIMIT 10
    """
)

try:
    for r in graph.query(query):
        print("term uri: %s | total number of names: %s" % (r["term"], int(r["n"]) + 1))
except Exception as e:
    print(e)

term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_1079539839_0 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_1125738681_0 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_1271253687_0 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_1306982326_0 | total number of names: 3
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_1356498803_0 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_1362453666_0 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_1380627302_0 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_1503725347_0 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693

### Question 23: List editions which was revision of another edition


In [98]:
query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        ?revision a hto:Edition;
            prov:wasRevisionOf ?edition;
            hto:number ?r_edition_num.
        ?edition a hto:Edition;
            hto:number ?edition_num.
    }
    """
)

try:
    for r in graph.query(query):
        print("edition %s - < %s > was revision of edition %s -< %s >" % (r["revision"], r["r_edition_num"], r["edition"], r["edition_num"]))
except Exception as e:
    print(e)

edition https://w3id.org/hto/Edition/9929192893804340 - < 1 > was revision of edition https://w3id.org/hto/Edition/992277653804341 -< 1 >


### Question 24: Find the summary of the description of a topic article _A_

In [99]:
topic_a = "<https://w3id.org/hto/TopicTermRecord/9910796273804340_192693199_9335647130_0>"

query = prepareQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s a hto:TopicTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?description.
        ?description a hto:OriginalDescription;
            hto:hasSummary ?summary.
        ?summary hto:text ?text.
    }
    """ % topic_a
)

try:
    for r in graph.query(query):
        print("term name: %s | summary: %s" % (r["name"], r["text"]))
except Exception as e:
    print(e)


term name: HYGROMETRY | summary: The formation of steam or aqueous vapour, and its diffusion in space or in a gaseous medium, have already been considered under the article Evaporation. We now propose first to take a view of various methods and devices which have been employed to detect the presence of aqueous vapour, and to ascertain its amount, or how much of it is contained in a given volume, whether when alone or diffused in a gaseous medium. It will, however, be proper briefly to notice a few of those either already become obsolete, or soon to be so, were it only to show their imperfection. It is therefore of no other use than as a mere toy; for the value of an instrument employed as a measure of any kind must depend not only on its being at first accurately constructed, but likewise upon its indications not being, ccBtøris paribus, liable to change. The preceding remarks upon the effects of a change of humidity on organic substances may enable us to correct what we consider a gre

### Question 25: List all records for a concept

In [100]:
concept_uri = "<https://w3id.org/hto/Concept/6194477897_2>"

queery = prepareQuery("""
    PREFIX hto: <https://w3id.org/hto#>
    SELECT * WHERE {
        %s a hto:Concept;
            hto:hadConceptRecord ?record.
    }
    """ % concept_uri
)

try:
    for r in graph.query(query):
        print("record uri: %s " % (r["record"]))
except Exception as e:
    print(e)

'record'


### Question 26: Given a description uri, track the source

In [110]:
"""
In this example, we show how to keep tracking the source of an article description, including what entity it was derived from (an entity here is anything with provenance information, such as another description, or a text file, or a xml file, or an image), what software was used, who created it. 
Given a given description uri, here we will return a dictionary with entity uri as the key, and entity object as the value. This entity object have the following format: {'source_uri': '', 'entity_type': '', 'agent_type': '', 'agent_name': ''}. 
For example, we have the following result:
{'entityA': 
    {'source_uri': 'entityB', 'entity_type': 'OriginalDescription', 'agent_type': 'SoftwareAgent', 'agent_name': 'frances information extraction'}, 
'entityB': 
    {'source_uri': None, 'entity_type': 'InformationResource', 'agent_type': 'Person', 'agent_name': 'Ash Charlton'}
}
You can find the source of an entity by mapping the source_uri to the entity_uri. In above example, we can find that InformationResource entityB is the source of the OriginalDescription entityA. 
"""

# entity_uri = "https://raw.githubusercontent.com/TU-plogan/kp-editions/main/eb07/TXT_v2/e8/kp-eb0708-039107-8218-v2.txt"

entity_uri = "https://w3id.org/hto/OriginalDescription/992277653804341_144133903_6364534740_0Ash"
#entity_uri = "https://w3id.org/hto/InformationResource/__data_EB_1_3_clean_cut_txt"

def get_source(entity_uri):
    if entity_uri is None:
        return None

    entity_uri = "<" + entity_uri + ">"
    query = prepareQuery("""
        PREFIX hto: <https://w3id.org/hto#>
        PREFIX prov: <http://www.w3.org/ns/prov#>
        PREFIX foaf: <http://xmlns.com/foaf/0.1/>
        SELECT * WHERE {
            %s a ?entity_type;
                prov:wasAttributedTo ?agent.
            ?agent a ?agent_type;
                foaf:name ?agent_name.
            OPTIONAL {
            %s ?derived_type ?source.
            FILTER (?derived_type = prov:wasDerivedFrom || ?derived_type = hto:wasExtractedFrom)
            }
        } LIMIT 20
        """ % (entity_uri, entity_uri)
    )

    try:
        source = {}
        for r in graph.query(query):
            source_uri = str(r['source'])
            if source_uri == "None":
                source_uri = None
            source["source_uri"] = source_uri
            source["entity_type"] = r['entity_type']
            agent_type = r['agent_type']
            agent_type_name = agent_type.split("#")[-1]
            source["agent_type"] = agent_type_name
            source["agent_name"] = r['agent_name'].value

        return source
    except Exception as e:
        print(e)
        return None


def track_all_sources(entity_uri):
    sources = {}
    tmp_source_uri = entity_uri
    while tmp_source_uri:
        current_source_info = get_source(tmp_source_uri)
        sources[tmp_source_uri] = current_source_info
        tmp_source_uri = current_source_info["source_uri"]
    return sources
print(track_all_sources(entity_uri))

{'https://w3id.org/hto/OriginalDescription/992277653804341_144133903_6364534740_0Ash': {'source_uri': 'https://w3id.org/hto/InformationResource/__data_EB_1_3_clean_cut_txt', 'entity_type': rdflib.term.URIRef('https://w3id.org/hto#OriginalDescription'), 'agent_type': 'SoftwareAgent', 'agent_name': 'frances information extraction'}, 'https://w3id.org/hto/InformationResource/__data_EB_1_3_clean_cut_txt': {'source_uri': None, 'entity_type': rdflib.term.URIRef('https://w3id.org/hto#InformationResource'), 'agent_type': 'Person', 'agent_name': 'Ash Charlton'}}
