# Knowledge Exploration With SPARQL queries
This notebook explores the knowledge in knowledge graphs generated in this repository using sparql queries.
Overall, we will parse an RDF source to a Graph using rdflib, and explore the knowledge by querying the graph.

Questions this knowledge graph should answer:
1. List all digitalised collections in this graph.
2. What volumes, editions, or series does a digitalised collection _C_ include?
3. What time period does a digitalised collection _C_ cover? (next version)
4. When was edition _E_, series _S_, or volume _V_ published?
5. Who published edition _E_, series _S_, or volume _V_?
6. Who edited edition _E_, series _S_, or volume _V_?
7. Which genre does an edition _E_, series _S_, or volume _V_ belongs to?
8. Where was an edition _E_, series _S_, or volume _V_ published or printed?
9. Which language did an edition _E_, series _S_, or volume _V_ use?
10. In EB, what articles a volume _V_ include?
11. Where an EB article _A_ was described (in a page, volume, edition)?
12. What are the EB articles which appear in all edition?
13. What EB articles were only appeared once in edition _E_?
14. What are EB articles related to another EB article _T_?
15. What are EB articles which has similar description to _T_?
16. How a term with name _T_ was described in all editions?
17. What is the text in a page?
18. What sources the text descriptions of article _T_ or a page _P_ are extracted from?
19. What is the clean description of term _T_?
20. What are the descriptions of term _T_ with the highest text quality?
21. What software was used to extract the description of article _T_ or a page _P_?
22. What software was used to digitise a document?

## Load the graph

In [1]:
from rdflib import Graph, URIRef, Namespace

# Create a new RDFLib Graph
graph = Graph()

# Load the rdf file into the graph
ontology_file = "results/hto_total.ttl"
graph.parse(ontology_file, format="turtle")
hto = Namespace("https://w3id.org/hto#")

In [2]:
# Print the number of "triples" in the Graph
print(f"Graph g has {len(graph)} statements.")

Graph g has 640078 statements.


## Query the graph

### Question 1: List all digital collections

In [3]:
from rdflib.plugins.sparql import prepareQuery
q1 = prepareQuery('''
    SELECT ?collection ?name WHERE {
        ?collection a hto:WorkCollection;
            hto:name ?name.
        FILTER (regex(?name, "Collection$", "i"))
}
  ''',
  initNs = { "hto": hto}
)

for r in graph.query(q1):
      print("%s %s" % (r.collection, r.name))

https://w3id.org/hto/WorkCollection/ChapbooksOfScotland Chapbooks printed in Scotland Collection


### Question 2: What volumes, editions, or series does a digitalised collection _C_ include?

In [4]:
## List all editions of Encyclopaedia Britannica collection
eb_collection = "<https://w3id.org/hto/WorkCollection/EncyclopaediaBritannica>"
q2 = prepareQuery('''
    SELECT * WHERE {
        %s  a hto:WorkCollection;
            hto:hadMember ?edition.
        ?edition a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title.
        OPTIONAL {
            ?edition hto:subtitle ?subtitle;
        }
        OPTIONAL {
            ?edition hto:number ?number.
        }
}
  ''' % eb_collection,
  initNs = { "hto": hto}
)

for r in graph.query(q2):
      print("uri: %s | MMSID: %s | title: %s | subtitle: %s | number: %s" % (r.edition, r.mmsid, r.title , r.subtitle,  r.number))

uri: https://w3id.org/hto/Edition/9910796343804340 | MMSID: 9910796343804340 | title: Supplement to the third edition of the Encyclopaedia Britannica ... Illustrated with ... copperplates | subtitle: None | number: None
uri: https://w3id.org/hto/Edition/9929192893804340 | MMSID: 9929192893804340 | title: Encyclopaedia Britannica: or, A dictionary of arts and sciences | subtitle: compiled upon a new plan. In which the different sciences and arts are digested into distinct treatises or systems; and the various technical terms, &c. are explained as they occur in the order of the alphabet. Illustrated with one hundred and sixty copperplates | number: 1
uri: https://w3id.org/hto/Edition/992277653804341 | MMSID: 992277653804341 | title: Encyclopaedia Britannica; or, A dictionary of arts and sciences, compiled upon a new plan | subtitle: Illustrated with one hundred and sixty copperplates | number: 1
uri: https://w3id.org/hto/Edition/9910796373804340 | MMSID: 9910796373804340 | title: Supplem

In [5]:
## List all volumes in Encyclopaedia Britannica 7th Edition
eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"
q2 = prepareQuery('''
    SELECT * WHERE {
        %s  a hto:Edition;
            hto:hadMember ?volume.
        ?volume a hto:Volume;
            hto:title ?title;
            hto:number ?number;
            hto:volumeId ?volumeId;
            hto:permanentURL ?permanentURL.
        OPTIONAL {
            ?volume hto:letters ?letters;
        }
}
  ''' % eb_edition7,
  initNs = { "hto": hto}
)

for r in graph.query(q2):
      print("uri: %s | title: %s | number: %s | id: %s |  permanent url: %s | letters: %s" % (r.volume, r.title ,r.number, r.volumeId, r.permanentURL, r.letters))

uri: https://w3id.org/hto/Volume/9910796273804340_192547789 | title: Seventh edition, General index | number: 0 | id: 192547789 |  permanent url: https://digital.nls.uk/192547789 | letters: None
uri: https://w3id.org/hto/Volume/9910796273804340_192693199 | title: Seventh edition, Volume 12, Hydrodynamics-KYR | number: 12 | id: 192693199 |  permanent url: https://digital.nls.uk/192693199 | letters: Hydrodynamics-KYR
uri: https://w3id.org/hto/Volume/9910796273804340_192984258 | title: Seventh edition, Volume 1, Preliminary dissertations | number: 1 | id: 192984258 |  permanent url: https://digital.nls.uk/192984258 | letters: Preliminarydissertations
uri: https://w3id.org/hto/Volume/9910796273804340_192984259 | title: Seventh edition, Volume 2, A-Anatomy | number: 2 | id: 192984259 |  permanent url: https://digital.nls.uk/192984259 | letters: A-Anatomy
uri: https://w3id.org/hto/Volume/9910796273804340_193057500 | title: Seventh edition, Volume 3, Anatomy-Astronomy | number: 3 | id: 193057

### Question 3: What time period does a digitalised collection _C_ cover? (next version)

In [33]:
# TODO Get time period which a digitalised collection _C_ cover

### Question 4: When was edition _E_, series _S_, or volume _V_ published?

In [8]:
eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"
q4 = prepareQuery('''
    SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:yearPublished ?year_published;
}
  ''' % eb_edition7,
  initNs = { "hto": hto}
)

for r in graph.query(q4):
      print("MMSID: %s | title: %s |  number: %s | year published: %s " % (r.mmsid, r.title , r.number, r.year_published))

MMSID: 992277653804341 | title: Encyclopaedia Britannica; or, A dictionary of arts and sciences, compiled upon a new plan |  number: 1 | year published: 1771 


### Question 5: Who published edition _E_, series _S_, or volume _V_?

In [9]:
from rdflib.namespace import FOAF

eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"
q5 = prepareQuery('''
    SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:yearPublished ?year_published.
        OPTIONAL {
            %s hto:publisher ?publisher.
            ?publisher foaf:name ?publisher_name;
        }
}
  ''' % (eb_edition7, eb_edition7),
  initNs = { "hto": hto, "foaf": FOAF}
)

for r in graph.query(q5):
      print("MMSID: %s | title: %s | number: %s | year published: %s | | (publisher: %s | name: %s) " % (r.mmsid, r.title , r.number, r.year_published, r.publisher, r.publisher_name))

MMSID: 992277653804341 | title: Encyclopaedia Britannica; or, A dictionary of arts and sciences, compiled upon a new plan | number: 1 | year published: 1771 | | (publisher: https://w3id.org/hto/Organization/CMacfarquharColinMacfarquhar | name: , C. Macfarquhar, Colin Macfarquhar) 


### Question 6: Who edited edition _E_, series _S_, or volume _V_?

In [36]:
from rdflib.namespace import FOAF

eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"
q6 = prepareQuery('''
    SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:yearPublished ?year_published.
        OPTIONAL {
            %s hto:editor ?editor.
            ?editor a hto:Person;
                foaf:name ?editor_name;
            OPTIONAL {
                ?editor hto:birthYear ?birthYear.
            }
            OPTIONAL {
                ?editor hto:deathYear ?deathYear.
            }
            OPTIONAL {
                ?editor hto:termsOfAddress ?termsOfAddress.
            }
        }
}
  ''' % (eb_edition7, eb_edition7),
  initNs = { "hto": hto, "foaf": FOAF}
)

for r in graph.query(q6):
      print("MMSID: %s | title: %s | number: %s | year published: %s | (editor: %s | name: %s | terms of address: %s | %s-%s ) " % (r.mmsid, r.title , r.number, r.year_published, r.editor, r.editor_name, r.termsOfAddress,r.birthYear, r.deathYear))

MMSID: 9910796273804340 | title: Seventh edition, Volume 2, A-Anatomy | number: 7 | year published: 1842 | (editor: https://w3id.org/hto/Person/StewartDugald | name: Stewart, Dugald | terms of address: Sir | 1753-1828 ) 


### Question 7: Which genre does a volume _V_ belongs to?

In [37]:
eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"
q7 = prepareQuery('''
    SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:genre ?genre;
}
  ''' % eb_edition7,
  initNs = { "hto": hto}
)

for r in graph.query(q7):
      print("MMSID: %s | title: %s |  number: %s | genre: %s " % (r.mmsid, r.title , r.number, r.genre))

MMSID: 9910796273804340 | title: Seventh edition, Volume 2, A-Anatomy |  number: 7 | genre: encyclopedia 


### Question 8: Where was a volume _V_ published or printed?

In [38]:
eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"
q8 = prepareQuery('''
    SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
        OPTIONAL {
            %s hto:printedAt ?placePrinted;
        }
        OPTIONAL {
            %s hto:shelfLocator ?shelfLocator;
        }
}
  ''' % (eb_edition7, eb_edition7, eb_edition7),
  initNs = { "hto": hto}
)

for r in graph.query(q8):
      print("MMSID: %s | title: %s |  number: %s | place printed: %s | shelf locator: %s " % (r.mmsid, r.title , r.number, r.placePrinted, r.shelfLocator))

MMSID: 9910796273804340 | title: Seventh edition, Volume 2, A-Anatomy |  number: 7 | place printed: https://w3id.org/hto/Location/Edinburgh | shelf locator: https://w3id.org/hto/Location/EB15 


### Question 9: Which language does a volume _V_ use?

In [39]:
eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"
q9 = prepareQuery('''
    SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:language ?language;
}
  ''' % eb_edition7,
  initNs = { "hto": hto}
)

for r in graph.query(q9):
      print("MMSID: %s | title: %s |  number: %s | language: %s " % (r.mmsid, r.title , r.number, r.language))

MMSID: 9910796273804340 | title: Seventh edition, Volume 2, A-Anatomy |  number: 7 | language: eng 


### Question 10: In EB, what articles a volume _V_ include?

In [10]:
# List 20 articles in volume 2 of 7th edition.
eb_edition7_volume2 = "<https://w3id.org/hto/Volume/9910796273804340_192984259>"
q10 = prepareQuery('''
    SELECT * WHERE {
        %s a hto:Volume;
            hto:hadMember ?termRecord.
        ?page a hto:TermRecord;
    } LIMIT 20
  ''' % eb_edition7_volume2,
  initNs = { "hto": hto}
)

for r in graph.query(q10):
      print("volume %s" % (r.termRecord))

volume https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_CAABA_0
volume https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_CABALIST_0
volume https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_CABALLARIA_0
volume https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_CABALLEROS_0
volume https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_CABBAGE_0
volume https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_CABBAGING_0
volume https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_CABBALA_0
volume https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_CABBALISTS_0
volume https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_CABECA_0
volume https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_CABERAH_0
volume https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_CABIDOS_0
volume https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_CABINET_0
volume 

### Question 11: Where an EB article _A_ was described (in a page, volume, edition)?

In [41]:
# Show which page an article starts at, which volume and edition, collection, this article was described
article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_1>"
q11 = prepareQuery('''
    SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:startsAtPage ?page;
            hto:wasMemberOf ?volume.
        ?volume a hto:Volume.
        OPTIONAL {
            %s hto:wasMemberOf ?edition.
            ?edition a hto:Edition.
        }
        OPTIONAL {
            %s hto:wasMemberOf ?collection.
            ?collection hto:name ?collection_name.
            FILTER (regex(?collection_name, "Collection$", "i"))
        }

    } LIMIT 20
  ''' % (article_accent, article_accent, article_accent),
  initNs = { "hto": hto}
)

for r in graph.query(q11):
      print("term: %s | starts at page: %s | volume: %s | edition %s | collection: %s" % (r.name, r.page, r.volume, r.edition, r.collection))

term: ACCENT | starts at page: https://w3id.org/hto/Page/9910796273804340_192984259_96 | volume: https://w3id.org/hto/Volume/9910796273804340_192984259 | edition https://w3id.org/hto/Edition/9910796273804340 | collection: https://w3id.org/hto/WorkCollection/EncyclopaediaBritannica


### Question 12: What are the EB articles which appear in all edition?

In [42]:
q12 = prepareQuery('''
    SELECT * WHERE {
        ?term a hto:TermRecord;
            hto:name ?name.

    } LIMIT 20
  ''',
  initNs = { "hto": hto}
)

for r in graph.query(q12):
      print("term uri: %s | name: %s" % (r.term, r.name))

term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_HYDROGRAPHICALCHARTS_0 | name: HYDROGRAPHICAL CHARTS
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_HYDROGRAPHICALCHARTS_0 | name: HYDROGRAPHICAL MAPS
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_HYDROGRAPHY_0 | name: HYDROGRAPHY
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_HYDROMEL_0 | name: HYDROMEL
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_HYDROMETER_0 | name: HYDROMETER
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_HYDROPHANES_0 | name: HYDROPHANES
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_HYDROPHANES_0 | name: OCULUS MUNDI
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_HYDROPHOBIA_0 | name: HYDROPHOBIA
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192693199_HYDROPHYLACIA_0

### Question 13: What are the EB articles which appear in all edition?

In [43]:
# TODO Question 13: What EB articles were only appeared once in edition _E_?

### Question 14: What are EB articles related to another EB article _T_?

In [44]:
article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_1>"
q14 = prepareQuery('''
    SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
        OPTIONAL {
            %s hto:refersTo ?see_term.
            ?see_term a hto:TermRecord.
        }
    } LIMIT 20
  ''' % (article_accent, article_accent),
  initNs = { "hto": hto}
)

for r in graph.query(q14):
      print("term: %s | see term: %s" % (r.name, r.see_term))

term: ACCENT | see term: None


### Question 15: What are EB articles which has similar description to _T_?

In [45]:
article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_1>"
q15 = prepareQuery('''
    SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name.
        OPTIONAL {
            %s hto:similarTo ?similar_term.
            ?similar_term a hto:TermRecord.
        }
    } LIMIT 20
  ''' % (article_accent, article_accent),
  initNs = { "hto": hto}
)

for r in graph.query(q15):
      print("term: %s | similar term: %s" % (r.name, r.similar_term))

term: ACCENT | similar term: None


### Question 16: How a term with name _T_ was described in all editions?

In [46]:
term_name = "'EARTH'"
q16 = prepareQuery('''
    SELECT * WHERE {
        ?term a hto:TermRecord;
            hto:name %s;
            hto:wasMemberOf ?edition;
            hto:hasOriginalDescription ?desc.
        ?desc hto:text ?content.
        ?edition a hto:Edition;
            hto:title ?title.
    }
  ''' % term_name,
  initNs = { "hto": hto}
)

for r in graph.query(q16):
      print("term uri: %s | description: %s |  edition: %s | edition title: %s" % (r.term, r.content,  r.edition, r.title))

term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_193322688_EARTH_0 | description: amongst ancient philosophers, owe of the four elements of which the whole system of nature was believed to be composed. |  edition: https://w3id.org/hto/Edition/9910796273804340 | edition title: Seventh edition, Volume 2, A-Anatomy
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_193322688_EARTH_1 | description: in Astronomy and Geography, one of the primary planets, bci□g the terraqueous globe which we inhabit. (See the articles Figure of the Earth, and Geology.) |  edition: https://w3id.org/hto/Edition/9910796273804340 | edition title: Seventh edition, Volume 2, A-Anatomy


### Question 17: What is the text in a page?

In [5]:
# check the text of a page from Chapbooks collection

q17 = prepareQuery('''
    SELECT * WHERE {
        ?page a hto:Page;
            hto:hasOriginalDescription ?desc.
        ?desc hto:text ?text;
            hto:hasTextQuality ?textQuality.
    }
    LIMIT 5
  ''',
  initNs = { "hto": hto}
)

for r in graph.query(q17):
      print("page uri: %s | content: %s | quality: %s" % (r.page, r.text, r.textQuality))

page uri: https://w3id.org/hto/Page/9910029463804340_104184208_1 | content: OR, THE Affectionate Lovers A T^ue Lore Song. PRINTED BY C: Mll ACHL 4N, For John Sinda r, Bock.e‘;:rr D U M f R.J £ S. | quality: https://w3id.org/hto#Low
page uri: https://w3id.org/hto/Page/9910029463804340_104184208_2 | content: TH» Mridt's Burtd J, &f. ' » ' \ ^ ! Come mourn, come niourn v/kir 1 ye loyal lovrs all; ( me ; Lament my loss in weeds of woe, whom gripping death doth thrall Like to the drooping vine, " cut by the gard‘ner‘* knife, Bven so my heart, with (orryflain doth niourn for my i Trcet wife, j 15y death, that grizly Goft, my turtle dove was slain, And I’m left, unhappy man, to spend my days iavaip. 'Her beauty, late so bright, like rotes in tlieir*prime, Is jailed like the mountainfnc w; by Iro U of V , ■ ^ ' I 'll k " V ■ • ' ’ <' ] t | quality: https://w3id.org/hto#Low
page uri: https://w3id.org/hto/Page/9910029463804340_104184208_3 | content: ( 3 ) Her fair and coloured cheek?, now pale a

### Question 18: What sources the text descriptions of article _T_ or a page _P_ are extracted from?

In [48]:
from rdflib import PROV

article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_1>"
q18 = prepareQuery('''
    SELECT * WHERE {
        %s a hto:TermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text.
        ?source prov:wasAttributedTo ?agent.
    }
  ''' % article_accent,
  initNs = { "hto": hto, "prov": PROV}
)

for r in graph.query(q18):
      print("name: %s | description: %s | source: %s | agent: %s" % (r.name,  r.text, r.source, r.agent))

name: ACCENT | description: in Music, is a certain enforcement of particular sounds, whether by the voice or instruments, generally used at the beginning of bars.
Acceptance, in Commerce, is the subscribing, signing, and making one’s self debtor for the sum contained in a bill of exchange or other obligation. | source: https://raw.githubusercontent.com/TU-plogan/kp-editions/main/eb07/TXT_v2/a2/kp-eb0702-008305-0888-v2.txt | agent: https://pdf.abbyy.com
name: ACCENT | description: in Music, is a certain enforcement of particular sounds, whether by the voice or instruments, generally used at the beginning of bars.
Acceptance, in Commerce, is the subscribing, signing, and making one’s self debtor for the sum contained in a bill of exchange or other obligation. | source: https://raw.githubusercontent.com/TU-plogan/kp-editions/main/eb07/TXT_v2/a2/kp-eb0702-008305-0888-v2.txt | agent: https://w3id.org/hto/Organization/NCKP


### Question 19: What is the clean description of term _T_?

In [15]:
from rdflib import PROV

article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_0>"
q19 = prepareQuery('''
    SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text;
            hto:hasTextQuality hto:Clean.
        ?source prov:wasAttributedTo ?agent.

    }
  ''' % article_accent,
  initNs = { "hto": hto, "prov": PROV}
)

for r in graph.query(q19):
      print("name: %s | description: %s | source: %s | agent: %s" % (r.name,  r.text, r.source, r.agent))

### Question 20: What are the descriptions of term _T_ with the highest text quality?

In [50]:
from rdflib import PROV

article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_1>"
q20 = prepareQuery('''
    SELECT * WHERE {
        %s a hto:TermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text;
            hto:hasTextQuality ?textQuality.
        ?source prov:wasAttributedTo ?agent.
        ?agent a ?agentType.
        FILTER (?agentType = prov:Person || ?agentType = prov:Organization)
        FILTER NOT EXISTS {
          %s hto:hasOriginalDescription [hto:hasTextQuality [hto:isTextQualityHigherThan ?textQuality]].
        }
    }
  ''' % (article_accent, article_accent),
  initNs = { "hto": hto, "prov": PROV}
)

for r in graph.query(q20):
      print("name: %s | description: %s | source: %s | agent: %s | text quality: %s" % (r.name,  r.text, r.source, r.agent, r.textQuality))

name: ACCENT | description: in Music, is a certain enforcement of particular sounds, whether by the voice or instruments, generally used at the beginning of bars.
Acceptance, in Commerce, is the subscribing, signing, and making one’s self debtor for the sum contained in a bill of exchange or other obligation. | source: https://raw.githubusercontent.com/TU-plogan/kp-editions/main/eb07/TXT_v2/a2/kp-eb0702-008305-0888-v2.txt | agent: https://w3id.org/hto/Organization/NCKP | text quality: https://w3id.org/hto#High


### Question 21: What software was used to extract the description of article _T_ or a page _P_?

In [51]:
from rdflib import PROV

article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_1>"
q19 = prepareQuery('''
    SELECT * WHERE {
        %s a hto:TermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text;
            prov:wasAttributedTo ?software.
        ?software a prov:SoftwareAgent.
        ?source prov:wasAttributedTo ?agent.
        ?agent a ?agentType.
        FILTER (?agentType = prov:Person || ?agentType = prov:Organization)
    }
  ''' % article_accent,
  initNs = { "hto": hto, "prov": PROV}
)

for r in graph.query(q19):
      print("name: %s | description: %s | extracted using software: %s | source: %s | agent: %s " % (r.name,  r.text, r.software, r.source, r.agent))

name: ACCENT | description: in Music, is a certain enforcement of particular sounds, whether by the voice or instruments, generally used at the beginning of bars.
Acceptance, in Commerce, is the subscribing, signing, and making one’s self debtor for the sum contained in a bill of exchange or other obligation. | extracted using software: https://github.com/frances-ai/frances-InformationExtraction | source: https://raw.githubusercontent.com/TU-plogan/kp-editions/main/eb07/TXT_v2/a2/kp-eb0702-008305-0888-v2.txt | agent: https://w3id.org/hto/Organization/NCKP 


### Question 22: What software was used to digitise a document?

In [52]:
from rdflib import PROV
# Find software used to generate the source which the description of an article was extracted.
article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_1>"
q22 = prepareQuery('''
    SELECT * WHERE {
        %s a hto:TermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text.
        ?source prov:wasAttributedTo ?agent.
        ?agent a ?agentType.
        FILTER (?agentType = prov:Person || ?agentType = prov:Organization)
        OPTIONAL {
            ?source prov:wasAttributedTo ?software.
            ?software a prov:SoftwareAgent.
        }
    }
  ''' % article_accent,
  initNs = { "hto": hto, "prov": PROV}
)

for r in graph.query(q22):
      print("name: %s | description: %s | source: %s | created by %s using %s" % (r.name,  r.text, r.source, r.agent, r.software))

name: ACCENT | description: in Music, is a certain enforcement of particular sounds, whether by the voice or instruments, generally used at the beginning of bars.
Acceptance, in Commerce, is the subscribing, signing, and making one’s self debtor for the sum contained in a bill of exchange or other obligation. | source: https://raw.githubusercontent.com/TU-plogan/kp-editions/main/eb07/TXT_v2/a2/kp-eb0702-008305-0888-v2.txt | created by https://w3id.org/hto/Organization/NCKP using https://pdf.abbyy.com


### Question 23: List all articles which have more than one name

In [31]:

q23 = prepareQuery('''
    SELECT ?term (COUNT(?name) AS ?n) WHERE {
        ?term a hto:ArticleTermRecord;
            hto:name ?name.
    }
    GROUP BY ?term
    HAVING (COUNT(?name) > 1)
  ''',
  initNs = { "hto": hto}
)

for r in graph.query(q23):
      print("term uri: %s | number of names: %s" % (r.term, r.n))

term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133901_AATTER_0 | number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133901_ABACTORES_0 | number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133901_ABANCAI_0 | number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133901_ABASSI_0 | number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133901_ABBREVIATION_0 | number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133901_ABCDARY_0 | number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133901_ABELIANS_0 | number of names: 3
term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133901_ABELMOSCH_0 | number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133901_ABELTREE_0 | number of names: 2
term uri: https://w3id.org/hto/Articl

### Question 24: List editions which was revision of another edition

In [6]:
from rdflib.namespace import PROV
q24 = prepareQuery('''
    SELECT * WHERE {
        ?revision a hto:Edition;
            prov:wasRevisionOf ?edition;
            hto:number ?r_edition_num.
        ?edition a hto:Edition;
            hto:number ?edition_num.
    }
  ''',
  initNs = { "hto": hto, "prov": PROV}
)

for r in graph.query(q24):
      print("edition %s - < %s > was revision of edition %s -< %s >" % (r.revision, r.r_edition_num, r.edition, r.edition_num))

edition https://w3id.org/hto/Edition/9929192893804340 - < 1 > was revision of edition https://w3id.org/hto/Edition/992277653804341 -< 1 >
