# Knowledge Exploration With SPARQL queries
This notebook explores the knowledge in knowledge graphs generated in this repository using sparql queries.
Overall, we will query the graph from a remote SPARQL Query Server.

Questions this knowledge graph should answer:
1. List all digitalised collections in this graph.
2. What volumes, editions, or series does a digitalised collection _C_ include?
3. What time period does a digitalised collection _C_ cover? (next version)
4. When was edition _E_, series _S_, or volume _V_ published?
5. Who published edition _E_, series _S_, or volume _V_?
6. Who edited edition _E_, series _S_, or volume _V_?
7. Which genre does an edition _E_, series _S_, or volume _V_ belongs to?
8. Where was an edition _E_, series _S_, or volume _V_ published or printed?
9. Which language did an edition _E_, series _S_, or volume _V_ use?
10. In EB, what articles a volume _V_ include?
11. Where an EB article _A_ was described (in a page, volume, edition)?
12. What are the EB articles which appear in all edition?
13. What EB articles were only appeared once in edition _E_?
14. What are EB articles related to another EB article _T_?
15. What are EB articles which has similar description to _T_?
16. How a term with name _T_ was described in all editions?
17. What is the text in a page?
18. What sources the text descriptions of article _T_ or a page _P_ are extracted from?
19. What is the clean description of term _T_?
20. What are the descriptions of term _T_ with the highest text quality?
21. What software was used to extract the description of article _T_ or a page _P_?
22. What software was used to digitise a document?

## Load the graph

In [21]:
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper(
    "http://query.frances-ai.com/hto"
)
sparql.setReturnFormat(JSON)

## Query the graph

### Question 1: List all digital collections

In [2]:
sparql.setQuery("""
    PREFIX hto: <https://w3id.org/hto#>
    SELECT ?collection ?name WHERE {
        ?collection a hto:WorkCollection;
            hto:name ?name.
        FILTER (regex(?name, "Collection$", "i"))
        }
    """
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("collection uri: %s| name: %s" % (r["collection"]["value"], r["name"]["value"]))
except Exception as e:
    print(e)


collection uri: https://w3id.org/hto/WorkCollection/EncyclopaediaBritannica| name: Encyclopaedia Britannica Collection


### Question 2: What volumes, editions, or series does a digitalised collection _C_ include?

In [3]:
## List all editions of Encyclopaedia Britannica collection
eb_collection = "<https://w3id.org/hto/WorkCollection/EncyclopaediaBritannica>"

sparql.setQuery("""
    PREFIX hto: <https://w3id.org/hto#>
    SELECT * WHERE {
        %s  a hto:WorkCollection;
            hto:hadMember ?edition.
        ?edition a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title.
        OPTIONAL {
            ?edition hto:subtitle ?subtitle;
        }
        OPTIONAL {
            ?edition hto:number ?number.
        }
} ORDER BY ?number
    """ % eb_collection
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        subtitle = None
        number =None
        if "subtitle" in r:
            subtitle = r["subtitle"]["value"]
        if "number" in r:
            number = r["number"]["value"]
        print("Edition uri: %s | MMSID: %s | title: %s | subtitle: %s | number: %s" % (r["edition"]["value"], r["mmsid"]["value"], r["title"]["value"], subtitle,  number))
except Exception as e:
    print(e)

Edition uri: https://w3id.org/hto/Edition/9910796343804340 | MMSID: 9910796343804340 | title: Supplement to the third edition of the Encyclopaedia Britannica ... Illustrated with ... copperplates | subtitle: None | number: None
Edition uri: https://w3id.org/hto/Edition/9910796373804340 | MMSID: 9910796373804340 | title: Supplement to the fourth, fifth and sixth editions of the Encyclopaedia Britannica | subtitle: None | number: None
Edition uri: https://w3id.org/hto/Edition/992277653804341 | MMSID: 992277653804341 | title: Encyclopaedia Britannica; or, A dictionary of arts and sciences, compiled upon a new plan | subtitle: Illustrated with one hundred and sixty copperplates | number: 1
Edition uri: https://w3id.org/hto/Edition/9929192893804340 | MMSID: 9929192893804340 | title: Encyclopaedia Britannica: or, A dictionary of arts and sciences | subtitle: compiled upon a new plan. In which the different sciences and arts are digested into distinct treatises or systems; and the various tec

In [4]:
## List all volumes in Encyclopaedia Britannica 7th Edition
eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:hadMember ?volume.
        ?volume a hto:Volume;
            hto:title ?title;
            hto:number ?number;
            hto:volumeId ?volumeId;
            hto:permanentURL ?permanentURL.
        OPTIONAL {
            ?volume hto:letters ?letters;
        }
    } ORDER BY ?number
    """ % eb_edition7
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        letters = None
        if "letters" in r:
            letters = r["letters"]["value"]
        print("Volume uri: %s | title: %s | number: %s | id: %s |  permanent url: %s | letters: %s" % (r["volume"]["value"], r["title"]["value"] ,r["number"]["value"], r["volumeId"]["value"], r["permanentURL"]["value"], letters))
except Exception as e:
    print(e)

Volume uri: https://w3id.org/hto/Volume/9910796273804340_192547789 | title: Seventh edition, General index | number: 0 | id: 192547789 |  permanent url: https://digital.nls.uk/192547789 | letters: None
Volume uri: https://w3id.org/hto/Volume/9910796273804340_192984258 | title: Seventh edition, Volume 1, Preliminary dissertations | number: 1 | id: 192984258 |  permanent url: https://digital.nls.uk/192984258 | letters: Preliminarydissertations
Volume uri: https://w3id.org/hto/Volume/9910796273804340_192984259 | title: Seventh edition, Volume 2, A-Anatomy | number: 2 | id: 192984259 |  permanent url: https://digital.nls.uk/192984259 | letters: A-Anatomy
Volume uri: https://w3id.org/hto/Volume/9910796273804340_193057500 | title: Seventh edition, Volume 3, Anatomy-Astronomy | number: 3 | id: 193057500 |  permanent url: https://digital.nls.uk/193057500 | letters: Anatomy-Astronomy
Volume uri: https://w3id.org/hto/Volume/9910796273804340_193108322 | title: Seventh edition, Volume 4, Astronomy

### Question 3: What time period does a digitalised collection _C_ cover? (next version)

In [33]:
# TODO Get time period which a digitalised collection _C_ cover

### Question 4: When was edition _E_, series _S_, or volume _V_ published?

In [5]:
eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:yearPublished ?year_published;
    }
    """ % eb_edition7
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("MMSID: %s | title: %s |  number: %s | year published: %s " % (r["mmsid"]["value"], r["title"]["value"] ,r["number"]["value"], r["year_published"]["value"]))
except Exception as e:
    print(e)


MMSID: 9910796273804340 | title: Encyclopaedia Britannica |  number: 7 | year published: 1842 


### Question 5: Who published edition _E_, series _S_, or volume _V_?

In [6]:
from rdflib.namespace import FOAF

eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:yearPublished ?year_published.
        OPTIONAL {
            %s hto:publisher ?publisher.
            ?publisher foaf:name ?publisher_name;
        }
    }
    """ % (eb_edition7, eb_edition7)
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        publisher = None
        if "publisher" in r:
            publisher = r["publisher"]["value"]
        publisher_name = None
        if "publisher_name" in r:
            publisher_name = r["publisher_name"]["value"]
        print("MMSID: %s | title: %s | number: %s | year published: %s | | (publisher: %s | name: %s) " % (r["mmsid"]["value"], r["title"]["value"] ,r["number"]["value"], r["year_published"]["value"], publisher, publisher_name))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica | number: 7 | year published: 1842 | | (publisher: None | name: None) 


### Question 6: Who edited edition _E_, series _S_, or volume _V_?

In [7]:
from rdflib.namespace import FOAF

eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:yearPublished ?year_published.
        OPTIONAL {
            %s hto:editor ?editor.
            ?editor a hto:Person;
                foaf:name ?editor_name;
            OPTIONAL {
                ?editor hto:birthYear ?birthYear.
            }
            OPTIONAL {
                ?editor hto:deathYear ?deathYear.
            }
            OPTIONAL {
                ?editor hto:termsOfAddress ?termsOfAddress.
            }
        }
}
    """ % (eb_edition7, eb_edition7)
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        editor = None
        if "editor" in r:
            editor = r["editor"]["value"]
        editor_name = None
        if "editor_name" in r:
            editor_name = r["editor_name"]["value"]
        birthYear = None
        if "birthYear" in r:
            birthYear = r["birthYear"]["value"]
        deathYear = None
        if "deathYear" in r:
            deathYear = r["deathYear"]["value"]
        termsOfAddress = None
        if "termsOfAddress" in r:
            termsOfAddress = r["termsOfAddress"]["value"]
        print("MMSID: %s | title: %s | number: %s | year published: %s | (editor: %s | name: %s | terms of address: %s | %s-%s ) " % (r["mmsid"]["value"], r["title"]["value"] ,r["number"]["value"], r["year_published"]["value"], editor, editor_name, termsOfAddress, birthYear, deathYear))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica | number: 7 | year published: 1842 | (editor: https://w3id.org/hto/Person/1436491835 | name: Stewart, Dugald | terms of address: Sir | 1753-1828 ) 


### Question 7: Which genre does a volume _V_ belongs to?

In [8]:
eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:genre ?genre;
    }
    """ % eb_edition7
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("MMSID: %s | title: %s |  number: %s | genre: %s " % (r["mmsid"]["value"], r["title"]["value"] ,r["number"]["value"], r["genre"]["value"]))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica |  number: 7 | genre: encyclopedia 


### Question 8: Where was a volume _V_ published or printed?

In [9]:
eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:printedAt ?placePrinted;
            hto:shelfLocator ?shelfLocator.
    }
    """ % eb_edition7
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("MMSID: %s | title: %s |  number: %s | place printed: %s | shelf locator: %s " % (r["mmsid"]["value"], r["title"]["value"] ,r["number"]["value"], r["placePrinted"]["value"], r["shelfLocator"]["value"]))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica |  number: 7 | place printed: https://w3id.org/hto/Location/3744239112 | shelf locator: https://w3id.org/hto/Location/881055320 


### Question 9: Which language does a volume _V_ use?

In [10]:
eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"
sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:language ?language;
}
    """ % eb_edition7
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("MMSID: %s | title: %s |  number: %s | language: %s" % (r["mmsid"]["value"], r["title"]["value"] ,r["number"]["value"], r["language"]["value"]))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica |  number: 7 | language: eng


### Question 10: In EB, what articles a volume _V_ include?

In [14]:
# List 20 articles in volume 2 of 7th edition.
eb_edition7_volume2 = "<https://w3id.org/hto/Volume/9910796273804340_192984259>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s a hto:Volume;
            hto:hadMember ?page.
        ?termRecord a ?term_type;
                hto:startsAtPage ?page.
        FILTER (?term_type = hto:ArticleTermRecord || ?term_type = hto:TopicTermRecord)
    } LIMIT 20
    """ % eb_edition7_volume2
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("Term uri: %s" % (r["termRecord"]["value"]))
except Exception as e:
    print(e)

Term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_7983463252_0
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_189797561_0
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_163232880_0
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_5725605898_1
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_158244943_0
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_391757613_0
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_9018018225_0
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_140356593_0
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_720629256_0
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_3666381975_0
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_1563118920_0
Term uri: https://w3id.org/

### Question 11: Where an EB article _A_ was described (in a page, volume, edition)?

In [15]:
# Show which page an article starts at, which volume and edition, collection, this article was described
article_name = "'ACCENT'"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        ?term a hto:ArticleTermRecord;
            hto:name %s;
            hto:startsAtPage ?page;
            hto:wasMemberOf ?volume.
        ?volume a hto:Volume.
        OPTIONAL {
            ?term hto:wasMemberOf ?edition.
            ?edition a hto:Edition.
        }
        OPTIONAL {
            ?term hto:wasMemberOf ?collection.
            ?collection hto:name ?collection_name.
            FILTER (regex(?collection_name, "Collection$", "i"))
        }

    } LIMIT 20
    """ % article_name
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("term: %s | starts at page: %s | volume: %s | edition %s | collection: %s" % (r["term"]["value"], r["page"]["value"], r["volume"]["value"], r["edition"]["value"], r["collection"]["value"]))
except Exception as e:
    print(e)

### Question 12: What are the EB articles which appear in all edition?

In [35]:
# TODO Question 12: What are the EB articles which appear in all edition?

### Question 13: What are the EB articles which appear in all edition?

In [43]:
# TODO Question 13: What EB articles were only appeared once in edition _E_?

### Question 14: What are EB articles related to another EB article _T_?

In [68]:
article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s a hto:TermRecord;
            hto:name ?name;
        OPTIONAL {
            %s hto:refersTo ?see_term.
            ?see_term a hto:TermRecord.
        }
    }
    """ % (article_accent, article_accent)
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        refersTo = None
        if "see_term" in r:
            refersTo = r["see_term"]["value"]
        print("term: %s | see term: %s" % (r["name"]["value"], refersTo))
except Exception as e:
    print(e)


### Question 15: What are EB articles which has similar description to _T_?

In [72]:
article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s a hto:TermRecord;
            hto:name ?name.
        OPTIONAL {
            %s hto:similarTo ?similar_term.
            ?similar_term a hto:TermRecord.
        }
    }
    """ % (article_accent, article_accent)
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        similar_term = None
        if "similar_term" in r:
            similar_term = r["similar_term"]["value"]
        print("term: %s | similar term: %s" % (r["name"]["value"], similar_term))
except Exception as e:
    print(e)

term: ACCENT | similar term: None


### Question 16: How a term with name _T_ was described in all editions?

In [23]:
term_name = "'EARTH'"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        ?term a hto:ArticleTermRecord;
            hto:name %s;
            hto:startsAtPage ?page;
            hto:hasOriginalDescription ?desc.
        ?desc hto:text ?content.
        ?vol hto:hadMember ?page.
        ?edition a hto:Edition;
            hto:hadMember ?vol;
            hto:title ?title.
        OPTIONAL {
            ?edition hto:number ?number.
        }
    } ORDER BY ?number
    """ % term_name
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        edition_number = None
        if "number" in r:
            edition_number = r["number"]["value"]
        print("term uri: %s | description: %s |  edition: %s | edition title: %s | edition number: %s" % (r["term"]["value"], r["content"]["value"], r["edition"]["value"], r["title"]["value"], edition_number))
except Exception as e:
    print(e)


term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_5297117738_0 | description: a ' profile, or terreflrial matter, wherever our globe but really on lists. See Vol. I. p 67. Earth, in allronomy and geography, one of the prim w planets, being it a terraqueous globe where we inhabit. See Astronomy and Geography..E-ARTHQUAKE, in natural history, a violent agitation and sometimes with an eruption of fire, water, wind, be. See Pneumatics. EASEL-pieces, a denomination given by painters to such pieces as are contained in frames, in contradiction from those painted on ceilings, be. |  edition: https://w3id.org/hto/Edition/992277653804341 | edition title: Encyclopaedia Britannica; or, A dictionary of arts and sciences, compiled upon a new plan | edition number: 1
term uri: https://w3id.org/hto/ArticleTermRecord/9929192893804340_144850367_5297117738_0 | description: a follile, or terredrial matter, whereof our globe partly confids. See Vol. I. p 67. Earth, in adronoray and

### Question 17: What is the text in a page?

In [19]:
# check the text of a page from Chapbooks collection
sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        ?page a hto:Page;
            hto:hasOriginalDescription ?desc.
        ?desc hto:text ?text;
            hto:hasTextQuality ?textQuality.
    }
    LIMIT 5
    """
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("page uri: %s | content: %s | quality: %s" % (r["page"]["value"], r["text"]["value"], r["textQuality"]["value"]))
except Exception as e:
    print(e)

### Question 18: What sources the text descriptions of article _T_ or a page _P_ are extracted from?

In [80]:
from rdflib import PROV

article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text.
        ?source prov:wasAttributedTo ?agent.
    }
    """ % article_accent
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("name: %s | description: %s | source: %s | agent: %s" % (r["name"]["value"], r["text"]["value"], r["source"]["value"], r["agent"]["value"]))
except Exception as e:
    print(e)


name: ACCENT | description: in reading or speaking, an inflection of the voice, which gives to each syllable of a word its due pitch in-respect of height or lowness. See Reading. The word is originally Latin, accentus ; a compound of ad, to, and cano, to sing. Accentus quasi adcantus, or juχtα can-turn. In this sense, accent is synonymous with the Greek τονος ; the Latin tenor, or tonor ; and the Hebrew t□yto, gustus, taste, Accent, among grammarians, is a certain mark, or  character placed over a syllable to direct the stress of its pronunciation. We generally reckon three grammatical accents in ordinary use, all borrowed from the Greeks, viz. the acute accent ('), which shows when the tone of the voice is to be raised; the grave accent ('), when the note or tone of the voice is to be depressed; and the circumflex accent ( a ), which is composed of both the acute and the grave, and points out a kind of undulation of the voice. The Latins have made the same use as the Greeks of these t

### Question 19: What is the High quality description of term _T_?

In [81]:
from rdflib import PROV

article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text;
            hto:hasTextQuality hto:High.
        ?source prov:wasAttributedTo ?agent.

    }
    """ % article_accent
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("name: %s | description: %s | source: %s | agent: %s" % (r["name"]["value"], r["text"]["value"], r["source"]["value"], r["agent"]["value"]))
except Exception as e:
    print(e)


name: ACCENT | description: in reading or speaking, an inflection of the voice, which gives to each syllable of a word its due pitch in-respect of height or lowness. See Reading. The word is originally Latin, accentus ; a compound of ad, to, and cano, to sing. Accentus quasi adcantus, or juχtα can-turn. In this sense, accent is synonymous with the Greek τονος ; the Latin tenor, or tonor ; and the Hebrew t□yto, gustus, taste, Accent, among grammarians, is a certain mark, or  character placed over a syllable to direct the stress of its pronunciation. We generally reckon three grammatical accents in ordinary use, all borrowed from the Greeks, viz. the acute accent ('), which shows when the tone of the voice is to be raised; the grave accent ('), when the note or tone of the voice is to be depressed; and the circumflex accent ( a ), which is composed of both the acute and the grave, and points out a kind of undulation of the voice. The Latins have made the same use as the Greeks of these t

### Question 20: What are the descriptions of term _T_ with the highest text quality?

In [82]:
from rdflib import PROV

article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text;
            hto:hasTextQuality ?textQuality.
        ?source prov:wasAttributedTo ?agent.
        ?agent a ?agentType.
        FILTER (?agentType = hto:Person || ?agentType = hto:Organization)
        FILTER NOT EXISTS {
          %s hto:hasOriginalDescription [hto:hasTextQuality [hto:isTextQualityHigherThan ?textQuality]].
        }
    }
    """ % (article_accent, article_accent)
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("name: %s | description: %s | source: %s | agent: %s | text quality: %s" % (r["name"]["value"], r["text"]["value"], r["source"]["value"], r["agent"]["value"], r["textQuality"]["value"]))
except Exception as e:
    print(e)


name: ACCENT | description: in reading or speaking, an inflection of the voice, which gives to each syllable of a word its due pitch in-respect of height or lowness. See Reading. The word is originally Latin, accentus ; a compound of ad, to, and cano, to sing. Accentus quasi adcantus, or juχtα can-turn. In this sense, accent is synonymous with the Greek τονος ; the Latin tenor, or tonor ; and the Hebrew t□yto, gustus, taste, Accent, among grammarians, is a certain mark, or  character placed over a syllable to direct the stress of its pronunciation. We generally reckon three grammatical accents in ordinary use, all borrowed from the Greeks, viz. the acute accent ('), which shows when the tone of the voice is to be raised; the grave accent ('), when the note or tone of the voice is to be depressed; and the circumflex accent ( a ), which is composed of both the acute and the grave, and points out a kind of undulation of the voice. The Latins have made the same use as the Greeks of these t

### Question 21: What software was used to extract the description of article _T_ or a page _P_?

In [84]:
from rdflib import PROV

article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:TermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text;
            prov:wasAttributedTo ?software.
        ?software a prov:SoftwareAgent.
        ?source prov:wasAttributedTo ?agent.
        ?agent a ?agentType.
        FILTER (?agentType = prov:Person || ?agentType = prov:Organization)
    }
    """ % (article_accent)
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("name: %s | description: %s | extracted using software: %s | source: %s | agent: %s " % (r["name"]["value"], r["text"]["value"], r["software"]["value"], r["source"]["value"], r["agent"]["value"]))
except Exception as e:
    print(e)


name: ACCENT | description: in reading or speaking, an inflection of the voice, which gives to each syllable of a word its due pitch in-respect of height or lowness. See Reading. The word is originally Latin, accentus ; a compound of ad, to, and cano, to sing. Accentus quasi adcantus, or juχtα can-turn. In this sense, accent is synonymous with the Greek τονος ; the Latin tenor, or tonor ; and the Hebrew t□yto, gustus, taste, Accent, among grammarians, is a certain mark, or  character placed over a syllable to direct the stress of its pronunciation. We generally reckon three grammatical accents in ordinary use, all borrowed from the Greeks, viz. the acute accent ('), which shows when the tone of the voice is to be raised; the grave accent ('), when the note or tone of the voice is to be depressed; and the circumflex accent ( a ), which is composed of both the acute and the grave, and points out a kind of undulation of the voice. The Latins have made the same use as the Greeks of these t

### Question 22: What software was used to digitise a document?

In [87]:
from rdflib import PROV
# Find software used to generate the source which the description of an article was extracted.
article_accent = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_ACCENT_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:TermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text.
        ?source prov:wasAttributedTo ?agent.
        ?agent a ?agentType.
        FILTER (?agentType = prov:Person || ?agentType = prov:Organization)
        OPTIONAL {
            ?source prov:wasAttributedTo ?software.
            ?software a prov:SoftwareAgent.
        }
    }
    """ % (article_accent)
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        software = None
        if "software" in r:
            software = r["software"]["value"]
        print("name: %s | description: %s | source: %s | created by %s using %s" % (r["name"]["value"], r["text"]["value"], r["source"]["value"], r["agent"]["value"], software))
except Exception as e:
    print(e)


name: ACCENT | description: in reading or speaking, an inflection of the voice, which gives to each syllable of a word its due pitch in-respect of height or lowness. See Reading. The word is originally Latin, accentus ; a compound of ad, to, and cano, to sing. Accentus quasi adcantus, or juχtα can-turn. In this sense, accent is synonymous with the Greek τονος ; the Latin tenor, or tonor ; and the Hebrew t□yto, gustus, taste, Accent, among grammarians, is a certain mark, or  character placed over a syllable to direct the stress of its pronunciation. We generally reckon three grammatical accents in ordinary use, all borrowed from the Greeks, viz. the acute accent ('), which shows when the tone of the voice is to be raised; the grave accent ('), when the note or tone of the voice is to be depressed; and the circumflex accent ( a ), which is composed of both the acute and the grave, and points out a kind of undulation of the voice. The Latins have made the same use as the Greeks of these t

### Question 23: List all articles which have more than one name

In [18]:
sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT ?term (COUNT(?name) AS ?n) WHERE {
        ?term a hto:ArticleTermRecord;
            hto:name ?name.
    }
    GROUP BY ?term
    HAVING (COUNT(?name) > 1)
    LIMIT 10
    """
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        if "term" in r:
            print("term uri: %s | number of names: %s" % (r["term"]["value"], r["n"]["value"]))
except Exception as e:
    print(e)


### Question 24: List editions which was revision of another edition

In [13]:
sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        ?revision a hto:Edition;
            prov:wasRevisionOf ?edition;
            hto:number ?r_edition_num.
        ?edition a hto:Edition;
            hto:number ?edition_num.
    }
    """
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("edition %s - < %s > was revision of edition %s -< %s >" % (r["revision"]["value"], r["r_edition_num"]["value"], r["edition"]["value"], r["edition_num"]["value"]))
except Exception as e:
    print(e)


### Question 25: Find the summary of the description of a topic article _A_

In [16]:
topic_a = "<https://w3id.org/hto/TopicTermRecord/9910796273804340_192693199_HYGROMETRY_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s a hto:TopicTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?description.
        ?description a hto:OriginalDescription;
            hto:hasSummary ?summary.
        ?summary hto:text ?text.
    }
    """ % topic_a
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("term uri: %s | summary: %s" % (r["term"]["value"], r["summary"]["value"]))
except Exception as e:
    print(e)


In [24]:
# List the source provider who provides the highest quality of textual dataset in each edition.
sparql.setQuery("""
    PREFIX hto: <https://w3id.org/hto#>
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX prov: <http://www.w3.org/ns/prov#>
    SELECT DISTINCT ?agent ?name WHERE {
        ?desc hto:wasExtractedFrom ?source.
        ?source prov:wasAttributedTo ?agent.
        ?agent a ?agentType;
            foaf:name ?name.
        FILTER (?agentType = prov:Person || ?agentType = prov:Organization)
    }
    """
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("AGENT: %s" % (r["name"]["value"]))
except Exception as e:
    print(e)


AGENT: Ash Charlton
AGENT: Nineteenth-Century Knowledge Project
AGENT: National Library of Scotland


### Question 26: List all concepts

In [25]:
sparql.setQuery("""
    PREFIX hto: <https://w3id.org/hto#>
    SELECT * WHERE {
        ?concept a hto:Concept.
    } LIMIT 20
    """
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("Concept uri: %s" % (r["concept"]["value"]))
except Exception as e:
    print(e)

Concept uri: https://w3id.org/hto/Concept/5209900143_1
Concept uri: https://w3id.org/hto/Concept/6194477897_2
Concept uri: https://w3id.org/hto/Concept/9438111967_1
Concept uri: https://w3id.org/hto/Concept/528009415_1
Concept uri: https://w3id.org/hto/Concept/6957054916_1
Concept uri: https://w3id.org/hto/Concept/3646294525_1
Concept uri: https://w3id.org/hto/Concept/8992274475_1
Concept uri: https://w3id.org/hto/Concept/6536088635_1
Concept uri: https://w3id.org/hto/Concept/1224675172_2
Concept uri: https://w3id.org/hto/Concept/2771374226_1
Concept uri: https://w3id.org/hto/Concept/2441206593_1
Concept uri: https://w3id.org/hto/Concept/5451806916_27
Concept uri: https://w3id.org/hto/Concept/6005223580_1
Concept uri: https://w3id.org/hto/Concept/5932165500_1
Concept uri: https://w3id.org/hto/Concept/1310772381_1
Concept uri: https://w3id.org/hto/Concept/120789143_2
Concept uri: https://w3id.org/hto/Concept/4803695092_2
Concept uri: https://w3id.org/hto/Concept/274228744_2
Concept uri:

### Question 26: List all term records for a concept

In [2]:
concept_uri = "<https://w3id.org/hto/Concept/6194477897_2>"

sparql.setQuery("""
    PREFIX hto: <https://w3id.org/hto#>
    SELECT * WHERE {
        %s a hto:Concept;
            hto:hadConceptRecord ?record.
        ?record a ?term_type;
                hto:name ?name;
                hto:hasOriginalDescription ?desc.
        ?desc hto:text ?content;
            hto:hasTextQuality hto:Low.
        FILTER (?term_type = hto:ArticleTermRecord || ?term_type = hto:TopicTermRecord)

    } LIMIT 20
    """ % concept_uri
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("record name: %s | uri: %s | description: %s " % (r["name"]["value"], r["record"]["value"], r["content"]["value"]))
except Exception as e:
    print(e)

record name: DIVISION | uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_6194477897_0 | description: in general, is the separating a thing into two or more parts. Division, in arithmetic. SeeVol. I.p. 376. Division, in algebra. SeeVol.I. p. 82. 
record name: DIVISION | uri: https://w3id.org/hto/ArticleTermRecord/9929192893804340_144850367_6194477897_0 | description: in general, is the separating a thing into two or more parts. Division, in arithmetic., See Vol. I.p. 376. Division, in algebra. SeeVol.I. p. 82. 


### Question 27: Given a description, track the source

In [23]:
description_uri = "https://raw.githubusercontent.com/TU-plogan/kp-editions/main/eb07/TXT_v2/e8/kp-eb0708-039107-8218-v2.txt"

#description_uri = "https://w3id.org/hto/OriginalDescription/992277653804341_144133903_6364534740_0Ash"
#description_uri = "https://w3id.org/hto/InformationResource/__data_EB_1_3_clean_cut_txt"

def get_source(entity_uri):
    if entity_uri is None:
        return None

    entity_uri = "<" + entity_uri + ">"
    sparql.setQuery("""
        PREFIX hto: <https://w3id.org/hto#>
        PREFIX prov: <http://www.w3.org/ns/prov#>
        PREFIX foaf: <http://xmlns.com/foaf/0.1/>
        SELECT * WHERE {
            %s a ?entity_type;
                prov:wasAttributedTo ?agent.
            ?agent a ?agent_type;
                foaf:name ?agent_name.
            OPTIONAL {
            %s ?derived_type ?source.
            FILTER (?derived_type = prov:wasDerivedFrom || ?derived_type = hto:wasExtractedFrom)
            }
        } LIMIT 20
        """ % (entity_uri, entity_uri)
    )

    # wasAttributedTo peron, Organization, software agent
    # wasDerivedFrom - description, wasExtractedFrom - information resource

    try:
        ret = sparql.queryAndConvert()
        source = {}
        for r in ret["results"]["bindings"]:
            source_uri = None
            if 'source' in r:
                source_uri = r['source']['value']
            source["source_uri"] = source_uri
            source["entity_type"] = r['entity_type']['value']
            agent_type = r['agent_type']['value']
            agent_type_name = agent_type.split("#")[-1]
            source[agent_type_name] = r['agent_name']['value']

        return source
    except Exception as e:
        print(e)
        return None


def track_all_sources(entity_uri):
    sources = {}
    tmp_source_uri = entity_uri
    while tmp_source_uri:
        current_source_info = get_source(tmp_source_uri)
        sources[tmp_source_uri] = current_source_info
        tmp_source_uri = current_source_info["source_uri"]
    return sources
print(track_all_sources(description_uri))

{'https://raw.githubusercontent.com/TU-plogan/kp-editions/main/eb07/TXT_v2/e8/kp-eb0708-039107-8218-v2.txt': {'source_uri': None, 'entity_type': 'https://w3id.org/hto#InformationResource', 'Organization': 'Nineteenth-Century Knowledge Project', 'SoftwareAgent': 'ABBYY FineReader'}}
