# Knowledge Exploration With SPARQL queries
This notebook explores the knowledge in knowledge graphs generated in this repository using sparql queries.
Overall, we will query the graph from a remote SPARQL Query Server.

Questions this knowledge graph should answer:
1. List all digitalised collections in this graph.
2. What volumes, editions, or series does a digitalised collection _C_ include?
3. What time period does a digitalised collection _C_ cover?
4. When was edition _E_, series _S_, or volume _V_ published?
5. Who published edition _E_, series _S_, or volume _V_?
6. Who edited edition _E_, series _S_, or volume _V_?
7. Which genre does an edition _E_, series _S_, or volume _V_ belongs to?
8. Where was an edition _E_, series _S_, or volume _V_ published or printed?
9. Which language did an edition _E_, series _S_, or volume _V_ use?
10. In EB, what articles a volume _V_ include?
11. Where an EB article _A_ was described (in a page, volume, edition)?
12. What are EB articles related to another EB article _T_?
13. What are EB articles which has similar description to _T_?
14. How a term with name _T_ was described in all editions?
15. What is the text in a page?
16. What sources the text descriptions of article _T_ or a page _P_ are extracted from?
17. What is the high description of term _T_?
18. What are the descriptions of term _T_ with the highest text quality?
19. What software was used to extract the description of article _T_ or a page _P_?
20. What software was used to digitise a document?
21. What is the primary name and alternative names of an article T
22. List all articles which have more than one name
23. List editions which was revision of another edition
24. Find the summary of the description of a topic article _A_
25. List all records for a concept
26. Given a description uri, track the source

## Set up SPARQLWrapper

In [93]:
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper(
    "http://query.frances-ai.com/hto"
)
sparql.setReturnFormat(JSON)

## Query the graph

### Question 1: List all digital collections

In [94]:
sparql.setQuery("""
    PREFIX hto: <https://w3id.org/hto#>
    SELECT ?collection ?name WHERE {
        ?collection a hto:WorkCollection;
            hto:name ?name.
        FILTER (regex(?name, "Collection$", "i"))
        }
    """
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("collection uri: %s| name: %s" % (r["collection"]["value"], r["name"]["value"]))
except Exception as e:
    print(e)


collection uri: https://w3id.org/hto/WorkCollection/EncyclopaediaBritannica| name: Encyclopaedia Britannica Collection
collection uri: https://w3id.org/hto/WorkCollection/LadiesEdinburghDebatingSociety| name: Ladies’ Edinburgh Debating Society Collection
collection uri: https://w3id.org/hto/WorkCollection/ChapbooksprintedinScotland| name: Chapbooks printed in Scotland Collection
collection uri: https://w3id.org/hto/WorkCollection/GazetteersofScotland| name: Gazetteers of Scotland Collection


### Question 2: What volumes, editions, or series does a digitalised collection _C_ include?

In [95]:
## List all editions of Encyclopaedia Britannica collection
eb_collection = "<https://w3id.org/hto/WorkCollection/EncyclopaediaBritannica>"

sparql.setQuery("""
    PREFIX hto: <https://w3id.org/hto#>
    SELECT * WHERE {
        %s  a hto:WorkCollection;
            hto:hadMember ?edition.
        ?edition a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title.
        OPTIONAL {
            ?edition hto:subtitle ?subtitle;
        }
        OPTIONAL {
            ?edition hto:number ?number.
        }
} ORDER BY ?number
    """ % eb_collection
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        subtitle = None
        number =None
        if "subtitle" in r:
            subtitle = r["subtitle"]["value"]
        if "number" in r:
            number = r["number"]["value"]
        print("Edition uri: %s | MMSID: %s | title: %s | subtitle: %s | number: %s" % (r["edition"]["value"], r["mmsid"]["value"], r["title"]["value"], subtitle,  number))
except Exception as e:
    print(e)

Edition uri: https://w3id.org/hto/Edition/9910796343804340 | MMSID: 9910796343804340 | title: Supplement to the third edition of the Encyclopaedia Britannica ... Illustrated with ... copperplates | subtitle: None | number: None
Edition uri: https://w3id.org/hto/Edition/9910796373804340 | MMSID: 9910796373804340 | title: Supplement to the fourth, fifth and sixth editions of the Encyclopaedia Britannica | subtitle: None | number: None
Edition uri: https://w3id.org/hto/Edition/992277653804341 | MMSID: 992277653804341 | title: Encyclopaedia Britannica; or, A dictionary of arts and sciences, compiled upon a new plan | subtitle: Illustrated with one hundred and sixty copperplates | number: 1
Edition uri: https://w3id.org/hto/Edition/9929192893804340 | MMSID: 9929192893804340 | title: Encyclopaedia Britannica: or, A dictionary of arts and sciences | subtitle: compiled upon a new plan. In which the different sciences and arts are digested into distinct treatises or systems; and the various tec

In [96]:
## List all volumes in Encyclopaedia Britannica 7th Edition
eb_edition7_uri = "<https://w3id.org/hto/Edition/9910796273804340>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:hadMember ?volume.
        ?volume a hto:Volume;
            hto:title ?title;
            hto:number ?number;
            hto:volumeId ?volumeId;
            hto:permanentURL ?permanentURL.
        OPTIONAL {
            ?volume hto:letters ?letters;
        }
    } ORDER BY ?number
    """ % eb_edition7_uri
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        letters = None
        if "letters" in r:
            letters = r["letters"]["value"]
        print("Volume uri: %s | title: %s | number: %s | id: %s |  permanent url: %s | letters: %s" % (r["volume"]["value"], r["title"]["value"] ,r["number"]["value"], r["volumeId"]["value"], r["permanentURL"]["value"], letters))
except Exception as e:
    print(e)

Volume uri: https://w3id.org/hto/Volume/9910796273804340_192547789 | title: Seventh edition, General index | number: 0 | id: 192547789 |  permanent url: https://digital.nls.uk/192547789 | letters: None
Volume uri: https://w3id.org/hto/Volume/9910796273804340_192984258 | title: Seventh edition, Volume 1, Preliminary dissertations | number: 1 | id: 192984258 |  permanent url: https://digital.nls.uk/192984258 | letters: Preliminarydissertations
Volume uri: https://w3id.org/hto/Volume/9910796273804340_192984259 | title: Seventh edition, Volume 2, A-Anatomy | number: 2 | id: 192984259 |  permanent url: https://digital.nls.uk/192984259 | letters: A-Anatomy
Volume uri: https://w3id.org/hto/Volume/9910796273804340_193057500 | title: Seventh edition, Volume 3, Anatomy-Astronomy | number: 3 | id: 193057500 |  permanent url: https://digital.nls.uk/193057500 | letters: Anatomy-Astronomy
Volume uri: https://w3id.org/hto/Volume/9910796273804340_193108322 | title: Seventh edition, Volume 4, Astronomy

### Question 3: What time period does a digitalised collection _C_ cover?

In [97]:
# At current version, this can be done check the publication years of a collection.
# For example, the publication year of each edition for EB collection.
eb_collection_uri = "<https://w3id.org/hto/WorkCollection/EncyclopaediaBritannica>"
sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s hto:hadMember ?edition.
        ?edition hto:yearPublished ?yearPublished.
    }
    """ % eb_collection_uri
)

try:
    ret = sparql.queryAndConvert()
    years = []
    for r in ret["results"]["bindings"]:
        if "yearPublished" in r:
            years.append(r["yearPublished"]["value"])
    years.sort()
    print(years)
except Exception as e:
    print(e)


['1771', '1773', '1778', '1797', '1801', '1810', '1815', '1823', '1824', '1842', '1853']


In [None]:
# TODO Get time period which a digitalised collection _C_ cover through direct link to the collection (next version)

### Question 4: When was edition _E_, series _S_, or volume _V_ published?

In [5]:
eb_edition7_uri = "<https://w3id.org/hto/Edition/9910796273804340>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:yearPublished ?year_published;
    }
    """ % eb_edition7_uri
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("MMSID: %s | title: %s |  number: %s | year published: %s " % (r["mmsid"]["value"], r["title"]["value"] ,r["number"]["value"], r["year_published"]["value"]))
except Exception as e:
    print(e)


MMSID: 9910796273804340 | title: Encyclopaedia Britannica |  number: 7 | year published: 1842 


### Question 5: Who published edition _E_, series _S_, or volume _V_?

In [6]:
eb_edition7_uri = "<https://w3id.org/hto/Edition/9910796273804340>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:yearPublished ?year_published.
        OPTIONAL {
            %s hto:publisher ?publisher.
            ?publisher foaf:name ?publisher_name;
        }
    }
    """ % (eb_edition7_uri, eb_edition7_uri)
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        publisher = None
        if "publisher" in r:
            publisher = r["publisher"]["value"]
        publisher_name = None
        if "publisher_name" in r:
            publisher_name = r["publisher_name"]["value"]
        print("MMSID: %s | title: %s | number: %s | year published: %s | | (publisher: %s | name: %s) " % (r["mmsid"]["value"], r["title"]["value"] ,r["number"]["value"], r["year_published"]["value"], publisher, publisher_name))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica | number: 7 | year published: 1842 | | (publisher: None | name: None) 


### Question 6: Who edited edition _E_, series _S_, or volume _V_?

In [8]:
eb_edition7_uri = "<https://w3id.org/hto/Edition/9910796273804340>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:yearPublished ?year_published.
        OPTIONAL {
            %s hto:editor ?editor.
            ?editor a hto:Person;
                foaf:name ?editor_name;
            OPTIONAL {
                ?editor hto:birthYear ?birthYear.
            }
            OPTIONAL {
                ?editor hto:deathYear ?deathYear.
            }
            OPTIONAL {
                ?editor hto:termsOfAddress ?termsOfAddress.
            }
        }
}
    """ % (eb_edition7_uri, eb_edition7_uri)
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        editor = None
        if "editor" in r:
            editor = r["editor"]["value"]
        editor_name = None
        if "editor_name" in r:
            editor_name = r["editor_name"]["value"]
        birthYear = None
        if "birthYear" in r:
            birthYear = r["birthYear"]["value"]
        deathYear = None
        if "deathYear" in r:
            deathYear = r["deathYear"]["value"]
        termsOfAddress = None
        if "termsOfAddress" in r:
            termsOfAddress = r["termsOfAddress"]["value"]
        print("MMSID: %s | title: %s | number: %s | year published: %s | (editor: %s | name: %s | terms of address: %s | %s-%s ) " % (r["mmsid"]["value"], r["title"]["value"] ,r["number"]["value"], r["year_published"]["value"], editor, editor_name, termsOfAddress, birthYear, deathYear))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica | number: 7 | year published: 1842 | (editor: https://w3id.org/hto/Person/1436491835 | name: Stewart, Dugald | terms of address: Sir | 1753-1828 ) 


### Question 7: Which genre does a volume _V_ belongs to?

In [10]:
eb_edition7 = "<https://w3id.org/hto/Edition/9910796273804340>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:genre ?genre;
    }
    """ % eb_edition7
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("MMSID: %s | title: %s |  number: %s | genre: %s " % (r["mmsid"]["value"], r["title"]["value"] ,r["number"]["value"], r["genre"]["value"]))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica |  number: 7 | genre: encyclopedia 


### Question 8: Where was a volume _V_ published or printed?

In [10]:
eb_edition7_uri = "<https://w3id.org/hto/Edition/9910796273804340>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:printedAt ?placePrinted_uri;
            hto:shelfLocator ?shelfLocator_uri.
        ?placePrinted_uri rdfs:label ?placePrinted.
        ?shelfLocator_uri rdfs:label ?shelfLocator.
    }
    """ % eb_edition7_uri
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("MMSID: %s | title: %s |  number: %s | place printed: %s | shelf locator: %s " % (r["mmsid"]["value"], r["title"]["value"] ,r["number"]["value"], r["placePrinted"]["value"], r["shelfLocator"]["value"]))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica |  number: 7 | place printed: Edinburgh | shelf locator: EB.15 


### Question 9: Which language does a volume _V_ use?

In [11]:
eb_edition7_uri = "<https://w3id.org/hto/Edition/9910796273804340>"
sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s  a hto:Edition;
            hto:mmsid ?mmsid;
            hto:title ?title;
            hto:number ?number;
            hto:language ?language;
}
    """ % eb_edition7_uri
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("MMSID: %s | title: %s |  number: %s | language: %s" % (r["mmsid"]["value"], r["title"]["value"] ,r["number"]["value"], r["language"]["value"]))
except Exception as e:
    print(e)

MMSID: 9910796273804340 | title: Encyclopaedia Britannica |  number: 7 | language: eng


### Question 10: In EB, what articles a volume _V_ include?

In [13]:
# List 20 articles in volume 2 of 7th edition.
eb_edition7_volume2 = "<https://w3id.org/hto/Volume/9910796273804340_192984259>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s a hto:Volume;
            hto:hadMember ?page.
        ?termRecord a ?term_type;
                hto:name ?name.
        FILTER (?term_type = hto:ArticleTermRecord || ?term_type = hto:TopicTermRecord)
    } LIMIT 20
    """ % eb_edition7_volume2
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("Term uri: %s, name: %s" % (r["termRecord"]["value"], r["name"]["value"]))
except Exception as e:
    print(e)

Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_1007164464_0, name: ALZIRA
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_1018837245_0, name: ALEURITES
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191678901_7881872715_0, name: INDEX
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_102256234_0, name: ALTIN
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_102256234_1, name: ALTIN
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_103000667_0, name: ALAN
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_1033477591_0, name: ALGOL
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_1083154598_0, name: ALMEHRAB
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191679020_3926448654_0, name: KEBLA
Term uri: https://w3id.org/hto/ArticleTermRecord/9910796233804340_191253819_11

### Question 11: Where an EB article _A_ was described (in a page, volume, edition)?

In [37]:
# Show which page an article starts at, which volume and edition, collection, this article was described
article_name = "'SUGAR'"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        ?term a hto:ArticleTermRecord;
            hto:name %s;
            hto:startsAtPage ?page.
        ?volume a hto:Volume;
                hto:hadMember ?page.
        ?edition hto:hadMember ?volume.
        ?collection hto:hadMember ?edition.
    } LIMIT 20
    """ % article_name
)

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print("term: %s | starts at page: %s | volume: %s | edition %s | collection: %s" % (r["term"]["value"], r["page"]["value"], r["volume"]["value"], r["edition"]["value"], r["collection"]["value"]))
except Exception as e:
    print(e)

term: https://w3id.org/hto/ArticleTermRecord/9910796233804340_193108317_6364534740_0 | starts at page: https://w3id.org/hto/Page/9910796233804340_193108317_435 | volume: https://w3id.org/hto/Volume/9910796233804340_193108316 | edition https://w3id.org/hto/Edition/9910796233804340 | collection: https://w3id.org/hto/WorkCollection/EncyclopaediaBritannica
term: https://w3id.org/hto/ArticleTermRecord/9910796253804340_193057497_6364534740_0 | starts at page: https://w3id.org/hto/Page/9910796253804340_193057497_877 | volume: https://w3id.org/hto/Volume/9910796253804340_193057497 | edition https://w3id.org/hto/Edition/9910796253804340 | collection: https://w3id.org/hto/WorkCollection/EncyclopaediaBritannica
term: https://w3id.org/hto/ArticleTermRecord/9922270543804340_192200899_6364534740_0 | starts at page: https://w3id.org/hto/Page/9922270543804340_192200899_873 | volume: https://w3id.org/hto/Volume/9922270543804340_192200899 | edition https://w3id.org/hto/Edition/9922270543804340 | collect

### Question 12: What are EB articles related to another EB article _T_?

In [98]:
article_sugar = "<https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_6364534740_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
        OPTIONAL {
            %s hto:refersTo ?see_term.
            ?see_term a ?term_type;
                hto:name ?see_term_name
            FILTER (?term_type = hto:ArticleTermRecord || ?term_type = hto:TopicTermRecord)
        }
    }
    """ % (article_sugar, article_sugar)
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        refersTo = None
        if "see_term" in r:
            refersTo = r["see_term"]["value"]
        print("term: %s %s | see also term: %s %s" % (article_sugar, r["name"]["value"], refersTo, r["see_term_name"]["value"]))
except Exception as e:
    print(e)


term: <https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_6364534740_0> SUGAR | see also term: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_790527472_0 SACCHARUM


### Question 13: What are EB articles which has similar description to _T_?

In [99]:
# In current version of KGs,the similar terms are not linked as they can be queried through elastic search service. So example below will print nothing
article_sugar = "<https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_6364534740_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s a hto:TermRecord;
            hto:name ?name.
        OPTIONAL {
            %s hto:similarTo ?similar_term.
            ?similar_term a hto:TermRecord.
        }
    }
    """ % (article_sugar, article_sugar)
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        similar_term = None
        if "similar_term" in r:
            similar_term = r["similar_term"]["value"]
        print("term: %s | similar term: %s" % (r["name"]["value"], similar_term))
except Exception as e:
    print(e)

### Question 14: How a term with name _T_ was described in all editions?

In [100]:
term_name = "'EARTH'"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        ?term a hto:ArticleTermRecord;
            hto:name %s;
            hto:startsAtPage ?page;
            hto:hasOriginalDescription ?desc.
        ?desc hto:text ?content.
        ?vol hto:hadMember ?page.
        ?edition a hto:Edition;
            hto:hadMember ?vol;
            hto:title ?title.
        OPTIONAL {
            ?edition hto:number ?number.
        }
    } ORDER BY ?number
    """ % term_name
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        edition_number = None
        if "number" in r:
            edition_number = r["number"]["value"]
        print("term uri: %s | description: %s |  edition: %s | edition title: %s | edition number: %s" % (r["term"]["value"], r["content"]["value"], r["edition"]["value"], r["title"]["value"], edition_number))
except Exception as e:
    print(e)


term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_5297117738_0 | description: a ' profile, or terreflrial matter, wherever our globe but really on lists. See Vol. I. p 67. Earth, in allronomy and geography, one of the prim w planets, being it a terraqueous globe where we inhabit. See Astronomy and Geography..E-ARTHQUAKE, in natural history, a violent agitation and sometimes with an eruption of fire, water, wind, be. See Pneumatics. EASEL-pieces, a denomination given by painters to such pieces as are contained in frames, in contradiction from those painted on ceilings, be. |  edition: https://w3id.org/hto/Edition/992277653804341 | edition title: Encyclopaedia Britannica; or, A dictionary of arts and sciences, compiled upon a new plan | edition number: 1
term uri: https://w3id.org/hto/ArticleTermRecord/9929192893804340_144850367_5297117738_0 | description: a follile, or terredrial matter, whereof our globe partly confids. See Vol. I. p 67. Earth, in adronoray and

### Question 15: What is the text in a page?

In [101]:
# check the text of a page from Chapbooks collection
sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        ?page a hto:Page;
            hto:hasOriginalDescription ?desc.
        ?desc hto:text ?text;
            hto:hasTextQuality ?textQuality.
    }
    LIMIT 5
    """
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("page uri: %s | content: %s | quality: %s" % (r["page"]["value"], r["text"]["value"], r["textQuality"]["value"]))
except Exception as e:
    print(e)

page uri: https://w3id.org/hto/Page/9927010233804340_103655658_1 | content:  | quality: https://w3id.org/hto#Low
page uri: https://w3id.org/hto/Page/9927010233804340_103655658_10 | content: 2 The Ladies Edinhuagh Magazine. may be the light of our lives, but Avhich no act of our own will can bring back. It is not till the distinction has been appreciated between nature as it is and nature as we make it to be, between that which we see and that which ' having not seen, we love,' that any branch of art can be reckoned in its proper value." The writer of these remarks then goes on to contrast the matter-of-fact aspect of the source of our knowledge with the aspect of philosophy, art, and religion. The former takes our knowledge to be exclusively the result of the action of human thought; the latter admits the co-operation of nature by the transmission of images. The view of art, taking its stand on this basis, involves the absolute fusion of thought and things. The habitual interpretation 

### Question 16: What sources the text descriptions of article _T_ or a page _P_ are extracted from?

In [102]:
article_sugar = "<https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_6364534740_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text.
        ?source prov:wasAttributedTo ?agent.
    }
    """ % article_sugar
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("name: %s | source: %s | agent: %s | description: %s " % (r["name"]["value"], r["source"]["value"], r["agent"]["value"], r["text"]["value"]))
except Exception as e:
    print(e)


name: SUGAR | source: https://w3id.org/hto/InformationResource/__data_EB_1_3_clean_cut_txt | agent: https://w3id.org/hto/Organization/Ash | description: in natural history, is properly the essential salt of the sugar-cane, as tartar is of the grape. See CHEMISTRY p. 161. and SACCHARUM. This plant tises to eight, nine, or more feet high ; the stalk, conic earthen. stalk, or cane, being round, jointed, and two or three inches in diameter at the bottom : the joints are three or four inches afunder, and in a rich foil more : the leaves are long and narrow and of a yellowish green colour whio s ornamented is is also the stalk itself, the top t ciodier of ariadincos tiowers, two with a panicle or three feet in length. They propagate the sugar-cane, by planting cuttings of it in the ground in furrows, dug parailel for that purpose ; the cuttings are laid level and even, and are covered up with earth ; they soon sho nt out new plants from their knots or joints : the ground is to be kept clear,

### Question 17: What is the High quality description of term _T_?

In [103]:
article_sugar = "<https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_6364534740_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc hto:wasExtractedFrom ?source;
            hto:text ?text;
            hto:hasTextQuality hto:High.
        ?source prov:wasAttributedTo ?agent.

    }
    """ % article_sugar
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("name: %s | description: %s | source: %s | agent: %s" % (r["name"]["value"], r["text"]["value"], r["source"]["value"], r["agent"]["value"]))
except Exception as e:
    print(e)


name: SUGAR | description: in natural history, is properly the essential salt of the sugar-cane, as tartar is of the grape. See CHEMISTRY p. 161. and SACCHARUM. This plant tises to eight, nine, or more feet high ; the stalk, conic earthen. stalk, or cane, being round, jointed, and two or three inches in diameter at the bottom : the joints are three or four inches afunder, and in a rich foil more : the leaves are long and narrow and of a yellowish green colour whio s ornamented is is also the stalk itself, the top t ciodier of ariadincos tiowers, two with a panicle or three feet in length. They propagate the sugar-cane, by planting cuttings of it in the ground in furrows, dug parailel for that purpose ; the cuttings are laid level and even, and are covered up with earth ; they soon sho nt out new plants from their knots or joints : the ground is to be kept clear, at times, from weeds ; and the eanes grow so quick, that in cipht, ten, or twelve months, they are sit to cut for making of s

### Question 18: What are the descriptions of term _T_ with the highest text quality?

In [115]:
article_sugar = "<https://w3id.org/hto/ArticleTermRecord/9910796233804340_193108317_6364534740_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc ?derived_type ?source;
            hto:text ?text;
            hto:hasTextQuality ?textQuality.
        FILTER (?derived_type = prov:wasDerivedFrom || ?derived_type = hto:wasExtractedFrom)
        ?source prov:wasAttributedTo ?agent.
        FILTER NOT EXISTS {
          %s hto:hasOriginalDescription [hto:hasTextQuality [hto:isTextQualityHigherThan ?textQuality]].
        }
    }
    """ % (article_sugar, article_sugar)
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("name: %s | description: %s | source: %s | agent: %s | text quality: %s" % (r["name"]["value"], r["text"]["value"], r["source"]["value"], r["agent"]["value"], r["textQuality"]["value"]))
except Exception as e:
    print(e)


name: SUGAR | description: a solid sweet substance juice of the sugar-cane, or, according essential fair, capable of cryftallization, agreeable flavour, and contained in a greater type in almost every species of vegetables, abundant in the sugar-cane. As the sugar-cane is the principal West Indies, and the great source of is so important in a commercial view, meant which it gives to seamen, and the opens for merchants; and besides now safety of life and it may just be esteemed valuable plants in the world. The in Europe is estimated at nine million demand would probably be greater if a reduced price. Since freighter is curious a commodity, it must be an one Ct persons of curiosity and research, to obtain knowledge of the history and nature which it is produced, as well as to access by which the juice is extracted will therefore first inquire in what countries flouriftied, and when it was brought and became an article of commerce. From the few remains of the Grecian tigers which have su

### Question 19: What software was used to extract the description of article _T_ or a page _P_?

In [118]:
article_sugar = "<https://w3id.org/hto/ArticleTermRecord/992277653804341_144133903_6364534740_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc ?derived_type ?source;
            hto:text ?text;
            prov:wasAttributedTo ?software.
        FILTER (?derived_type = prov:wasDerivedFrom || ?derived_type = hto:wasExtractedFrom)
        ?software a hto:SoftwareAgent.
        ?source prov:wasAttributedTo ?agent.
        ?agent a ?agentType.
    }
    """ % (article_sugar)
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("name: %s | description: %s | extracted using software: %s | source: %s | agent: %s " % (r["name"]["value"], r["text"]["value"], r["software"]["value"], r["source"]["value"], r["agent"]["value"]))
except Exception as e:
    print(e)


name: SUGAR | description: in natural history, is properly the essential part of the sugar-cane, as tartar is of the grape. See Chemistry p. 161. and Saccharum. This plant rises to light, nine, or more feet high; their cane, being round, joined, and two or three inches in diameter at the bottom: the joints are three or four inches angular, and in a rich soil more: the leaves are long and narrow, and of a yellowish green colour; as is also the stalk itself, the top of which is ornamented with a panicle, or fuller of arundinacecus flowers, two or three feet in length. They propagate the sugar-cane, by planting cuttings of it in the ground in furrows, dug parallel for that purpose; the cuttings are laid level and even, and are covered up with earth; they look ( shoot out new plants from their knots or joints: the ground is to be kept clear, at times, from weeds; and the canes grow so quick, that in eight, ten, or twelve months, they are set to cut for making of sugar from them. The manner

### Question 20: What software was used to digitise a document?

In [128]:
# Find software used to generate the source which the description of an article was extracted.
article_earth = "<https://w3id.org/hto/ArticleTermRecord/9910796273804340_193322688_5297117738_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        %s a hto:ArticleTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?desc.
        ?desc ?derived_type ?source;
            hto:text ?text.
        FILTER (?derived_type = prov:wasDerivedFrom || ?derived_type = hto:wasExtractedFrom)
        OPTIONAL {
            ?source prov:wasAttributedTo ?software.
            ?software a hto:SoftwareAgent.
        }
    }
    """ % (article_earth)
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        software = None
        if "software" in r:
            software = r["software"]["value"]
        print("name: %s | description: %s | source: %s | created using %s" % (r["name"]["value"], r["text"]["value"], r["source"]["value"], software))
except Exception as e:
    print(e)


name: EARTH | description: amongst ancient philosophers, one elements of which the whole system of nature was Earth, in Astronomy and Geography, one many planets, being the terraqueous globe which habit.( See the articles Figure of the Earth, | source: https://w3id.org/hto/OriginalDescription/9910796273804340_193322688_5297117738_0NLS | created using https://github.com/defoe-code/defoe
name: EARTH | description: amongst ancient philosophers, owe of the four elements of which the whole system of nature was believed to be composed. | source: https://raw.githubusercontent.com/TU-plogan/kp-editions/main/eb07/TXT_v2/e8/kp-eb0708-039107-8218-v2.txt | created using https://pdf.abbyy.com
name: EARTH | description: amongst ancient philosophers, one elements of which the whole system of nature was Earth, in Astronomy and Geography, one mary planets, being the terraqueous globe which habit. (See the articles Figure of the Earth, | source: https://w3id.org/hto/InformationResource/193322688_alto_19

### Question 21: What is the primary name and alternative names of an article T

In [138]:
article_aurora = "<https://w3id.org/hto/TopicTermRecord/9910796273804340_193108322_9268244686_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
     SELECT * WHERE {
        %s a hto:TopicTermRecord;
            hto:name ?primary_name.
        OPTIONAL {
          %s rdfs:label ?alter_name.
        }
    }
    """ % (article_aurora, article_aurora)
)

try:
    ret = sparql.queryAndConvert()
    names = {} 
    for r in ret["results"]["bindings"]:
        names["primary_name"] = r["primary_name"]["value"]
        if "alter_names" in names:
            names["alter_names"].append(r["alter_name"]["value"])
        else:
            names["alter_names"] = [r["alter_name"]["value"]]
    print(names)       
except Exception as e:
    print(e)

{'primary_name': 'AURORA BOREALIS', 'alter_names': ['NORTHERN LIGHTS', 'POLAR LIGHT', 'STREAMERS']}


### Question 22: List all articles which have more than one name

In [133]:
sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
     SELECT ?term (COUNT(?alter_name) AS ?n) WHERE {
        ?term a hto:ArticleTermRecord;
            rdfs:label ?alter_name.
    }
    GROUP BY ?term
    LIMIT 10
    """
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        if "term" in r:
            print("term uri: %s | total number of names: %s" % (r["term"]["value"], int(r["n"]["value"]) + 1))
except Exception as e:
    print(e)


term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_5359492829_0 | total number of names: 3
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_192984259_5290102410_0 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_7766210292_0 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_193819043_206098831_0 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_193322690_9390612085_0 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_3676360847_0 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_193057500_1074856715_0 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_193057500_3869508535_1 | total number of names: 2
term uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_193469091_

### Question 23: List editions which was revision of another edition

In [71]:
sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     PREFIX prov: <http://www.w3.org/ns/prov#>
     SELECT * WHERE {
        ?revision a hto:Edition;
            prov:wasRevisionOf ?edition;
            hto:number ?r_edition_num.
        ?edition a hto:Edition;
            hto:number ?edition_num.
    }
    """
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("edition %s - < %s > was revision of edition %s -< %s >" % (r["revision"]["value"], r["r_edition_num"]["value"], r["edition"]["value"], r["edition_num"]["value"]))
except Exception as e:
    print(e)


edition https://w3id.org/hto/Edition/9929192893804340 - < 1 > was revision of edition https://w3id.org/hto/Edition/992277653804341 -< 1 >


### Question 24: Find the summary of the description of a topic article _A_

In [72]:
topic_a = "<https://w3id.org/hto/TopicTermRecord/9910796273804340_192693199_9335647130_0>"

sparql.setQuery("""
     PREFIX hto: <https://w3id.org/hto#>
     SELECT * WHERE {
        %s a hto:TopicTermRecord;
            hto:name ?name;
            hto:hasOriginalDescription ?description.
        ?description a hto:OriginalDescription;
            hto:hasSummary ?summary.
        ?summary hto:text ?text.
    }
    """ % topic_a
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("term name: %s | summary: %s" % (r["name"]["value"], r["text"]["value"]))
except Exception as e:
    print(e)


term name: HYGROMETRY | summary: The formation of steam or aqueous vapour, and its diffusion in space or in a gaseous medium, have already been considered under the article Evaporation. We now propose first to take a view of various methods and devices which have been employed to detect the presence of aqueous vapour, and to ascertain its amount, or how much of it is contained in a given volume, whether when alone or diffused in a gaseous medium. It will, however, be proper briefly to notice a few of those either already become obsolete, or soon to be so, were it only to show their imperfection. It is therefore of no other use than as a mere toy; for the value of an instrument employed as a measure of any kind must depend not only on its being at first accurately constructed, but likewise upon its indications not being, ccBtøris paribus, liable to change. The preceding remarks upon the effects of a change of humidity on organic substances may enable us to correct what we consider a gre

### Question 25: List all records for a concept

In [89]:
concept_uri = "<https://w3id.org/hto/Concept/6194477897_2>"

sparql.setQuery("""
    PREFIX hto: <https://w3id.org/hto#>
    SELECT * WHERE {
        %s a hto:Concept;
            hto:hadConceptRecord ?record.
    }
    """ % concept_uri
)

try:
    ret = sparql.queryAndConvert()
    for r in ret["results"]["bindings"]:
        print("record uri: %s " % (r["record"]["value"]))
except Exception as e:
    print(e)

record uri: http://www.wikidata.org/entity/Q1226939 
record uri: https://w3id.org/hto/ArticleTermRecord/9910796273804340_193322688_6194477897_1 
record uri: https://w3id.org/hto/ArticleTermRecord/992277653804341_144133902_6194477897_0 
record uri: https://w3id.org/hto/ArticleTermRecord/9929192893804340_144850367_6194477897_0 


### Question 26: Given a description uri, track the source

In [142]:
"""
In this example, we show how to keep tracking the source of an article description, including what entity it was derived from (an entity here is anything with provenance information, such as another description, or a text file, or a xml file, or an image), what software was used, who created it. 
Given a given description uri, here we will return a dictionary with entity uri as the key, and entity object as the value. This entity object have the following format: {'source_uri': '', 'entity_type': '', 'agent_type': '', 'agent_name': ''}. 
For example, we have the following result:
{'entityA': 
    {'source_uri': 'entityB', 'entity_type': 'OriginalDescription', 'agent_type': 'SoftwareAgent', 'agent_name': 'frances information extraction'}, 
'entityB': 
    {'source_uri': None, 'entity_type': 'InformationResource', 'agent_type': 'Person', 'agent_name': 'Ash Charlton'}
}
You can find the source of an entity by mapping the source_uri to the entity_uri. In above example, we can find that InformationResource entityB is the source of the OriginalDescription entityA. 
"""

# entity_uri = "https://raw.githubusercontent.com/TU-plogan/kp-editions/main/eb07/TXT_v2/e8/kp-eb0708-039107-8218-v2.txt"

entity_uri = "https://w3id.org/hto/OriginalDescription/992277653804341_144133903_6364534740_0Ash"
#entity_uri = "https://w3id.org/hto/InformationResource/__data_EB_1_3_clean_cut_txt"

def get_source(entity_uri):
    if entity_uri is None:
        return None

    entity_uri = "<" + entity_uri + ">"
    sparql.setQuery("""
        PREFIX hto: <https://w3id.org/hto#>
        PREFIX prov: <http://www.w3.org/ns/prov#>
        PREFIX foaf: <http://xmlns.com/foaf/0.1/>
        SELECT * WHERE {
            %s a ?entity_type;
                prov:wasAttributedTo ?agent.
            ?agent a ?agent_type;
                foaf:name ?agent_name.
            OPTIONAL {
            %s ?derived_type ?source.
            FILTER (?derived_type = prov:wasDerivedFrom || ?derived_type = hto:wasExtractedFrom)
            }
        } LIMIT 20
        """ % (entity_uri, entity_uri)
    )

    try:
        ret = sparql.queryAndConvert()
        source = {}
        for r in ret["results"]["bindings"]:
            source_uri = None
            if 'source' in r:
                source_uri = r['source']['value']
            source["source_uri"] = source_uri
            source["entity_type"] = r['entity_type']['value']
            agent_type = r['agent_type']['value']
            agent_type_name = agent_type.split("#")[-1]
            source["agent_type"] = agent_type_name
            source["agent_name"] = r['agent_name']['value']

        return source
    except Exception as e:
        print(e)
        return None


def track_all_sources(entity_uri):
    sources = {}
    tmp_source_uri = entity_uri
    while tmp_source_uri:
        current_source_info = get_source(tmp_source_uri)
        sources[tmp_source_uri] = current_source_info
        tmp_source_uri = current_source_info["source_uri"]
    return sources
print(track_all_sources(entity_uri))

{'https://w3id.org/hto/OriginalDescription/992277653804341_144133903_6364534740_0Ash': {'source_uri': 'https://w3id.org/hto/InformationResource/__data_EB_1_3_clean_cut_txt', 'entity_type': 'https://w3id.org/hto#OriginalDescription', 'agent_type': 'SoftwareAgent', 'agent_name': 'frances information extraction'}, 'https://w3id.org/hto/InformationResource/__data_EB_1_3_clean_cut_txt': {'source_uri': None, 'entity_type': 'https://w3id.org/hto#InformationResource', 'agent_type': 'Person', 'agent_name': 'Ash Charlton'}}
