# SPARQL Queries - Chapbooks Scotland - ChapbooksScotland-KG


This notebook enables to perform several SPARQL queries to the ChapbooksScotland-KG Knowlege Graph (chapbooks_scotland.ttl). It includes two types RDF queries:

 - Querying the KG locally: Using rdflib.plugins.sparql
 - Querying the KG from an Apache Fuseki Server: Using SPARQLWrapper

The ChapbooksScotland-KG implements the [NLS-Ontology](https://francesnlp.github.io/NLS-ontology/doc/index-en.html), which has the following [Data Model](https://francesnlp.github.io/NLS-ontology/doc/dataModel.png)

The ChapbooksScotland-KG (chapbooks_scotland.ttl) can be download from [Zenodo](https://zenodo.org/record/6673995#.YrHcBZBBx_A).

In general terms this NLS-Ontology can be described as follows:
1. A Digital Collection can have several Series.
2. A Serie can have several Volumes (Books).
3. A Serie can have several Editors, Publishers and reference books
4. A Volume has several Pages
5. A Page has text
6. A Serie can have a Supplement
7. A Supplement can have several Editors, Publishers and reference books
8. A Supplement can have several Volumes


Example of values of the ChapbooksScotland-KG properties:


    mmsid: 9937033633804341
    serie title: Chapbooks printed in Scotland
    editor name: Milne, John
    editor_date: 1792-1871
    genre: Chapbooks-Scotland-Aberdeen-1801-1900
    language: eng
    metsXML: 104184105-mets.xml
    termsOfAddress: None
    numberOfPages: 8
    numberOfWords: 53
    permanentURL: https://digital.nls.uk/104184105
    physicalDescription: 8 p. ; 18 cm.
    place: Aberdeen
    publisher: Printed by A. Imlay, 22, Long Acre
    referencedBy: None
    shelfLocator: L.C.2786.A(1)
    altoXML: 104184105/alto/107134030.34.xml
    serie subtitle: to the tune of Johnny Cop
    text: A SONG JRAISB OP THE ^ HIGHLAND LADS. To the T...
    page number: Page1
    volume title: song in praise of the highland lads
    volumeId 104184105
    year 1826
    serie number: 0
    part: 0
    publisherPersons; []
    numberOfVolumes: 3080
    volume number: 1



### Loading the necessary libraries

In [1]:
# List a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip list

Package                  Version
------------------------ ----------
altgraph                 0.17.2
anyio                    3.6.2
appnope                  0.1.3
argon2-cffi              21.3.0
argon2-cffi-bindings     21.2.0
arrow                    1.2.3
astroid                  2.14.2
asttokens                2.2.1
attrs                    22.2.0
backcall                 0.2.0
beautifulsoup4           4.11.2
bertopic                 0.10.0
bleach                   6.0.0
blis                     0.7.9
catalogue                2.0.8
certifi                  2022.12.7
cffi                     1.15.1
charset-normalizer       3.0.1
chart-studio             1.1.0
click                    8.1.3
comm                     0.1.2
confection               0.0.4
contourpy                1.0.7
cycler                   0.11.0
cymem                    2.0.7
Cython                   0.29.33
debugpy                  1.6.6
decorator                5.1.1
defusedxml        

In [2]:
import rdflib
from rdflib.extras.external_graph_libs import rdflib_to_networkx_multidigraph
import networkx as nx
import matplotlib.pyplot as pl
from rdflib import Graph, ConjunctiveGraph, Namespace, Literal
from rdflib.plugins.sparql import prepareQuery

In [3]:
import networkx as nx
import matplotlib.pyplot as plt
from SPARQLWrapper import SPARQLWrapper, JSON

In [4]:
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [15, 10]

### Functions

In [5]:
def plot_resource(results):

    G = rdflib_to_networkx_multidigraph(results)
    pos = nx.spring_layout(G, scale=3)
    edge_labels = nx.get_edge_attributes(G, 'r')
    nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
    nx.draw(G, with_labels=True)
    plt.show()

### Type 1:  Working with the Knowlege Graph locally

In [6]:
g = Graph()
## Modify the path if necessary
g.parse("./results/chapbooks_scotland.ttl", format="ttl")

FileNotFoundError: [Errno 2] No such file or directory: '/Users/ly40/Documents/frances-ai/frances-api/results/chapbooks_scotland.ttl'

Query 1: List all the resources with the property nls:editor in the ChapbooksScotland-KG

In [None]:
nls = Namespace("https://w3id.org/nls#")
q1 = prepareQuery('''
  SELECT ?Year WHERE {
    ?Serie nls:Serie ?publicationYear.
  }
  ''',
  initNs = { "nls": nls}
)

for r in g.query(q1):
      print(r.Year)

Query 2: Same query but asking more information about the resources obtained.

In [None]:
q2 = prepareQuery('''
  SELECT ?Subject ?Editor WHERE {
    ?Subject nls:editor ?Editor.
  }
  ''',
  initNs = { "nls": nls}
)

for r in g.query(q2):
  print(r.Subject, r.Editor)

Query 3: Obtaining just the editors names

In [None]:
q3= prepareQuery('''
SELECT DISTINCT ?name
    WHERE {
     ?instance nls:editor ?Editor.
     ?Editor nls:name ?name .
    }
    ''',
  initNs = { "nls": nls}
)

for r in g.query(q3):
      print(r.name)

In [None]:
res=g.query(q3)
a=list(res)[0]
a.name

Query 4: Same query asking for the first 10 resources with the properity nls.name

In [None]:
q4 = prepareQuery('''
  SELECT ?Subject ?Name WHERE {
    ?Subject nls:name ?Name.
  }
  LIMIT 10
  ''',
  initNs = { "nls": nls}
)

for r in g.query(q4):
    print(r.Subject, r.Name)

Query 5: Obtain the resources titles which have been printed in Glasgow

In [None]:
from rdflib import XSD
q5 = prepareQuery('''
SELECT ?name ?place
WHERE {
 ?Subject nls:printedAt ?place;
          nls:title ?name.
}

  ''',
  initNs = { "nls": nls}
)

for r in g.query(q5, initBindings = {'?place' : Literal('Glasgow', datatype=XSD.string)}):
    print(r.name)

Query 7: Asking for resources which family name is "Berry, Edward"

In [None]:
from rdflib import XSD
q7 = prepareQuery('''
  SELECT ?Subject WHERE {
    ?Subject nls:name ?Family.
  }
  ''',
    initNs = { "nls": nls}
)

for r in g.query(q7, initBindings = {'?Family' : Literal('Berry, Edward', datatype=XSD.string)}):
  print(r.Subject)

Query 8: Counting the number of "Serie" resources

In [None]:
q8 = prepareQuery('''
    SELECT ?serie
    WHERE {
       ?serie rdf:type nls:Serie .
    }
    ''',
  initNs = { "nls": nls}
)

print(len(g.query(q8)))


Query: How many series do have more than 1 volume

In [None]:
q8 = prepareQuery('''
    SELECT ?serie ?nv
    WHERE {
       ?serie rdf:type nls:Serie .
       ?serie nls:numberOfVolumes ?nv .
       FILTER (?nv > 1).
     }
    ''',
  initNs = { "nls": nls}
)

print(len(g.query(q8)))


IMPORTANT!! query that shows the title of a series that have more than 1 volume (in this case, because those books are repeated), and the number of volumes ( number of repetitions). The issue than a book is repeated is specific to this digital collection. Nevertheless, if a book is printed twice, they will belong to the same serie ( having the same mmsid), but different volume_id.

In [None]:
q8 = prepareQuery('''
    SELECT ?serie ?title ?nv
    WHERE {
       ?serie rdf:type nls:Serie .
       ?serie nls:title ?title .
       ?serie nls:numberOfVolumes ?nv .
       FILTER (?nv > 1).
     }
    ''',
  initNs = { "nls": nls}
)

for r in g.query(q8):
    print("Serie Uri: %s,  Serie Title: %s, Num volume %s" %(r.serie, r.title, r.nv))

Query 8bis: Obtaining Series URIS and collection name

In [None]:
q8 = prepareQuery('''
    SELECT ?serie ?collection
    WHERE {
       ?serie rdf:type nls:Serie .
       ?serie nls:collection ?collection .
    }
    ''',
  initNs = { "nls": nls}
)

for r in g.query(q8):
    print(r.serie, r.collection)

q8c: Obtaining seris uris and title

In [None]:
q8 = prepareQuery('''
    SELECT ?serie ?title
    WHERE {
       ?serie rdf:type nls:Serie .
       ?serie nls:title ?title .
    }
    ''',
  initNs = { "nls": nls}
)

for r in g.query(q8):
    print(r.serie, r.title)

q8d: Obtaining volumes uris and title

In [None]:
q8 = prepareQuery('''
    SELECT ?volume ?title
    WHERE {
       ?volume rdf:type nls:Volume .
       ?volume nls:title ?title .
    }
    ''',
  initNs = { "nls": nls}
)

for r in g.query(q8):
    print(r.volume, r.title)

Query 9: Priting the first 10 "Pages" uris

In [None]:
q9 = prepareQuery('''
    SELECT ?page
    WHERE {
       ?page rdf:type nls:Page .
    }
    LIMIT 10
    ''',
  initNs = { "nls": nls}
)

for r in g.query(q9):
    print(r.page)

Query 10: Otaining the text and the page-number of the first 10 pages

In [None]:
q10 = prepareQuery('''
    SELECT *
    WHERE {
       ?page rdf:type nls:Page .
       ?page nls:text ?text .
       ?page nls:number ?number .

    }
    LIMIT 5
    ''',
  initNs = { "nls": nls}
)

for r in g.query(q10):
    print("---")
    print(r.text, r.number)

Query 11: Obtaining the title of the first 10 resources

In [None]:
q11 = prepareQuery('''
  SELECT ?serie ?title
    WHERE {
    ?serie rdf:type nls:Serie .
    ?serie nls:title ?title .
    }
    LIMIT 10
    ''',
  initNs = { "nls": nls}
)

for r in g.query(q11):
      print(r.title)

Query 12: Obtaining each serie number, serie year, serie mmsid, serie language

In [None]:
q12 = prepareQuery('''
SELECT ?enum ?s ?y ?nv ?mmsid ?title ?subtitle  ?lang WHERE {
       ?s a nls:Serie ;
            nls:number ?enum ;
            nls:mmsid ?mmsid ;
            nls:publicationYear ?y ;
            nls:numberOfVolumes ?nv ;
            nls:title ?title ;
            nls:subtitle ?subtitle ;
            nls:language ?lang .

    }
''',
  initNs = { "nls": nls}
)

for r in g.query(q12):
    #print(r.enum, r.y, r.nv, r.mmsid, r.lang, r.title, r.subtitle)
    print(r.title, r.subtitle)

In [None]:
q12 = prepareQuery('''
SELECT ?title WHERE {
       ?s a nls:Volume ;
            nls:title ?title ;


    }
''',
  initNs = { "nls": nls}
)

for r in g.query(q12):
    #print(r.enum, r.y, r.nv, r.mmsid, r.lang, r.title, r.subtitle)
    print(r.title)

Query 13: Obtaining the text of pages, in which appear the string "woman".

In [None]:
q13 = prepareQuery('''
SELECT * WHERE {
       ?page a nls:Page .
       ?page nls:text ?text .
      FILTER regex(?text, "^woman")
           }
    LIMIT 3
''',
  initNs = { "nls": nls}
)

for r in g.query(q13):
    print(r)
    print("---")

Query 14: Counting the number of Volumes

In [None]:
q14 = prepareQuery('''
SELECT (COUNT (DISTINCT ?v) as ?count)
    WHERE {
        ?serie a nls:Serie .
        ?serie nls:hasPart ?v .
        }

''',
  initNs = { "nls": nls}
)

for r in g.query(q14):
    print("-Number of Volumes--")
    print(r)

Query 15: Counting the number of Pages

In [None]:
q15 = prepareQuery('''
SELECT (COUNT (DISTINCT ?p) as ?count)
    WHERE {
        ?volume a nls:Volume .
        ?volume nls:hasPart ?p .
        }

''',
  initNs = { "nls": nls}
)

for r in g.query(q15):
    print("-Number of PAGES--")
    print(r)


Query 16: Counting the number of Series

In [None]:
q16 = prepareQuery('''
SELECT (COUNT (DISTINCT ?s) as ?count)
    WHERE {
        ?serie a nls:Serie .
        ?serie nls:mmsid ?s .
        }

''',
  initNs = { "nls": nls}
)

for r in g.query(q16):
    print("-Number of Series--")
    print(r)


Query 17: Obtaining the uri, year, number of volumes and number of pages of the first 3 series resources

In [None]:
q17 =prepareQuery('''
SELECT ?uri ?year ?vnum ?numPages ?numWords
        WHERE {
        ?uri a nls:Page .
        ?uri nls:numberOfWords ?numWords.
        ?v nls:hasPart ?uri .
        ?v nls:number ?vnum .
        ?v nls:numberOfPages ?numPages .
        ?e nls:hasPart ?v .
        ?e nls:publicationYear ?year.

        }
         LIMIT 3
        ''',
  initNs = { "nls": nls}
)

for r in g.query(q17):
    print("-Number of Series--")
    print(r.uri, r.year, r.vnum, r.numPages)

In [None]:
#G = rdflib_to_networkx_multidigraph(result)

# Plot Networkx instance of RDF Graph
#pos = nx.spring_layout(G, scale=2)
#edge_labels = nx.get_edge_attributes(G, 'r')
#nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
#nx.draw(G, with_labels=True)

#if not in interactive mode for
#plt.show()


### Type 2: Connecting with FUSEKI and using SPARQLWrapper

**Previously the knowlege graph (chapbooks_scotland.ttl) needs to be uploaded to Apache Fuseki Server**

Query 18: Basic query

In [None]:
sparql = SPARQLWrapper("http://localhost:3030/chapbooks_scotland/sparql")
sparql.setQuery("""
    SELECT ?subject ?predicate ?object WHERE {   ?subject ?predicate ?object } LIMIT 5
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

results

for result in results["results"]["bindings"]:
    print(result["subject"]["value"], result["predicate"]["value"], result["object"]["value"] )


Query 19: Obtaining each Serie uri, year and number of Serie.

In [None]:
sparql = SPARQLWrapper("http://localhost:3030/chapbooks_scotland/sparql")
sparql.setQuery("""
    PREFIX nls: <https://w3id.org/nls#>
    SELECT ?serie ?year ?snum WHERE {
       ?serie a nls:Serie ;
              nls:number ?snum ;
              nls:publicationYear ?year
    }

""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
r=results["results"]["bindings"]
for i in r:
    print(i["serie"]["value"], i["year"]["value"], i["snum"]["value"])

Query 20: Obtaining publication year, number, title, subtitle, printedAt, physical description, mmsid, shelflocator, number of volumes of the resource with the uri: https://w3id.org/nls/i/Serie/9937393453804340

In [None]:
sparql = SPARQLWrapper("http://localhost:3030/chapbooks_scotland/sparql")
uri="<https://w3id.org/nls/i/Serie/9937393453804340>"
query="""
PREFIX nls: <https://w3id.org/nls#>
SELECT ?publicationYear ?num ?title ?subtitle ?printedAt ?physicalDescription ?mmsid ?shelfLocator ?numberOfVolumes  WHERE {
       %s nls:publicationYear ?publicationYear ;
          nls:number ?num;
          nls:title ?title;
          nls:subtitle ?subtitle ;
          nls:printedAt ?printedAt;
          nls:physicalDescription ?physicalDescription;
          nls:mmsid ?mmsid;
          nls:shelfLocator ?shelfLocator;
          nls:numberOfVolumes ?numberOfVolumes.


}
""" % (uri)
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
r=results["results"]["bindings"]
for i in r:
    print(i["publicationYear"]["value"], i["num"]["value"], i["title"]["value"], i["subtitle"]["value"],\
          i["printedAt"]["value"], i["physicalDescription"]["value"], i["mmsid"]["value"], i["shelfLocator"]["value"],\
          i["numberOfVolumes"]["value"])

Query 21: Obtaining the number of volumes of the serie with the uri: https://w3id.org/nls/i/Serie/9937393453804340

In [None]:
sparql = SPARQLWrapper("http://localhost:3030/chapbooks_scotland/sparql")
uri="<https://w3id.org/nls/i/Serie/9937393453804340>"
query="""
PREFIX nls: <https://w3id.org/nls#>
SELECT (COUNT (DISTINCT ?v) as ?count)
    WHERE {
        %s nls:hasPart ?v.
    	?v ?b ?c
}
""" % (uri)
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
results["results"]["bindings"][0]["count"]["value"]


Query to obtain the volumes uris and title for a particular serie

In [None]:
sparql = SPARQLWrapper("http://localhost:3030/chapbooks_scotland/sparql")
uri="<https://w3id.org/nls/i/Serie/9937185433804340>"
query="""
PREFIX nls: <https://w3id.org/nls#>
SELECT ?v ?title ?vnum
    WHERE {
        %s nls:hasPart ?v.
    	?v nls:title ?title .
        ?v nls:title ?vnum .
}
""" % (uri)
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
r=results["results"]["bindings"]
for i in r:
        print(i["v"]["value"], i["vnum"]["value"])

Query 22: Giving the volume with the uri: <https://w3id.org/nls/i/Volume/9937038023804340_104184129>, this query obtaians its pages uris and numbers.

In [None]:
sparql = SPARQLWrapper("http://localhost:3030/chapbooks_scotland/sparql")
uri="<https://w3id.org/nls/i/Volume/9937038023804340_104184129>"
query="""
PREFIX nls: <https://w3id.org/nls#>
SELECT ?v ?vnum ?part  WHERE {
       %s nls:hasPart ?v .
       ?v nls:number ?vnum ;
          OPTIONAL {?v nls:part ?part; }


}
""" % (uri)
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
r=results["results"]["bindings"]
for i in r:
    if "part" in i:
        print(i["v"]["value"], i["vnum"]["value"], i["part"]["value"])
    else:
        print(i["v"]["value"], i["vnum"]["value"])

Query 23: Giving the volume with the uri: <https://w3id.org/nls/i/Volume/9937038023804340_104184129>, this query obtaians: number, title, part, metsXML, volume Id, permanent URL, number of Pages

In [None]:
sparql = SPARQLWrapper("http://localhost:3030/chapbooks_scotland/sparql")
uri="<https://w3id.org/nls/i/Volume/9937038023804340_104184129>"
query="""
PREFIX nls: <https://w3id.org/nls#>
SELECT ?num ?title ?part ?metsXML ?volumeId ?permanentURL ?numberOfPages  WHERE {
       %s nls:number ?num ;
          nls:title ?title;
          nls:metsXML ?metsXML;
          nls:volumeId ?volumeId;
          nls:permanentURL ?permanentURL;
          nls:numberOfPages ?numberOfPages;
       OPTIONAL {%s nls:part ?part. }


}
""" % (uri, uri)
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
r=results["results"]["bindings"]
for i in r:
    print(i)

Query 23: Giving the volume with the uri: https://w3id.org/nls/i/Volume/9937038023804340_104184129, this query obtains the number of pages.


In [None]:
sparql = SPARQLWrapper("http://localhost:3030/chapbooks_scotland/sparql")
uri="<https://w3id.org/nls/i/Volume/9937038023804340_104184129>"
query="""
PREFIX nls: <https://w3id.org/nls#>
SELECT (COUNT (DISTINCT ?p) as ?count)
    WHERE {
        %s nls:hasPart ?p .
        ?p a nls:Page

}
""" % (uri)


sparql.setQuery(query)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
results

In [None]:
sparql = SPARQLWrapper("http://localhost:3030/chapbooks_scotland/sparql")
uri="<https://w3id.org/nls/i/Volume/9937038023804340_104184129>"
query="""
    PREFIX nls: <https://w3id.org/nls#>
       PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
       SELECT (count (DISTINCT ?a) as ?count)
       WHERE {
            %s nls:hasPart ?b .
            ?b a nls:Page .
            ?b nls:altoXML ?a.
      }


""" % (uri)

sparql.setQuery(query)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
results = sparql.query().convert()
results["results"]["bindings"][0]["count"]["value"]

Query 25: Giving the volume with the uri: https://w3id.org/nls/i/Volume/9937038023804340_104184129, this query describe that resource.

In [None]:
sparql = SPARQLWrapper("http://localhost:3030/chapbooks_scotland/sparql")
sparql.setQuery("""
    DESCRIBE <https://w3id.org/nls/i/Volume/9937038023804340_104184129>
""")

sparql.setReturnFormat(JSON)
results = sparql.query().convert()
results

Query 27: This query obtains the first 3 pages with the following properties:  uri, year, title, serie number, volume number, volume number, number of Pages, metsXML, number of page, text, number of Words, number of Pages

In [None]:
sparql = SPARQLWrapper("http://localhost:3030/chapbooks_scotland/sparql")
sparql.setQuery("""
PREFIX nls: <https://w3id.org/nls#>
SELECT ?uri ?year ?title ?snum ?vnum ?v ?part ?metsXML ?page ?text ?numberOfWords ?numberOfPages
        WHERE {
        ?uri a nls:Page .
        ?uri nls:text ?text .
        ?uri nls:number ?page .
        ?uri nls:numberOfWords ?numberOfWords .
        ?v nls:hasPart ?uri.
        ?v nls:number ?vnum.
        ?v nls:numberOfPages ?numberOfPages .
        ?v nls:metsXML ?metsXML.
        ?s nls:hasPart ?v.
        ?s nls:publicationYear ?year.
        ?s nls:number ?snum.
        ?s nls:title ?title.
        OPTIONAL {?v nls:part ?part; }

        }
        LIMIT 3

""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
r=results["results"]["bindings"]
for i in r:
    print(i["uri"]["value"], i["year"]["value"], i["title"]["value"], i["snum"]["value"], \
          i["v"]["value"], i["vnum"]["value"],i["numberOfPages"]["value"],\
          i["metsXML"]["value"], i["page"]["value"],i["text"]["value"], i["numberOfWords"]["value"]

          )
    print("---")
