# SPARQL Update, RDF Datasets

In this notebook we explore how to use rdflib for SPARQL Update and for working with RDF datasets.

Author: Bernd Neumayr, JKU

## Preparations

In [170]:
# Install required packages
!pip install -q rdflib     # comment to avoid re-install with every re-run

### Imports and Functions 

We are re-using the sparql_select function. 

In [171]:
# Imports
import pandas as pd
import rdflib
from rdflib import Graph, Literal, RDF, URIRef, BNode, Namespace


# Convenient Functions
def sparql_select(graph,query,use_prefixes=True):
  results = graph.query(query)          # execute the query against the graph, resulting in a rdflib.plugins.sparql.processor.SPARQLResult
  rows = []                             # a list of dictionaries, as intermediate format to construct the pandas DataFrame
  for result in results:                # iterate over the result set of the query, a result is an instance of rdflib.query.ResultRow
    row = {}                            #     create a dictionary to hold a single row of the result
    for var in results.vars:            #     iterate over the variables of the SPARQLResult to add a dictionary entry for each variable
      if (isinstance(result[var],URIRef) and use_prefixes):
        row[var] = result[var].n3(graph.namespace_manager)   # use namespace prefixes to shorten URIs
      else:
        row[var] = result[var]                  
    rows.append(row)                    #     add the dictionary (row) to the list 
  return pd.DataFrame(rows,columns=results.vars)        
                                        # return a pandas DataFrame constructed from the list of dictionaries, with the variables from the result set as columns      


## SPARQL Update on single RDF Graphs

### Insert Data

Create an RDF graph, and then use the SPARQL Update language to add data to it. Display the updated RDF graph in the Turtle syntax.

By employing the INSERT DATA command in SPARQL Update, we can directly specify the triples to be added to the graph.

In [172]:
g = rdflib.Graph().parse(format="turtle",data="""
  @prefix : <http://example.org/> .""")

g.update("""
INSERT DATA { 
  :jane  a :Person; 
    :gender "female"@en; :age 22;
    :friend :mary, :bob, :bill;
    :loves :bill.
  :mary  a :Person; 
    :gender "female"; :age 22;
    :friend :bob;
    :loves :bill.
  :bob  a :Person;  
    :age 26; 
    :loves :jane.
};

INSERT DATA { 
  :mary :age 24.
  :bob a :Person; 
    :age 28.
  :bill  a :Person; 
    :gender "male";
    :friend :mary, :jane. 
}
""")

print(g.serialize(format="turtle"))

@prefix : <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:bob a :Person ;
    :age 26,
        28 ;
    :loves :jane .

:jane a :Person ;
    :age 22 ;
    :friend :bill,
        :bob,
        :mary ;
    :gender "female"@en ;
    :loves :bill .

:mary a :Person ;
    :age 22,
        24 ;
    :friend :bob ;
    :gender "female" ;
    :loves :bill .

:bill a :Person ;
    :friend :jane,
        :mary ;
    :gender "male" .




### Delete Data

Using the DELETE DATA command, we indicate the triples that need to be removed from the graph. If a specified triple for deletion is not present in the graph, it will simply be disregarded.

In [173]:
g.update("""
DELETE DATA 
  { :mary :age 24.
    :bob :age 28.
    :bob :age 43.
  }""")

print(g.serialize(format="turtle"))

@prefix : <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:bob a :Person ;
    :age 26 ;
    :loves :jane .

:jane a :Person ;
    :age 22 ;
    :friend :bill,
        :bob,
        :mary ;
    :gender "female"@en ;
    :loves :bill .

:mary a :Person ;
    :age 22 ;
    :friend :bob ;
    :gender "female" ;
    :loves :bill .

:bill a :Person ;
    :friend :jane,
        :mary ;
    :gender "male" .




### DELETE/INSERT

With DELETE WHERE and INSERT WHERE, the triples to be removed or added, respectively, are determined dynamically by executing the WHERE clause on the RDF graph.

The DELETE WHERE and INSERT WHERE commands can be combined within a single DELETE INSERT WHERE statement.

For instance, consider a scenario where we want to increase the age of every individual in the graph.

In [174]:
g.update("""
DELETE {?p :age ?age_old}
INSERT {?p :age ?age_new}
WHERE 
  { ?p a :Person. 
    ?p :age ?age_old.
    BIND(?age_old + 1 AS ?age_new)	
  }""")

print(g.serialize(format="turtle"))

@prefix : <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:bob a :Person ;
    :age 27 ;
    :loves :jane .

:jane a :Person ;
    :age 23 ;
    :friend :bill,
        :bob,
        :mary ;
    :gender "female"@en ;
    :loves :bill .

:mary a :Person ;
    :age 23 ;
    :friend :bob ;
    :gender "female" ;
    :loves :bill .

:bill a :Person ;
    :friend :jane,
        :mary ;
    :gender "male" .




### INSERT with Subquery in WHERE clause

To obtain aggregated information and incorporate it into the graph, we must use a subquery within the WHERE clause. For example, let's say we want to calculate and store the number of friends each person has in the graph.

In [175]:
g.update("""
INSERT {?p :nrOfFriends ?nr}
WHERE 
  { SELECT ?p (COUNT(?f) AS ?nr)
    WHERE 
      { ?p a :Person.
        ?p :friend ?f.
      }
    GROUP BY ?p  	
  }""")

print(g.serialize(format="turtle"))

@prefix : <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:bob a :Person ;
    :age 27 ;
    :loves :jane .

:jane a :Person ;
    :age 23 ;
    :friend :bill,
        :bob,
        :mary ;
    :gender "female"@en ;
    :loves :bill ;
    :nrOfFriends 3 .

:mary a :Person ;
    :age 23 ;
    :friend :bob ;
    :gender "female" ;
    :loves :bill ;
    :nrOfFriends 1 .

:bill a :Person ;
    :friend :jane,
        :mary ;
    :gender "male" ;
    :nrOfFriends 2 .




### Update and Re-Calculation

In order to maintain the accuracy of the materialized derived information after changes to the underlying triples, we need to remove the outdated materialized derived statements and add the newly computed ones.

In this example, we remove a friendship triple and then update the materialized derived information. We achieve this by first deleting the statements with the :nrOfFriends property, and then inserting the updated :nrOfFriends statements based on the new data.

In [176]:
g.update("""
DELETE DATA 
  { :jane :friend :bill };
 

DELETE WHERE {?p :nrOfFriends ?nr};

INSERT {?p :nrOfFriends ?nr}
WHERE 
  { SELECT ?p (COUNT(?f) AS ?nr)
    WHERE 
      { ?p a :Person.
        ?p :friend ?f.
      }
    GROUP BY ?p  	
  }
""")

print(g.serialize(format="turtle"))

@prefix : <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:bill a :Person ;
    :friend :jane,
        :mary ;
    :gender "male" ;
    :nrOfFriends 2 .

:bob a :Person ;
    :age 27 ;
    :loves :jane .

:jane a :Person ;
    :age 23 ;
    :friend :bob,
        :mary ;
    :gender "female"@en ;
    :loves :bill ;
    :nrOfFriends 2 .

:mary a :Person ;
    :age 23 ;
    :friend :bob ;
    :gender "female" ;
    :loves :bill ;
    :nrOfFriends 1 .




## Querying RDF Datasets

An RDF dataset is a default graph and a set of named graphs. 

### Workaround: Default Graph 

Rdflib's support for datasets, especially regarding the default graph, seems to not fully adhere to the standard. Parsing the default graph separately is necessary, and the Trig output represents the default graph with curly brackets, deviating from the standard. Additionally, it appears that updating the default graph using SPARQL Update requests is not possible.

To address these limitations, we use a dedicated named graph (called :main, for example) as a workaround, replacing the default graph. The examples presented here have been adjusted from the lecture slides to accommodate this workaround. For your homework assignment, you'll need to make similar modifications, as solutions relying on SemAI.jar (which is built on the fully standard-compliant Apache Jena SPARQL engine) will not function as intended.

Some may argue that using this workaround is simpler and more transparent, regardless of rdflib's shortcomings, and is preferable to the examples in the slides and the SemAI.jar solutions that heavily utilize the default graph.

In summary, we use a named graph (in this case, `:main` within the example namespace) as an alternative to the default graph.

### Insert Data

In this example, we use a `:main` named graph as a substitute for the default graph, along with three additional named graphs. The `:main` graph stores metadata about the other graphs, assigning an owner (a person) to each of the three other graphs. Separating metadata from the content is just one approach to handling metadata. Alternatively, we could incorporate the metadata directly into the person-owned graphs.

In [177]:
ds = rdflib.Dataset()

ds.parse(format="trig", data="""
@prefix : <http://example.org/> .

:main {
  :Jane a :Person;
    :owns :JanesGraph.

  :Mary a :Person;
    :owns :MarysGraph.
    
  :Bill a :Person;
    :owns :BillsGraph.
}

:JanesGraph { 
  :Jane :likes :Mary.
  :Bill :likes :Jane, :Mary.
}

:MarysGraph {
  :Jane :likes :Mary.
  :Bill :likes :Jane.
  :Mary :likes :Jane.
}

:BillsGraph {
  :Jane :likes :Mary, :Bill.
  :Bill :likes :Mary, :Jane.
} 
""")

print(ds.serialize(format="trig"))

@prefix : <http://example.org/> .

:MarysGraph {
    :Bill :likes :Jane .

    :Mary :likes :Jane .

    :Jane :likes :Mary .
}

:BillsGraph {
    :Bill :likes :Jane,
            :Mary .

    :Jane :likes :Bill,
            :Mary .
}

:main {
    :Bill a :Person ;
        :owns :BillsGraph .

    :Mary a :Person ;
        :owns :MarysGraph .

    :Jane a :Person ;
        :owns :JanesGraph .
}

:JanesGraph {
    :Bill :likes :Jane,
            :Mary .

    :Jane :likes :Mary .
}




### Querying all Triples and Quadruples in a Dataset

In this example query, we aim to obtain all triples and quadruples from the dataset. We strive to stay as close as possible to the slides; hence, we treat the :main graph separately (to emulate the use of a default graph).



In [178]:
df = sparql_select(ds,"""
SELECT ?s ?p ?o ?g
WHERE {  
  { GRAPH :main { ?s ?p ?o } }
  UNION
  { GRAPH ?g {?s ?p ?o}
   } 
}
ORDER BY ?g ?s ?p ?o
""")
df

Unnamed: 0,s,p,o,g
0,:Bill,:owns,:BillsGraph,
1,:Bill,rdf:type,:Person,
2,:Jane,:owns,:JanesGraph,
3,:Jane,rdf:type,:Person,
4,:Mary,:owns,:MarysGraph,
5,:Mary,rdf:type,:Person,
6,:Bill,:likes,:Jane,:BillsGraph
7,:Bill,:likes,:Mary,:BillsGraph
8,:Jane,:likes,:Bill,:BillsGraph
9,:Jane,:likes,:Mary,:BillsGraph


#### Simplification: Query all quadruples in the dataset

Since we don't have a true default graph, there's no need to differentiate between triples and quadruples. Every statement is associated with a named graph, allowing us to simplify the query for all statements in the dataset as follows.


In [179]:
df = sparql_select(ds,"""
SELECT ?s ?p ?o ?g
WHERE {  
 GRAPH ?g {?s ?p ?o} 
}
ORDER BY ?g ?s ?p ?o
""")
df

Unnamed: 0,s,p,o,g
0,:Bill,:likes,:Jane,:BillsGraph
1,:Bill,:likes,:Mary,:BillsGraph
2,:Jane,:likes,:Bill,:BillsGraph
3,:Jane,:likes,:Mary,:BillsGraph
4,:Bill,:likes,:Jane,:JanesGraph
5,:Bill,:likes,:Mary,:JanesGraph
6,:Jane,:likes,:Mary,:JanesGraph
7,:Bill,:likes,:Jane,:MarysGraph
8,:Jane,:likes,:Mary,:MarysGraph
9,:Mary,:likes,:Jane,:MarysGraph


### Querying a specific Named Graph

In [180]:
df = sparql_select(ds,"""
SELECT *
WHERE 
  {  GRAPH :JanesGraph 
         {:Bill :likes ?o}      
  }
""")
df

Unnamed: 0,o
0,:Jane
1,:Mary


### Intersection/Join of Named Graphs

In [181]:
df = sparql_select(ds,"""
SELECT DISTINCT ?s ?p ?o
WHERE 
  {  GRAPH :JanesGraph 
         {?s ?p ?o}
     GRAPH :MarysGraph 
         {?s ?p ?o}
  }
""")
df

Unnamed: 0,s,p,o
0,:Bill,:likes,:Jane
1,:Jane,:likes,:Mary


### Union of Named Graphs

In [182]:
df = sparql_select(ds,"""
SELECT DISTINCT ?s ?p ?o
WHERE 
  {  { GRAPH :JanesGraph 
         {?s ?p ?o}
     }
     UNION 
     { GRAPH :MarysGraph 
         {?s ?p ?o}
     }
  } 
""")
df

Unnamed: 0,s,p,o
0,:Bill,:likes,:Jane
1,:Bill,:likes,:Mary
2,:Jane,:likes,:Mary
3,:Mary,:likes,:Jane


### Querying graphs with a description fulfilling a given condition

Please note that in this query, we follow the previously mentioned workaround and utilize the `:main` named graph as a substitute for the default graph.

In [183]:
df = sparql_select(ds,"""
SELECT *
WHERE { 
    GRAPH :main { 
      :Jane :owns ?g.
    }
    GRAPH ?g 
         {:Bill :likes ?o}
      
  }
""")
df

Unnamed: 0,g,o
0,:JanesGraph,:Jane
1,:JanesGraph,:Mary


#### Union of all person-owned graphs

In [184]:
df = sparql_select(ds,"""
SELECT DISTINCT ?s ?p ?o
WHERE {  
  GRAPH :main {
    [] a :Person; :owns ?g.
  }
  GRAPH ?g {
    ?s ?p ?o
  }
}
""")
df

Unnamed: 0,s,p,o
0,:Bill,:likes,:Jane
1,:Bill,:likes,:Mary
2,:Jane,:likes,:Mary
3,:Mary,:likes,:Jane
4,:Jane,:likes,:Bill


### Correlating inner and outer queries

How do the owners of graphs describe themselves?

Again we employ the `:main` named graph as substitute for the default graph. 



In [185]:
df = sparql_select(ds,"""
SELECT ?s ?p ?o
WHERE {  
  GRAPH :main {
    ?s a :Person; 
       :owns ?g.
  }
  GRAPH ?g {
    ?s ?p ?o
  }
}
""")
df

Unnamed: 0,s,p,o
0,:Jane,:likes,:Mary
1,:Mary,:likes,:Jane
2,:Bill,:likes,:Jane
3,:Bill,:likes,:Mary


## Update the Dataset

...

### Insert Data into a Named Graph

In [186]:
ds.update("""
INSERT DATA { 
  GRAPH :MarysGraph {
    :Mary :likes :Bill.
  }
}
""")

print(ds.serialize(format="trig"))

@prefix : <http://example.org/> .

:MarysGraph {
    :Bill :likes :Jane .

    :Mary :likes :Bill,
            :Jane .

    :Jane :likes :Mary .
}

:BillsGraph {
    :Bill :likes :Jane,
            :Mary .

    :Jane :likes :Bill,
            :Mary .
}

:main {
    :Bill a :Person ;
        :owns :BillsGraph .

    :Mary a :Person ;
        :owns :MarysGraph .

    :Jane a :Person ;
        :owns :JanesGraph .
}

:JanesGraph {
    :Bill :likes :Jane,
            :Mary .

    :Jane :likes :Mary .
}




### Insert Data into a **new** Named Graph

In [187]:
ds.update("""
INSERT DATA { 
  GRAPH :NewGraph {
    :Mary :likes :Bill.
  }
}
""")

print(ds.serialize(format="trig"))

@prefix : <http://example.org/> .

:MarysGraph {
    :Bill :likes :Jane .

    :Mary :likes :Bill,
            :Jane .

    :Jane :likes :Mary .
}

:BillsGraph {
    :Bill :likes :Jane,
            :Mary .

    :Jane :likes :Bill,
            :Mary .
}

:main {
    :Bill a :Person ;
        :owns :BillsGraph .

    :Mary a :Person ;
        :owns :MarysGraph .

    :Jane a :Person ;
        :owns :JanesGraph .
}

:JanesGraph {
    :Bill :likes :Jane,
            :Mary .

    :Jane :likes :Mary .
}

:NewGraph {
    :Mary :likes :Bill .
}




### Cut and Paste

Delete some triples from different graphs and insert them into another graph. 

In [188]:
ds.update("""
DELETE WHERE { GRAPH :NewGraph { ?s ?p ?o. } }
""")

ds.update("""
DELETE {  
  GRAPH ?g 
    { :Bill :likes ?o. } } 
INSERT { 
  GRAPH :BGraph 
    { :Bill :likes ?o. } }
WHERE {
  GRAPH ?g 
    { :Bill :likes ?o. } }
""")

print(ds.serialize(format="trig"))

@prefix : <http://example.org/> .

:MarysGraph {
    :Jane :likes :Mary .

    :Mary :likes :Bill,
            :Jane .
}

:BillsGraph {
    :Jane :likes :Bill,
            :Mary .
}

:main {
    :Bill a :Person ;
        :owns :BillsGraph .

    :Jane a :Person ;
        :owns :JanesGraph .

    :Mary a :Person ;
        :owns :MarysGraph .
}

:JanesGraph {
    :Jane :likes :Mary .
}

:BGraph {
    :Bill :likes :Jane,
            :Mary .
}


