Course Instructor: Bernd Neumayr, JKU

# UE04- SPARQL Updata and RDF Datasets

Complete the **8 tasks (1 point per task)** in the `3. SPARQL Update` sheet of `SemAI.jar` first and then transfer them to this notebook.

For each task include:
- A headline including the task number
- The task description 
- Your solution in executable form: your solutions for SemAI.jar will make use of the default grap. In this notebook you have to transform your solutions according to the workaround exemplified in V04_SPARQL_Update.ipynb
- After executing the update request, print a serizalization of the dataset in TriG format.  

**Task 9 (2 points)**  is to develop a nice visualization of RDF datasets using `visualize_graph_pyvis` from UE02 as a starting point. The requirements are as follows:
- Each named graph must be represented as an independent graph. This means, for example, that :Jane in :JanesGraph is a different node than :Jane in :BillsGraph. There are no edges between nodes in different graphs.
- It is not strictly necessary to draw a box around each named graph, as seen in the slides. The different named graphs should simply be visually distinguishable and not overlap.
- If not all nodes within a named graph are connected, make sure in the visualization that the named graph still forms a coherent visual unit in some way.

## Preparations

In [None]:
# Install required packages
#!pip install -q rdflib     # comment to avoid re-install with every re-run
#!pip install networkx pyvis matplotlib


### Imports and Functions 

We are re-using the sparql_select function. 

In [None]:
# Imports
import pandas as pd
import rdflib
from rdflib import Graph, Literal, RDF, URIRef, BNode, Namespace
import networkx as nx
from pyvis.network import Network
import os
from IPython.display import display, HTML, Image
import matplotlib.pyplot as plt


# Convenient Functions
def sparql_select(graph,query,use_prefixes=True):
  results = graph.query(query)          # execute the query against the graph, resulting in a rdflib.plugins.sparql.processor.SPARQLResult
  rows = []                             # a list of dictionaries, as intermediate format to construct the pandas DataFrame
  for result in results:                # iterate over the result set of the query, a result is an instance of rdflib.query.ResultRow
    row = {}                            #     create a dictionary to hold a single row of the result
    for var in results.vars:            #     iterate over the variables of the SPARQLResult to add a dictionary entry for each variable
      if (isinstance(result[var],URIRef) and use_prefixes):
        row[var] = result[var].n3(graph.namespace_manager)   # use namespace prefixes to shorten URIs
      else:
        row[var] = result[var]                  
    rows.append(row)                    #     add the dictionary (row) to the list 
  return pd.DataFrame(rows,columns=results.vars)        
                                        # return a pandas DataFrame constructed from the list of dictionaries, with the variables from the result set as columns      


# Create empty graph

In [None]:
g = rdflib.Dataset()

g.parse(format="turtle",data="""
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://example.org/> .
@prefix xrdf: <urn:x-rdflib:> .
""")

#Task1
Sie beginnen mit einem leeren Dataset. Fügen Sie in den Default Graph Statements ein, die sagen, dass :Peter der Autor von :G1 ist, und :Mary Autor von :G2. 

In [None]:
update_str = """
insert data { GRAPH xrdf:default {
    :G1 :author :Mary .
    :G2 :author :Peter .
  }
}
"""
g.update(update_str)

print(g.serialize(format="trig"))

#Task2
Schreiben Sie { :Mary :knows :Peter, :John, :Mary. } in den Named Graph :G1 und { :Peter :knows :Mary. :John :knows :Mary. } in den Named Graph :G2.



In [None]:
update_str = """
insert data {
  GRAPH :G1 {
    :Mary :knows :Mary, :John, :Peter.
  }
  GRAPH :G2 {
    :John :knows :Mary .
    :Peter :knows :Mary .
  }
}
"""
g.update(update_str)

print(g.serialize(format="trig"))

#Task3
Fragen Sie mittels INSERT-WHERE die :knows-Beziehungen aus :G2 ab und fügen deren inverse :knownBy-Beziehungen in den Default-Graph ein.
Ihr Update Request darf nicht enthalten: [Mary, Peter, John]

In [None]:
update_str = """
insert { GRAPH xrdf:default { ?o :knownBy ?s } }
where {  GRAPH :G2 {?s :knows ?o}

}
"""
g.update(update_str)

print(g.serialize(format="trig"))

#Task4
Löschen Sie mittels DELETE-WHERE alle :knownBy-Beziehungen aus dem Default-Graph.
Ihr Update Request darf nicht enthalten: [Mary, Peter, John]

In [None]:
update_str = """
delete { GRAPH xrdf:default {?s :knownBy ?o } }
where { GRAPH xrdf:default {?s ?p ?o} }

"""
g.update(update_str)

print(g.serialize(format="trig"))

#Task5
Ermitteln Sie mittels INSERT-WHERE zu jedem Named-Graph dessen Anzahl an Statements mit der Property :knows und schreiben Sie diese in den Default Graph.
Ihr Update Request darf nicht enthalten: [G1, G2]

In [None]:
update_str = """
insert { GRAPH xrdf:default { ?g :knowsCount ?gnr } }
#select *
where {
 select ?g (count(?g) as ?gnr) {
  GRAPH ?g {
    ?s1 :knows ?o1.
  }
  
 } group by ?g
}
"""
g.update(update_str)

print(g.serialize(format="trig"))

#Task6
Ermitteln Sie mittels INSERT-WHERE die Anzahl an Named Graphs und schreiben Sie sie in den Default Graph.
Ihr Update Request darf nicht enthalten: [2]

In [None]:
update_str = """

insert { GRAPH xrdf:default { :ds :graphCount ?gsum } }
where {
 select ?g (sum(?gnr) as ?gsum) 
  where {
   select ?g (count( distinct ?g) as ?gnr) where {
     GRAPH ?g { ?s ?p ?o }
     } 
  } 
}
"""

g.update(update_str)

print(g.serialize(format="trig"))


#Task7
Verschieben Sie mittels DELETE-INSERT-WHERE alle Metadaten zu Named Graphs (also Statements die einen Named Graph als Subjekt haben) in den entsprechenden Named Graph.
Ihr Update Request darf nicht enthalten: [G1, G2]

In [None]:
update_str = """
DELETE { GRAPH xrdf:default { ?g ?p ?o } }
Insert { GRAPH ?g {
  ?g ?p ?o }
}
where {
  GRAPH ?g { ?x ?y ?z }
  GRAPH xrdf:default { ?g ?p ?o }
}
"""
g.update(update_str)

print(g.serialize(format="trig"))

#Task8
Schreiben Sie in jeden Named Graph ein Statement, dass der Autor des jeweiligen Named Graphs die :Susi kennt und aktualisieren Sie mit dem selben UpdateRequest den knowsCount.
Ihr Update Request darf nicht enthalten: [Peter, Mary]

In [None]:
update_str = """
DELETE { GRAPH ?g { ?g :knowsCount ?c }}
INSERT { GRAPH ?g {?a :knows :Susi. }}
where { 
  Graph ?g { ?g :author ?a.
     ?g :knowsCount ?c .
     }
};


INSERT { GRAPH ?g { ?g :knowsCount ?kc } }
WHERE {
  SELECT ?g (count(:knows) as ?kc) 
  WHERE { GRAPH ?g { ?p :knows ?k } 

  }   group by ?g
 }
;
"""
g.update(update_str)

print(g.serialize(format="trig"))

# Task 9
develop a nice visualization of RDF datasets using visualize_graph_pyvis from UE02 as a starting point. The requirements are as follows:

* [x]  Each named graph must be represented as an independent graph. This means, for example, that :Jane in :JanesGraph is a different node than :Jane in :BillsGraph. There are no edges between nodes in different graphs.
* [x]  It is not strictly necessary to draw a box around each named graph, as seen in the slides. The different named graphs should simply be visually distinguishable and not overlap.
* [x]  If not all nodes within a named graph are connected, make sure in the visualization that the named graph still forms a coherent visual unit in some way.


In [None]:
from traitlets.traitlets import Instance
def visualize_graph_pyvis(g, base=None):
    # Create the NetworkX graph
    nx_graph = nx.MultiDiGraph()
    legend_nodes = []
    legend_nodes_count = 0
    step = 50
    x = -400
    y = -400
    for ng in g.graphs():
      #nx_graph = nx.DiGraph()
      legend_nodes_count += 1
      ng_title = ng.n3()
      if (isinstance(ng, BNode) == False):
        nx_graph.add_node(ng_title, group= ng_title, label= str(ng_title), size= 30, 
            physics= False,
            x= x, 
            y= f'{y + legend_nodes_count*step}px',
            shape= 'box', 
            widthConstraint= 200
        )

      for s, p, o in ng:

        s_shape = "dot"
        if (isinstance(s, Literal)):
            s_shape = "square"
          
        if (isinstance(s, BNode)):
            s = ""

        if (isinstance(s, URIRef)):
            s = s.n3(g.namespace_manager)


        o_shape = "dot"
        if (isinstance(o, Literal)):
            o_shape = "square"
          
        if (isinstance(o, BNode)):
            o = ""
            
        if (isinstance(o, URIRef)):
            o = o.n3(g.namespace_manager)


        p = p.n3(g.namespace_manager)


        nx_graph.add_node(ng_title+s, title=s, label=s, shape=s_shape, group=ng_title)
        nx_graph.add_node(ng_title+o, title=o, label=o, shape=o_shape, group=ng_title)
        nx_graph.add_edge(ng_title+s, ng_title+o, label=p, group=ng_title)
        


    # Create a PyVis network graph
    pyvis_graph = Network(notebook=True, cdn_resources='in_line',bgcolor="#EEEEEE", directed=True )
    ###pyvis_graph.barnes_hut()
    ###pyvis_graph.show_buttons(filter_=['physics'])

    pyvis_graph.from_nx(nx_graph)


    # Customize the node appearance
    for node in pyvis_graph.nodes:
        # node["shape"] = "dot"
        node["size"] = 10
        node["font"] = {"size": 10}

    # Customize the edge appearance
    for edge in pyvis_graph.edges:
        edge["width"] = 1
        edge["font"] = {"size": 8, "align": "middle"}
        edge["arrows"] = "to"

    # Define the HTML file name
    html_file = 'graph.html'    
    
    # Show the graph in the notebook
    pyvis_graph.prep_notebook()
    pyvis_graph.show(html_file)

    # Check if the file exists
    if os.path.isfile(html_file):
        # Read the content of the HTML file
        with open(html_file, 'r') as file:
            html_content = file.read()
        # Display the HTML content in the notebook
        display(HTML(html_content))
    else:
        print(f"File not found: {html_file}")

# g=Graph()
# g.parse(data=rdf_str, format="text/turtle")
visualize_graph_pyvis(g)