# UE02 - RDF and RDF Schema

Before you start with this notebook, complete the eight tasks in the `1. RDF` sheet of `SemAI.jar`. You will then make use of your solutions in this notebook. 

## Task 0: Preparation

Preparation (Installs and Imports). 




In [None]:
# Install required packages in the current Jupyter kernel
!pip install -q rdflib 
!pip install -q pydot
!pip install -q owlrl

!pip install networkx pyvis

import rdflib
from rdflib import Graph, Literal, RDF, URIRef, BNode, Namespace, Dataset
import networkx as nx
from pyvis.network import Network
import requests
from IPython.display import display, HTML, Image
import os
import pydot
import owlrl
from rdflib.namespace import FOAF , XSD , RDFS 



In [None]:
# A function to produce a graphical visualization of an RDF graph
def visualize_graph(g,base=None):

  def node_id(graph,term):
    if isinstance(term,Literal): return term.n3(graph.namespace_manager)
    else: return f"\"{term.n3(graph.namespace_manager)}\""

  def add_node(dg,g,t,base):
    if isinstance(t,URIRef):
      lbl = f"\"{t.n3(g.namespace_manager)}\""
      if(base): lbl = lbl.replace(base,"")
      if(len(lbl)>25): lbl = lbl[:12] + "..." +  lbl[-12:] 
      dg.add_node(pydot.Node( node_id(g,t), label=lbl ))
    if isinstance(t,Literal):
      dg.add_node(pydot.Node( node_id(g,t), label=t.n3(g.namespace_manager), shape="box"))
    if isinstance(t,BNode):
      dg.add_node(pydot.Node( node_id(g,t), label=""))    

  def add_edge(dg,g,s,p,o):
    dg.add_edge(pydot.Edge(node_id(g,s), node_id(g,o), label=f"\"{p.n3(g.namespace_manager)}\""))

  dg = pydot.Dot('my_graph', graph_type='digraph',layout='sfdp', splines='curved' )

  for subj in g.subjects(None,None): add_node(dg,g,subj,base)
  for obj in g.objects(None,None): add_node(dg,g,obj,base)
  for (s,p,o) in g: add_edge(dg,g,s,p,o)   

  display(Image(dg.create_png()))

In [None]:
rdf_str = """BASE   <http://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX schema: <http://schema.org/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX wd: <http://www.wikidata.org/entity/>
 
<bob#me>
   a foaf:Person ;
   foaf:knows [a foaf:Perso];
   foaf:knows <alice#me> ;
   schema:birthDate "1990-07-04"^^xsd:date ;
   foaf:topic_interest wd:Q12418 .
   
wd:Q12418
  dcterms:title "Mona Lisa" ;
  dcterms:creator <http://dbpedia.org/resource/Leonardo_da_Vinci> .

<http://data.europeana.eu/item/04802/243FA8618938F4117025F17A8B813C5F9AA4D619>
  dcterms:subject wd:Q12418 .
"""

g = Graph()
g.parse(format="turtle",data=rdf_str)
visualize_graph(g,base="http://example.org/")

## Task 1:  Improve interactive RDF graph visualization (1 pt)

Improve function `visualize_graph_pyvis` (from `V01_rdf.ipynb`) as follows:
- an optional `base` parameter
- abbreviate the labels of nodes and edges in the same way as in `visualize_graph`. 
- make sure that blank node IDs are not shown in the visualization. 

Optional features: 
- use different graphical forms for literals and URIs
- (add further improvements as you like)

Test the function with `rdf_str` and with your solution to task `0. Intro` in the `1. RDF` sheet in `SemAI.jar`.

In [None]:
from networkx.linalg.spectrum import modularity_spectrum

def visualize_graph_pyvis(g, base=None):

    # Create the NetworkX graph
    nx_graph = nx.DiGraph()

    for s, p, o in g:
          subject_ = s.n3(g.namespace_manager)
          object_ = o.n3(g.namespace_manager)
          predicate_ = p.n3(g.namespace_manager)

          # replace base
          if base:
            subject_ = subject_.replace(base,"")
            object_ = object_.replace(base,"")
            predicate_ = predicate_.replace(base,"")

          # abbreviations
          subject_ = subject_[:12] + '...'+  subject_[-12:] if len(subject_)> 25 else subject_
          object_ = object_[:12] + '...'+ object_[-12:] if len(object_)> 25 else object_
          predicate_ = predicate_[:12] + '...'+ predicate_[-12:] if len(predicate_)> 25 else predicate_

          # blank nodes are not shown
          #subject_ = '' if subject_[:2] == '_:' else subject_
          #object_ = '' if object_[:2] == '_:' else object_


          # Versions
          # nx_graph.add_edge(s, o, label=p) 
          #nx_graph.add_edge(s.n3(g.namespace_manager),  o.n3(g.namespace_manager), label=p.n3(g.namespace_manager))
          nx_graph.add_edge(subject_, object_, label=predicate_)


    # Create a PyVis network graph
    pyvis_graph = Network(notebook=True, cdn_resources='in_line',bgcolor="#EEEEEE")
    ###pyvis_graph.barnes_hut()
    ###pyvis_graph.show_buttons(filter_=['physics'])

    pyvis_graph.from_nx(nx_graph)

    # Customize the node appearance
    for node in pyvis_graph.nodes:
        node["shape"] = "dot"
        node["size"] = 10
        node["font"] = {"size": 10}

    # Customize the edge appearance
    for edge in pyvis_graph.edges:
        edge["width"] = 0.5
        edge["font"] = {"size": 8, "align": "middle"}
        edge["arrows"] = "to"

    # Define the HTML file name
    html_file = 'graph.html'    
    
    # Show the graph in the notebook
    pyvis_graph.show(html_file)

    # Check if the file exists
    if os.path.isfile(html_file):
        # Read the content of the HTML file
        with open(html_file, 'r') as file:
            html_content = file.read()
        # Display the HTML content in the notebook
        display(HTML(html_content))
    else:
        print(f"File not found: {html_file}")

In [None]:
visualize_graph_pyvis(g,base="http://data.europeana.eu/item/04802/")

## Task 2:  Print RDF graph as HTML table (1 pt)

Implement a function `rdf2htmltable(g)` that 
- takes as parameter an rdflib.Graph 
- generates and displays an HTML table representing that graph with
  - one line per RDF statement 
  - three columns (subject, predicate, object) 
  - URIs should be shown in abbreviated form and be represented as links (`href=<full URI>`)

Test the function with `rdf_str` and with your solution to task `0. Intro` in the `1. RDF` sheet in `SemAI.jar`.

In [None]:
def rdf2htmltable(g:Graph):
  table_rows = []
  for s, p ,o in  g:
    #print(s, p, o)
    #print(s.n3(g.namespace_manager), str(len(s.n3(g.namespace_manager))), p.n3(g.namespace_manager), o.n3(g.namespace_manager))

    s_html = '<a href="{0}">{1}</a>'.format(s.replace('<','').replace('>',''), s.n3(g.namespace_manager).replace('<','').replace('>','') if isinstance (s, URIRef) else s.replace('<','').replace('>','')) 
    p_html = '<a href="{0}">{1}</a>'.format(p.replace('<','').replace('>',''), p.n3(g.namespace_manager).replace('<','').replace('>','') if isinstance (p, URIRef) else p.replace('<','').replace('>',''))
    o_html = '<a href="{0}">{1}</a>'.format(o.replace('<','').replace('>',''), o.n3(g.namespace_manager).replace('<','').replace('>','') if isinstance (o, URIRef) else o.replace('<','').replace('>',''))

    row = '<tr><td>{0}</td><td>{1}</td><td>{2}</td></tr>'.format(s_html, p_html, o_html)
    table_rows.append(row)
  table_header =  '<tr><th>subject</th><th>predicate</th><th>object</th></tr>'
  table_html = '<table>{0}{1}</table>'.format(table_header, ''.join(table_rows))
  display(HTML(table_html))


In [None]:
g = Graph()
g.parse(format="trig",data=rdf_str)
rdf2htmltable(g)

## Task 3: A function for parsing and displaying an RDF graph (1 pt)

A function `parse_display_rdf(str)` that takes as parameter a string which represents an RDF graph in Turtle notation and 
- produces an rdflib.Graph from that string
- prints the graph in Turte notation
- prints the graph in RDF/XML
- visualizes it using `visualize_graph` (to be taken from `V01_rdf.ipynb`)
- visualizes it using (your improved version of) `visualize_graph_pyvis`
- outputs it using `rdf2htmltable` (only if you have implemented this function)

Test the function with `rdf_str` and with your solution to task `0. Intro` in the `1. RDF` sheet in `SemAI.jar`.

In [None]:
def parse_display_rdf(rdf_str,base=None):
  # produces an rdflib.Graph from that string
  g = Graph()
  g.parse(format="turtle",data=rdf_str)
  
  # prints the graph in Turte notation
  print(g.serialize(format="turtle" ))

  # prints the graph in RDF/XML
  print(g.serialize(format="xml"))
  
  # visualizes it using visualize_graph
  visualize_graph(g)

  # visualizes it using (your improved version of) visualize_graph_pyvis
  visualize_graph_pyvis(g)

  # outputs it using rdf2htmltable (only if you have implemented this function)
  rdf2htmltable(g)

### 0. Intro

Create an RDF graph in Turtle notation. Use the RDF and FOAF vocabularies where applicable and the example namesspace (prefix ex:) for the other resources.

* John is a Person.
* John knows Mary.

In [None]:
rdf_str = """@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex:   <http://www.ex.org/> .

ex:John rdf:type foaf:Person .
ex:John foaf:knows ex:Mary ."""

In [None]:
parse_display_rdf(rdf_str)

[1] John is a Person\
[1] John knows Mary

# Tasks 4-10 (1 point each)

For each of the remaining 7 tasks in the "1. RDF" sheet in `SemAI.jar` do the following: 
- add a text cell in this notebook 
  - with the description of the task from `SemAI.jar` 
  - with number and title (e.g., **1. Simple Data Graph**)  from `SemAI.jar` as header 
- add a code cell where you apply `parse_display_rdf(str)` on your solution

### 1. Simple Data Graph
Create an RDF graph in Turtle notation. Use the FOAF vocabulary to state the following.

* Mary and Jim are persons.
* Mary knows Jim.
* Mary is 27 years old.
The URIs for the two persons should be http://www.ex.org/person#Mary and http://www.ex.org/person#Jim. The age of Mary should be represented as an integer.

In [None]:
rdf_str = """@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix :     <http://www.ex.org/person#> .

:Mary rdf:type foaf:Person .
:Jim rdf:type foaf:Person .
:Mary foaf:knows :Jim .
:Mary foaf:age "27"^^xsd:integer .
"""

In [None]:
parse_display_rdf(rdf_str)

[1] Mary knows Jim\
[1] Mary is a Person\
[1] The graph contains exactly 4 statements.\
[1] Jim is a Person\
[1] Mary is 27 years old

### 2. Simple Schema

Create a vocabulary using RDFS in Turtle. Specify

* Classes Company, Employee, and Person
* Property worksFor between Employee and Company
* Property salary of Employee with Integer as data type
* Class Employee is a subclass of Person

Use XSD for data types. The URIs of classes and properties are in namespace <http://www.ex.org/vocabulary#>, for example, <http://www.ex.org/vocabulary#Company>

In [None]:
rdf_str = """@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> . 
@prefix :     <http://www.ex.org/vocabulary#> .


:Company a rdfs:Class .
:Employee a rdfs:Class.
:Person a rdfs:Class .
:Employee rdfs:subClassOf :Person .
:worksFor rdf:type rdf:Property;
  rdfs:domain :Employee;
  rdfs:range :Company.
:salary rdf:type rdf:Property;
  rdfs:domain :Employee;
  rdfs:range xsd:integer.
"""

In [None]:
parse_display_rdf(rdf_str)

[1] employee is a subclass of person\
[1] salary has employee as domain\
[1] The graph contains exactly 10 statements.\
[1] salary has integer as range\
[1] Company is a class.\
[1] worksFor has employee as domain\
[1] worksFor has company as range\
[1] Person is a class\
[1] salary is a property\
[1] Employee is a class\
[1] worksFor is a property

### 3. Reification

Create an RDF graph in Turtle notation. Use the RDF vocabulary where applicable and the example namesspace (ex:) for all other resources (ex:Mary, ex:John, ex:says, ex:loves). Hint: the lecture slides contain a similar reification example.

* Mary says that John loves her.


In [None]:
rdf_str = """@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ex:   <http://www.ex.org/> .

ex:aStmt rdf:predicate ex:loves .
ex:aStmt rdf:object ex:Mary .
ex:aStmt rdf:subject ex:John .
ex:aStmt rdf:type rdf:Statement .
ex:Mary ex:says ex:aStmt .
"""

In [None]:
parse_display_rdf(rdf_str)

[1] loves is the predicate of the reified statement\
[1] Mary says something which is classified as Statement\
[1] Mary is the object of the reified statement\
[1] John is the subject of the reified statement

### 4. Blank Node

Create an RDF graph in Turtle notation. Use the RDF and FOAF vocabularies where applicable and the example namesspace (ex:) for the other resources.

* John knows a person, who knows Mary.
* Use a blank node to represent that anonymous person.

In [None]:
rdf_str = """@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex:   <http://www.ex.org/> .

ex:John foaf:knows [foaf:knows ex:Mary; rdf:type foaf:Person] .
"""

In [None]:
parse_display_rdf(rdf_str)

[1] John knows a Person\
[1] John knows someone who is a Person\
[1] John knows a Person who knows Mary and that person is represented as a blank node.\
[1] John knows a Person who knows Mary.

### 5. Multiple Classification

Create an RDF graph in Turtle notation. Use the RDF vocabulary where applicable and the example namesspace (ex:) for the other resources.

* John is an instance of SoccerPlayer and of Student.

comment: wieso funktioniert statt a auch rdf:type im SemAI? ist das vergleichbar?

In [None]:
rdf_str = """@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ex:   <http://www.ex.org/> .

#ex:John rdf:type ex:SoccerPlayer .
ex:John a ex:SoccerPlayer .
ex:John a ex:Student.
"""

In [None]:
parse_display_rdf(rdf_str)

[1] John is an instance of SoccerPlayer\
[1] John is an instance of Student

### 6. Metamodeling
Create an RDF graph in Turtle notation. Use the RDF and RDF Schema vocabularies where applicable and the example namesspace (ex:) for the other resources.

* Dog and Cat are instances of Species and subclasses of Animal.
* Lassie is an instance of Dog.

In [None]:
rdf_str = """@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ex:   <http://www.ex.org/> .

ex:Dog a ex:Species; rdfs:subClassOf ex:Animal .
ex:Cat a ex:Species; rdfs:subClassOf ex:Animal .
ex:Lassie a ex:Dog .
"""

In [None]:
parse_display_rdf(rdf_str)

[1] Dog is an instance of Species\
[1] Dog is a subclass of Animal\
[1] Lassie is an instance of Dog\
[1] Cat is an instance of Species.\
[1] Cat is a subclass of Animal.

### 7. Properties

Create an RDF graph in Turtle notation. Use the RDF and RDF Schema vocabularies where applicable and the example namesspace (ex:) for the other resources, e.g., ex:childOf, ex:descendantOf.

* Everyone who is a child of someone, is also a descendant of that someone.

In [None]:
rdf_str = """@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ex:   <http://www.ex.org/> .

ex:childOf rdfs:subPropertyOf ex:descendantOf.
"""

In [None]:
parse_display_rdf(rdf_str)

[1] every child-of relationship is also a descendant-of relationship