# SPARQL: Die RDF Abfragesprache

In dieser Übung schauen wir uns die SPARQL Abfragesprache für RDF in der Praxis an. Wir verwenden dafür etwas RDF mit der Pink Floyd Diskographie und führen einige Abfragen aus. Zum Schluss schreiben Sie dann Ihr eigenes RDF und einige SPARQL Abfragen dazu.

In [None]:
!pip install rdflib

In [1]:
import pandas as pd
from io import BytesIO, StringIO
from rdflib import Graph
from rdflib.plugins.sparql.results.csvresults import CSVResultSerializer
from IPython.display import display

rdf = """
@prefix ex: <http://example.org#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

[] a ex:Album ;
   ex:title "The Dark Side of the Moon"^^xsd:string ;
   ex:label "Harvest, EMI"@en ;
   ex:released [ 
     ex:day "16"^^xsd:int ;
     ex:month "03"^^xsd:int ;
     ex:year "1973"^^xsd:int 
   ] .
   
[] a ex:Album ;
   ex:title "The Wall" ;
   ex:label "Harvest, EMI" ;
   ex:released [ 
     ex:day 30 ;
     ex:month "11"^^xsd:string ;
     ex:year "1979"^^xsd:int 
   ] .

[] a ex:Single ;
   ex:title "What God Wants, Part 1"^^xsd:string ;
   ex:author [
     ex:firstname "Roger" ;
     ex:lastname "Waters"
   ] ;
   ex:released [ 
     ex:year "1992"^^xsd:int 
   ] .
"""

g = Graph()

r = g.parse(data=rdf, format='turtle')

def query(q):
    serializer = CSVResultSerializer(g.query(q))
    output = BytesIO()
    serializer.serialize(output)
    display(pd.read_csv(StringIO(output.getvalue().decode())))

Führen Sie nun die folgenden Abfragen aus und beantworten Sie die Fragen.

In [2]:
query("""
SELECT ?a ?b WHERE { 
  ?a ex:title ?b
}
""")

Unnamed: 0,a,b
0,ub1bL14C1,The Wall
1,ub1bL23C1,"What God Wants, Part 1"
2,ub1bL5C1,The Dark Side of the Moon


In [3]:
query("""
SELECT ?work ?title WHERE { 
  ?work ex:title ?title
}
""")

Unnamed: 0,work,title
0,ub1bL14C1,The Wall
1,ub1bL23C1,"What God Wants, Part 1"
2,ub1bL5C1,The Dark Side of the Moon


In [5]:
query("""
SELECT ?title WHERE { 
  [] ex:title ?title
}
""")

# Was ist der Unterschied zum vorherigem Beispiel? Antwort: Dieses Beispiel verwended einen blank node in Subjektposition

Unnamed: 0,title
0,The Wall
1,"What God Wants, Part 1"
2,The Dark Side of the Moon


In [7]:
query("""
SELECT ?title WHERE { 
 ?work rdf:type ex:Album .
 ?work ex:title ?title
}
""")

# Warum nur zwei Resultate? Antwort: Weil explizit nach Albem gefragt wird, davon gibt es nur zwei

Unnamed: 0,title
0,The Dark Side of the Moon
1,The Wall


In [9]:
query("""
SELECT ?s ?p ?o WHERE { 
  ?s ?p ?o
}
""")

# Was erhält man hier? Antwort: Alle Tripel in den Daten

Unnamed: 0,s,p,o
0,ub1bL5C1,http://example.org#released,ub1bL8C16
1,ub1bL17C16,http://example.org#year,1979
2,ub1bL14C1,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://example.org#Album
3,ub1bL17C16,http://example.org#day,30
4,ub1bL23C1,http://example.org#title,"What God Wants, Part 1"
5,ub1bL5C1,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://example.org#Album
6,ub1bL25C14,http://example.org#lastname,Waters
7,ub1bL5C1,http://example.org#label,"Harvest, EMI"
8,ub1bL14C1,http://example.org#label,"Harvest, EMI"
9,ub1bL23C1,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://example.org#Single


In [10]:
query("""
SELECT ?title ?year WHERE { 
  [] rdf:type ex:Album ; 
     ex:title ?title ;
     ex:released [ ex:year ?year ]
  FILTER (?year > 1973)
}
""")

Unnamed: 0,title,year
0,The Wall,1979


In [11]:
query("""
SELECT ?title ?year WHERE { 
  {
    [] rdf:type ex:Album ; 
       ex:title ?title ;
       ex:released [ ex:year ?year ]
    FILTER (?year > 1973)
  }
  UNION
  {
    [] rdf:type ex:Single ; 
       ex:title ?title ;
       ex:released [ ex:year ?year ]
    FILTER (?year <= 2000 )
  }
}
""")

Unnamed: 0,title,year
0,The Wall,1979
1,"What God Wants, Part 1",1992


In [13]:
query("""
SELECT ?title ?label WHERE { 
    ?work ex:title ?title .
    OPTIONAL { ?work ex:label ?label }
}
""")

# Warum ist der label von "What God Wants, Part 1" NaN? Antwort: Weil das Single kein Label hat, da OPTIONAL aber trotzdem in der Resultatsmenge enthalten ist

Unnamed: 0,title,label
0,The Wall,"Harvest, EMI"
1,"What God Wants, Part 1",
2,The Dark Side of the Moon,"Harvest, EMI"


In [14]:
query("""
SELECT ?title WHERE { 
  [] rdf:type ex:Album ;
     ex:title ?title ;
     ex:label ?label
  FILTER (LANG(?label) = "en")
}
""")

Unnamed: 0,title
0,The Dark Side of the Moon


In [15]:
query("""
SELECT ?title WHERE { 
  [] rdf:type ex:Album ;
     ex:title ?title ;
     ex:released [ ex:day ?day ]
  FILTER (?day > 15)
}
""")

Unnamed: 0,title
0,The Dark Side of the Moon
1,The Wall


In [17]:
query("""
SELECT ?title WHERE { 
  [] rdf:type ex:Album ;
     ex:title ?title ;
     ex:released [ ex:month ?month ]
  FILTER (DATATYPE(?month) = xsd:string)
}
""")

# Warum erhält man hier nur "The Wall" als Resultat? Antwort: Weil das andere Album für Veröffentlichungsmonat ein anderer Datentyp aufweist

Unnamed: 0,title
0,The Wall


Schreiben Sie nun ihr eigenes RDF und werten Sie einige SPARQL Abfragen aus.

In [18]:
rdf = """
<http://example.org#Wako5UJ_7t> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org#Adress> .
<http://example.org#Wako5UJ_7t> <http://example.org#hasStreetName> "Tintentrift" .
<http://example.org#Wako5UJ_7t> <http://example.org#hasStreetNumber> "33" .
<http://example.org#Wako5UJ_7t> <http://example.org#hasZip> "30176" .
<http://example.org#Wako5UJ_7t> <http://example.org#hasCity> "Hannover" .
"""

g = Graph()

r = g.parse(data=rdf, format='nt')

In [22]:
query("""
PREFIX ex: <http://example.org#> 
SELECT *
WHERE {
  [] ex:hasStreetName ?name ;
     ex:hasStreetNumber ?number ;
     ex:hasCity "Hannover" 
}
""")

Unnamed: 0,number,name
0,33,Tintentrift
