# Introduction to SPARQL Query

In this notebook, we delve into querying RDF graphs using SPARQL with the help of the rdflib library in Python. Our aim is to facilitate a smooth and convenient introduction to working with RDF and SPARQL using Python.

In [16]:
# Install required packages.

!pip install -q rdflib

## Import Libraries and Define Convenient Functions

We utilize the `query` method of `rdflib.Graph` and employ `pandas.DataFrame` as the data structure for SPARQL Result Sets.
- We create a function called `sparql_select` that returns the result set of a SELECT query as a `pandas.DataFrame`, making it easier to visualize the results and display them in a tabular form.
- We develop a function named `sparql_construct` that returns the resulting graph of a CONSTRUCT or DESCRIBE query as a `rdflib.Graph object`, ensuring it retains the same namespace prefixes as the original graph.



In [17]:
import pandas as pd
from rdflib import Graph, Literal, RDF, URIRef, BNode, Namespace
from rdflib.namespace import FOAF , XSD , RDFS, NamespaceManager 

def sparql_select(graph,query,use_prefixes=True):
  results = graph.query(query)          # execute the query against the graph, resulting in a rdflib.plugins.sparql.processor.SPARQLResult
  rows = [ { var : res[var].n3(graph.namespace_manager) if (isinstance(res[var],URIRef) and use_prefixes) else res[var] for var in results.vars } for res in results ]     
                                        # construct a list of dictionaries, as intermediate format to construct the pandas DataFrame, use prefixes to abbreviate URIs                
  return pd.DataFrame(rows,columns=results.vars)        
                                        # return a pandas DataFrame constructed from the list of dictionaries, with the variables from the result set as columns      

def sparql_construct(graph, query):
  result_graph = Graph(namespace_manager = g.namespace_manager)  # create a Graph object that reuses the namespace prefixes of the original graph
  result_graph += graph.query(query)                             # execute the construct query against the original graph and add the resulting graph to the new one
  return result_graph

def sparql_ask(graph, query):
  return bool(graph.query(query))      # an ASK query has a boolean result, which should be returned as such


## Create a Graph
First, let's create an rdflib.Graph object by parsing the example RDF graph, which is used in the lecture slides and can be queried in the SPARQL playground of SemAI.jar. This example graph is available in the SemAI GitHub repository at https://github.com/jku-win-dke/SemAI/blob/main/data/social.ttl.

In [18]:
# Create a Graph object by parsing social.ttl the GitHub repo  (the example from the slides)
g= Graph().parse("https://raw.githubusercontent.com/jku-win-dke/SemAI/main/data/social.ttl",format="turtle")
g.bind("",Namespace("http://example.org"))
print(g.serialize(format='turtle'))

@prefix : <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:bob a :Person ;
    :age 26 ;
    :loves :jane .

:jane a :Person ;
    :age 22 ;
    :friend :bill,
        :bob,
        :mary ;
    :gender "female"@en ;
    :loves :bill .

:mary a :Person ;
    :age 22 ;
    :friend :bob ;
    :gender "female" ;
    :loves :bill .

:bill a :Person ;
    :friend :jane,
        :mary ;
    :gender "male" .




## SPARQL SELECT queries
With rdflib, it's not necessary to define namespace prefixes in SPARQL queries explicitly. Instead, we can conveniently utilize the namespace prefixes already bound to the graph we're querying.

In [19]:
q = """
SELECT ?x ?y
WHERE { ?x :loves ?y . }
"""

for r in g.query(q):
    print(r["x"], r["y"])

http://example.org/jane http://example.org/bill
http://example.org/mary http://example.org/bill
http://example.org/bob http://example.org/jane


## SPARQL SELECT queries to pandas.DataFrame

A SPARQL SELECT query yields a result set, which is similar to the outcome of an SQL query. For easy visualization and printing of this result set, we convert it (using our function `sparql_select`) into a pandas DataFrame. When displayed, a DataFrame is automatically rendered as an HTML table.


In [20]:
df = sparql_select(g,"""
SELECT  ?p (avg(?age) AS ?avgAge)
WHERE
  { ?p rdf:type :Person .
    ?p :friend ?f .
    ?f :age ?age
  }
GROUP BY ?p
HAVING ( avg(?age) > 23 )
""",use_prefixes=False)
df

Unnamed: 0,p,avgAge
0,http://example.org/jane,24
1,http://example.org/mary,26


## SPARQL CONSTRUCT queries

A SPARQL CONSTRUCT query applied to an RDF graph produces another RDF graph as its output. By using our `sparql_construct` function, we ensure that the resulting graph retains the namespace prefixes from the original graph.

In [21]:
g2 = sparql_construct(g,"""
  CONSTRUCT { ?y :lovedBy ?x . }
  WHERE { ?x :loves ?y . }
""")

print(g2.serialize(format='turtle'))

@prefix : <http://example.org/> .

:bill :lovedBy :jane,
        :mary .

:jane :lovedBy :bob .




## SPARQL ASK queries

In [22]:
print(sparql_ask(g,"ASK { :jane :loves :bill } "))

if(sparql_ask(g,"ASK { :jane :loves :mary }")):
  print("Jane loves Mary.")
else: 
  print("Jane doesn't love Mary.")

True
Jane doesn't love Mary.


## SPARQL DESCRIBE queries

SPARQL Describe queries can be treated in the same way as Construct queries. 

In [23]:
g2 = sparql_construct(g,"""
  DESCRIBE ?b
  WHERE { ?a  a :Person;
              :loves ?b . }
""")

print(g2.serialize(format='turtle'))

@prefix : <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:jane a :Person ;
    :age 22 ;
    :friend :bill,
        :bob,
        :mary ;
    :gender "female"@en ;
    :loves :bill .

:bill a :Person ;
    :friend :jane,
        :mary ;
    :gender "male" .




# Load and query the solar system graph

The tasks in the SPARQL sheet of SemAI.jar involve querying an RDF graph that represents our solar system. This graph can be found in the SemAI GitHub repository at the following URL: https://github.com/jku-win-dke/SemAI/blob/main/data/solarsystem.ttl.

Let's load that graph and query it. 

In [24]:
g_solar = Graph()
g_solar.parse("https://raw.githubusercontent.com/jku-win-dke/SemAI/main/data/solarsystem.ttl",format="turtle")

df = sparql_select(g_solar,"""
  SELECT ?planet ?apoapsis ?apoapsis_uom
  WHERE { 
    ?planet rdf:type dbo:Planet . 
    OPTIONAL { ?planet v:apoapsis [rdf:value ?apoapsis ; v:uom ?apoapsis_uom ].  }
  }
""")
df

Unnamed: 0,planet,apoapsis,apoapsis_uom
0,:Mercury,0.467,unit:AU
1,:Venus,0.728,unit:AU
2,:Earth,1.017,unit:AU
3,:Earth,149597871.0,unit:KM
4,:Mars,1.666,unit:AU
5,:Jupiter,5.4588,unit:AU
6,:Saturn,9.0412,unit:AU
7,:Uranus,20.11,unit:AU
8,:Neptune,30.33,unit:AU


### Making Use of `BASE`
A base namespace can be quite useful in Turtle and SPARQL. However, its support in rdflib (and some other RDF libraries) may not be flawless. Instead of using the `base` parameter of the Graph class, it seems more reliable to specify the BASE namespace directly within your SPARQL query.

In [25]:
df = sparql_select(g_solar,"""
  BASE <http://dke.jku.at/example/>
  SELECT ?p ?o
  WHERE { 
    <solarsystem/Earth> ?p ?o . 
  }
""")
df

Unnamed: 0,p,o
0,skos:exactMatch,dbr:Earth
1,rdf:type,dbo:Planet
2,sdo:name,Planet Earth
3,sdo:name,Erde
4,v:orbits,:Sun
5,v:apoapsis,n095783546f5647fd9ef4db874b488daab13
6,v:apoapsis,n095783546f5647fd9ef4db874b488daab14
7,:temperature,n095783546f5647fd9ef4db874b488daab15
8,:temperature,n095783546f5647fd9ef4db874b488daab16
9,v:nrOfMoons,1
