# SPARQL Tutorial

From the very fine Jena SPARQL Tutorial: https://jena.apache.org/tutorials

The objective of this SPARQL tutorial is to give a fast course in SPARQL. The tutorial covers the major features of the query language through examples but does not aim to be complete.

If you are looking for a short introduction to SPARQL and Jena try Search RDF data with SPARQL. If you are looking to execute SPARQL queries in code and already known SPARQL then you likely want to read the ARQ Documentation instead.

SPARQL is a query language and a protocol for accessing RDF designed by the W3C RDF Data Access Working Group. 

As a query language, SPARQL is “data-oriented” in that it only queries the information held in the models; there is no inference in the query language itself.  Of course, the Jena model may be ‘smart’ in that it provides the impression that certain triples exist by creating them on-demand, including OWL reasoning.  SPARQL does not do anything other than take the description of what the application wants, in the form of a query, and returns that information, in the form of a set of bindings or an RDF graph.

## First, some DataFrame help

It's going to be useful to both view and manipulate these data using DataFrames. So, first a helpful function to convert a SPARQL query to a DataFrame.

In [1]:
#%load_ext nb_black

import rdflib
import io
import pandas as pd
import json
from pprint import pprint
from box import Box


def to_dataframe(sparql_query_result):
    """
    Convert the result of a SPARQL query into a Pandas DataFrame
    
    A bit naive, in that I'm not handling types at all.
    """

    qres_dict = json.loads(sparql_query_result.serialize(format="json"))
    b = Box(qres_dict)

    vars = b.head.vars

    rows = []
    for v in b.results.bindings:
        row = {}
        for var in vars:
            try:
                chunk = {var: v[var].value}
                row.update(chunk)
            except:
                chunk = {var: None}
        rows.append(row)

    df = pd.DataFrame(rows)
    return df

## Data Formats

First, we need to be clear about what data is being queried. SPARQL queries RDF graphs. An RDF graph is a set of triples (Jena calls RDF graphs “models” and triples “statements” because that is what they were called at the time the Jena API was first designed).

It is important to realize that it is the triples that matter, not the serialization. The serialization is just a way to write the triples down. RDF/XML is the W3C recommendation but it can be difficult to see the triples in the serialized form because there are multiple ways to encode the same graph.  In this tutorial, we use a more “triple-like” serialization, called Turtle (see also N3 language described in the W3C semantic web primer).

In [2]:
turtle_data = """
@prefix vCard:   <http://www.w3.org/2001/vcard-rdf/3.0#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://somewhere/MattJones/>  vCard:FN   "Matt Jones" .
<http://somewhere/MattJones/>  vCard:N    _:b0 .
_:b0  vCard:Family "Jones" .
_:b0  vCard:Given  "Matthew" .


<http://somewhere/RebeccaSmith/> vCard:FN    "Becky Smith" .
<http://somewhere/RebeccaSmith/> vCard:N     _:b1 .
_:b1 vCard:Family "Smith" .
_:b1 vCard:Given  "Rebecca" .

<http://somewhere/JohnSmith/>    vCard:FN    "John Smith" .
<http://somewhere/JohnSmith/>    vCard:N     _:b2 .
_:b2 vCard:Family "Smith" .
_:b2 vCard:Given  "John"  .

<http://somewhere/SarahJones/>   vCard:FN    "Sarah Jones" .
<http://somewhere/SarahJones/>   vCard:N     _:b3 .
_:b3 vCard:Family  "Jones" .
_:b3 vCard:Given   "Sarah" .
"""

g = rdflib.Graph()
g.parse(format="turtle", data=turtle_data)

<Graph identifier=Ncce4190850254b26a77906f7c51ec03e (<class 'rdflib.graph.Graph'>)>

## A First SPARQL Query



In [3]:
sparql_query = """
SELECT ?x
WHERE { ?x  <http://www.w3.org/2001/vcard-rdf/3.0#FN>  "John Smith" }
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,x
0,http://somewhere/JohnSmith/


This works by matching the triple pattern in the WHERE clause against the triples in the RDF graph. The predicate and object of the triple are fixed values so the pattern is going to match only triples with those values. The subject is a variable, and there are no other restrictions on the variable. The pattern matches any triples with these predicate and object values, and it matches with solutions for x.

The item enclosed in <> is a URI (actually, it’s an IRI) and the item enclosed in "” is a plain literal. Just like Turtle, N3 or N-triples, typed literals are written with ^^ and language tags can be added with @.

?x is a variable called x. The ? does not form part of the name which is why it does not appear in the table output.

There is one match. The query returns the match in the x query variable. The output shown was obtained by using one of ARQ’s command line applications.

## Basic patterns

This section covers basic patterns and solutions, the main building blocks of SPARQL queries.

### Solutions

Query solutions are a set of pairs of a variable name with a value. A SELECT query directly exposes the solutions (after order/limit/offset are applied) as the result set - other query forms use the solutions to make a graph. The solution is the way the pattern matched - which values the variables must take for a pattern to match.

The first query example had a single solution. Change the pattern to this second query:

In [4]:
sparql_query = """
SELECT ?x ?fname
WHERE {?x  <http://www.w3.org/2001/vcard-rdf/3.0#FN>  ?fname}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,x,fname
0,http://somewhere/MattJones/,Matt Jones
1,http://somewhere/RebeccaSmith/,Becky Smith
2,http://somewhere/JohnSmith/,John Smith
3,http://somewhere/SarahJones/,Sarah Jones


This has 4 solutions, one for each VCARD name property triples in the data source.

So far, with triple patterns and basic patterns, every variable will be defined in every solution. The solutions to a query can be thought of a table, but in the general case, it is a table where not every row will have a value for every column. All the solutions to a given SPARQL query don’t have to have values for all the variables in every solution as we shall see later.

### Basic Patterns

A basic pattern is a set of triple patterns. It matches when the triple patterns all match with the same value used each time the variable with the same name is used.

In [5]:
sparql_query = """
SELECT ?givenName
WHERE
  { ?y  <http://www.w3.org/2001/vcard-rdf/3.0#Family>  "Smith" .
    ?y  <http://www.w3.org/2001/vcard-rdf/3.0#Given>  ?givenName .
  }
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,givenName
0,Rebecca
1,John


This query involves two triple patterns, each triple ends in a ‘.’ (but the dot after the last one can be omitted like it was in the one triple pattern example). The variable y has to be the same for each triple pattern match.

### QNames

There is shorthand mechanism for writing long URIs using prefixes. The query above is more clearly written as the query:

In [6]:
sparql_query = """
PREFIX vcard:      <http://www.w3.org/2001/vcard-rdf/3.0#>

SELECT ?givenName
WHERE
 { ?y vcard:Family "Smith" .
   ?y vcard:Given  ?givenName .
 }
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,givenName
0,Rebecca
1,John


This is a prefixing mechanism - the two parts of the URIs, from the prefix declaration and from the part after the “:” in the qname, are concatenated together. This is strictly not what an XML qname is but uses the RDF rule for turning a qname into a URI by concatenating the parts.

### Blank Nodes

Note that the y will be rendered as a full BNode in rdflib. It's tough to read that they are truly different, but they are.

In [7]:
sparql_query = """
PREFIX vcard:      <http://www.w3.org/2001/vcard-rdf/3.0#>

SELECT ?y ?givenName
WHERE
 { ?y vcard:Family "Smith" .
   ?y vcard:Given  ?givenName .
 }
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,y,givenName
0,n973bfd03599844f6bf11b15bfa373099b2,Rebecca
1,n973bfd03599844f6bf11b15bfa373099b3,John


## Filters

Graph matching allows patterns in the graph to be found. This section describes how the values in a solution can be restricted. There are many comparisons available - we just cover two cases here.

### String Matching

SPARQL provides an operation to test strings, based on regular expressions.  This includes the ability to ask SQL “LIKE” style tests, although the syntax of the regular expression is different from SQL.

The syntax is:

`FILTER regex(?x, "pattern" [, "flags"])`

The flags argument is optional.  The flag “i” means a case-insensitive pattern match is done.

The example query finds given names with an “r” or “R” in them.

In [8]:
sparql_query = """
PREFIX vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>

SELECT ?g
WHERE
{ ?y vcard:Given ?g .
  FILTER regex(?g, "r", "i") }
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,g
0,Rebecca
1,Sarah


The regular expression language is the same as the XQuery regular expression language which is codified version of that found in Perl.

### Testing Values

There are times when the application wants to filter on the value of a variable.  In the data, we have added an extra field for age.  Age is not defined by the vCard schema so we have created a new property for the purpose of this tutorial.  RDF allows such mixing of different definitions of information because URIs are unique. Note also that the info:age property value is typed.

In [9]:
xml_data = """
<rdf:RDF
  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
  xmlns:vCard='http://www.w3.org/2001/vcard-rdf/3.0#'
  xmlns:info='http://somewhere/peopleInfo#'
   >

  <rdf:Description rdf:about="http://somewhere/JohnSmith">
    <vCard:FN>John Smith</vCard:FN>
    <info:age rdf:datatype='http://www.w3.org/2001/XMLSchema#integer'>25</info:age>
    <vCard:N rdf:parseType="Resource">
    <vCard:Family>Smith</vCard:Family>
    <vCard:Given>John</vCard:Given>
    </vCard:N>
  </rdf:Description>

  <rdf:Description rdf:about="http://somewhere/RebeccaSmith">
    <vCard:FN>Becky Smith</vCard:FN>
    <info:age rdf:datatype='http://www.w3.org/2001/XMLSchema#integer'>23</info:age>
    <vCard:N rdf:parseType="Resource">
    <vCard:Family>Smith</vCard:Family>
    <vCard:Given>Rebecca</vCard:Given>
    </vCard:N>
  </rdf:Description>

  <rdf:Description rdf:about="http://somewhere/SarahJones">
    <vCard:FN>Sarah Jones</vCard:FN>
    <vCard:N rdf:parseType="Resource">
    <vCard:Family>Jones</vCard:Family>
    <vCard:Given>Sarah</vCard:Given>
    </vCard:N>
  </rdf:Description>

  <rdf:Description rdf:about="http://somewhere/MattJones">
    <vCard:FN>Matt Jones</vCard:FN>
    <vCard:N
    vCard:Family="Jones"
    vCard:Given="Matthew"/>
  </rdf:Description>

</rdf:RDF>
"""

g = rdflib.Graph()
g.parse(format="xml", data=xml_data)

<Graph identifier=N84d9f6d14b2e49c6bd6f2a8f0c725a06 (<class 'rdflib.graph.Graph'>)>

So, a query to find the names of people who are older than 24 is:

In [10]:
sparql_query = """
PREFIX info: <http://somewhere/peopleInfo#>

SELECT ?resource
WHERE
  {
    ?resource info:age ?age .
    FILTER (?age >= 24)
  }
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,resource
0,http://somewhere/JohnSmith


The arithmetic expression must be in parentheses (round brackets). Just one match, resulting in the resource URI for John Smith. Turning this round to ask for those less than 24 also yields one match for Rebecca Smith.  Nothing about the Jones’s.

The database contains no age information about the Jones: there are no info:age properties on these vCards so the variable age did not get a value and so was not tested by the filter.

## Optional information

RDF is semi-structured data so SPARQL has a the ability to query for data but not to fail query when that data does not exist. The query is using an optional part to extend the information found in a query solution but to return the non-optional information anyway.

### OPTIONALs

This query gets the name of a person and also their age if that piece of information is available.

In [11]:
sparql_query = """
PREFIX info:    <http://somewhere/peopleInfo#>
PREFIX vcard:   <http://www.w3.org/2001/vcard-rdf/3.0#>

SELECT ?name ?age
WHERE
{
    ?person vcard:FN  ?name .
    OPTIONAL { ?person info:age ?age }
}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,name,age
0,John Smith,25.0
1,Becky Smith,23.0
2,Sarah Jones,
3,Matt Jones,


Two of the four people in the data have age properties so two of the query solutions have that information.  However, because the triple pattern for the age is optional, there is a pattern solution for the people who don’t have age information.

If the optional clause had not been there, no age information would have been retrieved. If the triple pattern had been included but not optional then we would have the query with only two solutions because the `info:age` property must now be present in a solution:

In [12]:
sparql_query = """
PREFIX info:   <http://somewhere/peopleInfo#>
PREFIX vcard:  <http://www.w3.org/2001/vcard-rdf/3.0#>

SELECT ?name ?age
WHERE
{
    ?person vcard:FN  ?name .
    ?person info:age ?age .
}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,name,age
0,John Smith,25
1,Becky Smith,23


### OPTIONALs with FILTERs

OPTIONAL is a binary operator that combines two graph patterns. The optional pattern is any group pattern and may involve any SPARQL pattern types.  If the group matches, the solution is extended, if not, the original solution is given. So, if we filter for ages greater than 24 in the optional part, we will still get 4 solutions (from the vcard:FN pattern) but only get ages if they pass the test.

In [13]:
sparql_query = """
PREFIX info:        <http://somewhere/peopleInfo#>
PREFIX vcard:      <http://www.w3.org/2001/vcard-rdf/3.0#>

SELECT ?name ?age
WHERE
{
    ?person vcard:FN  ?name .
    OPTIONAL { ?person info:age ?age . FILTER ( ?age > 24 ) }
}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,name,age
0,John Smith,25.0
1,Becky Smith,
2,Sarah Jones,
3,Matt Jones,


No age included for “Becky Smith” because it is less than 24.

If the filter condition is moved out of the optional part, then it can influence the number of solutions but it may be necessary to make the filter more complicated to allow for variable age being unbound.

If a solution has an age variable, then it must be greater than 24. It can also be unbound.  There are now three solutions:

In [14]:
sparql_query = """
PREFIX info:        <http://somewhere/peopleInfo#>
PREFIX vcard:      <http://www.w3.org/2001/vcard-rdf/3.0#>

SELECT ?name ?age
WHERE
{
    ?person vcard:FN  ?name .
    OPTIONAL { ?person info:age ?age . }
    FILTER ( !bound(?age) || ?age > 24 )
}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,name,age
0,John Smith,25.0
1,Sarah Jones,
2,Matt Jones,


Evaluating an expression which has an unbound variables where a bound one was expected causes an evaluation exception and the whole expression fails.

### OPTIONALs and Order Dependent Queries

One thing to be careful of is using the same variable in two or more optional clauses (and not in some basic pattern as well):

In [15]:
sparql_query = """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#>

SELECT ?name
WHERE
{
  ?x a foaf:Person .
  OPTIONAL { ?x foaf:name ?name }
  OPTIONAL { ?x vCard:FN  ?name }
}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

If the first optional binds `?name` and `?x` to some values, the second OPTIONAL is an attempt to match the ground triples (`?x` and `?name` have values). If the first optional did not match the optional part, then the second one is an attempt to match its triple with two variables.

## Alternatives

Another way of dealing with the semi-structured data is to query for one of a number of possibilities. This section covers UNION patterns, where one of a number of possibilities is tried.

### UNION - two ways to the same data

Both the vCard vocabulary and the FOAF vocabulary have properties for people’s names.  In vCard, it is vCard:FN, the “formatted name”, and in FOAF, it is foaf:name. In this section, we will look at a small set of data where the names of people can be given by either the FOAF or the vCard vocabulary.

Suppose we have an RDF graph that contains name information using both the vCard and FOAF vocabularies.

In [16]:
rdf_data = """
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> .

_:a foaf:name   "Matt Jones" .

_:b foaf:name   "Sarah Jones" .

_:c vcard:FN    "Becky Smith" .

_:d vcard:FN    "John Smith" .
"""

g = rdflib.Graph()
g.parse(format="turtle", data=rdf_data)

<Graph identifier=N73c20b6fbcaa46aca88915aed1346c18 (<class 'rdflib.graph.Graph'>)>

A query to access the name information, when it can be in either form, could return the results:

In [17]:
sparql_query = """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#>

SELECT ?name
WHERE
{
   { [] foaf:name ?name } UNION { [] vCard:FN ?name }
}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,name
0,Matt Jones
1,Sarah Jones
2,Becky Smith
3,John Smith


It didn’t matter which form of expression was used for the name, the `?name` variable is set. This can be achieved using a FILTER as this query shows:

In [18]:
sparql_query = """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#>

SELECT ?name
WHERE
{
  [] ?p ?name
  FILTER ( ?p = foaf:name || ?p = vCard:FN )
}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,name
0,Becky Smith
1,Matt Jones
2,John Smith
3,Sarah Jones


testing whether the property is one URI or another. The solutions may not come out in the same order.  The first form is more likely to be faster, depending on the data and the storage used, because the second form may have to get all the triples from the graph to match the triple pattern with unbound variables (or blank nodes) in each slot, then test each ?p to see if it matches one of the values. It will depend on the sophistication of the query optimizer as to whether it spots that it can perform the query more efficiently and is able to pass the constraint down as well as to the storage layer.

### UNION - remembering where the data was found.

The example above used the same variable in each branch. If different variables are used, the application can discover which sub-pattern caused the match:

In [19]:
sparql_query = """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#>

SELECT ?name1 ?name2
WHERE
{
   { [] foaf:name ?name1 } UNION { [] vCard:FN ?name2 }
}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,name1,name2
0,Matt Jones,
1,Sarah Jones,
2,,Becky Smith
3,,John Smith


This second query has retained information of where the name of the person came from by assigning the name to different variables.

### OPTIONAL and UNION

n practice, OPTIONAL is more common than UNION but they both have their uses. OPTIONAL are useful for augmenting the solutions found, UNION is useful for concatenating the solutions from two possibilities. They don’t necessary return the information in the same way:

In [20]:
sparql_query = """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#>

SELECT ?name1 ?name2
WHERE
{
  ?x a foaf:Person
  OPTIONAL { ?x  foaf:name  ?name1 }
  OPTIONAL { ?x  vCard:FN   ?name2 }
}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

but beware of using ?name in each OPTIONAL because that is an order-dependent query.

## Named Graphs

_Work-in-progress! rfdlib calls these Conjunctive Graphs, but I haven't quite cracked the pathing or the data reads yet._

This section covers RDF Datasets - an RDF Dataset is the unit that is queried by a SPARQL query. It consists of a default graph, and a number of named graphs.

### Querying datasets
The graph matching operation (basic patterns, OPTIONALs, and UNIONs) work on one RDF graph.  This starts out being the default graph of the dataset but it can be changed by the GRAPH keyword.

```
GRAPH uri { ... pattern ... }

GRAPH var { ... pattern ... }
```

If a URI is given, the pattern will be matched against the graph in the dataset with that name - if there isn’t one, the GRAPH clause fails to match at all.

If a variable is given, all the named graphs (not the default graph) are tried.  The variable may be used elsewhere so that if, during execution, its value is already known for a solution, only the specific named graph is tried.

#### Example Data
An RDF dataset can take a variety of forms.  Two common setups are to have the default graph being the union (the RDF merge) of all the named graphs or to have the default graph be an inventory of the named graphs (where they came from, when they were read etc).  There are no limitations - one graph can be included twice under different names, or some graphs may share triples with others.

In the examples below we will use the following dataset that might occur for an RDF aggregator of book details:

##### Named Graph #1

In [21]:
named1_data = """
@prefix dc: <http://purl.org/dc/elements/1.1/> .

[] dc:title "Harry Potter and the Philospher's Stone" .
[] dc:title "Harry Potter and the Chamber of Secrets" .
"""

g1 = rdflib.Graph()
g1.parse(format="turtle", data=named1_data)

<Graph identifier=Nff391fb6480c49c384523dc28930e11f (<class 'rdflib.graph.Graph'>)>

##### Named Graph #2

In [22]:
named2_data = """
@prefix dc: <http://purl.org/dc/elements/1.1/> .

[] dc:title "Harry Potter and the Sorcerer's Stone" .
[] dc:title "Harry Potter and the Chamber of Secrets" .
"""

g2 = rdflib.Graph()
g2.parse(format="turtle", data=named2_data)

<Graph identifier=N15fbd361071e4e5f95beab0d8ace61f5 (<class 'rdflib.graph.Graph'>)>

##### Default graph

In [23]:
default_data = """
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<resources/ds-ng-1.ttl> dc:date "2005-07-14T03:18:56+0100"^^xsd:dateTime .
<resources/ds-ng-2.ttl> dc:date "2005-09-22T05:53:05+0100"^^xsd:dateTime .
"""

g = rdflib.ConjunctiveGraph()
g.parse(format="turtle", data=default_data)

<Graph identifier=Ne559da47edec41a0a806db2ff2010690 (<class 'rdflib.graph.Graph'>)>

### Accessing the Dataset

The first example just accesses the default graph.

This is the default graph only - nothing from the named graphs because they aren’t queried unless explicitly indicated via GRAPH.

In [24]:
sparql_query = """
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX : <.>

SELECT *
{ ?s ?p ?o }
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,p,s,o
0,http://purl.org/dc/elements/1.1/date,file:///C:/Work/sparql_tutorial/resources/ds-n...,2005-09-22T05:53:05+01:00
1,http://purl.org/dc/elements/1.1/date,file:///C:/Work/sparql_tutorial/resources/ds-n...,2005-07-14T03:18:56+01:00


We can query for all triples by querying the default graph and the named graphs giving:

In [25]:
sparql_query = """
PREFIX  xsd:    <http://www.w3.org/2001/XMLSchema#>
PREFIX  dc:     <http://purl.org/dc/elements/1.1/>
PREFIX  :       <.>

SELECT *
{
    { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } }
}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,p,s,o,g
0,http://purl.org/dc/elements/1.1/date,file:///C:/Work/sparql_tutorial/resources/ds-n...,2005-09-22T05:53:05+01:00,
1,http://purl.org/dc/elements/1.1/date,file:///C:/Work/sparql_tutorial/resources/ds-n...,2005-07-14T03:18:56+01:00,
2,http://purl.org/dc/elements/1.1/date,file:///C:/Work/sparql_tutorial/resources/ds-n...,2005-09-22T05:53:05+01:00,Ne559da47edec41a0a806db2ff2010690
3,http://purl.org/dc/elements/1.1/date,file:///C:/Work/sparql_tutorial/resources/ds-n...,2005-07-14T03:18:56+01:00,Ne559da47edec41a0a806db2ff2010690


#### Querying a specific graph
If the application knows the name graph, it can directly ask a query such as finding all the titles in a given graph 

In [26]:
sparql_query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX : <.>

SELECT ?title
{
  GRAPH :ds-ng-2.ttl
    { ?b dc:title ?title }
}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

#### Querying to find data from graphs that match a pattern
The name of the graphs to be queried can be determined with the query itself. The same process for variables applies whether they are part of a graph pattern or the GRAPH form. The query below sets a condition on the variable used to select named graphs, based on information in the default graph.

The results of executing this query on the example dataset are the titles in one of the graphs, the one with the date later than 1 August 2005.

In [27]:
sparql_query = """
PREFIX  xsd:    <http://www.w3.org/2001/XMLSchema#>
PREFIX  dc:     <http://purl.org/dc/elements/1.1/>
PREFIX  :       <.>

SELECT ?date ?title
{
  ?g dc:date ?date . FILTER (?date > "2005-08-01T00:00:00Z"^^xsd:dateTime )
  GRAPH ?g
      { ?b dc:title ?title }
}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

### Describing RDF Datasets - FROM and FROM NAMED

A query execution can be given the dataset when the execution object is built or it can be described in the query itself. When the details are on the command line, a temporary dataset is created but an application can create datasets and then use them in many queries.

When described in the query, FROM <url> is used to identify the contents to be in the default graph. There can be more than one FROM clause and the default graph is result of reading each file into the default graph. It is the RDF merge of the individual graphs.

Don’t be confused by the fact the default graph is described by one or more URLs in FROM clauses. This is where the data is read from, not the name of the graph. As several FROM clauses can be given, the data can be read in from several places but none of them become the graph name.

FROM NAMED <url> is used to identify a named graph. The graph is given the name url and the data is read from that location. Multiple FROM NAMED clauses cause multiple graphs to be added to the dataset.

Note that graphs are loaded with the Jena FileManager which includes the ability to provide alternative locations for files. For example, the query may have FROM NAMED <http://example/data>, and the data actually be read from file:local.rdf. The name of the graph will be http://example/data\ as in the query.

For example, the query to find all the triples in both default graph and named graphs could be written as (q-ds-5.rq):

In [28]:
sparql_query = """
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc:  <http://purl.org/dc/elements/1.1/>
PREFIX :    <.>

SELECT *
FROM       <resources/ds-dft.ttl>
FROM NAMED <resources/ds-ng-1.ttl>
FROM NAMED <resources/ds-ng-2.ttl>
{
   { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } }
}
"""

qres = g.query(sparql_query)
df = to_dataframe(qres)
df

Unnamed: 0,p,s,o,g
0,http://purl.org/dc/elements/1.1/title,n6a15e8d4417042e5a207da6a6f0e2055b2,Harry Potter and the Chamber of Secrets,
1,http://purl.org/dc/elements/1.1/date,file:///C:/Work/sparql_tutorial/resources/ds-n...,2005-09-22T05:53:05+01:00,
2,http://purl.org/dc/elements/1.1/title,n6a15e8d4417042e5a207da6a6f0e2055b1,Harry Potter and the Sorcerer's Stone,
3,http://purl.org/dc/elements/1.1/title,n67e56208b43049a78768ed5a1937bebfb1,Harry Potter and the Philospher's Stone,
4,http://purl.org/dc/elements/1.1/date,file:///C:/Work/sparql_tutorial/resources/ds-n...,2005-07-14T03:18:56+01:00,
5,http://purl.org/dc/elements/1.1/title,n67e56208b43049a78768ed5a1937bebfb2,Harry Potter and the Chamber of Secrets,
6,http://purl.org/dc/elements/1.1/title,n6a15e8d4417042e5a207da6a6f0e2055b1,Harry Potter and the Sorcerer's Stone,file:///C:/Work/sparql_tutorial/resources/ds-n...
7,http://purl.org/dc/elements/1.1/title,n6a15e8d4417042e5a207da6a6f0e2055b2,Harry Potter and the Chamber of Secrets,file:///C:/Work/sparql_tutorial/resources/ds-n...
8,http://purl.org/dc/elements/1.1/title,n67e56208b43049a78768ed5a1937bebfb1,Harry Potter and the Philospher's Stone,file:///C:/Work/sparql_tutorial/resources/ds-n...
9,http://purl.org/dc/elements/1.1/title,n67e56208b43049a78768ed5a1937bebfb2,Harry Potter and the Chamber of Secrets,file:///C:/Work/sparql_tutorial/resources/ds-n...


## Producing Result Sets

SPARQL has four result forms:

* SELECT – Return a table of results.
* CONSTRUCT – Return an RDF graph, based on a template in the query.
* DESCRIBE – Return an RDF graph, based on what the query processor is configured to return.
* ASK – Ask a boolean query.

The SELECT form directly returns a table of solutions as a result set, while DESCRIBE and CONSTRUCT use the outcome of matching to build RDF graphs.

### Solution Modifiers
Pattern matching produces a set of solutions. This set can be modified in various ways:

Projection - keep only selected variables
OFFSET/LIMIT - chop the number solutions (best used with ORDER BY)
ORDER BY - sorted results
DISTINCT - yield only one row for one combination of variables and values.
The solution modifiers OFFSET/LIMIT and ORDER BY always apply to all result forms. 

#### OFFSET and LIMIT
A set of solutions can be abbreviated by specifying the offset (the start index) and the limit (the number of solutions) to be returned. Using LIMIT alone can be useful to ensure not too many solutions are returned, to restrict the effect of some unexpected situation.  LIMIT and OFFSET can be used in conjunction with sorting to take a defined slice through the solutions found.

#### ORDER BY
SPARQL solutions are sorted by expression, including custom functions.

```
ORDER BY ?x ?y

ORDER BY DESC(?x)

ORDER BY x:func(?x)  # Custom sorting condition
```

#### DISTINCT
The SELECT result form can take the DISTINCT modifier which ensures that no two solutions returned are the same - this takes place after projection to the requested variables.

### SELECT
The SELECT result form is a projection, with DISTINCT applied, of the solution set. SELECT identifies which named variables are in the result set.  This may be “*” meaning “all named variables” (blank nodes in the query act like variables for matching but are never returned).

### CONSTRUCT
CONSTRUCT builds an RDF based on a graph template.  The graph template can have variables which are bound by a WHERE clause.  The effect is to calculate the graph fragment, given the template, for each solution from the WHERE clause, after taking into account any solution modifiers. The graph fragments, one per solution, are merged into a single RDF graph which is the result.

Any blank nodes explicitly mentioned in the graph template are created afresh for each time the template is used for a solution.

### DESCRIBE
The CONSTRUCT form, takes an application template for the graph results. The DESCRIBE form also creates a graph but the form of that graph is provided the query processor, not the application. For each URI found, or explicitly mentioned in the DESCRIBE clause, the query processor should provide a useful fragment of RDF, such as all the known details of a book. ARQ allows domain-specific description handlers to be written.

### ASK
The ASK result form returns a boolean, true of the pattern matched otherwise false.