# Federated Query Examples

This notebook demonstrates Federated querying capabilities using `rdflib` and `SERVICE`

### Run a Federated Query using `rdflib` on a remote SPARQL endpoint

In the following example, we query a remote graph from the SPARQL endpoint:
http://linkeddata.uriburner.com/sparql. We validate the query string using the
service_patch as described above. The output of the query is a list of triples that
contain the subject, predicate, and objects of the matched results and limits the output
to the top 3.

The power of using federated queries allows for more flexibility of data queries and
data integration to in-memory graphs. If we think about a simple use such as Google, the
website essentially aggregates data from various sources and combines that into one UI
where it is presented to the user. Federated queries allows the capability to gather
knowledge from distributed sources into one aggregated knowledge graph.

In [None]:
import ipyradiant
from rdflib import Graph

In [None]:
graph = Graph()
query_str = """
    SELECT DISTINCT ?s ?p ?o
    WHERE
      { 
        SERVICE <http://linkeddata.uriburner.com/sparql> 
          {
            SELECT ?s ?p ?o
            WHERE {?s ?p ?o}
            LIMIT 3                
          }
      }
"""
query_str = ipyradiant.service_patch_rdflib(query_str)
print(query_str)

In [None]:
res = graph.query(query_str)
list(res)

### Query an in-memory graph and a remote graph

This example shows a powerful way to use remote SPAQRL endpoints and federated queries
by combining data from remote graphs to an existing in-memory graph. This capability
expands the breadth of data sources and allows for more flexibilty for users.

In [None]:
from rdflib import Graph, Literal, URIRef, namespace

In [None]:
graph = Graph()
graph.add(
    (
        URIRef("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"),
        URIRef("https://i.imgur.com/0HuwV7e.jpg"),
        URIRef("http://www.w3.org/2001/XMLSchema#anyURI"),
    )
)
list(graph)

In [None]:
query_str = """
    SELECT ?p ?o
    WHERE {
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?p ?o.
        
        service <http://linkeddata.uriburner.com/sparql>
        {
            SELECT ?p ?o
            WHERE {
                <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://i.imgur.com/4OESFVu.jpg> <http://www.w3.org/2001/XMLSchema#anyURI>.
            }
        
        }
    
    
    }
"""
res = graph.query(query_str)
list(res)

### Another Example: Query an in-memory and remote graph

Load local file into an rdflib InMemory storage:

In [None]:
graph = Graph()
graph = graph.parse("data/starwars.ttl", format="ttl")
query_str = """
    SELECT *
    WHERE {
        ?s ?p ?o .
    }
    LIMIT 5
"""
res = graph.query(query_str)
list(res)

Add remote query to in-memory graph:

In [None]:
query_str = """
    SELECT *
    WHERE {
        service <http://linkeddata.uriburner.com/sparql> 
          {
            SELECT ?s ?p ?o
            WHERE {?s ?p ?o}
            LIMIT 3                
          }
    }
    LIMIT 5
"""
res = graph.query(query_str)
list(res)

### Query two remote endpoints

This example shows rdflib's ability to run two parallel service queries and aggregate
data into a higher level query structure:

In [None]:
graph = Graph()
query_str = """
    SELECT ?o ?p
    WHERE {
        service <http://linkeddata.uriburner.com/sparql> 
          {
            SELECT ?o
            WHERE {?s ?p ?o}
            LIMIT 4                
          }
        service <http://linkeddata.uriburner.com/sparql> 
          {
            SELECT ?p
            WHERE {?s ?p ?o}
            LIMIT 4                
          }
    }
    LIMIT 4
"""
res = graph.query(query_str)
list(res)

### Broken Example: Nested Service Calls using rdflib

rdflib does not currently support nested service calls. The following is an example of
what NOT to do when querying both an in-memory graph and a remote graph:

In [None]:
WD = namespace.Namespace("https://www.wikidata.org/wiki/")
graph = Graph()
graph.add((WD.Q28792126, WD.example, Literal("Example")))
list(graph)

In [None]:
query_str = """
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd: <http://www.bigdata.com/rdf#>

SELECT ?p ?o
WHERE {
    wd:Q28792126 ?p ?o .

    service <https://query.wikidata.org/> 
    {
        SELECT ?p ?o 
        WHERE 
        {
          BIND(wikibase:label AS ?p)
          wd:Q28792126 wdt:P31 wd:Q146
          service wikibase:label { 
              bd:serviceParam wikibase:language "en" . 
          }
        } 
        LIMIT 10
    }
}

"""

In [None]:
res = graph.query(query_str)
list(res)

### SERVICE patch for rdflib

Currently, rdflib contains a bug where the SERVICE clause is not supported properly.
ipyradiant detects when SERVICE is used for federated queries and converts the keyword
into lower case for rdflib support. A warning is issued when SERVICE is detected. This
patch can be removed for release>5.0.0

Here is a working example of the query string conversion to a scheme that is supported
by rdflib:

In [None]:
import ipyradiant

query_str = """
    SELECT DISTINCT ?s ?p ?o
    WHERE
      { 
        SERVICE <http://linkeddata.uriburner.com/sparql> 
          {
            SELECT ?s ?p ?o
            WHERE {?s ?p ?o}               
          }
      }
"""

query_str = ipyradiant.service_patch_rdflib(query_str)
print(query_str)

Once rdflib is updated, the warning can be removed by setting the logger_level to
CRITICAL:

In [None]:
ipyradiant.set_logger_level("CRITICAL")

query_str = """
    SELECT DISTINCT ?s ?p ?o
    WHERE
      { 
        SERVICE <http://linkeddata.uriburner.com/sparql> 
          {
            SELECT ?s ?p ?o
            WHERE {?s ?p ?o}               
          }
      }
"""

query_str = ipyradiant.service_patch_rdflib(query_str)
print(query_str)