# Federated Query Examples

This notebook demonstrates Federated querying capabilities using `rdflib` and `SERVICE`

In [None]:
from rdflib import Graph, Literal, URIRef, namespace

import ipyradiant

> Note: Federated Queries in this notebook work with wikidata endpoints and not with
> linkeddata endpoints

### Run a Federated Query using `rdflib` on a remote SPARQL endpoint

In the following example, we query a remote graph from the SPARQL endpoint:
https://query.wikidata.org/sparql. We validate the query string using the service_patch
as described in the <a href="examples/RemoteQuery_Example.ipynb">Remote Query
Examples</a> notebook. The output of the query is a list of items of the matched results
and limits the output to the top 5.

In [None]:
graph = Graph()
query_str = """
    PREFIX wd: <http://www.wikidata.org/entity/>
    PREFIX wdt: <http://www.wikidata.org/prop/direct/>
    PREFIX wikibase: <http://wikiba.se/ontology#>
    PREFIX bd: <http://www.bigdata.com/rdf#>
    SELECT ?item
    WHERE {
        SERVICE <https://query.wikidata.org/sparql>
        {
            SELECT ?item
            WHERE {
                ?item wdt:P31 wd:Q146 .
            }
            LIMIT 5
        }
    }
"""
res = graph.query(ipyradiant.service_patch_rdflib(query_str))
list(res)

The power of using federated queries allows for more flexibility of data queries and
data integration to in-memory graphs. If we think about a simple use such as Google, the
website essentially aggregates data from various sources and combines that into one UI
where it is presented to the user. Federated queries allows the capability to gather
knowledge from distributed sources aggregated into a single query result.

### Query an in-memory graph and a remote graph

This example shows a powerful way to use remote SPAQRL endpoints and federated queries
by combining data from remote graphs to an existing in-memory graph. This capability
expands the breadth of data sources and allows for more flexibilty for users.

Here we create a single triple that will be used as our in-memory graph:

In [None]:
graph = Graph()
graph.add(
    (
        URIRef("http://www.wikidata.org/entity/Q378619"),
        URIRef("http://www.example.org/ExamplePredicate"),
        URIRef("http://www.example.org/ExampleObject"),
    )
)
list(graph)

> Note: You can use parsed data as a local graph by:
> `graph = Graph().parse("example.ttl", format="ttl")`

Next we perform a federated query that returns the subject from the in-memory graph and
a result from a service call to wikidata's sparql endpoint:

> Note the DISTINCT on the first SELECT. Without this specification, the output results
> contain duplicates

In this next example, we are supplementing a remote graph with local data and return the
results:

In [None]:
# goal is for results to include local data and remote data
query_str = """
    PREFIX wd: <http://www.wikidata.org/entity/>
    PREFIX wdt: <http://www.wikidata.org/prop/direct/>
    PREFIX wikibase: <http://wikiba.se/ontology#>
    PREFIX bd: <http://www.bigdata.com/rdf#>
    SELECT DISTINCT ?s ?p ?o
    WHERE {
        BIND(?item as ?s)
        ?s ?p ?o.
        
        service <https://query.wikidata.org/sparql>
        {
            SELECT ?item
            WHERE {
                ?item wdt:P31 wd:Q146 .
            }
        }
    }
"""
res = graph.query(query_str)
list(res)

In this example, we are supplementing our local graph with data returned from the
wikidata endpoint. To make the results more interesting, we returned entities of 'cloned
cat' and 'chat clone' in english and french:

In [None]:
query_str = """
    PREFIX wd: <http://www.wikidata.org/entity/>
    PREFIX wdt: <http://www.wikidata.org/prop/direct/>
    PREFIX wikibase: <http://wikiba.se/ontology#>
    PREFIX bd: <http://www.bigdata.com/rdf#>
    SELECT ?s ?p_remote ?o_remote
    WHERE {     
        ?s ?p ?o.
            
        service <https://query.wikidata.org/sparql>
        {
            SELECT ?p_remote ?o_remote
            WHERE {
                BIND(<http://schema.org/description> as ?p_remote)
                ?s ?p_remote ?o_remote .
                FILTER(LANG(?o_remote)=?languages)
                VALUES(?languages){('en')('fr')}
            }
        }
    
    }
"""
res = graph.query(query_str)
list(res)

### Broken Example: Query two remote endpoints

This example attempts to show a way to run two parallel service queries using rdflib.
There is an issue with the way the results are aggregated at the end. Overall, there may
not be a way to do this easily:

In [None]:
graph = Graph()
query_str = """
    SELECT ?s
    WHERE {
        {
            service <https://query.wikidata.org/sparql> 
            {
                SELECT DISTINCT ?s
                WHERE {?s ?p ?o}
                LIMIT 4                
            }
       }
          UNION
       {
             service <https://query.wikidata.org/sparql> 
            {
                SELECT DISTINCT ?s
                WHERE {?s ?p ?o}
                LIMIT 4   
                OFFSET 4
            }
       }
    }
"""
res = graph.query(query_str)
assert len(res) <= 8
list(res)

### Broken Example: Nested Service Calls using rdflib

rdflib does not currently support nested service calls. The following is an example of
what NOT to do when querying both an in-memory graph and a remote graph. This results in
a RecursionError: maximum recursion depth exceeded:

In [None]:
query_str = """
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd: <http://www.bigdata.com/rdf#>

SELECT ?item1 ?item2
WHERE {
    
    service <https://query.wikidata.org/sparql> 
    {
        SELECT ?item1 ?item2
        WHERE 
        {
        
          ?item1 wdt:P31 wd:Q146 .
          
          service <https://query.wikidata.org/sparql>
          { 
            SELECT ?item2
            WHERE {
                ?item2 wdt:P31 wd:Q146 .
            }
            LIMIT 10
          }
        } 
        LIMIT 10
    }
}

"""