## SPARQLQueryFramer and Metaclass Values

One of the challenges with using RDF is in passing new values as bindings to the query
string. A pythonic way to specify values is safer than requiring each value
specification to require re-engineering. This notebook demonstrates the use of a
`metaclass` together with the `ipyradiant` `SPARQLQueryFramer` class to support (semi)
pythonic specification of SPARQL `VALUES` blocks.

In [None]:
from rdflib import URIRef

### Load an example RDF graph

In this example, we will use the `ipyradiant` `FileManager`.

In [None]:
from ipyradiant import FileManager, PathLoader

lw = FileManager(loader=PathLoader(path="data"))
# here we hard set what we want the file to be, but ideally a user can choose a file to work with.
lw.loader.file_picker.value = lw.loader.file_picker.options["starwars.ttl"]
lw.loader.file_picker.disabled = True  # disabling for the example
graph = lw.graph  # convenience variable
lw

### Example Query with `rdflib`

One common way to perform a SPARQL query in python is using the `query` method on a
`rdflib.graph.Graph`. This requires that we specify namespaces and bindings at runtime,
or within the query body itself.

In [None]:
qres = graph.query(
    """
    SELECT DISTINCT ?subject ?label
    WHERE {
        ?subject a ?type ;
            rdfs:label ?label .
    }
    LIMIT 3
    """,
    initBindings={"type": URIRef("https://swapi.co/vocabulary/Character")},
)
list(qres)

> Note the specification of the initBindings at runtime. This evaluates to the same
> result as the following query string:

```sparql
SELECT DISTINCT ?subject ?label
WHERE {
    ?subject a <https://swapi.co/vocabulary/Character> ;
        rdfs:label ?label .
}
LIMIT 3
```

which is the same as using the shorthand prefix and defining the prefix within the
SPARQL string:

```sparql
PREFIX voc: <https://swapi.co/vocabulary/>
SELECT DISTINCT ?subject ?label
WHERE {
    ?subject a voc:Character ;
        rdfs:label ?label .
}
LIMIT 3
```

Now, imagine we wanted to run this query many times with different bindings for `type`.
We would have to replicate a lot of code, or fall back on f-string formatted SPARQL
queries. What we need is a common class pattern for defining SPARLQ queries and tracking
variables such as the SPARQL string, the bindings, and namespaces.

### Example Query with `SPARQLQueryFramer`

The `SPARQLQueryFramer` has a number of useful capabilities. Building upon the example
above, the `SPARQLQueryFramer` class allows us to build new sub-classes for SPARQL
queries that can neatly maintain the query, namespaces, bindings, etc.

The following is an example of the query shown above using the framer class:

In [None]:
from ipyradiant.query.framer import SPARQLQueryFramer


class CharacterLabels(SPARQLQueryFramer):
    sparql = """
    SELECT DISTINCT ?subject ?label
    WHERE {
        ?subject a voc:Character ;
            rdfs:label ?label .
    }
    LIMIT 3
    """
    initNs = {"voc": "https://swapi.co/vocabulary/"}

We can easily run this query on any number of graphs by using the classmethod
`run_query`. This accepts a `rdflib.graph.Graph` as input, and returns a
`pandas.DataFrame` with inferred columns (which can be manually specified).

In [None]:
CharacterLabels.run_query(graph)

We can still use the `rdflib.graph.Graph.query` kwargs at runtime. For example, if we
knew which subject we wanted to query for, we could use the `initBindings` kwarg:

In [None]:
CharacterLabels.run_query(
    graph, initBindings={"subject": URIRef("https://swapi.co/resource/mirialan/64")}
)

Alternately, we could use the binding variable as the kwarg key directly:

In [None]:
CharacterLabels.run_query(
    graph, subject=URIRef("https://swapi.co/resource/mirialan/64")
)

> Note: take care when using the binding variables as kwargs. Protected python names
> (e.g. `type`, `id`) will cause problems.

Now imagine we want to run this query for multiple `type`s (e.g. `voc:Character` and
`voc:Film`). How would we do this?

### Querying using `VALUES` block

SPARQL provides a built-in `VALUES` specification that allows us to specify multiple
types within a SPARQL string. The following basic query illustrates the previous query
example, but extended to return subjects of both `voc:Character` and `voc:Film` types
(i.e. `rdf:type`).

In [None]:
qres = graph.query(
    """
    PREFIX voc: <https://swapi.co/vocabulary/>
    SELECT DISTINCT ?subject ?type
    WHERE {
        ?subject a ?type .
        
        VALUES (?type) {
            (voc:Character)
            (voc:Film)
        }
    }
    ORDER BY DESC(?type)  # so that we can see film and character subjects
    LIMIT 10
    """
)
list(qres)

If we tried to create a query class using `SPARQLQueryFramer` (or another
implementation), we would have to know the `VALUES` before defining the query string.
This poses a problem for many applications.

Imagine two queries. The first returns subjects of a particular type (like our example
above), and the second uses the returned subjects to query for specific information. The
second query would require the results of the first, which would prevent us from being
able to define the query class up front (we don't know the `VALUES` a priori).

We could define a class with an unformatted `VALUES` block, but this would require us to
get the VALUES syntax correct each time, and would make our query class clunky (when do
we tell the class to format?). Instead, we can use a python metaclass to achieve the
same effect, while providing a (slightly?) simpler interface.

### Querying using `SPARQLQueryFramer` with `metaclass`

If you don't know about python metaclasses, I encourage you to review the
[official python docs](https://docs.python.org/3/reference/datamodel.html#metaclasses)
and
[this great overview](https://jeffknupp.com/blog/2013/12/28/improve-your-python-metaclasses-and-dynamic-classes-with-type/).

Essentially, `SPARQLQueryFramer` uses `metaclass` and `@property` to allow us to define
`VALUES` pythonically and dynamically. The internals are fairly advanced, so we will
simplify by showing a basic example.

The first step is to define a `metaclass` with an unformatted `VALUES` block. This
`metaclass` with contain the `_sparql` string, and will maintain the values that will be
used to format it.

In [None]:
from ipyradiant.query.framer import build_values


class MetaSubjectOfType(type):
    """Metaclass to query for type and label for specific VALUES."""

    _sparql = """
        SELECT DISTINCT ?subject ?type
        WHERE {{
            ?subject a ?type .

            VALUES ({}) {{
                {}
            }}
        }}
        ORDER BY DESC(?type)
        LIMIT 10
    """
    values = None

    @property
    def sparql(cls):
        return build_values(cls._sparql, cls.values)

> Note 1: IMPORTANT The attribute for the metaclass is `_sparql` not `sparql` (important
> for the `@property`).

> Note 2: The `metaclass` is not subclassed from `SPARQLQueryFramer`, so we cannot run
> the query directly.

> Note 3: If this seems like a complicated way to still end up f-string formatting the
> sparql string, you are right! There are a number of other underlying capabilities that
> further motivate the requirement for using a python metaclass here. If you have a
> potential solution that you think avoids the need for metaclassed, please
> [submit an issue/PR](https://github.com/jupyrdf/ipyradiant/issues).

We can now define a barebones `SPARQLQueryFramer` class.

In [None]:
class SubjectsOfType(SPARQLQueryFramer, metaclass=MetaSubjectOfType):
    values = None  # Note, we could have passed a dictionary here if we wanted to

We still can't run the query because we did not define a set of values.

In [None]:
try:
    SubjectsOfType.run_query(graph)
except AssertionError as e:
    print("Unable to run without specifying VALUES. See error below:")
    print(f"AssertionError: {e}")

We could easily set a default `values` for the class (if applicable). You can refer to
the docs for the `values` schema.

In [None]:
class SubjectsOfType(SPARQLQueryFramer, metaclass=MetaSubjectOfType):
    values = {"type": [URIRef("https://swapi.co/vocabulary/Character")]}


SubjectsOfType.run_query(graph)

And now we can pythonically update `VALUES` (e.g. to include `voc:Film`) for the class
as needed. Simple as that!

In [None]:
# We can use rdflib.namespace.Namespace here too!
from rdflib.namespace import Namespace

VOC = Namespace("https://swapi.co/vocabulary/")

SubjectsOfType.values = {"type": [VOC.Character, VOC.Film]}
SubjectsOfType.run_query(graph)

### Conclusion

`SPARQLQueryFramer` and `metaclass` provide a flexible way to build SPARQL queries in
python. This allows us to dynamically specify queries from reusable query classes.

There are a lot of other cool ways to use `SPARQLQueryFramer` classes. Check out usages
in the `ipyradiant` sourcecode
[here](https://github.com/jupyrdf/ipyradiant/search?q=SPARQLQueryFramer).