# RDF to Networkx Custom Queries

The `RDF2NX` transformation class that is included in `ipyradiant` contains default
queries for defining generic transformation behavior
([check out the example here](RDF_to_NX.ipynb)). However, there are many cases where
custom transformation behavior may be needed. This example demonstrates just a few ways
that the power of the transformation class can be harnessed for custom behavior.

### Load an RDF graph

In this example, we will use the `ipyradiant` `FileManager`.

In [None]:
from ipyradiant import FileManager, PathLoader

lw = FileManager(loader=PathLoader(path="data"))
# here we hard set what we want the file to be, but ideally a user can choose a file to work with.
lw.loader.file_picker.value = lw.loader.file_picker.options["starwars.ttl"]
lw

## Example Process (Custom)

This custom transformation process has two major parts. Given an RDF graph containing
data on startwars characters, ships, planets, etc., we want to:

1. Generate an LPG with only a few select subject nodes
2. Create a shorthand node property based on a one-step removed path

The requirements above are only a few of the (near infinite) ways that the
transformation behavior can be customized. While the current version of `ipyradiant`
requires users to understand SPARQL in order to define custom behavior, future versions
may include utility tools to support custom transformation without extensive RDF/SPARQL
experience.

### 1. Return only select nodes during the transformation

The `RDF2NX` class defines an attribute for each query class, which capture specific
aspects of the transformation behavior. The nodes that are converted during the process
are defined in the `RDF2NX.node_iris` attribute. The default class
`ipyradiant.rdf2nx.nodes.NodeIRIs` returns all IRIs for subjects in the RDF graph.

The following custom query class (sublass of `SPARQLQueryFramer`) is designed to return
only a IRIs that represent one of three types from the RDF graph:

- Humans
- Starships
- Gungans

We will define a static `VALUES` statement in the SPARQL query to capture this
requirement.

In [None]:
from ipyradiant.query.framer import SPARQLQueryFramer


class HumanAndStarshipIRIs(SPARQLQueryFramer):
    sparql = """
    PREFIX voc: <https://swapi.co/vocabulary/>

    SELECT DISTINCT ?iri
    WHERE {
      ?iri a ?type .
      
      VALUES (?type) {
          (voc:Human)
          (voc:Starship)
          (voc:Gungan)
      }
    }
    """

In [None]:
# Simple execution to verify query is working
HumanAndStarshipIRIs.run_query(lw.graph).head(3)

### 2. Return data from neighboring node as custom node attribute

The `RDF2NX` class defines an attribute for each query class, which capture specific
aspects of the transformation behavior. The query for determining data properties stored
on nodes within the LPG is defined by the `RDF2NX.node_properties` attribute. The
default class `ipyradiant.rdf2nx.nodes.NodeProperties` returns all values for all
predicates attached to the source node in the RDF graph (this includes literal and IRI
values).

There are many situations where data that is relevant to some node is stored on a
separate node object several edges away from the source. When visualizing a graph, it
may be valuable to bring that data forward and present it on the node of interest.

The following custom query class (sublass of `SPARQLQueryFramer`) is designed to create
a new attribute. For nodes that are connected to a Starship (via `voc:starship`), the
custom query class will create a new attribute (`ex:starshipName`) that returns the name
of the connected Starship as a data attribute on the node iteself (e.g. the pilot of a
Starship will now have data for the Starship names rather than only a connection to
their Starships). See the image below for a visual example:

<img src="assets/RDF2NX_1.png"></img>

> Note: We don't want to replace the original `NodeProperties` query. Rather, we want to
> add to the results of the query. Therefore, we need to specify both as valid queries
> for the converter class. Notice how the `RDF2NX.node_properties` attribute (specified
> further down) is a list of queries, the results of which are aggregated together.

In [None]:
class CustomNodeProperty(SPARQLQueryFramer):
    # Note: construct queries are valid too (must specify columns)
    sparql = """
    CONSTRUCT{
      ?iri ?predicate ?value.
    } WHERE {
      ?iri voc:starship/rdfs:label ?value .
           
      BIND (ex:starshipName AS ?predicate)
    }
    """
    # Note: we can specify the namespaces on the query class too
    initNs = {"ex": "https://www.example.org/", "voc": "https://swapi.co/vocabulary/"}
    columns = ["iri", "predicate", "value"]

#### Verify the custom query is working

It is common to make mistakes that result in namespace/parse/etc. errors within SPARQL
queries. Since many of these are silent errors within larger processes (e.g. `RDF2NX`),
we should verify that our query will return valid results when passed one of the nodes
of interest.

In [None]:
from rdflib import URIRef

CustomNodeProperty.run_query(lw.graph, iri=URIRef("https://swapi.co/resource/human/1"))

> Note: we can see that the query successfully returns the `voc:starship/rdfs:label` as
> `ex:starshipName` (i.e. the predicate)

### Execute the `RDF2NX` process

In [None]:
from ipyradiant.rdf2nx import RDF2NX

# We must import the original query in order to add to the transformation class
from ipyradiant.rdf2nx.nodes import NodeProperties

# Namespaces defined for shortened URIs
initNs = {
    "ex": "https://www.example.org/",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "res": "https://swapi.co/resource/",
    "voc": "https://swapi.co/vocabulary/",
    "base": "https://swapi.co/resource/",
}

# overwrite the default node_iris query
RDF2NX.node_iris = HumanAndStarshipIRIs

# overwrite the default node_properties query with two queries
RDF2NX.node_properties = [NodeProperties, CustomNodeProperty]

# run the converter
nx_graph = RDF2NX.convert(lw.graph, namespaces=initNs)

#### Post-process the networkx graph

We don't want to visualize nodes without connections. We can solve this problem using
built-in `networkx` capabilities (now that our graph in an LPG).

> Note: in a future version of `ipyradiant`, this will be encapsulated in the behavior
> of the `CytoscapeViewer` widget.

In [None]:
import networkx as nx

nx_graph.remove_nodes_from(list(nx.isolates(nx_graph)))

## Example Graph Visualization with Cytoscape

The purpose of creating these projections is often to generate a view that is easier for
humans to interpret (or to leverage graph algorithms for LPGs). The following cells
demonstrate how the `ipyradiant` visualization widget `InteractiveViewer` can be used to
visualize and inspect an LPG graph.

> Note: the option to remove disconnected nodes will be enabled on ticket
> [#110](https://github.com/jupyrdf/ipyradiant/issues/110)

In [None]:
from ipyradiant.visualization import InteractiveViewer

iv = InteractiveViewer()
iv._rdf_converter = RDF2NX
iv.rdf_graph = lw.graph
iv

> Note: all nodes are of at least one type specified by the custom `RDF2NX.node_iris`
> query.

> Note: any character node with startships has a data attribute `ex:starshipName` that
> was created by the custom `RDF2NX.node_properties` query.