## RDF to Networkx (an LPG)

This process help reduce data complexity by transforming an RDF graph to networkx (LPG).
Unlike other similar capabilities
(`rdflib.extras.external_graph_libs.rdflib_to_networkx_multidigraph`) this process
allows properties to be collapsed onto LPG nodes. This reduces visual complexity, and
can provide performance improvements for graph algorithms such as distance (due to
reduced number of nodes).

The process is designed to provide a customizable way to change the behavior of the
transformation. This work is still in progress, so bugs and unexpected behavior may
occur.

In [None]:
import rdflib

### Load an RDF graph

In this example, we will use the `ipyradiant` `FileManager`.

In [None]:
from ipyradiant import FileManager, PathLoader

lw = FileManager(loader=PathLoader(path="data"))
# here we hard set what we want the file to be, but ideally a user can choose a file to work with.
lw.loader.file_picker.value = lw.loader.file_picker.options["starwars.ttl"]
lw

Create a simple graph with just data about character `Luke Skywalker`, character
`Darth Vader`, and film `A New Hope` (to reduce complexity).

> Note: large graphs cannot be visualized with cytoscape, and will not work with the
> visualization portion of this example.

In [None]:
qres = lw.graph.query(
    """
    PREFIX hum: <https://swapi.co/resource/human/>
    PREFIX film: <https://swapi.co/resource/film/>
    
    CONSTRUCT {
        ?s ?p ?o .
    }
    WHERE {
        ?s ?p ?o .
        
        VALUES (?s) {
            (hum:1)  # Luke
            (hum:4)  # Vader
            (film:1) # A New Hope
        }
    }
    """
)

simple_graph = rdflib.graph.Graph().parse(data=qres.serialize(format="xml"))
print("# triples in our simple graph:", len(simple_graph))

### URI Converters (configurable)

[id_converter_link]:
  https://github.com/Rothamsted/rdf2neo/blob/master/rdf2neo/src/main/java/uk/ac/rothamsted/rdf/neo4j/idconvert/DefaultIri2IdConverter.java

These converters allow us to simplify the representation of URIs in the netowrkx LPG.
They can be configured to perform custom conversion within the larger `rdf2nx` process.

[Adapted from this KnetMiner `rdf2neo` (RDF to neo4j) process.][id_converter_link]

For more examples, see [this notebook.](URI_Converter_Examples.ipynb)

In [None]:
from rdflib.namespace import RDFS

from ipyradiant.rdf2nx import URItoID, URItoShortID

uri = RDFS.label
ns = {"rdfs": str(RDFS)}
print(f"URItoID:\n  -  {uri} -> {URItoID(uri)}")
print("URItoShortID w/ specified namespace:")
print(f"  -  prefix: namespace = rdfs: {RDFS}")
print(f"  -  {uri} -> {URItoShortID(uri, ns=ns)}")

## Queries to facilitate RDF -> LPG (networkx)

[rdf2neo_link]: https://github.com/Rothamsted/rdf2neo/blob/master/README.md

[Adapted from this KnetMiner `rdf2neo` (RDF to neo4j) process.][rdf2neo_link]

## `rdf2nx` Node Queries

Node queries directly enable the creation of LPG nodes and their properties.

1. NodeIRIs:

- SPARQL query that lists all the IRIs about RDF resources that represent a node.
- Will typically return instances of target classes, although may also catch resources
  of interest by targeting subjects or objects of given relations.
- It is <b>very important</b> that the query returns <u>distinct</u> results.

2.  NodeTypes (<i>label</i> in neo4j):

- Invoked for each IRI returned by `NodeIRIs`, and is parameterized over a single node.
- Invoked once per node; its purpose is to list all types that have to be assigned to
  the node.
- A type can be IRI, literal, or string. If it's an IRI, it will be translated into an
  identifier via the configured URItoID converter.

3. NodeProperties:

- Invoked once per node (`?iri` bound to a single node).
- Returns a list of all pairs of predicate+value that will be assigned to the LPG node.
- Every node must have an `iri` property in order to process RDF-defined relations. This
  property is always indexed, and has distinct values.
- Every node has a default type (`label` in neo4j).
  - The predefined value for this can be changed by configuring a `defaultNodeLabel` (in
    future versions).
  - Is used to find specific nodes.
- Literal values will be converted (e.g. RDF numbers to Python numbers)(config option in
  future version).
- Names are typically converted to shorthand ID using the configured `URItoID`

## `rdf2nx` Property Queries

[reification_link]: https://www.w3.org/wiki/RdfReification

Similarly to nodes, rdf2lpg needs first a list of relations to be created. These must
refer to their linking nodes by means of the node URIs (mapped earlier via the iri
property).

As you can see, we need certain properties always reported after the SELECT keyword.
Among these, we always need the relation URI, which has to be computed for straight (non
reified) triples too.

Similarly to nodes, relation URIs (i.e., ?iri) are needed by rdf2lpg in order to check
for their properties with the relation property query. Moreover, it is a good way to
keep track of multiple statements about the same subject/predicate/property.

1. RelationTypes

- Relation types are based on triple predicates (e.g. `ex:birthPlace`)
- Returns a list of relations to be created in the LPG.
- A fictitous IRI is created for plain relations. This is used to uniquely identify
  specific relationship instances.

2. ReifiedRelations

- Similar to the `RelationTypes`, but collects the same information for the [RDF
  reification pattern][reification_link].

3. RelationProperties

> Note: once reified relationships are selected with the query above, a simple
> relationship property query is used to get additional properties of the relationship

## Example Process (Vanilla)

No custom logic is applied. This uses the default conversion queries built into
`ipyradiant`.

For examples of how to apply custom logic see [this notebook](RDF_to_NX_Custom.ipynb).

In [None]:
from ipyradiant.rdf2nx import RDF2NX

In [None]:
# Namespaces defined for shortened URIs
initNs = {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "res": "https://swapi.co/resource/",
    "voc": "https://swapi.co/vocabulary/",
    "base": "https://swapi.co/resource/",
}

In [None]:
rdf_graph = simple_graph  # assign to variable for easy access
nx_graph = RDF2NX.convert(rdf_graph, namespaces=initNs)

## Example Graph Visualization with Cytoscape

In [None]:
from ipyradiant.visualization import CytoscapeViewer

cv = CytoscapeViewer()
# specify the label key for the nx graph (default="label")
cv._nx_label = "rdfs:label"
cv.graph = nx_graph
cv

Checkout [this example](JSON_Interactive_Example.ipynb) for a demonstration on linking the visualization widget to a JSON inspector for viewing node data. 