## RDF to Networkx (an LPG)
This process help reduce data complexity by transforming an RDF graph to networkx (LPG). Unlike other similar capabilities (`rdflib.extras.external_graph_libs.rdflib_to_networkx_multidigraph`) this process allows properties to be collapsed onto LPG nodes. This reduces visual complexity, and can provide performance improvements for graph algorithms such as distance (due to reduced number of nodes). 

The process is designed to provide a customizable way to change the behavior of the transformation. This work is still in progress, so bugs and unexpected behavior may occur. 


In [None]:
import pandas
import rdflib
from ipyradiant import (
    CustomURIRef,
    FileManager,
    MultiPanelSelect,
    PathLoader,
    PredicateMultiselectApp,
    collapse_predicates,
)
from ipyradiant.sparql.api import SPARQLQueryFramer, build_values
from networkx import MultiDiGraph
from rdflib import URIRef

### Load an RDF graph
In this example, we will use the `ipyradiant` `FileManager`. 

In [None]:
lw = FileManager(loader=PathLoader(path="data"))
# here we hard set what we want the file to be, but ideally a user can choose a file to work with.
lw.loader.file_picker.value = lw.loader.file_picker.options["starwars.ttl"]
lw

Create a simple graph with just data about character `Luke Skywalker`, character `Darth Vader`, and film `A New Hope` (to reduce complexity).
> Note: larger graphs cannot be visualized with cytoscape, and will not work with the visualization portion of this example. 

In [None]:
qres = lw.graph.query(
    """
    PREFIX hum: <https://swapi.co/resource/human/>
    PREFIX film: <https://swapi.co/resource/film/>
    
    CONSTRUCT {
        ?s ?p ?o .
    }
    WHERE {
        ?s ?p ?o .
        
        VALUES (?s) {
            (hum:1)  # Luke
            (hum:4)  # Vader
            (film:1) # A New Hope
        }
    }
    """
)

simple_graph = rdflib.graph.Graph().parse(data=qres.serialize(format='xml'))
print("# triples in our simple graph:", len(simple_graph))

### Converters
TODO discussion

#### URItoID default converter (configurable)
[id_converter_link]: https://github.com/Rothamsted/rdf2neo/blob/master/rdf2neo/src/main/java/uk/ac/rothamsted/rdf/neo4j/idconvert/DefaultIri2IdConverter.java

These converters allow us to simplify the representation of URIs in the netowrkx LPG. They can be configured to perform custom conversion within the larger `rdf2nx` process. 

[Adapted from this KnetMiner `rdf2neo` (RDF to neo4j) process.][id_converter_link]

> Future implementation note: we will want to store converted URIs in a dict and check
> the dict before calling `URItoID`

In [None]:
from ipyradiant.rdf2nx import URItoID, URItoShortID

#### Converter Examples
> TODO make this easier to read or move to another notebook.

In [None]:
test_uri_1 = URIRef("https://www.example.com/Person")
test_uri_2 = URIRef("https://www.other_example.com/Person#Person1")
test_namespaces = {
    "ex": URIRef("https://www.example.com/")
}  # could use dict(lw.graph.namespaces)

print("Trailing '/':", URItoID(test_uri_1))
print("Trailing '#':", URItoID(test_uri_2))
print("No namespaces provided (URItoID):", URItoShortID(test_uri_1))
print(
    "Namespaces provided and valid for URI:",
    URItoShortID(test_uri_1, ns=test_namespaces),
)
print(
    "Namespaces provided but none for URI:",
    URItoShortID(test_uri_2, ns=test_namespaces),
)

### Literal to python type (mapping configurable)

Used to convert typed Literals to other types.

This is a work in progress and does not currently have an implementation. (This is a stub section).

In [None]:
from ipyradiant.rdf2nx.literal_map import LiteralMap, LiteralTyping

## Queries to facilitate RDF -> LPG (networkx)
[rdf2neo_link]: https://github.com/Rothamsted/rdf2neo/blob/master/README.md
[Adapted from this KnetMiner `rdf2neo` (RDF to neo4j) process.][rdf2neo_link]

## `rdf2nx` Node Queries
Node queries directly enable the creation of LPG nodes and their properties. 

1. NodeIRIs: 
  * SPARQL query that lists all the IRIs about RDF resources that represent a node.
  * Will typically return instances of target classes, although may also catch resources of interest by targeting subjects or objects of given relations.
  * It is <b>very important</b> that the query returns <u>distinct</u> results.
0. NodeTypes (<i>label</i> in neo4j):
  * Invoked for each IRI returned by `NodeIRIs`, and is parameterized over a single node.
  * Invoked once per node; its purpose is to list all types that have to be assigned to the node.
  * A type can be IRI, literal, or string. If it's an IRI, it will be translated into an identifier via the configured URItoID converter.
0. NodeProperties:
  * Invoked once per node (`?iri` bound to a single node).
  * Returns a list of all pairs of predicate+value that will be assigned to the LPG node.
  * Every node must have an `iri` property in order to process RDF-defined relations. This property is always indexed, and has distinct values.
  * Every node has a default type (`label` in neo4j). 
    * The predefined value for this can be changed by configuring a `defaultNodeLabel` (in future versions).
    * Is used to find specific nodes.
  * Literal values will be converted (e.g. RDF numbers to Python numbers)(config option in future version).
  * Names are typically converted to shorthand ID using the configured `URItoID`

## `rdf2nx` Property Queries
[reification_link]: https://www.w3.org/wiki/RdfReification

Similarly to nodes, rdf2lpg needs first a list of relations to be created. These must
refer to their linking nodes by means of the node URIs (mapped earlier via the iri
property).

As you can see, we need certain properties always reported after the SELECT keyword.
Among these, we always need the relation URI, which has to be computed for straight (non
reified) triples too.

Similarly to nodes, relation URIs (i.e., ?iri) are needed by rdf2lpg in order to check
for their properties with the relation property query. Moreover, it is a good way to
keep track of multiple statements about the same subject/predicate/property.

1. RelationTypes
 * Relation types are based on triple predicates (e.g. `ex:birthPlace`)
 * Returns a list of relations to be created in the LPG. 
 * A fictitous IRI is created for plain relations. This is used to uniquely identify specific relationship instances.
0. ReifiedRelations
  * Similar to the `RelationTypes`, but collects the same information for the [RDF reification pattern][reification_link]. 
0. RelationProperties

> Note: once reified relationships are selected with the query above, a simple
> relationship property query is used to get additional properties of the relationship

## Example Process (Vanilla)
No custom logic is applied. This uses the default conversion queries built into `ipyradiant`. 

In [None]:
from ipyradiant.rdf2nx import RDF2NX

In [None]:
# Namespaces defined for shortened URIs
# TODO what if these were URIRefs, Namespaces, or NamespaceManager?
initNs = {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "res": "https://swapi.co/resource/",
    "voc": "https://swapi.co/vocabulary/",
    "base": "https://swapi.co/resource/",
}

In [None]:
rdf_graph = simple_graph
nx_graph = RDF2NX.convert(rdf_graph, namespaces=initNs)

## Example Graph Visualization with Cytoscape
Very basic for now. Just to illustrate the LPG. 
> Note: Cytoscape appears to cast tuples to list for the visualization.

In [None]:
import json
import ipycytoscape
import ipywidgets as W

In [None]:
# Here we will build a cytoscape representation of the RDF graph to show the reduces complexity
from rdflib.extras.external_graph_libs import rdflib_to_networkx_multidigraph

cyto_from_rdf = ipycytoscape.CytoscapeWidget()
cyto_from_rdf.graph.add_graph_from_networkx(
    rdflib_to_networkx_multidigraph(rdf_graph),
    multiple_edges=True, 
    directed=True,
)
for node in cyto_from_rdf.graph.nodes:
    # deal with inability to handle colons
    node.data["_label"] = str(node.data.get("id", None))
    
cyto_from_rdf.set_layout(animate=False)
cyto_from_rdf.set_style(
    [
        {
            "selector": "node",
            "css": {
                "label": "data(_label)",
                "text-wrap": "wrap",
                "text-max-width": "150px",
                "text-valign": "center",
                "text-halign": "center",
                "font-size": "10",
                "font-family": '"Gill Sans", sans-serif',
                "color": "blue"
            },
        },
        {
            'selector': 'edge.directed',
            'style': {
                'curve-style': 'bezier',
                'target-arrow-shape': 'triangle',
            }
        },
        {
            'selector': 'edge.multiple_edges', 
            'style': {'curve-style': 'bezier'}
        },
    ]
)

In [None]:
directed = ipycytoscape.CytoscapeWidget()
directed.graph.add_graph_from_networkx(nx_graph, multiple_edges=True, directed=True)

In [None]:
for node in directed.graph.nodes:
    # deal with inability to handle colons
    node.data["_label"] = node.data.get("rdfs:label", None)
    node.data["_attrs"] = json.dumps(node.data, indent=2)  # TODO remove iri, private attrs, etc.?

In [None]:
# TODO set layout and CSS within ipyradiant library
directed.set_layout(name="dagre", animate=False, randomize=False, maxSimulationTime=2000)
# Workaround for style overwriting
directed.set_style(
    [
        {
            "selector": "node",
            "css": {
                "label": "data(_label)",
                "text-wrap": "wrap",
                "text-max-width": "150px",
                "text-valign": "center",
                "text-halign": "center",
                "font-size": "10",
                "font-family": '"Gill Sans", sans-serif',
                "color": "blue"
            },
        },
        {
            "selector": "edge",
            "css": {
                "label": "data(_label)",
                "text-wrap": "wrap",
                "text-max-width": "150px",
                "text-valign": "center",
                "text-halign": "center",
                "font-size": "10",
                "font-family": '"Gill Sans", sans-serif',
                "color": "green"
            },
        },
        {
            'selector': 'edge.directed',
            'style': {
                'curve-style': 'bezier',
                'target-arrow-shape': 'triangle',
            }
        },
        {
            'selector': 'edge.multiple_edges', 
            'style': {'curve-style': 'bezier'}
        },
        {
            "selector": ":active ",
            "css": {
                "label": "data(_attrs)",
                "text-wrap": "wrap",
                "text-max-width": "500px",
                "text-valign": "bottom",
                "text-halign": "right",
                'text-background-opacity': 0.9,
                'text-background-color': 'white',
                'text-background-shape': 'roundrectangle',
                "color": "black",
            }
        }
    ]
)

In [None]:
W.HBox([cyto_from_rdf, directed])

## Example Process (Custom)

### Nodes
We can overwwrite the `cls.sparql` attribute of each query class in order to change the behavior of the `rdf2lpg` process. 

> Note: Make sure to overwrite the `cls.initNs` if custom namespaces are used. 

### Edges
Similar to [Nodes](#Nodes), we can overwwrite the `cls.sparql` attribute of each query class in order to change the behavior of the `rdf2lpg` process. 

> Note: Make sure to overwrite the `cls.initNs` if custom namespaces are used. 