-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readwrite module for interfacing with RDF #855
Conversation
TODO: implement building from generic and rgml graphs
…ctions and readwrite modules
+ # I think we don't need to consider disconnected nodes, since that + # is impossible to represent in straight RDF. That is, RDF has a + # concept of disconnected triples, but not terms, which are + # properly analogous to nodes in networkx
- Fix error in to_rdfgraph() dispatching mechanism due to iterators being consumed and later reused
- reduced cyclomatic complexity on {to,from}_rgmlgraph below 10 - improved pep8 compliance - tools used: pep8, pyflakes, pylint, pygenie, pymetrics
Make Python3 compatible Skip failing tests Skip tests completely if no rdflib
Pull request at pedros#1 with some suggested changes. |
In addition to the changes from pedros#1,
|
Updates for Python3, skip tests
@pedros Thanks for the contribution! And sorry for the delay in review 😅 As of today it looks like rdflib maintains connectors to networkx https://github.com/RDFLib/rdflib/blob/9625ed0b432c9085e2d9dda1fd8acf707b9022ab/rdflib/extras/external_graph_libs.py#L72 so we don't need to add them here :) The link to RGML seems broken, I think this is the current one https://www.cs.rpi.edu/~puninj/rgml.html . And from previous experience maintaining markup languages in networkx becomes a maintenance burden so we are trying to avoid adding new readwrite modules, especially the ones which don't seem to have robust support outside of networkx already. Let me know what you think. Thanks again! |
I agree - there's not a lot to gain by duplicating format-conversion functionality, and the RDF library seems like a more natural place for these to live. I will go ahead and close this - thanks all for the proposal & discussion! |
def rdflib_to_networkx_digraph(
graph,
calc_weights=True,
edge_attrs=lambda s, p, o: {"triples": [(s, p, o)]},
**kwds,
):
def rdflib_to_networkx_multidigraph(
graph, edge_attrs=lambda s, p, o: {"key": p}, **kwds
):
@pedros @rossbar Would this rdflib read/write code be best as a third-party module? https://github.com/networkx/networkx/blob/main/setup.py:
(Edit)
|
@westurner The plugins bits are currently setup to work for backend computation plugins, not readwrite modules. But this is something that indeed can be thought more about :) For readwrite to arrow/Parquet, I think we can have a readwrite inside networkx too! (just my opinion) Arrow is a robust data format outside of networkx and if there is an efficient way of reading/writing into that I think that's a plus. Now if someone comes up and implements algorithms on top of arrow data structures for graphs, that would be great :D. We would be able to directly latch into that as a backend. |
RDF support would be worthwhile as a core "read write plugin" or as a third-party adapter with it's own integration tests that depend upon rdflib import IMHO. These have C/C++-based
https://github.com/rapidsai/cugraph/blob/branch-23.02/readme_pages/algorithms.md https://github.com/rapidsai/cugraph#apache-arrow-on-gpu-- :
https://www.phoronix.com/news/Intel-oneAPI-2023 :
|
rdflib already has all the support for conversion b/w networkx and rdf, not sure what else we can/should add.
Well we support (soon) cugraph as a backend for networkx so that's good news. But arrow is still columnar memory layout which doesn't really work that well with graph algorithms, so it's not that straight forward having arrow support for the graph data structure itself. Yes, it can work as a dumping ground for graph data but not something we can write code on top off (which is the more interesting thing to me). |
https://arrow.apache.org/powered_by/ Ctrl-F "graph" doesn't appear to list e.g. CuGraph, which is built on Apache Arrow. Does pyarrow already support SparseTensors?
|
FWIW, rdflib-hdt also includes support for RDF HDT Header Dictionary Triples; which IIUC this PR would make easier to readwrite from? https://en.wikipedia.org/wiki/HDT_(data_format) |
TLDR Model serialization: https://en.wikipedia.org/wiki/Serialization
Just found this, which mentions SQLAlchemy and rdflib: https://github.com/unum-cloud/NetworkXum#project-structure |
Read and write graphs in RDF format. Requires rdflib. This module adds 8 new functions:
It can load arbitrary RDF graphs, or RGML-namespaced graphs. RGML is the RDF Graph Modeling Language, an RDF ontology for generic graphs.
For arbitrary RDF graphs, it can represent them as multiple directed labeled graphs as per the RDF specification, with terms that occur as subject/object and, later on, as a predicate, being reified multiple times. Since this is not optimal for connectivity analyses, it also recognizes that RDF graphs are, in fact, hypergraphs, and as such represents them as bipartite graphs where each node in one partition has 3 edges pointing to subject, predicate, and object nodes in the other partition.
For more information, see Hayes, J. (2004). A graph model for RDF. Technische Universität Darmstadt
All serialization formats supported by
rdflib
are supported, currently: