Dispatch classes such as nx.Graph(backend=...)#7760
Conversation
|
@aMahanna, I updated dispatching graph classes. I think things are much simpler (if ever-so-slightly less capable) if we only dispatch Another perfectly valid (and probably better) approach is to dispatch to functions and not the def graph__new__(cls, incoming_graph_data=None, **attr):
return object.__new__(Graph) # Backend Graph, not nx.Graphlmk when you're able to take another look and whether this is or isn't helpful. I'm happy to help. |
| # Args are passed | ||
| G1 = nx.Graph([(0, 1), (1, 2)]) | ||
| assert not isinstance(G1, LoopbackGraph) | ||
| G2 = nx.Graph([(0, 1), (1, 2)], backend="nx_loopback") | ||
| assert isinstance(G2, LoopbackGraph) | ||
| # Should `backend=` argument that gets passed to __init__ be | ||
| # ignored or set as graph attribute? | ||
| G1.graph["backend"] = "nx_loopback" | ||
| assert nx.utils.misc.graphs_equal(G1, G2) |
There was a problem hiding this comment.
I added a test, and in doing so noticed that specifying the backend explicitly such as Graph(backend="foo") sets the graph attribute G.graph["backend"] == "foo". We can consider this a feature 😂 (indeed, it may actually be nice). This has to do with how arguments to __new__ are passed to __init__ (see https://docs.python.org/3/reference/datamodel.html#object.__new__). This also means any backend-only keyword arguments will be added to G.graph. I'm not sure if this is as nice, because it may be surprising.
There are several variations available. I think the current approach of only dispatching __new__ is the simplest and gets us pretty far. Also dispatching __init__ is possible, but can become complicated when trying to call e.g. super().__init__. We could also create a metaclass for Graph to handle __new__ and __init__ the way we want. I would prefer to explore this option before (or in addition to) exploring the option of dispatching __init__. Using metaclasses are usually avoided, but sometimes they can be the most functional, maintainable option. If we don't like saving backend keyword arguments to G.graph, I think metaclasses are a very reasonable option. Again, though, I also think it's (probably) fine to add these to G.graph.
There was a problem hiding this comment.
Should we mention this behavior in the docs?
There was a problem hiding this comment.
Yes -- we should describe this feature in the docs. And we should add comments for the __new__ method definitions (maybe just before?) that state that the method can be ignored in terms of learning the graph classes and/or implementing your own graph class.
There was a problem hiding this comment.
I took the liberty of adding a comment above each __new__ method definition in the base classes.
The proposed text as added is:
# This __new__ method just does what Python itself does automatically.
# We include it here as part of the dispatchable/backend interface.
# If your goal is to understand how the graph classes work, you can ignore
# this method, even when subclassing the base classes. If you are subclassing
# in order to provide a backend that allows class instantiation, this method
# can be overridden to return your own backend graph class.
LMKWYT
|
FYI, this doesn't affect stack traces when there is an error in In [1]: class A:
...: def __new__(cls):
...: print("A.__new__")
...: return object.__new__(cls)
...: def __init__(self):
...: print("A.__init__")
...: 1/0
...:
In [2]: class B(A):
...: def __init__(self):
...: print("B.__init__")
...: 1/0
...:
In [3]: A()
A.__new__
A.__init__
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
Cell In[3], line 1
----> 1 A()
Cell In[1], line 7, in A.__init__(self)
5 def __init__(self):
6 print("A.__init__")
----> 7 1/0
ZeroDivisionError: division by zero
In [4]: B()
A.__new__
B.__init__
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
Cell In[4], line 1
----> 1 B()
Cell In[2], line 4, in B.__init__(self)
2 def __init__(self):
3 print("B.__init__")
----> 4 1/0
ZeroDivisionError: division by zero |
There was a problem hiding this comment.
I tried testing these changes in colab with GPU runtime
! pip uninstall -y networkx
! pip install git+https://github.com/eriknw/networkx.git@dispatch_for_classes
! pip install git+https://github.com/networkx/nx-parallel.git@main
import logging
nxl = logging.getLogger("networkx")
nxl.addHandler(logging.StreamHandler())
nxl.setLevel(logging.DEBUG)
import networkx as nx
nx.config.backend_priority = ["cugraph", "parallel", "networkx"]
G = nx.Graph()
G.add_edges_from([(1, 2), (2, 3), (4, 1), (5, 3)])
nx.all_pairs_node_connectivity(G) # algorithm supported by nx-parallel but not nx-cugraphOutput error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-2-3e3b96de6dae>](https://localhost:8080/#) in <cell line: 0>()
7 nx.config.backend_priority = ["cugraph", "parallel", "networkx"]
8
----> 9 G = nx.Graph()
10 G.add_edges_from([(1, 2), (2, 3), (4, 1), (5, 3)])
11 nx.all_pairs_node_connectivity(G) # algorithm supported by nx-parallel but not nx-cugraph
2 frames
[/usr/local/lib/python3.11/dist-packages/networkx/utils/decorators.py](https://localhost:8080/#) in func(_argmap__wrapper, *args, **kwargs)
782
783 def func(*args, __wrapper=None, **kwargs):
--> 784 return argmap._lazy_compile(__wrapper)(*args, **kwargs)
785
786 # standard function-wrapping stuff
[/usr/local/lib/python3.11/dist-packages/networkx/utils/decorators.py](https://localhost:8080/#) in argmap___new___1(cls, incoming_graph_data, backend, **attr)
1 import bz2
2 import collections
----> 3 import gzip
4 import inspect
5 import itertools
[/usr/local/lib/python3.11/dist-packages/networkx/utils/backends.py](https://localhost:8080/#) in _call_if_any_backends_installed(self, backend, *args, **kwargs)
606 backend_priority = nx.config.backend_priority.get(
607 self.name,
--> 608 nx.config.backend_priority.classes
609 if self.name.endswith("__new__")
610 else nx.config.backend_priority.generators
AttributeError: 'dict' object has no attribute 'classes'
running the following also didn't work:
nx.config.backend_priority.classes = ["cugraph", "parallel", "networkx"]for more, see https://colab.research.google.com/drive/10OHlDP0kGpB7SGJnHo1R6Ge7CoTjaij0?usp=sharing
Thanks @Schefflera-Arboricola for catching and reporting this!
|
Great catch @Schefflera-Arboricola! I pushed a fix to this PR and also made a new PR with the same fix: #8034 |
dschult
left a comment
There was a problem hiding this comment.
I approve this PR!
In the list of things to do, it'd be great to have a test suite for the backend structure. Just to make sure things work the way we expect them to. And to catch things like config settings. :} I know, I know, there's lots to do.
Thanks @eriknw!! and also @rlratzel, @Schefflera-Arboricola for this work and review conversations.
amcandio
left a comment
There was a problem hiding this comment.
This is a very cool feature!
Thanks :) Thanks! |
Thanks for trying @Schefflera-Arboricola! I think adding |
| if incoming_graph_data is not None: | ||
| convert.to_networkx_graph(incoming_graph_data, create_using=self) | ||
| # load graph attributes (must be after convert) | ||
| attr.pop("backend", None) # Ignore explicit `backend="networkx"` |
There was a problem hiding this comment.
Why not this?
| attr.pop("backend", None) # Ignore explicit `backend="networkx"` | |
| self.__networkx_cache__ = attr.pop("backend", self.__networkx_cache__) |
There was a problem hiding this comment.
Thanks for taking a look @amcandio, and sorry for missing this comment! Here's my late reply.
attr.pop("backend", None) makes it so that nx.Graph(backend="networkx") doesn't add "networkx" to G.graph. It is currently unrelated to the cache. If you're proposing a way to pre-populate the cache--which sounds like a worthwhile goal!--I would suggest raising that as a separate issue or PR as a new feature.
dschult
left a comment
There was a problem hiding this comment.
I think this PR is pretty much ready -- but I would prefer to merge it after the release so we can play with it more.
rossbar
left a comment
There was a problem hiding this comment.
Dispatching the creation of graph objects is certainly a step towards a world where networkx can truly be used "just" as an API. With this feature, users could run "pure" NX code and, depending on their environment configuration, end up with execution that does not involve NetworkX implementations at all!
To the plus side, this means users who have dispatching activated both for functions and for graph instantiation can now realize the performance/scalability boost from backend algorithm implementations without having to pay the cost of creating a NetworkX graph object and translating it to the backend's data structure. For most cases this is most likely to result in a performance boost where the data conversion is a significant bottleneck. Even more compelling IMO - this also reduces the barrier towards backends focused on scalability by removing the requirement that data ever have to be stored in-memory.
The other edge of this sword is the complete decoupling of what the source code says it's doing and what's actually being done. Arguably this is already the case with dispatching to backend implementations of functions, but I have a separate, increased level of trepidation when this pattern applies to the data structures. To me, the representation of graphs as nested mappings storing adjacency info is at the core of NetworkX. Indeed there are many other representations (e.g. sparse and dense adjacency matrices) baked into the library, but conversion to these formats is explicit (e.g. nx.to_numpy_array). The ability to run nx.Graph() and get something other than the core data structure without any explicit change in the source code is quite a departure from the typical user experience/expectation. It differs significantly from the dispatching systems used by other libraries such as numpy; which provides a protocol for overriding functions applied to arrays but AFAIK doesn't enable a pattern where np.array() would ever give you e.g. a torch.Array (not without modifying imports or kwargs, see e.g. NEP 37 and NEP 35, respectively).
While I personally would prefer an approach that is more explicit, e.g. type-based dispatching, I certainly understand the arguments for zero-code change and the benefits of allowing folks to explore other approaches to solving graph problems with very little barrier-to-entry. The dispatching system has been a great project with really interesting ideas, and I would like to continue supporting exploration and innovation in that direction! Therefore I am in favor of adding this feature as long as some guardrails in place (which this PR includes), namely:
- This behavior is opt-in only
- The dispatching of the creation of graph objects is dictated by an additional, separate configuration option. AIUI this is the way things are currently set up, with
NETWORKX_BACKEND_PRIORITYthe toggle that controls dispatch of "analysis" functions (e.g.betweenness_centrality);NETWORKX_BACKEND_PRIORITY_GENERATORScontrols dispatch of "creation" functions (e.g.complete_graph) andNETWORKX_BACKEND_PRIORITY_CLASSES(new in this PR) controls the dispatch of generic graph creation via the class constructors (e.g.nx.Graph())
Another concern is the readability of the source code - IMO the comment surrounding the dispatching of the __new__ method are sufficient to make readers aware that the affected bits are only relevant for dispatching.
Thanks to @dschult @Schefflera-Arboricola and @amcandio for the review and discussion - as usual I learned a ton from reading through everyone's comments and perspectives! And special thanks to @eriknw for putting this together - I can't wait to see what applications this unlocks!
|
Thanks for the thoughtful review and reply @rossbar! I believe you understand things correctly, and I agree with your guiding principles and general view of things--including how helpful other reviewers have been :) |
This PR goes with networkx/networkx#7760 to allow e.g. `nx.Graph()` or `nx.Graph(backend="cugraph")` to create `nxcg.Graph`. Graphs created in this way are networkx-compatible graphs by default. If the user is running in "strict", GPU-only mode by setting `nx.config.backends.cugraph.use_compat_graphs` to False, then Cuda Graphs will be returned. Also, I removed `yesqa` pre-commit hook, which is unnecessary b/c this is handled by ruff via [RUF100](https://docs.astral.sh/ruff/rules/unused-noqa/). `yesqa` could occasionally be a minor nuisance by removing a `noqa` understood by ruff (as was happening in earlier commits of this PR). Despite the "WIP" / "Draft" labels, this is more-or-less ready for review and is awaiting networkx/networkx#7760 (which is also ready for review). Authors: - Erik Welch (https://github.com/eriknw) Approvers: - Rick Ratzel (https://github.com/rlratzel) - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Bradley Dice (https://github.com/bdice) URL: #92
I was experimenting running libraries that depend on networkx, and I wanted to also dispatch classes to backends just to see. This PR dispatches e.g.
nx.Graph(...). When we previously discussed possibly doing this, we agreed "not yet", so let's discuss again. I think the changes to enable this are actually pretty minimal; as ever, testing would be nice to have.CC @rlratzel. Also, @aMahanna, you may find this interesting; you could do e.g.
nx.Graph(backend="arangodb", db=db).