Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nx-cugraph: Fixes dependency on missing feature, adds attrs to allow auto-dispatch for generators #4558

Conversation

rlratzel
Copy link
Contributor

@rlratzel rlratzel commented Jul 27, 2024

This PR fixes a breaking change that was merged in #4629, based on an assumption the NetworkX 3.4 will have that feature.

ImportError: cannot import name '_get_cache_key' from 'networkx.utils.backends' (/opt/conda/envs/nightly/lib/python3.11/site-packages/networkx/utils/backends.py)

Resulting from a dependency on a feature in an unmerged NetworkX PR. We may revisit this fix later and possibly just close this PR if that feature is merged into NetworkX.

This PR also updates nx-cugraph Graph and DiGraph classes to inherit from nx.Graph, and adds the appropriate cached_properties to lazily convert and cache to a NetworkX Graph and expose the appropriate dictionaries accordingly. These changes allow a nx_cugraph.Graph instance to be drop-in compatible with networkx functions that are not yet supported by nx_cugraph.

Combine this with the changes to NetworkX in this PR to auto dispatch generators if they return compatible backend types and allow compatible backend types to fallback to networkx, and users can maximize e2e acceleration for their workflows without code changes.

edgelist_csv = "/datasets/cugraph/csv/directed/cit-Patents.csv"
edgelist_df = pd.read_csv(edgelist_csv, sep=" ", names=["src", "dst"], dtype="int32")

with Timer("from_pandas_edgelist"):
    G = nx.from_pandas_edgelist(
        edgelist_df, source="src", target="dst", create_using=nx.DiGraph)

print(type(G))

with Timer("number of nodes and edges"):
    print(f"{G.number_of_nodes()=}, {G.number_of_edges()=}")

with Timer("pagerank"):
    pr = nx.pagerank(G)

with Timer("coloring"):
    c1 = nx.coloring.greedy_color(G)

with Timer("coloring (again)"):
    c2 = nx.coloring.greedy_color(G)

with Timer("adding a node"):
    G.add_edge(0, (3.14159, "string_in_tuple"))

print(type(G))
print(f"{G.number_of_nodes()=}, {G.number_of_edges()=}")

with Timer("re-running pagerank"):
    pr2 = nx.pagerank(G)

print(f"new vs. orig nodes: {pr2.keys() - pr.keys()}")

with Timer("pad_graph (this mutates the input graph)"):
    cc = nx.coloring.equitable_coloring.pad_graph(G, 11)

print(type(G))
print(f"{G.number_of_nodes()=}, {G.number_of_edges()=}")

with Timer("re-running pagerank"):
    pr3 = nx.pagerank(G)

print(f"new vs. orig nodes: {pr3.keys() - pr.keys()}")

Timer.print_total()

No backends used:

(nx) root@8546eec3d49d:~# python zcc_demo.py

from_pandas_edgelist...
Done in: 0:00:50.219987
<class 'networkx.classes.digraph.DiGraph'>

number of nodes and edges...
G.number_of_nodes()=3774768, G.number_of_edges()=16518948
Done in: 0:00:01.851362

pagerank...
Done in: 0:01:10.388206

coloring...
Done in: 0:00:13.802888

coloring (again)...
Done in: 0:00:13.793485

adding a node...
Done in: 0:00:00.000018
<class 'networkx.classes.digraph.DiGraph'>
G.number_of_nodes()=3774769, G.number_of_edges()=16518949

re-running pagerank...
Done in: 0:01:03.532062
new vs. orig nodes: {(3.14159, 'string_in_tuple')}

pad_graph (this mutates the input graph)...
Done in: 0:00:00.000764
<class 'networkx.classes.digraph.DiGraph'>
G.number_of_nodes()=3774771, G.number_of_edges()=16518950

re-running pagerank...
Done in: 0:01:16.790938
new vs. orig nodes: {(3.14159, 'string_in_tuple'), 3774769, 3774770}
Total time: 0:04:50.379710

nx-cugraph backend used - nx-cugraph does not yet support coloring.greedy_color() or nx.coloring.equitable_coloring.pad_graph(), note the first call to coloring includes the conversion to a networkx Graph, but the second uses the cached conversion:

(nx) root@8546eec3d49d:~# NETWORKX_BACKEND_PRIORITY=cugraph python zcc_demo.py

from_pandas_edgelist...
Done in: 0:00:00.664462
<class 'nx_cugraph.classes.digraph.DiGraph'>

number of nodes and edges...
G.number_of_nodes()=3774768, G.number_of_edges()=16518948
Done in: 0:00:00.000008

pagerank...
Done in: 0:00:03.741143

coloring...
Done in: 0:01:11.706015

coloring (again)...
Done in: 0:00:11.752219

adding a node...
Done in: 0:00:13.415563
<class 'nx_cugraph.classes.digraph.DiGraph'>
G.number_of_nodes()=3774769, G.number_of_edges()=16518949

re-running pagerank...
Done in: 0:00:00.878451
new vs. orig nodes: {(3.14159, 'string_in_tuple')}

pad_graph (this mutates the input graph)...
Done in: 0:00:13.069187
<class 'nx_cugraph.classes.digraph.DiGraph'>
G.number_of_nodes()=3774771, G.number_of_edges()=16518950

re-running pagerank...
Done in: 0:00:00.896314
new vs. orig nodes: {3774769, 3774770, (3.14159, 'string_in_tuple')}
Total time: 0:01:56.123361

Also note, when debug logging is enabled, you can see calls made from within networkx functions being dispatched appropriately:

pad_graph (this mutates the input graph)...
DEBUG:networkx.utils.backends:no backends are available to handle the call to `pad_graph` with graph types {'cugraph'}
DEBUG:networkx.utils.backends:falling back to backend 'networkx' for call to `pad_graph' with args: (<nx_cugraph.classes.digraph.DiGraph object at 0x7efb84138d60>, 11), kwargs: {}
DEBUG:networkx.utils.backends:using backend 'cugraph' for call to `complete_graph' with args: (2, None), kwargs: {}
DEBUG:networkx.utils.backends:using backend 'cugraph' for call to `relabel_nodes' with args: (<nx_cugraph.classes.graph.Graph object at 0x7efb84139c60>, {0: 3774769, 1: 3774770}, True), kwargs: {}
Done in: 0:00:13.226258

zcc_demo.py.txt

@rlratzel rlratzel added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 27, 2024
@github-actions github-actions bot removed the conda label Jul 31, 2024
@rlratzel rlratzel changed the title Updates to make nx_cugraph.Graph a subclass of nx.Graph, adds attrs for auto-dispatch for generators Updates to make nx_cugraph.Graph a drop-in replacement for nx.Graph, adds attrs for auto-dispatch for generators Jul 31, 2024
@rlratzel rlratzel added this to the 24.10 milestone Jul 31, 2024
rapids-bot bot pushed a commit that referenced this pull request Sep 24, 2024
…workX Graph classes, needed for zero code change graph generators (#4629)

This is an alternative approach to #4558 for enabling GPU-accelerated NetworkX to "just work". It has similarities to #4558. I opted to make separate classes such as `ZeroGraph`, which I think makes for cleaner separation and gives us and users more control.

There are a few lingering TODOs and code comments to tidy up, but I don't think there are any show-stoppers. I have not updated methods (such as `number_of_nodes`) to optimistically try to use GPU if possible, b/c this is not strictly necessary, but we should update these soon.

I have run NetworkX tests with these classes using networkx/networkx#7585 and networkx/networkx#7600. We need the behavior in 7585, and 7600 is only useful for testing.

There are only 3 new failing tests, and 3 tests that hang (I'll run them overnight to see if they finish). Here's a test summary:
```
5548 passed, 24 skipped, 16 xfailed, 25 xpassed
```
Note that 25 tests that were failing now pass. I have not investigated test failures, xfails, or xpasses yet. I would like to add tests too.

We rely heavily on the networkx cache. I think this is preferred.

It is late for me. I will describe and show how and why this works later.

I opted for `zero=` and `ZeroGraph`, because I find them delightful! Renaming is trivial if other terms are preferred.

CC @quasiben

Authors:
  - Erik Welch (https://github.com/eriknw)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4629
@rlratzel rlratzel changed the title Updates to make nx_cugraph.Graph a drop-in replacement for nx.Graph, adds attrs for auto-dispatch for generators nx-cugraph: Adds attrs to allow auto-dispatch for generators Sep 25, 2024
@rlratzel rlratzel changed the title nx-cugraph: Adds attrs to allow auto-dispatch for generators nx-cugraph: Fixes dependency on missing feature, adds attrs to allow auto-dispatch for generators Sep 26, 2024
@rlratzel rlratzel self-assigned this Sep 26, 2024
@rlratzel rlratzel added bug Something isn't working and removed improvement Improvement / enhancement to an existing function labels Sep 26, 2024
@rlratzel rlratzel marked this pull request as ready for review September 26, 2024 20:56
@rlratzel rlratzel requested a review from a team as a code owner September 26, 2024 20:56
@rlratzel
Copy link
Contributor Author

rlratzel commented Oct 1, 2024

Closing this now that networkx/networkx#7578 has been merged and the upstream code dependency is in place as well as the needed dispatching feature.

@rlratzel rlratzel closed this Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant