# Benchmarking Performance of NetworkX without and with the RAPIDS GPU-based nx-cugraph backend

This notebook collects the run-times without and with the nx-cugraph backend enabled for three popular NetworkX algorithms: Betweenness Centrality, Breadth First Search, and Louvain Community Detection.

Here is a sample minimal script to demonstrate no-code-change GPU acceleration using nx-cugraph.

----
bc_demo.ipy:

```
import pandas as pd
import networkx as nx

url = "https://data.rapids.ai/cugraph/datasets/cit-Patents.csv"
df = pd.read_csv(url, sep=" ", names=["src", "dst"], dtype="int32")
G = nx.from_pandas_edgelist(df, source="src", target="dst")

%time result = nx.betweenness_centrality(G, k=10)
```
----
Running it with the nx-cugraph backend looks like this:
```
user@machine:/# ipython bc_demo.ipy
CPU times: user 7min 38s, sys: 5.6 s, total: 7min 44s
Wall time: 7min 44s

user@machine:/# NETWORKX_BACKEND_PRIORITY=cugraph ipython bc_demo.ipy
CPU times: user 18.4 s, sys: 1.44 s, total: 19.9 s
Wall time: 20 s
```
----


First import the needed packages

In [1]:
import os
import pandas as pd
import networkx as nx

This installs nx-cugraph if not already present.

In [2]:
try: 
    import nx_cugraph
except ModuleNotFoundError:
    os.system('conda install -c rapidsai -c conda-forge -c nvidia nx-cugraph')

Download a patent citation dataset containing 3774768 nodes and 16518948 edges and loads it into a NetworkX graph.

In [3]:
filepath = "./data/cit-Patents.csv"

if os.path.exists(filepath):
    url = filepath
else:
    url = "https://data.rapids.ai/cugraph/datasets/cit-Patents.csv"
    print(f"File {filepath} not found, downloading {url}")

df = pd.read_csv(url, sep=" ", names=["src", "dst"], dtype="int32")
G = nx.from_pandas_edgelist(df, source="src", target="dst")

File ./data/cit-Patents.csv not found, downloading https://data.rapids.ai/cugraph/datasets/cit-Patents.csv


Define a function that can be used to run various NetworkX algorithms on the Graph created above. This can be used to compare run-times for NetworkX both without `nx-cugraph` and with `nx-cugraph` enabled.

The following NetworkX calls will be run:
* [Betweenness Centrality](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html)
* [Breadth First Search](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.traversal.breadth_first_search.bfs_tree.html)
* [Louvain Community Detection](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.community.louvain.louvain_communities.html)

This code does not require modification to use with nx-cugraph and can be used with NetworkX as-is even when no backends are installed.

In [4]:
def run_algos():
   print("\nRunning Betweenness Centrality...")
   %time nx.betweenness_centrality(G, k=10)

   print("\nRunning Breadth First Search (bfs_edges)...")
   %time list(nx.bfs_edges(G, source=1))  # yields individual edges, use list() to force the full computation

   print("\nRunning Louvain...")
   %time nx.community.louvain_communities(G, threshold=1e-04)

## NetworkX (no backend) Benchmark Runs
**_NOTE: NetworkX benchmarks without a backend for the graph used in this notebook can take very long time.  Using a Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz with 45GB of memory, the three algo runs took approximately 50 minutes._**

In [5]:
run_algos()


Running Betweenness Centrality...
CPU times: user 7min 47s, sys: 5.61 s, total: 7min 53s
Wall time: 7min 52s

Running Breadth First Search (bfs_edges)...
CPU times: user 28.9 s, sys: 336 ms, total: 29.2 s
Wall time: 29.1 s

Running Louvain...
CPU times: user 42min 46s, sys: 4.8 s, total: 42min 51s
Wall time: 42min 50s


## NetworkX with `nx-cugraph` Benchmark Runs
Use the `nx.config` API introduced in ([NetworkX 3.3](https://networkx.org/documentation/stable/reference/backends.html#networkx.utils.configs.NetworkXConfig)) to configure NetworkX to use nx-cugraph.  Both options used below can also be set using environment variables.

In [6]:
# Set the prioritized list of backends to automatically try. If none of the backends in the list
# support the algorithm, NetworkX will use the default implementation).
#
# This can also be set using the environment variable NETWORKX_BACKEND_PRIORITY which accepts a
# comma-separated list.
nx.config.backend_priority = ["cugraph"]  # Try the "cugraph" (nx-cugraph) backend first, then
                                          # fall back to NetworkX
#nx.config.backend_priority = []          # Do not use any backends

# Enable caching of graph conversions. When set to False (the default) nx-cugraph will convert
# the CPU-based NetworkX graph object to a nx-cugraph GPU-based graph object each time an algorithm
# is run. When True, the conversion will happen once and be saved for future use *if* the graph has
# not been modified via a supported method such as G.add_edge(u, v, weight=val)
#
# This can also be set using the environment variable NETWORKX_CACHE_CONVERTED_GRAPHS
nx.config.cache_converted_graphs = True


**Note the warning message NetworkX generates to remind us a cached graph should not be manually mutated. This is shown because caching was enabled, and the initial call resulted in a cached graph conversion for use with subsequent nx-cugraph calls.**

In [7]:
run_algos()


Running Betweenness Centrality...
CPU times: user 17.9 s, sys: 1.5 s, total: 19.4 s
Wall time: 19.1 s

Running Breadth First Search (bfs_edges)...



For the cache to be consistent (i.e., correct), the input graph must not have been manually mutated since the cached graph was created. Examples of manually mutating the graph data structures resulting in an inconsistent cache include:

    >>> G[u][v][key] = val

and

    >>> for u, v, d in G.edges(data=True):
    ...     d[key] = val

Using methods such as `G.add_edge(u, v, weight=val)` will correctly clear the cache to keep it consistent. You may also use `G.__networkx_cache__.clear()` to manually clear the cache, or set `G.__networkx_cache__` to None to disable caching for G. Enable or disable caching via `nx.config.cache_converted_graphs` config.


CPU times: user 50.5 s, sys: 589 ms, total: 51 s
Wall time: 50.7 s

Running Louvain...
CPU times: user 27.4 s, sys: 3.36 s, total: 30.7 s
Wall time: 30.6 s


The Betweenness Centrality call above resulted in a conversion from a NetworkX Graph to a nx-cugraph Graph due to it being the first to use nx-cugraph. However, since caching was enabled, a second call will show the run-time for Betweenness Centrality without the need to convert the graph.

In [8]:
print("\nRunning Betweenness Centrality (again)...")
%time result = nx.betweenness_centrality(G, k=10)


Running Betweenness Centrality (again)...



For the cache to be consistent (i.e., correct), the input graph must not have been manually mutated since the cached graph was created. Examples of manually mutating the graph data structures resulting in an inconsistent cache include:

    >>> G[u][v][key] = val

and

    >>> for u, v, d in G.edges(data=True):
    ...     d[key] = val

Using methods such as `G.add_edge(u, v, weight=val)` will correctly clear the cache to keep it consistent. You may also use `G.__networkx_cache__.clear()` to manually clear the cache, or set `G.__networkx_cache__` to None to disable caching for G. Enable or disable caching via `nx.config.cache_converted_graphs` config.


CPU times: user 1.84 s, sys: 312 ms, total: 2.15 s
Wall time: 2.12 s


___
Each user is responsible for checking the content of datasets and the applicable licenses and determining if suitable for the intended use.

Information on the U.S. Patent Citation Network dataset used in this notebook is as follows:
Authors: Jure Leskovec and Andrej Krevl
Title: SNAP Datasets, Stanford Large Network Dataset Collection
URL: http://snap.stanford.edu/data
Date: June 2014 
___
Copyright (c) 2024, NVIDIA CORPORATION.

Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
___