# Skip notebook test
-----

# Running cuGraph using synthetic and benchmarking data on various algorithms on single node multi GPU (SNMG) cluster


This notebook compares the execution times of many of the cuGraph and NetworkX algorithms when run against identical synthetic data at multiple scales.

This notebook uses the RMAT data generator which allows the creation of graphs at various scales.  The notebook, by default, runs on a set of selected sizes but users are free to change or add to that list.

### Timing 

 

This benchmark produces two performance metrics:
 - (1)	Just each algorithm run time 
 - (2)	A separate graph creation time for each data set

Since GPU memory is a precious resource, having a lot of temporary data laying around is avoided.  So once a graph is created, the raw data is dropped.  
 
__What is not timed__:  Generating the data with R-MAT</p>
__What is timed__:     (1) creating a Graph, (2) running the algorithm


### Algorithms

|        Algorithm        |  Type         | Undirected Graph | Directed Graph |   Notes
| ------------------------|---------------|------ | ------- |-------------
| WCC                     | Components    |   X   |         |
| Katz                    | Centrality    |   X   |         |
| Betweenness Centrality  | Centrality    |   X   |         | Estimated, k = 100
| K Truss                 | Community     |   X   |         |
| Louvain                 | Community     |   X   |         | Uses python-louvain for comparison
| Triangle Counting       | Community     |   X   |         |
| Core Number             | Core          |   X   |         |
| PageRank                | Link Analysis |       |    X    |
| Jaccard                 | Similarity    |   X   |         |one-hop over all connected nodes instead of 2-hop default
| BFS                     | Traversal     |   X   |         | No depth limit
| SSSP                    | Traversal     |   X   |         |


### Test Data
Data is generated using a Recursive MATrix (R-MAT) graph generation algorithm. 
The generator specifics are documented [here](https://docs.rapids.ai/api/cugraph/stable/api_docs/generator.html)



## Import Modules

In [None]:
# system and other
import gc
import os
import importlib
from time import perf_counter
import pandas as pd
from collections import defaultdict

# rapids
import cugraph
import cugraph.datasets as ds

# liblibraries to setup dask cluster and client
from dask.distributed import Client
from dask_cuda import LocalCUDACluster
from cugraph.dask.comms import comms as Comms


# RMAT data generator
from cugraph.generators import rmat
from cugraph.structure import NumberMap

### Determine the scale of the test data
RMAT generates graph where the number of vertices is a power of 2 and the number of edges is based on an edge factor times the number vertices.

Since RMAT tends to generate about 50% isolated vertices, those vertices are dropped from the graph data.  Hence the number of vertices is closer to (2 ** scale) / 2


| Scale | Vertices (est) | Edges  |
| ------|----------------|--------|
| 10 | 512 | 16,384 | 
| 11 | 1,024 | 32,768| 
| 12 | 2,048 | 65,536| 
| 13 | 4,096 | 131,072| 
| 14 | 8,192 | 262,144| 
| 15 | 16,384 | 524,288 | 
| 16 | 32,768 | 1,048,576 | 
| 17 | 65,536 | 2,097,152 | 
| 18 | 131,072 | 4,194,304 | 
| 19 | 262,144 | 8,388,608 | 
| 20 | 524,288 | 16,777,216 | 
| 21 | 1,048,576 | 33,554,432 | 
| 22 | 2,097,152 | 67,108,864 | 
| 23 | 4,194,304 | 134,217,728 | 
| 24 | 8,388,608 | 268,435,456 | 
| 25 | 16,777,216 | 536,870,912 | 


In [None]:
# Test Data Sizes
# Here you can create an array of test data sizes.   Then set the "data" variable to the array you want
# the dictionary format is 'name' : scale


# These scales are used by R-MAT to determine the number of vertices/edges in the synthetic data graph.
data_full = {
    'data_scale_12'   :  12,
    'data_scale_14'  :   14,
    'data_scale_16'  :   16,
    'data_scale_18'  :   18,
    'data_scale_20'  :   20,
    'data_scale_22'  :   22,
}

# for quick testing
data_quick = {
   'data_scale_9' : 9,
   'data_scale_10' : 10,
   'data_scale_11' : 11,
}

# for existing benchmark datasets
data_sets = {
    'netscience' : -1,
    'hollywood' : -1,
    'cit_patents' : -1,
    'email_Eu_core' : -1,
}

# Which dataset is to be used
data = data_sets


### Generate data
The data is generated once for each graph only when doing random data.

In [None]:
# Data generator 
#  The result is an edgelist of the size determined by the scale and edge factor
def generate_data(scale, edgefactor=16, mg=False):
    _gdf = rmat(
        scale,
        (2 ** scale) * edgefactor,
        0.57,
        0.19,
        0.19,
        42,
        clip_and_flip=False,
        scramble_vertex_ids=True,
        create_using=None,  # return edgelist instead of Graph instance
        mg=mg # determines whether generated data will be used on one or multiple GPUs
        )

    clean_coo = NumberMap.renumber(_gdf, src_col_names="src", dst_col_names="dst")[0]
    if mg:
        clean_coo.rename(columns={"renumbered_src": "src", "renumbered_dst": "dst"})
    else:
        clean_coo.rename(columns={"renumbered_src": "src", "renumbered_dst": "dst"}, inplace=True)

    print(f'Generated a dataframe of type {type(clean_coo)}, with {len(clean_coo)} edges')
    
    return clean_coo

In [None]:
import subprocess

def get_gpus():
    try:
        gpu_info = subprocess.check_output(
        ['nvidia-smi', '--query-gpu=name', '--format=csv,noheader'],
        encoding='utf-8'
        )
        gpus = [line.strip() for line in gpu_info.strip().split('\n') if line.strip()]
        gpu_count = len(gpus)
        return ''.join(set(gpus)), gpu_count
    except Exception as e:
        return "no_gpus", 0


In [None]:
get_gpus()

## Create Graph functions
There are two types of graphs created:
* Directed Graphs - create_cu_directed_graph.
* Undirected Graphs - calls to create_cu_ugraph <- fully symmeterized

In [None]:
# cuGraph
def create_cu_graph(_df, transpose=True, directed=False, mg=False):
    t1 = perf_counter()
    _g = cugraph.Graph(directed=directed)

    if mg:
        _g.from_dask_cudf_edgelist(_df, source="src", destination="dst", edge_attr=None)
    else:
        _g.from_cudf_edgelist(_df,
                            source='src',
                            destination='dst',
                            edge_attr=None,
                            renumber=False,
                            store_transposed=transpose)
    t2 = perf_counter() - t1

    return _g, t2

## Algorithm Execution

### 	Weakly Connected Components (WCC)

In [None]:

def cu_wcc(_G, mg=False):
    t1 = perf_counter()
    if mg:
        _ = cugraph.dask.weakly_connected_components(_G)
    else:
        _ = cugraph.weakly_connected_components(_G)
    t2 = perf_counter() - t1
    return t2

### Katz

In [None]:

def cu_katz(_G, alpha, mg=False):
    t1 = perf_counter()
    if mg:
        _ = cugraph.dask.katz_centrality(_G, alpha)
    else:
        _ = cugraph.katz_centrality(_G, alpha)
    t2 = perf_counter() - t1
    return t2


### Betweenness Centrality (BC)

In [None]:

def cu_bc(_G, _k, mg=False):
    t1 = perf_counter()
    if mg:
        _ = cugraph.dask.betweenness_centrality(_G, k=_k)
    else:   
        _ = cugraph.betweenness_centrality(_G, k=_k)
    t2 = perf_counter() - t1
    return t2

### Louvain

In [None]:
def cu_louvain(_G, mg=False):
    t1 = perf_counter()
    if mg:
        _, modularity = cugraph.dask.louvain(_G)
        print (f'modularity: {modularity}')
    else:
        _,_ = cugraph.louvain(_G)
    t2 = perf_counter() - t1
    return t2



### K-Truss

In [None]:
def cu_ktruss(_G, k, mg=False):
    t1 = perf_counter()
    if mg:
        _ = cugraph.dask.ktruss_subgraph(_G,k)
    else:
        _ = cugraph.ktruss_subgraph(_G,k)
    t2 = perf_counter() - t1
    return t2

### Triangle Counting

In [None]:
def cu_tc(_G, mg=False):
    t1 = perf_counter()
    if mg:
        _ = cugraph.dask.triangle_count(_G)
    else:
        _ = cugraph.triangle_count(_G)
    t2 = perf_counter() - t1
    return t2


### Core Number

In [None]:
def cu_core_num(_G, mg=False):
    t1 = perf_counter()
    if mg:
        _ = cugraph.dask.core_number(_G)
    else:
        _ = cugraph.core_number(_G)
    t2 = perf_counter() - t1
    return t2


### PageRank

In [None]:
def cu_pagerank(_G, mg=False):
    t1 = perf_counter()
    if mg:
        _ = cugraph.dask.pagerank(_G)
    else:
        _ = cugraph.pagerank(_G)
    t2 = perf_counter() - t1
    return t2


### Jaccard

In [None]:
def cu_jaccard(_G, mg=False):
    edge_list = _G.view_edge_list()
    t1 = perf_counter()
    if mg:
        _ = cugraph.dask.jaccard(_G, vertex_pair = edge_list )
    else:
        _ = cugraph.jaccard(_G, vertex_pair = edge_list)
    t2 = perf_counter() - t1
    return t2


### Breadth First Search (BFS)

In [None]:
def cu_bfs(_G, seed=0, mg=False):
    t1 = perf_counter()
    if mg:
        _ = cugraph.dask.bfs(_G, seed)
    else:
        _ = cugraph.bfs(_G, seed)
    t2 = perf_counter() - t1
    return t2


### Single Source Shortest Path (SSSP)

In [None]:
def cu_sssp(_G, seed = 0, mg=False):
    
    t1 = perf_counter()
    # SSSP requires weighted graph
    if mg:
        if _G.weighted: 
            _ = cugraph.dask.sssp(_G, seed)
        else:
            _ = cugraph.dask.bfs(_G, seed)

    else:
        if _G.weighted:
            _ = cugraph.ssp(_G, seed)
        else:
            _ = cugraph.bfs(_G, seed)

    t2 = perf_counter() - t1
    return t2


## SG/MG Benchmark

### Initialize multi-GPU environment
Before we get started, we need to set up a dask (local) cluster of workers to execute our work, and a client to coordinate and schedule work for that cluster.


In [None]:
# Setup a local dask cluster of workers, and a client
cluster = LocalCUDACluster()
client = Client(cluster)
Comms.initialize(p2p=True)

### Run cuGraph algorithms for datasets
Takes in a mg parameter to determine if multiple GPU are used when available.

In [None]:
def run_algorithms( dataset, scale, mg):

    cugraph_algo_run_times = defaultdict()

    # generate data
    print("------------------------------")
    if (scale != -1):
        gdf = generate_data(scale, edgefactor=16, mg=mg)
    # gdf = gdf.repartition(gdf.npartitions * 3)
    else:
        current_set = getattr (ds, dataset)
        gdf = current_set.get_dask_edgelist(download=True)
        print(type(gdf))
    # create cuGraph
    g_cu, tcu = create_cu_graph(gdf, mg=mg, transpose =True)
    cugraph_graph_creation_time = [tcu]
    del gdf

    # prep
    deg = g_cu.degree()
    if mg == True:
        deg_max = deg['degree'].max().compute()
    else:
        deg_max = deg['degree'].max()
    alpha = 1 / deg_max
    num_nodes = g_cu.number_of_vertices()
    del deg
    gc.collect()

    #-- WCC
    algorithm = "WCC"
    print(f"\t{algorithm}  ", end = '')
    tc = cu_wcc(g_cu, mg=mg)
    print("")
    
    cugraph_algo_run_times[algorithm] = tc

    #-- Katz
    algorithm = "Katz"
    print(f"\t{algorithm}  ", end = '')
    tc = cu_katz(g_cu, alpha, mg=mg)
    print("")
    
    cugraph_algo_run_times[algorithm] = tc

    #-- K Truss
    algorithm = "K_Truss"
    print(f"\t{algorithm}  ", end = '')
    k = 5
    tc = cu_ktruss(g_cu, k=k, mg=mg)
    print("")
    
    cugraph_algo_run_times[algorithm] = tc

    #-- BC
    algorithm = "BC"
    print(f"\t{algorithm}  ", end = '')
    k = 100
    if k > num_nodes:
        k = int(num_nodes)
    tc = cu_bc(g_cu, k, mg=mg)
    print(" ")
    cugraph_algo_run_times[algorithm] = tc


    #-- Louvain
    algorithm = "Louvain"
    print(f"\t{algorithm}  ", end = '')
    tc = cu_louvain(g_cu, mg=mg)
    print(" ")

    cugraph_algo_run_times[algorithm] = tc

    #-- TC
    algorithm = "TC"
    print(f"\t{algorithm}  ", end = '')
    tc = cu_tc(g_cu, mg=mg)
    print(" ")
    
    cugraph_algo_run_times[algorithm] = tc

    #-- Core Number
    algorithm = "Core Number"
    print(f"\t{algorithm}  ", end = '')
    tc = cu_core_num(g_cu, mg=mg)
    print(" ")

    cugraph_algo_run_times[algorithm] = tc

    #-- PageRank
    algorithm = "PageRank"
    print(f"\t{algorithm}  ", end = '')
    tc = cu_pagerank(g_cu, mg=mg)
    print(" ")

    cugraph_algo_run_times[algorithm] = tc

    #-- Jaccard
    algorithm = "Jaccard"
    print(f"\t{algorithm}  ", end = '')
    tc = cu_jaccard(g_cu, mg=mg)
    print(" ")

    cugraph_algo_run_times[algorithm] = tc

    # Seed for BFS and SSSP
    if mg == True:
        cu_seed = g_cu.nodes().compute().to_pandas().iloc[0]
    else:
        cu_seed = g_cu.nodes().to_pandas().iloc[0]

    #-- BFS
    algorithm = "BFS"
    print(f"\t{algorithm}  ", end = '')
    tc = cu_bfs(g_cu, seed=cu_seed, mg=mg)
    print(" ")

    cugraph_algo_run_times[algorithm] = tc

    #-- SSSP
    algorithm = "SSSP"
    print(f"\t{algorithm}  ", end = '')
    tc = cu_sssp(g_cu, seed=cu_seed, mg=mg)
    print(" ")

    cugraph_algo_run_times[algorithm] = tc

    del g_cu
    gc.collect()

    return cugraph_algo_run_times, cugraph_graph_creation_time


### cuGraph execution times for different algorithms
Run this with mg=True to use multiple GPUs

In [None]:
mg=True

In [None]:
cugraph_algo_run_times = defaultdict(defaultdict)
cugraph_graph_creation_time = defaultdict()
for dataset, scale in data.items():
    cugraph_algo_run_times[dataset], cugraph_graph_creation_time[dataset] = run_algorithms(dataset, scale, mg=mg )
gpu,count = get_gpus()

In [None]:
cugraph_algo_run_times

In [None]:
cugraph_graph_creation_time

In [None]:
# Nx and cuGraph execution times for different algorithms
cugraph_run_times = pd.DataFrame()
for dataset in cugraph_algo_run_times.keys():
    temp_df = pd.DataFrame({'cuGraph': cugraph_algo_run_times[dataset]})
    temp_df.loc['Creation Time'] = cugraph_graph_creation_time[dataset]
    columns = [(dataset, 'cuGraph')]
    temp_df.columns = pd.MultiIndex.from_tuples(columns)
    cugraph_run_times = pd.concat([temp_df, cugraph_run_times], axis=1)

print(f'\n\t------cuGraph execution times for different algorithms-----mg={mg}\n')
print(cugraph_run_times)

Function to Capture the GPU info

In [None]:
import subprocess

def get_gpus():
    try:
        gpu_info = subprocess.check_output(
        ['nvidia-smi', '--query-gpu=name', '--format-csv,noheader'],
        encoding='utf-8'
        )
        gpus = [line.strip() for line in output.strip().split('\n') if line.strip()]
        gpu_count = len(gpus)
        return gpus, gpu_count
    except Exception as e:
        return "no_gpus", 0

Save the benchmarks

In [None]:
from datetime import datetime
time_now = datetime.now()
formatted_time = time_now.strftime("%Y-%m-%d_%H-%M-%S")
filename = f"algo-run-{count}-{gpu}-{formatted_time}.csv"

cugraph_run_times.to_csv(filename, index=False)

### Clean up multi-GPU environment

In [None]:
Comms.destroy()
client.close()
cluster.close()

___
Copyright (c) 2020-2025, NVIDIA CORPORATION.

Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
___