# Multi-GPU Batch Betweenness Centrality
In this notebook, we will compute the Betweenness Centrality for vertices using CuGraph and will see how to use multiple GPU to compute the Betweenness Centrality scores.

RAPIDS Versions: 0.15

Test Hardware:
4 Tesla V100-DGX 32G, CUDA 10.1

## Introduction
Betweennes Centrality can be slow to compute on large graphs, in order to speed up the process we can leverage multiple GPUs.
In this notebook we will showcase how it would have been done with a Single GPU approach, then we will show how it can be done using multiple GPUs.

### Preparation:

In [1]:
import cugraph
import cudf

import dask
import dask_cuda
import dask_cudf
import cugraph.comms as Comms

import time
import os

In [2]:
datafile='../data/csv/directed/soc-LiveJournal1.csv'

## Single GPU

### Reading the Data - Single GPU
The following shows how we would read the csv file using a single GPU as it commonly done when using a single GPU with CuGraph.

In [3]:
t_start_read_sg = time.perf_counter()
e_list = cudf.read_csv(datafile, delimiter=' ', names=['src', 'dst'], dtype=['int32', 'int32'])
t_stop_read_sg = time.perf_counter()

In [4]:
print("SG Read time: {}s".format(t_stop_read_sg - t_start_read_sg))

SG Read time: 1.4486296999966726s


### Building the Graph - Single GPU
Once we read the file, we need to build the Graph, we will use a DiGraph, and use the content extracted from the .csv file as an edge list.

In [5]:
t_start_build_sg = time.perf_counter()
G = cugraph.DiGraph()
G.from_cudf_edgelist(e_list, source='src', destination='dst')
t_stop_build_sg = time.perf_counter()

In [6]:
print("SG Build time: {}s".format(t_stop_build_sg - t_start_build_sg))

SG Build time: 0.8520178709877655s


### Calling the Algorithm -  Single GPU
Now that our graph is built, we can get its betweenness centrality score. Here we will use a sub-sample of 1024 sources in order to have a better approximation of the overall betweenness centrality. We set the set for comparability with the multi GPU version that comes next

In [7]:
t_start_sg = time.perf_counter()
sg_df = cugraph.betweenness_centrality(G, k=1024, seed=123)
t_stop_sg = time.perf_counter()

In [8]:
print("SG Time elapsed: {}s".format(t_stop_sg - t_start_sg))

SG Time elapsed: 57.219933932996355s


## Now let's use multiple GPUs!

### A single chunk
In order to use multi GPU Batch Betweenness Centrality, we need to ensure that the data is represented as a single chunk on a single GPU.
The following function gives us the size required to fit the entire data in a single chunk.

In [9]:
def get_chunksize(input_file):
    return os.path.getsize(input_file)

### Using a Dask Cluster
In order to use multiple GPU, we need to ensure that we have Dask Cluster and Client running, further more we need to initialize the CuGraph Communicator.

In [10]:
cluster = dask_cuda.LocalCUDACluster()
client = dask.distributed.Client(cluster)
Comms.initialize()

### Reading the Data - Multiple GPUs
This step is quite similar to the single GPU version:
1. We need to get the `chunksize`
2. We call `dask_cudf.read_csv` instead of `cudf.read_csv`, and use `chunksize=chunksize` to ensure that the data is in a single partition.
As you can see the difference are quite minimal.

In [11]:
t_start_read_mg = time.perf_counter()
chunksize = get_chunksize(datafile)
dask_e_list = dask_cudf.read_csv(datafile, chunksize=chunksize, delimiter=' ', names=['src', 'dst'], dtype=['int32', 'int32'])
t_stop_read_mg = time.perf_counter()

In [12]:
print("MG Read time: {}s".format(t_stop_read_mg - t_start_read_mg))

MG Read time: 0.02092634595464915s


### Building the Graph - Mutliple GPUs
Once we read the file, we need to build the Graph, we will use a DiGraph, and use the content extracted from the .csv file as an edge list, this comes with some overhead.

In [13]:
t_start_build_mg = time.perf_counter()
RG = cugraph.DiGraph()
RG.from_dask_cudf_edgelist(dask_e_list)
t_stop_build_mg = time.perf_counter()

In [14]:
print("MG Build time: {}s".format(t_stop_build_mg - t_start_build_mg))

MG Build time: 7.060216261015739s


### Calling the algorithm
We call the algorithm the same way as we used to, but this time it is much faster as we leverage multiple GPUs to compute the Betweenness Centrality scores.

In [15]:
t_start_mg = time.perf_counter()
mg_df = cugraph.betweenness_centrality(RG, k=1024, seed=123)
t_stop_mg = time.perf_counter()

  self.sync(self._update_scheduler_info)


In [16]:
print("MG Time elapsed: {}s".format(t_stop_mg - t_start_mg))

MG Time elapsed: 15.80565904499963s


Do not forget to clear the Communicator / client /cluster if required.

In [17]:
Comms.destroy()
client.close()
cluster.close()