# Multiple GPU Louvain in cuGraph

| Author Credit |    Date    |  Update          | cuGraph Version |  Test Hardware        |
|---------------|------------|------------------|-----------------|-----------------------|
| Chuck Hastings| 11/08/2021 | created          | 21.10 nightly   |                       |
| Don Acosta    | 01/30/2023 | updated          | 23.02 nightly   |  2xA6000 CUDA 11.7    |


In this notebook, we will show how to use multiple GPUs in cuGraph to compute the Louvain partitions and global modularity score for a dataset.

This notebook was tested using RAPIDS 23.02 and CUDA 11.5. Please be aware that your system may be different, and you may need to modify the code or install packages to run the below examples. If you think you have found a bug or an error, please file an issue in [cuGraph](https://github.com/rapidsai/cugraph/issues)


CuGraph's multi-GPU features leverage Dask. RAPIDS has other projects based on Dask such as dask-cudf and dask-cuda. These products will also be used in this example. Check out [RAPIDS.ai](https://rapids.ai/) to learn more about these technologies.

## Multi GPU Louvain with cuGraph
### Basic setup

In [None]:
# Import needed libraries. We recommend using the [cugraph_dev](https://github.com/rapidsai/cugraph/tree/branch-21.12/conda/environments) env through conda
from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster
from cugraph.dask.comms import comms as Comms
import cugraph.dask as dask_cugraph
import cugraph
import dask_cudf
import time


### Get the data

The Hollywood dataset is in our S3 bucket and zipped.  
1. We'll need to create a folder for our data in the `/data` folder
1. Download the zipped data into that folder from S3 (it will take some time as it it 6GB)
1. Decompress the zipped data for use (it will take some time as it it 26GB)

In [None]:
import urllib.request
import os

data_dir = '../data/'
if not os.path.exists(data_dir):
    print('creating data directory')
    os.system('mkdir ../data')

# download the Hollywood dataset
base_url = 'https://data.rapids.ai/cugraph/benchmark/'
fn = 'hollywood.csv'
comp = '.gz'

if not os.path.isfile(data_dir+fn):
    if not os.path.isfile(data_dir+fn+comp):
        print(f'Downloading {base_url+fn+comp} to {data_dir+fn+comp}')
        urllib.request.urlretrieve(base_url+fn+comp, data_dir+fn+comp)
    print(f'Decompressing {data_dir+fn+comp}...')
    os.system('gunzip '+data_dir+fn+comp)
    print(f'{data_dir+fn+comp} decompressed!')
else:
    print(f'Your data file, {data_dir+fn}, already exists')

# File path, assuming Notebook directory
input_data_path = data_dir+fn

### Initialize multi-GPU environment
Before we get started, we need to setup a Dask local cluster of workers to execute our work and a client to coordinate and schedule work for that cluster. As we see below, we can initiate a cluster and client using only 3 lines of code.

In [None]:
cluster = LocalCUDACluster()
client = Client(cluster)
Comms.initialize(p2p=True)

### Read the data from disk
cuGraph depends on cudf for data loading and the initial DataFrame creation. The CSV data file contains an edge list, which represents the connection of a vertex to another. The source to destination pairs is what is known as Coordinate Format (COO). In this test case, the data is just two columns. 

In [None]:
# Start ETL timer
t_start = time.time()

# Helper function to set the reader chunk size to automatically get one partition per GPU  
chunksize = dask_cugraph.get_chunksize(input_data_path)

# Multi-GPU CSV reader
e_list = dask_cudf.read_csv(input_data_path, chunksize = chunksize, delimiter=' ', names=['src', 'dst'], dtype=['int32', 'int32'])

### Create a graph


In [None]:
# Create an undirected graph using the source (src) and destination (dst) vertex pairs from the Dataframe 
G = cugraph.Graph(directed=False)
G.from_dask_cudf_edgelist(e_list, source='src', destination='dst')

# Print time
print("Read, load and renumber: ", time.time()-t_start, "s")

### Call Louvain algorithm


In [None]:
# Start Pagerank timer
t_start = time.time()

# Get the Louvain partition assignments for each vertex and the global modularity score.
(louvain_df, modularity) = dask_cugraph.louvain(G)

# Print time
print("Louvain: ", time.time()-t_start, "s")

It was that easy! Louvain should take 5-10 seconds to run on this 1.5GB input with two GPUs.

###  Display subset of the Louvain result

For now just display the louvain result

In [None]:
louvain_df.compute()

In [None]:
Comms.destroy()
client.close()
cluster.close()

___
Copyright (c) 2021-2023, NVIDIA CORPORATION.

Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
___