# Oversubscribing GPU memory in cuGraph
#### Author : Alex Fender
# Skip notebook test

In this notebook, we will show how to **scale to 4x larger graphs than before** without incurring a performance drop using managed memory features in cuGraph. We will compute the PageRank of each user in Twitter's dataset on a single GPU as an example. This technique applies to all features.

Unified Memory is a single memory address space accessible from any processor in a system. If a kernel tries to access any absent pages,the Page Migration Engine migrates the pages. When the GPU memory is full, the least recently used pages are evicted. In other words, Unified Memory transparently enables oversubscribing GPU memory, enabling out-of-core computations.


This notebook was tested on an NVIDIA 48GB RTX8000 GPU using RAPIDS 0.14 and CUDA 10.2. Please be aware that your system may be different, and you may need to modify the code or install packages to run the below examples. If you think you have found a bug or an error, please file an issue in [cuGraph](https://github.com/rapidsai/cugraph/issues)

### Data
We will be analyzing **1.47 billion social relations** on 41.7 million user profiles from the Twitter dataset.  The CSV file is 26GB and was collected in :<br>
*What is Twitter, a social network or a news media? Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010.*<br> 

__Notice__ that the memory requirement to read this 26GB dataset is already bigger than the memory of a single GPU. While we are not limited by the device memory size in this case, the whole system (host+device memory) should still have at least 84GB of memory available.  Additionally, the more that host memory is used, the slower the process will be.  

If you run into memory issues (kernel crashing), then you can limit the amount of data loaded using the _nrows_ option with _read_csv_

## Initialize RMM
RAPIDS Memory Manager (RMM) is a  central place for all device memory allocations in RAPIDS libraries. Using RMM in python code is straightforward. Simply `import rmm` and configure RMM options to be used before making any call to a RAPIDS library.

In [None]:
import rmm 

In [None]:
#Set RMM to allocate all memory as managed memory (cudaMallocManaged underlying allocator)
rmm.mr.set_current_device_resource(rmm.mr.ManagedMemoryResource())
assert(rmm.is_initialized())

## PageRank with cuGraph
### Basic setup

In [None]:
# Import needed libraries. We recommend using cugraph_dev env through conda
import time
import cudf 
import cugraph 

### Get the data

The Twitter dataset is in our S3 bucket and zipped.  
1. We'll need to create a folder for our data in the `/data` folder
1. Download the zipped data into that folder from S3 (it will take some time as it it 6GB)
1. Decompress the zipped data for use (it will take some time as it it 26GB)

In [None]:
import time
import urllib.request
import os

data_dir = '../data/'
if not os.path.exists(data_dir):
    print('creating data directory')
    os.system('mkdir ../data')

In [None]:
# download the Twitter dataset
base_url = 'https://data.rapids.ai/cugraph/benchmark/'
fn = 'twitter-2010.csv'
comp = '.gz'
if not os.path.isfile(data_dir+fn):
    if not os.path.isfile(data_dir+fn+comp):
        print(f'Downloading {base_url+fn+comp} to {data_dir+fn+comp}')
        urllib.request.urlretrieve(base_url+fn+comp, data_dir+fn+comp)
    print(f'Decompressing {data_dir+fn+comp}...')
    os.system('gunzip '+data_dir+fn+comp)
    print(f'{data_dir+fn+comp} decompressed!')
else:
    print(f'Your data file, {data_dir+fn}, already exists')

# File path, assuming Notebook directory
input_data_path = data_dir+fn

### Read the data from disk
cuGraph depends on cudf for data loading and the initial DataFrame creation. The CSV data file contains an edge list, which represents the connection of a vertex to another. The source to destination pairs is what is known as Coordinate Format (COO). In this test case, the data is just two columns. 

In [None]:
# Start timer
t_start = time.time()

# CSV reader
e_list = cudf.read_csv(input_data_path, delimiter=' ', names=['src', 'dst'], dtype=['int32', 'int32'])

# Print time
print("Reader: ", time.time()-t_start, "s")

### Create a graph


In [None]:
t_start = time.time()

# Create a directed graph using the source (src) and destination (dst) vertex pairs from the Dataframe 
G = cugraph.DiGraph()
G.from_cudf_edgelist(e_list, source='src', destination='dst', renumber=False)

# (optional) request the transposed here so that we can analyse pagerank solver time alone
G.view_transposed_adj_list()

# Print time
print("Load and transpose: ", time.time()-t_start, "s")

### Call PageRank algorithm


In [None]:
# Start timer
t_start = time.time()

# Get the pagerank scores
pr_df = cugraph.pagerank(G, tol=1e-4)

# Print time
print("Pagerank: ", time.time()-t_start, "s")

It was that easy! PageRank should only take a few seconds to run on this 26GB input with one GPU.<br>
Check out how it compares to published Spark results in the [Annex](#annex_cell).

### Further analysis on the PageRank result

We can now identify the most influent users in the network.<br>
Notice that the PageRank result is already in a regular `cudf.DataFrame`. We can then sort by PageRank value and print the *Top 3*.

In [None]:
# Start timer
t_start = time.time()

# Sort, descending order
pr_sorted_df = pr_df.sort_values('pagerank',ascending=False)

# Print time
print(time.time()-t_start, "s")

# Print the Top 3
print(pr_sorted_df.head(3))

We can now use the [map](https://data.rapids.ai/cugraph/benchmark/twitter-2010-ids.csv.gz) to convert Vertex ID into to Twitter's numeric ID. The user name can also be retrieved using the [TwitterID](https://tweeterid.com/) web app.<br>
The table below shows more information on our *Top 3*. Notice that this ranking is much better at capturing network influence compared the number of followers for instance. Further analysis of this dataset was published [here](https://doi.org/10.1145/1772690.1772751).

| Vertex ID	| Twitter ID	| User name	| Description |
| --------- |  ---------   | --------   |   ----------  |
| 21513299	| 813286	| barackobama	| US President (2009-2017) |
| 23933989	| 14224719	| 10DowningStreet | UK Prime Minister office |
| 23933986	| 15131310	| WholeFoods	| Food store from Austin |



## Annex
<a id='annex_cell'></a>
An experiment comparing various porducts for this workflow was published in *GraphX: Graph Processing in a Distributed Dataflow Framework,OSDI, 2014*. They used 16 m2.4xlarge worker nodes on Amazon EC2. There was a total of 128 CPU cores and 1TB of memory in this 2014 setup.

![twitter-2010-spark.png](twitter-2010-spark.png)

___
Copyright (c) 2020, NVIDIA CORPORATION.

Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
___