# GPU & Multi-GPU / Bigger-Than-Memory Demo

1. Get data
2. Use from local GPU process: single GPU, in-memory, with cudf
3. Use from remote GPU process(es): multi GPU, bigger-than-memory, with dask_cudf
4. Use cuGraph to compute pagerank and then color Graphistry nodes using it


## 1. Get data

* We strongly recommend Parquet for both single + multi-GPU
* In multi-gpu / bigger-than-memory (dask) mode:
  * Each `file.parquet/partXYZ` is a distributed unit of work
  * Aim for 10MB - 2GB for GPUs that are ~10GB
  * ... as this leaves 2-10X space for working memory + co-resident tenants
  * If parallel storage arrays are available, try striping partitions by PCI card

In [None]:
file_names = [
    'https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet?raw=true',
    'https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata2.parquet?raw=true',
    'https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata3.parquet?raw=true',
    'https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata4.parquet?raw=true',
    'https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata5.parquet?raw=true'
]

! wget https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet?raw=true
! wget https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata2.parquet?raw=true
! wget https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata3.parquet?raw=true
! wget https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata4.parquet?raw=true
! wget https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata5.parquet?raw=true

In [41]:
file_names2 = [
    'userdata1.parquet?raw=true',
    'userdata2.parquet?raw=true',
    'userdata3.parquet?raw=true',
    'userdata4.parquet?raw=true',
    'userdata5.parquet?raw=true'
]

In [14]:
import cudf, cugraph, dask.dataframe as da, dask_cudf, graphistry, pandas as pd, numpy as np
from dask.distributed import Client
graphistry.__version__, cudf.__version__

('0.20.1', '21.10.01')

In [15]:
#graphistry.register(api=3, username='admin', password='i-instanceid')

## 2. Use from local GPU process: single GPU, in-memory, with cudf

Try not to leave dangling GPU df pointers so GPU memory is freed by end of the cell

In [51]:
pd.read_parquet('./userdata1.parquet?raw=true').sample(5)

Unnamed: 0,registration_dttm,id,first_name,last_name,email,gender,ip_address,cc,country,birthdate,salary,title,comments
960,2016-02-03 04:57:20,961,Fred,Patterson,fpattersonqo@globo.com,Male,176.143.33.162,,Indonesia,7/18/1996,215783.39,Senior Quality Engineer,
466,2016-02-03 08:15:40,467,Joyce,Carpenter,jcarpentercy@tamu.edu,Female,122.240.54.87,4026953290166042.0,Argentina,,271799.8,,
64,2016-02-03 05:50:26,65,Raymond,Jacobs,rjacobs1s@sohu.com,Male,188.52.98.175,5048378563875353.0,Indonesia,,13673.35,,
81,2016-02-03 23:43:15,82,Carol,Franklin,cfranklin29@marketwatch.com,Female,32.189.30.244,6.709764757287374e+16,China,6/5/1978,31572.53,Automation Specialist II,
409,2016-02-03 16:49:00,410,Kelly,Nguyen,knguyenbd@google.co.uk,Female,150.16.62.11,,Philippines,9/27/1963,194611.56,Office Assistant III,


In [44]:
%%time
len(pd.read_parquet('./userdata1.parquet?raw=true'))

CPU times: user 5.96 ms, sys: 464 µs, total: 6.43 ms
Wall time: 4.03 ms


1000

In [45]:
%%time
len(cudf.read_parquet('./userdata1.parquet?raw=true'))

CPU times: user 6.9 ms, sys: 0 ns, total: 6.9 ms
Wall time: 6.3 ms


1000

In [50]:
%%time
len(cudf.read_parquet(file_names2 * 100))

CPU times: user 40 ms, sys: 44.5 ms, total: 84.5 ms
Wall time: 83.5 ms


500000

In [69]:
(graphistry
 .edges(cudf.read_parquet(file_names2, num_rows=100000), 'first_name', 'last_name')).plot()

In [10]:
! nvidia-smi

Mon Oct 11 06:45:19 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-PCIE...  Off  | 00000001:00:00.0 Off |                  Off |
| N/A   24C    P0    39W / 250W |   5122MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [12]:
! nvidia-smi

Mon Oct 11 06:45:30 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-PCIE...  Off  | 00000001:00:00.0 Off |                  Off |
| N/A   24C    P0    34W / 250W |   5122MiB / 16160MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## 3. Use from remote GPU process(es): multi GPU, bigger-than-memory, with dask_cudf

When working with slower tasks, bigger-than-memory tasks, or overall crash-y tasks, we recommend using `dask_cudf` over `cudf`, and via the Graphistry-provided `dask-scheduler` GPU service vs a local (in-process) dask GPU scheduler.

We recommend using this as it is more resilient and fair. For example, GPU code experiments in Jupyter  might freeze the Jupyter service for everyone, and while autostarts are in place, going through `dask-cuda-worker` will prevent Jupyter from freezing, and auto-restarts kick in much faster for GPU tasks.

Folder `/dask-shared` is shared between Jupyter and `dask_cuda` tasks, so put any files there


In [65]:
%%time
#Automatically multi-GPU-accelerated analytics
# On 1 GPU, not expected to be faster
with Client('dask-scheduler:8786'):
    dgdf = dask_cudf.read_parquet(file_names)
    id_max = dgdf['registration_dttm'].max().compute()
    del dgdf

id_max

CPU times: user 31.7 ms, sys: 4.7 ms, total: 36.4 ms
Wall time: 399 ms


Timestamp('2016-02-04 23:59:55')

In [70]:
def clean_gdf(gdf):
    return gdf[ gdf['email'].str.len() > 0 ]

with Client('dask-scheduler:8786'):

    # lazy parallel read: less urgency to explicitly free
    edges_dgdf = dask_cudf.read_parquet(file_names, num_rows=100000)

    print('# partitions: ', edges_dgdf.npartitions)
    print('mem: ', edges_dgdf.map_partitions(lambda gdf: gdf.memory_usage().sum()).compute())

    # "head()" (or ".compute()") will gather distributed chunks into single GPU in-memory (cudf)
    # No urgency to explicitly free the cudf df because 'p' is html; collected GPU df is auto-GC'd
    p = graphistry.edges(edges_dgdf.map_partitions(clean_gdf).head(1000), 'first_name', 'last_name').plot()

p

# partitions:  5
mem:  0    158262
1    155663
2    157159
3    156183
4    156689
dtype: int64


In [15]:
! nvidia-smi

Mon Oct 11 06:45:37 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-PCIE...  Off  | 00000001:00:00.0 Off |                  Off |
| N/A   24C    P0    35W / 250W |   5268MiB / 16160MiB |      7%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
#in Dask menu tab: http://dask-scheduler:8787

## 4. cuGraph
- move edges_df to GPU (cuDF)
- convert into GPU graph (cuGraph)
- compute pagerank cuDF table for vertices
- plot, binding pagerank to color
- free memory

In [78]:
e_df = cudf.read_parquet(file_names2)
e_df = e_df[ e_df['email'].str.len() > 0 ]

e_df.head(3)

Unnamed: 0,registration_dttm,id,first_name,last_name,email,gender,ip_address,cc,country,birthdate,salary,title,comments
0,2016-02-03 07:55:29,1,Amanda,Jordan,ajordan0@com.com,Female,1.197.201.2,6759521864920116.0,Indonesia,3/8/1971,49756.53,Internal Auditor,100.0
1,2016-02-03 17:04:03,2,Albert,Freeman,afreeman1@is.gd,Male,218.111.175.34,,Canada,1/16/1968,150280.17,Accountant IV,
2,2016-02-03 01:09:31,3,Evelyn,Morgan,emorgan2@altervista.org,Female,7.161.136.94,6767119071901597.0,Russia,2/1/1960,144972.51,Structural Engineer,


In [79]:
%%time

G = cugraph.Graph()
G.from_cudf_edgelist(e_df, source='first_name', destination='last_name')

df_page = cugraph.pagerank(G)

df_page.sample(10)

CPU times: user 55.6 ms, sys: 620 µs, total: 56.2 ms
Wall time: 54.9 ms


Unnamed: 0,pagerank,vertex
181,0.001075,Cooper
344,0.002483,Pamela
304,0.003036,Mary
111,0.001882,Mason
432,0.002071,Day
383,0.004471,George
231,0.002171,Ramos
151,0.0017,Edwards
365,0.002431,Alan
410,0.001713,Gonzales


In [80]:
g = (graphistry
     .edges(e_df, 'first_name', 'last_name')
     .nodes(df_page, 'vertex')
     .encode_point_color('pagerank', ['blue', 'yellow', 'red'], as_continuous=True))

g.plot()

In [81]:
! nvidia-smi

Thu Nov 18 10:40:55 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00002C34:00:00.0 Off |                  Off |
| N/A   45C    P0    27W /  70W |   7077MiB / 16127MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [82]:
del e_df
del df_page
del G

In [71]:
! nvidia-smi

Mon Oct 11 07:19:41 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-PCIE...  Off  | 00000001:00:00.0 Off |                  Off |
| N/A   24C    P0    39W / 250W |   8712MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces