# Benchmarking: GPU vs CPU (UMAP and HDBSCAN)

## GPU part

---

Author: Jianheng Liu @ Rui Zhang's Lab, SYSU, China

Email: jhfoxliu@gmail.com

Date: Jan, 2022

**Notation \#1: Assumed that you have finished `RNA_editing_landscape_GPU_part_I` notebook, so that we have `onehot.all_kmers.npy` file.**

**Notation \#2: HDBSCAN (CPU version) is incompatible with numpy v1.20.3 in RAPIDS. We hence split this note book into three parts (GPU, CPU, and figure).**

## Hardware

- Sytem: Ubuntu 18.04.5 LTS
- CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (12 cores)
- Disk: SAMSUNG MZ7LH960HAJR-00005 (SSD)
- RAM: 64G(32Gx2) DDR4 2666MHz
- GPU: RTX2080Ti (Driver Version: 495.29.05, CUDA Version: 11.5)

## Container Environment

**RAPIDS 21.12** (see https://rapids.ai/start.html)
- Ubuntu 20.04
- All packages
- Python 3.8
- CUDA 11.5

## 0. Environment

In [1]:
import numpy as np
import pandas as pd
import random

from cuml import UMAP
from cuml import HDBSCAN

import time
time0 = time.time()

# make it reproducible
np.random.seed(42)

## 1. Load data

In [2]:
all_kmers_500M = np.load("onehot.all_kmers.npy")

## 2. Test UMAP

Note: 4,000,000 items require ~8.5 Gb GPU memory, and 5,000,000 items are close to the limit of RTX2080Ti.

In [3]:
def test_UMAP(arr):
    global umap_2d
    size = arr.shape[0]
    time0 = time.time()
    
    proj_2d = umap_2d.fit_transform(arr)
    time2 = time.time() - time0
    print("Items:{}: UMAP={} sec".format(size, time2))

    return time2, proj_2d

sizes = [1000, 10000, 50000, 100000 , 250000, 500000, 750000, 1000000, 2000000, 3000000, 4000000, 5000000]
Size_out = []
Time_out = []
Iter_out = []
proj_out = []

for s in sizes:
    rng = np.random.default_rng()
    test_arr = rng.choice(all_kmers_500M, s, replace=False)
    for i in range(3):
        umap_2d = UMAP(init="random", random_state=42, min_dist=0.01, n_neighbors=20)
        t, p = test_UMAP(test_arr)
        Size_out.append(s)
        Iter_out.append(i)
        Time_out.append(t)
        if i == 0:
            proj_out.append(p)

Items:1000: UMAP=0.6943676471710205 sec
Items:1000: UMAP=0.12947678565979004 sec
Items:1000: UMAP=0.12945103645324707 sec
Items:10000: UMAP=0.1695263385772705 sec
Items:10000: UMAP=0.1737990379333496 sec
Items:10000: UMAP=0.1710057258605957 sec
Items:50000: UMAP=0.3412034511566162 sec
Items:50000: UMAP=0.29175376892089844 sec
Items:50000: UMAP=0.29529762268066406 sec
Items:100000: UMAP=0.6215031147003174 sec
Items:100000: UMAP=0.6038358211517334 sec
Items:100000: UMAP=0.6042344570159912 sec
Items:250000: UMAP=2.533763885498047 sec
Items:250000: UMAP=2.4848456382751465 sec
Items:250000: UMAP=2.490208148956299 sec
Items:500000: UMAP=8.590981483459473 sec
Items:500000: UMAP=8.536256790161133 sec
Items:500000: UMAP=8.537830829620361 sec
Items:750000: UMAP=18.568747520446777 sec
Items:750000: UMAP=18.577062845230103 sec
Items:750000: UMAP=18.662296772003174 sec
Items:1000000: UMAP=32.38437509536743 sec
Items:1000000: UMAP=32.23235034942627 sec
Items:1000000: UMAP=32.264782190322876 sec
Item

In [4]:
df_umap = pd.DataFrame(np.stack([Size_out, Iter_out, Time_out],axis=1), columns=["Scale", "Replicate", "Time (sec)"])
df_umap["Scale"] = df_umap["Scale"].astype(int)
df_umap["Replicate"] = df_umap["Replicate"].astype(int)
df_umap["Method"] = "GPU"
print(df_umap)

      Scale  Replicate  Time (sec) Method
0      1000          0    0.694368    GPU
1      1000          1    0.129477    GPU
2      1000          2    0.129451    GPU
3     10000          0    0.169526    GPU
4     10000          1    0.173799    GPU
5     10000          2    0.171006    GPU
6     50000          0    0.341203    GPU
7     50000          1    0.291754    GPU
8     50000          2    0.295298    GPU
9    100000          0    0.621503    GPU
10   100000          1    0.603836    GPU
11   100000          2    0.604234    GPU
12   250000          0    2.533764    GPU
13   250000          1    2.484846    GPU
14   250000          2    2.490208    GPU
15   500000          0    8.590981    GPU
16   500000          1    8.536257    GPU
17   500000          2    8.537831    GPU
18   750000          0   18.568748    GPU
19   750000          1   18.577063    GPU
20   750000          2   18.662297    GPU
21  1000000          0   32.384375    GPU
22  1000000          1   32.232350

In [5]:
df_umap.to_csv("UMAP_GPU_test.csv")

## 3. Test HDBSCAN

The complexity of HDBSCAN varied from ~0 to o(N^2), depending on the structure of data. So we just compare it to CPU. 

750,000 items consume ~8.5G memory, which is close to the limit of RTX2080Ti. To make the test simple, we only test UMAP results with sample size <= 50,0000.

In [6]:
def test_HDBSCAN(arr):
    global model
    size = arr.shape[0]
    if size > 500000:
        return -1, -1
    time0 = time.time()
    yhat = model.fit(arr)
    time3 = time.time() - time0
    print("Items:{}: HDBSCAN={} sec".format(size, time3))

    del yhat
    
    return size, time3

Size_out = []
Time_out = []
Iter_out = []

for umap_out in proj_out:
    for i in range(3):
        model = HDBSCAN(min_cluster_size=100, min_samples=100)
        s, time_used = test_HDBSCAN(umap_out)
        Size_out.append(s)
        Iter_out.append(i)
        Time_out.append(time_used)

Items:1000: HDBSCAN=0.729628324508667 sec
Items:1000: HDBSCAN=0.27799463272094727 sec
Items:1000: HDBSCAN=0.26853370666503906 sec
Items:10000: HDBSCAN=0.5593869686126709 sec
Items:10000: HDBSCAN=0.5621559619903564 sec
Items:10000: HDBSCAN=0.5667362213134766 sec
Items:50000: HDBSCAN=0.9886634349822998 sec
Items:50000: HDBSCAN=0.9659252166748047 sec
Items:50000: HDBSCAN=0.9683268070220947 sec
Items:100000: HDBSCAN=1.9370884895324707 sec
Items:100000: HDBSCAN=1.9390549659729004 sec
Items:100000: HDBSCAN=1.9670257568359375 sec
Items:250000: HDBSCAN=7.105342864990234 sec
Items:250000: HDBSCAN=7.029197454452515 sec
Items:250000: HDBSCAN=7.065461874008179 sec
Items:500000: HDBSCAN=22.936002016067505 sec
Items:500000: HDBSCAN=22.923251152038574 sec
Items:500000: HDBSCAN=22.9518301486969 sec


In [7]:
df_hdbscan = pd.DataFrame(np.stack([Size_out, Iter_out, Time_out],axis=1), columns=["Scale", "Replicate", "Time (sec)"])
df_hdbscan["Scale"] = df_hdbscan["Scale"].astype(int)
df_hdbscan["Replicate"] = df_hdbscan["Replicate"].astype(int)
df_hdbscan["Method"] = "GPU"
print(df_hdbscan)

     Scale  Replicate  Time (sec) Method
0     1000          0    0.729628    GPU
1     1000          1    0.277995    GPU
2     1000          2    0.268534    GPU
3    10000          0    0.559387    GPU
4    10000          1    0.562156    GPU
5    10000          2    0.566736    GPU
6    50000          0    0.988663    GPU
7    50000          1    0.965925    GPU
8    50000          2    0.968327    GPU
9   100000          0    1.937088    GPU
10  100000          1    1.939055    GPU
11  100000          2    1.967026    GPU
12  250000          0    7.105343    GPU
13  250000          1    7.029197    GPU
14  250000          2    7.065462    GPU
15  500000          0   22.936002    GPU
16  500000          1   22.923251    GPU
17  500000          2   22.951830    GPU
18      -1          0   -1.000000    GPU
19      -1          1   -1.000000    GPU
20      -1          2   -1.000000    GPU
21      -1          0   -1.000000    GPU
22      -1          1   -1.000000    GPU
23      -1      

In [8]:
df_hdbscan.to_csv("HDBSCAN_GPU_test.csv")