# Benchmarking: GPU vs CPU (UMAP and HDBSCAN)

## CPU part

---

Author: Jianheng Liu @ Rui Zhang's Lab, SYSU, China

Email: jhfoxliu@gmail.com

Date: Jan, 2022

**Notation \#1: Assumed that you have finished `RNA_editing_landscape_GPU_part_I` notebook, so that we have `onehot.all_kmers.npy` file.**

**Notation \#2: HDBSCAN (CPU version) is incompatible with numpy v1.20.3 in RAPIDS. We hence split this note book into three parts (GPU, CPU, and figure).**

**Notation \#3: We use 6 cores in this analysis. Normally, UMAP and HDBSCAN can be extended linearly. X-fold cores used might make it X-fold faster.**

## Hardware

- Sytem: Ubuntu 18.04.5 LTS
- CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (12 cores)
- Disk: SAMSUNG MZ7LH960HAJR-00005 (SSD)
- RAM: 64G(32Gx2) DDR4 2666MHz
- GPU: RTX2080Ti (Driver Version: 495.29.05, CUDA Version: 11.5)

## Running environment
- Python==3.7.8
- numpy==1.20.0
- umap-learn==0.5.2
- sklearn==0.23.1
- hdbscan==0.8.27

## 0. Environment

In [1]:
import numpy as np
import pandas as pd
import random

from umap import UMAP
from hdbscan import HDBSCAN

import time
time0 = time.time()

# make it reproducible
np.random.seed(42)

## 1. Load data

In [2]:
all_kmers_500M = np.load("onehot.all_kmers.npy")

## 2. Test UMAP

For CPU, it takes much more time runing with the first model. I don't know the mechanism behind it. I guess it might relate to the memory allocation issue.

In [3]:
def test_UMAP(arr):
    global umap_2d
    size = arr.shape[0]
    time0 = time.time()
    
    proj_2d = umap_2d.fit_transform(arr)
    time2 = time.time() - time0
    print("Items:{}: UMAP={} sec".format(size, time2))

    return time2, proj_2d

sizes = [1000, 10000 , 50000, 100000 , 250000, 500000 , 750000, 1000000] #, 2000000, 3000000, 4000000, 5000000]
Size_out = []
Time_out = []
Iter_out = []
proj_out = []


for s in sizes:
    rng = np.random.default_rng()
    test_arr = rng.choice(all_kmers_500M, s, replace=False)
    # skip JIT
    if s == 1000 or s == 10000:
        umap_2d = UMAP(init="random", random_state=42, min_dist=0.01, n_neighbors=20, n_jobs=6)
        t, p = test_UMAP(test_arr)
    for i in range(3):
        umap_2d = UMAP(init="random", random_state=42, min_dist=0.01, n_neighbors=20, n_jobs=6)
        t, p = test_UMAP(test_arr)
        Size_out.append(s)
        Iter_out.append(i)
        Time_out.append(t)
        if i == 0:
            proj_out.append(p)

Items:1000: UMAP=8.366478681564331 sec
Items:1000: UMAP=3.5953409671783447 sec
Items:1000: UMAP=3.698909044265747 sec
Items:1000: UMAP=3.5366437435150146 sec
Items:10000: UMAP=176.5087184906006 sec
Items:10000: UMAP=26.43204975128174 sec
Items:10000: UMAP=26.3529953956604 sec
Items:10000: UMAP=26.6754207611084 sec
Items:50000: UMAP=62.91027045249939 sec
Items:50000: UMAP=63.266845703125 sec
Items:50000: UMAP=63.728842973709106 sec
Items:100000: UMAP=131.7481803894043 sec
Items:100000: UMAP=138.1359577178955 sec
Items:100000: UMAP=131.2667429447174 sec
Items:250000: UMAP=334.883403301239 sec
Items:250000: UMAP=357.117484331131 sec
Items:250000: UMAP=335.63633012771606 sec
Items:500000: UMAP=727.0350477695465 sec
Items:500000: UMAP=723.5665755271912 sec
Items:500000: UMAP=682.6840653419495 sec
Items:750000: UMAP=1081.8391349315643 sec
Items:750000: UMAP=1065.4647288322449 sec
Items:750000: UMAP=1059.9477207660675 sec
Items:1000000: UMAP=1441.94624376297 sec
Items:1000000: UMAP=1546.95140

In [4]:
df_umap = pd.DataFrame(np.stack([Size_out, Iter_out, Time_out],axis=1), columns=["Scale", "Replicate", "Time (sec)"])
df_umap["Scale"] = df_umap["Scale"].astype(int)
df_umap["Replicate"] = df_umap["Replicate"].astype(int)
df_umap["Method"] = "CPU"
print(df_umap)

      Scale  Replicate   Time (sec) Method
0      1000          0     3.595341    CPU
1      1000          1     3.698909    CPU
2      1000          2     3.536644    CPU
3     10000          0    26.432050    CPU
4     10000          1    26.352995    CPU
5     10000          2    26.675421    CPU
6     50000          0    62.910270    CPU
7     50000          1    63.266846    CPU
8     50000          2    63.728843    CPU
9    100000          0   131.748180    CPU
10   100000          1   138.135958    CPU
11   100000          2   131.266743    CPU
12   250000          0   334.883403    CPU
13   250000          1   357.117484    CPU
14   250000          2   335.636330    CPU
15   500000          0   727.035048    CPU
16   500000          1   723.566576    CPU
17   500000          2   682.684065    CPU
18   750000          0  1081.839135    CPU
19   750000          1  1065.464729    CPU
20   750000          2  1059.947721    CPU
21  1000000          0  1441.946244    CPU
22  1000000

In [5]:
df_umap.to_csv("UMAP_CPU_test.csv")

## 3. Test HDBSCAN

The complexity of HDBSCAN varied from ~0 to o(N^2), depending on the structure of data. So we just compare it to CPU. 

750,000 items consume ~8.5G memory, which is close to the limit of RTX2080Ti. To make the test simple, we only test UMAP results with sample size <= 50,0000.

In [6]:
def test_HDBSCAN(arr):
    global model
    size = arr.shape[0]
    if size > 500000:
        return -1, -1
    time0 = time.time()
    yhat = model.fit(arr)
    time3 = time.time() - time0
    print("Items:{}: HDBSCAN={} sec".format(size, time3))

    del yhat
    
    return size, time3

Size_out = []
Time_out = []
Iter_out = []

for umap_out in proj_out:
    for i in range(3):
        model = HDBSCAN(min_cluster_size=100, min_samples=100, core_dist_n_jobs=6)
        s, time_used = test_HDBSCAN(umap_out)
        Size_out.append(s)
        Iter_out.append(i)
        Time_out.append(time_used)

Items:1000: HDBSCAN=0.026746749877929688 sec
Items:1000: HDBSCAN=0.023360729217529297 sec
Items:1000: HDBSCAN=0.02311992645263672 sec
Items:10000: HDBSCAN=0.24923968315124512 sec
Items:10000: HDBSCAN=0.23403286933898926 sec
Items:10000: HDBSCAN=0.24277925491333008 sec
Items:50000: HDBSCAN=2.4807608127593994 sec
Items:50000: HDBSCAN=0.8802845478057861 sec
Items:50000: HDBSCAN=0.8844475746154785 sec
Items:100000: HDBSCAN=1.9044508934020996 sec
Items:100000: HDBSCAN=1.9112699031829834 sec
Items:100000: HDBSCAN=1.894136905670166 sec
Items:250000: HDBSCAN=5.649293899536133 sec
Items:250000: HDBSCAN=5.616645336151123 sec
Items:250000: HDBSCAN=5.646224021911621 sec
Items:500000: HDBSCAN=13.55514407157898 sec
Items:500000: HDBSCAN=13.531664609909058 sec
Items:500000: HDBSCAN=13.579429864883423 sec


In [7]:
df_hdbscan = pd.DataFrame(np.stack([Size_out, Iter_out, Time_out],axis=1), columns=["Scale", "Replicate", "Time (sec)"])
df_hdbscan["Scale"] = df_hdbscan["Scale"].astype(int)
df_hdbscan["Replicate"] = df_hdbscan["Replicate"].astype(int)
df_hdbscan["Method"] = "CPU"
print(df_hdbscan)

     Scale  Replicate  Time (sec) Method
0     1000          0    0.026747    CPU
1     1000          1    0.023361    CPU
2     1000          2    0.023120    CPU
3    10000          0    0.249240    CPU
4    10000          1    0.234033    CPU
5    10000          2    0.242779    CPU
6    50000          0    2.480761    CPU
7    50000          1    0.880285    CPU
8    50000          2    0.884448    CPU
9   100000          0    1.904451    CPU
10  100000          1    1.911270    CPU
11  100000          2    1.894137    CPU
12  250000          0    5.649294    CPU
13  250000          1    5.616645    CPU
14  250000          2    5.646224    CPU
15  500000          0   13.555144    CPU
16  500000          1   13.531665    CPU
17  500000          2   13.579430    CPU
18      -1          0   -1.000000    CPU
19      -1          1   -1.000000    CPU
20      -1          2   -1.000000    CPU
21      -1          0   -1.000000    CPU
22      -1          1   -1.000000    CPU
23      -1      

In [8]:
df_hdbscan.to_csv("HDBSCAN_CPU_test.csv")