# Rapids

https://rapids.ai/

The RAPIDS data science framework is a collection of libraries for running end-to-end data science pipelines completely on the GPU. The interaction is designed to have a familiar look and feel to working in Python, but utilizes optimized NVIDIA® CUDA® primitives and high-bandwidth GPU memory under the hood. Below are some links to help getting started with each of the individual RAPIDS libraries.


cuDF accelerates pandas with no code change and brings greatly improved performance

GIS stands for Geographic Information System

In [31]:
import pandas as pd
import numpy as np
from numba import njit, jit
import cudf

In [34]:
! nvidia-smi

Thu May  9 19:49:42 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.07             Driver Version: 537.34       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce GTX 1650        On  | 00000000:01:00.0  On |                  N/A |
| 54%   52C    P8              N/A /  75W |   3587MiB /  4096MiB |     25%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [32]:
N = 1000000
A_list = np.random.randint(1, 200, N)
B_list = np.random.randint(1, 200, N)
df = pd.DataFrame({'A': A_list, 'B': B_list})
df.head()

Unnamed: 0,A,B
0,130,22
1,130,165
2,116,60
3,172,151
4,164,144


In [33]:
def f(x, y):
    return x + y

In [5]:
@njit
def f_jit(x, y):
    return x + y

@njit(parallel=True)
def f_jit_parallel(x, y):
    return x + y

@njit(cache=True)
def f_jit_cache(x, y):
    return x + y

@njit(cache=True, parallel=True)
def f_jit_cache_p(x, y):
    return x + y

In [6]:
%timeit df['apply'] = df.apply(lambda row: f(row['A'], row['B']), axis=1)

17.2 s ± 59.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [7]:
%timeit f_jit(df['A'].values, df['B'].values) 

6.35 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [8]:
%timeit f_jit_parallel(df['A'].values, df['B'].values) 

The slowest run took 9.70 times longer than the fastest. This could mean that an intermediate result is being cached.
3.02 ms ± 2.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [9]:
%timeit f_jit_cache(df['A'].values, df['B'].values) 

4.13 ms ± 489 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [10]:
%timeit f_jit_cache_p(df['A'].values, df['B'].values) 

The slowest run took 4.35 times longer than the fastest. This could mean that an intermediate result is being cached.
1.19 ms ± 788 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [11]:
%timeit df['vectorize'] = np.vectorize(f)(df['A'], df['B'])

363 ms ± 9.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [12]:
import cudf

In [13]:
dfcuda = cudf.DataFrame({'A': A_list, 'B': B_list})
dfcuda.head()

Unnamed: 0,A,B
0,136,147
1,130,65
2,194,126
3,137,112
4,28,95


In [15]:
def fcuda(row):
    return row["A"] + row["B"]

In [16]:
%timeit dfcuda.apply(fcuda, axis=1)

12.2 ms ± 1.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


# STRINGS

Experimental support for a subset of string functionality is available for `apply`. The following string operations are currently supported:

- `str.count`
- `str.startswith`
- `str.endswith`
- `str.find`
- `str.rfind`
- `str.isalnum`
- `str.isdecimal`
- `str.isdigit`
- `str.islower`
- `str.isupper`
- `str.isalpha`
- `str.istitle`
- `str.isspace`
- `==`, `!=`, `>=`, `<=`, `>`, `<` (between two strings)
- `len` (e.g. `len(some_string))`
- `in` (e.g, `'abc' in some_string`)
- `strip`
- `lstrip`
- `rstrip`
- `upper`
- `lower`
- `+` (string concatenation)
- `replace`

In [35]:
name_series = pd.Series(np.random.choice(['adam', 'chang', 'eliza', 'odom'], replace=True, size=100000))


def f(st):
    if len(st) > 0:
        if st.startswith("a") or st.startswith("o"):
            return 1
        elif "eliz" in st:
            return 2
        else:
            return -1
    else:
        return 42

In [41]:
%timeit name_series = pd.Series(np.random.choice(['adam', 'chang', 'eliza', 'odom'], replace=True, size=100000)); name_series.apply(f)

132 ms ± 1.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [37]:
sr = cudf.Series(np.random.choice(['adam', 'chang', 'eliza', 'odom'], replace=True, size=100000))

In [40]:
%timeit sr = cudf.Series(np.random.choice(['adam', 'chang', 'eliza', 'odom'], replace=True, size=100000)) ; sr.apply(f)

41 ms ± 273 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [42]:
@njit
def f(st):
    if len(st) > 0:
        if st.startswith("a") or st.startswith("o"):
            return 1
        elif "eliz" in st:
            return 2
        else:
            return -1
    else:
        return 42

In [43]:
%timeit name_series = pd.Series(np.random.choice(['adam', 'chang', 'eliza', 'odom'], replace=True, size=100000)); name_series.apply(f)

527 ms ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
