# Accelerating Dask with GPUs (via RAPIDS)

We've seen in lecture how the [RAPIDS libraries](https://rapids.ai/) make it possible to accelerate common analytical workflows on GPUs using libraries like `cudf` (for GPU DataFrames) and `cuml` (for basic GPU machine learning operations on DataFrames). When your data gets especially large (e.g. exceeding the memory capacity of a single GPU) or your computations get especially cumbersome, Dask makes it possible to scale these workflows out even further -- distributing work out across a cluster of GPUs.

This notebook is intended to be run in a Google Cloud Vertex AI User-Managed Notebook server with the environment set to "RAPIDS 0.18" and 2 T4 GPUs requested. To do so, be sure to redeem your student code for Google Cloud credits (posted on the Ed Discussion Forum) and follow [these steps](https://cloud.google.com/vertex-ai/docs/workbench/user-managed/create-new) to set up your Google Cloud account + create a User-Managed Notebook environment. Note that you [will need to request an increase in your GPU quota](https://cloud.google.com/compute/quotas#requesting_additional_quota) in order to request more than one GPU. For instance, here, we are requesting the ability to launch 2 T4 GPUs in the us-central1 region:

![](screenshot.png)

In AWS Educate, recall that we cannot create GPU clusters. However, this notebook should also be runnable on multi-GPU EC2 instances and clusters (on AWS) if you use a personal account to request these resources.

If we run the command below, you'll see the type of GPUs being used (2 NVIDIA T4s):

In [1]:
!nvidia-smi

Sun Nov 14 00:16:22 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   67C    P0    30W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:00:05.0 Off |                    0 |
| N/A   71C    P0    22W /  70W |      0MiB / 15109MiB |      0%      Default |
|       

Let's use `dask_cuda`'s API to launch a Dask GPU cluster and pass this cluster object to our `dask.distributed` client. `LocalCUDACluster()` will count each available GPU in our cluster (in this case, 1 GPU) as a Dask worker and assign it work.

In [2]:
from dask_cuda import LocalCUDACluster
from dask.distributed import Client

cluster = LocalCUDACluster() # Identify all available GPUs
client = Client(cluster)

From here, we can use `dask_cudf` to automate the process of partitioning our data across our GPU workers and instantiating a GPU-based DataFrame on our GPU that we can work with. Let's load in the same AirBnB data that we were working with in the `numba` + `dask` CPU demonstration:

In [3]:
import dask_cudf

df = dask_cudf.read_csv('listings*.csv')
df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,3781,HARBORSIDE-Walk to subway,4804,Frank,,East Boston,42.36413,-71.02991,Entire home/apt,125,32,19,2021-02-26,0.27,1,106
1,6695,$99 Special!! Home Away! Condo,8229,Terry,,Roxbury,42.32802,-71.09387,Entire home/apt,169,29,115,2019-11-02,0.81,4,40
2,10813,"Back Bay Apt-blocks to subway, Newbury St, The...",38997,Michelle,,Back Bay,42.35061,-71.08787,Entire home/apt,96,29,5,2020-12-02,0.08,11,307
3,10986,North End (Waterfront area) CLOSE TO MGH & SU...,38997,Michelle,,North End,42.36377,-71.05206,Entire home/apt,96,29,2,2016-05-23,0.03,11,293
4,13247,Back Bay studio apartment,51637,Susan,,Back Bay,42.35164,-71.08752,Entire home/apt,75,91,0,,,2,0


Once we have that data, we can perform many of the standard DataFrame operations we perform on CPUs -- just accelerated by our GPU cluster!

In [4]:
df.groupby(['neighbourhood', 'room_type']) \
  .price \
  .mean() \
  .compute()

neighbourhood   room_type      
North Center    Private room        75.818182
Ashburn         Entire home/apt    100.857143
Edgewater       Entire home/apt    140.142857
South Lawndale  Entire home/apt     79.826087
Auburn Gresham  Entire home/apt    135.000000
                                      ...    
Lakeshore       Entire home/apt    205.500000
Brighton Park   Shared room         39.000000
Lake View       Hotel room         656.400000
North Beach     Shared room         31.900000
Clearing        Entire home/apt     90.000000
Name: price, Length: 341, dtype: float64

One thing to note, though, is that not all of the functionality we might expect out of CPU clusters is available yet in the `cudf`/`dask_cudf` DataFrame implementation.

For instance (and of particular note!), our ability to apply custom functions is still pretty limited. `cudf` uses Numba's CUDA compiler to translate this code for the GPU and [many standard `numpy` operations are not supported](https://numba.pydata.org/numba-doc/dev/cuda/cudapysupported.html#numpy-support) (for instance, if you try to apply the distance calculation with performed in the Numba+Dask CPU demonstration notebook for today, this will fail to compile correctly for the GPU).

That being said, we can perform many base-Python operations inside of custom functions, so if you can express your custom functions in this way, it might be worth your while to do this work on a GPU. For example, let's create a custom price index that indicates whether an AirBnB is "Cheap" (0), "Moderately Expensive" (1), or "Very Expensive" (2) using `cudf`'s [`apply_rows` method](https://docs.rapids.ai/api/cudf/stable/guide-to-udfs.html#DataFrame-UDFs):


In [5]:
def expensive(x, price_index):
    # passed through Numba's CUDA compiler and auto-parallelized for GPU
    # for loop is automatically parallelized
    for i, price in enumerate(x):
        if price < 50:
            price_index[i] = 0
        elif price < 100:
            price_index[i] = 1
        else:
            price_index[i] = 2

# Use cudf's `apply_rows` API for applying function to every row in DataFrame
df = df.apply_rows(expensive,
                   incols={'price':'x'},
                   outcols={'price_index': int})

# Confirm that price index created correctly
df[['price', 'price_index']].head()

Unnamed: 0,price,price_index
0,125,2
1,169,2
2,96,1
3,96,1
4,75,1


In addition to preprocessing and analyzing data on GPUs, we can also train (a limited set of) Machine Learning models directly on our GPU cluster using the `cuml` library in the RAPIDS ecoystem as well. 

For instance, let's train a linear regression model based on our data from San Francisco, Chicago, and Boston to predict the price of an AirBnB based on other values in its listing information (e.g. "reviews per month" and "minimum nights"). We'll then use this model to make predictions about the price of AirBnBs in another city (NYC):

In [6]:
from cuml.dask.linear_model import LinearRegression
import numpy as np

X = df[['reviews_per_month', 'minimum_nights']].astype(np.float32).dropna()
y = df[['price']].astype(np.float32).dropna()
fit = LinearRegression().fit(X, y)

Then, we can read in the NYC dataset and make predictions about what prices will be in NYC on the basis of the model we trained on data from our three original cities:

In [7]:
df_nyc = dask_cudf.read_csv('test*.csv')
X_test = df_nyc[['reviews_per_month', 'minimum_nights']].astype(np.float32) \
                                                        .dropna()
fit.predict(X_test) \
   .compute() \
   .head()

0    184.802887
1    188.286636
2    184.802887
3    183.658218
4    186.646774
dtype: float32