# High Performance Jupyter

## GPU time with RAPIDS

<img src="https://rapids.ai/assets/images/RAPIDS-logo-purple.svg" width="400">

We will do the same analysis as as [laptop.ipynb](laptop.ipynb), except accelerated on a GPU using RAPIDS. This notebook should work on any Linux machine with a CUDA-accelerated GPU that [RAPIDS supports](https://rapids.ai/start.html).

Outputs here are from an AWS g4dn.xlarge instance (NVIDIA T4 GPU, 16GB GPU RAM)

Open up a few windows from the JupyterLab NVDashboard pane on the left sidebar to monitor GPU utilization!

In [1]:
# cudf is the RAPIDS dataframe library (a.k.a pandas on GPU)
import cudf  
import numpy as np
import datetime
import s3fs
import warnings
warnings.simplefilter("ignore")

data_path = 's3://nyc-tlc/trip data'
seed = 42

# Load and explore data

Notice that `cudf` has the same API as `pandas`.

In [2]:
fs = s3fs.S3FileSystem(anon=True)

In [3]:
%%time

taxi = cudf.read_csv(
    fs.open(f'{data_path}/yellow_tripdata_2019-01.csv'),
    parse_dates=['tpep_pickup_datetime', 'tpep_dropoff_datetime'],
)

CPU times: user 3.74 s, sys: 1.94 s, total: 5.68 s
Wall time: 8.85 s


In [4]:
print(f"Row count: {len(taxi)}")
print(f"Size in GB: {taxi.memory_usage(deep=True).sum() / 1e9}")

Row count: 7667792
Size in GB: 1.082117204


In [5]:
taxi.head()

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
0,1,2019-01-01 00:46:40,2019-01-01 00:53:20,1,1.5,1,N,151,239,1,7.0,0.5,0.5,1.65,0.0,0.3,9.95,
1,1,2019-01-01 00:59:47,2019-01-01 01:18:59,1,2.6,1,N,239,246,1,14.0,0.5,0.5,1.0,0.0,0.3,16.3,
2,2,2018-12-21 13:48:30,2018-12-21 13:52:40,3,0.0,1,N,236,236,1,4.5,0.5,0.5,0.0,0.0,0.3,5.8,
3,2,2018-11-28 15:52:25,2018-11-28 15:55:45,5,0.0,1,N,193,193,2,3.5,0.5,0.5,0.0,0.0,0.3,7.55,
4,2,2018-11-28 15:56:57,2018-11-28 15:58:33,5,0.0,2,N,193,193,2,52.0,0.0,0.5,0.0,0.0,0.3,55.55,


In [6]:
taxi.dtypes

VendorID                          int64
tpep_pickup_datetime     datetime64[ns]
tpep_dropoff_datetime    datetime64[ns]
passenger_count                   int64
trip_distance                   float64
RatecodeID                        int64
store_and_fwd_flag               object
PULocationID                      int64
DOLocationID                      int64
payment_type                      int64
fare_amount                     float64
extra                           float64
mta_tax                         float64
tip_amount                      float64
tolls_amount                    float64
improvement_surcharge           float64
total_amount                    float64
congestion_surcharge            float64
dtype: object

In [7]:
%%time 
taxi.describe().T

CPU times: user 587 ms, sys: 234 ms, total: 822 ms
Wall time: 821 ms


Unnamed: 0,count,mean,std,min,25%,50%,75%,max
VendorID,7667792.0,1.636775,0.53982,1.0,1.0,2.0,2.0,4.0
passenger_count,7667792.0,1.567078,1.224431,0.0,1.0,1.0,2.0,9.0
trip_distance,7667792.0,2.801084,3.737529,0.0,0.9,1.53,2.8,831.8
RatecodeID,7667792.0,1.058371,0.678089,1.0,1.0,1.0,1.0,99.0
PULocationID,7667792.0,165.500918,66.3918,1.0,130.0,162.0,234.0,265.0
DOLocationID,7667792.0,163.752906,70.364452,1.0,113.0,162.0,234.0,265.0
payment_type,7667792.0,1.291776,0.473323,1.0,1.0,1.0,2.0,4.0
fare_amount,7667792.0,12.409409,262.072058,-362.0,6.0,8.5,13.5,623259.86
extra,7667792.0,0.328039,0.507479,-60.0,0.0,0.0,0.5,535.38
mta_tax,7667792.0,0.496846,0.053378,-0.5,0.5,0.5,0.5,60.8


# Feature engineering

Same feature engineering from [laptop.ipynb](laptop.ipynb), using the same code!

In [8]:
numeric_feat = [
    'pickup_weekday', 
    'pickup_hour', 
    'pickup_week_hour', 
    'pickup_minute', 
    'passenger_count',
]
categorical_feat = [
    'PULocationID', 
    'DOLocationID',
]
features = numeric_feat + categorical_feat
y_col = 'high_tip'

In [9]:
def prep_df(df: cudf.DataFrame) -> cudf.DataFrame:
    '''
    Generate features from a raw taxi dataframe.
    Use 32 bit precision for GPU processing
    '''
    df = df[df.fare_amount > 0]  # avoid divide-by-zero
    df['tip_fraction'] = df.tip_amount / df.fare_amount
    df['high_tip'] = (df['tip_fraction'] > 0.2) # class label
    
    df['pickup_weekday'] = df.tpep_pickup_datetime.dt.weekday
    # as of version 0.15, cudf doesn't support weekofyear
    # df['pickup_weekofyear'] = df.tpep_pickup_datetime.dt.weekofyear
    df['pickup_hour'] = df.tpep_pickup_datetime.dt.hour
    df['pickup_week_hour'] = (df.pickup_weekday * 24) + df.pickup_hour
    df['pickup_minute'] = df.tpep_pickup_datetime.dt.minute
    df = df[features + [y_col]].astype('float32').fillna(-1)
    df[y_col] = df[y_col].astype('int32')
    
    return df
    
taxi = prep_df(taxi)

In [10]:
taxi.head()

Unnamed: 0,pickup_weekday,pickup_hour,pickup_week_hour,pickup_minute,passenger_count,PULocationID,DOLocationID,high_tip
0,1.0,0.0,24.0,46.0,1.0,151.0,239.0,1
1,1.0,0.0,24.0,59.0,1.0,239.0,246.0,0
2,4.0,13.0,109.0,48.0,3.0,236.0,236.0,0
3,2.0,15.0,63.0,52.0,5.0,193.0,193.0,0
4,2.0,15.0,63.0,56.0,5.0,193.0,193.0,0


## Random forest

We're doing the same data splitting and training the same model as [laptop.ipynb](laptop.ipynb).

In [11]:
%%time
from cuml.preprocessing import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    taxi[features], taxi[y_col], test_size=0.33, random_state=seed)

CPU times: user 499 ms, sys: 137 ms, total: 636 ms
Wall time: 636 ms


In [12]:
from cuml.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(
    n_estimators=100, 
    max_depth=5, 
    seed=seed,
)

In [13]:
%%time
_ = rfc.fit(X_train, y_train)

CPU times: user 3.19 s, sys: 1.35 s, total: 4.54 s
Wall time: 1.31 s


In [14]:
%%time
from cuml.metrics import roc_auc_score

preds = rfc.predict_proba(X_test)[1]
roc_auc_score(y_test, preds)

CPU times: user 674 ms, sys: 160 ms, total: 833 ms
Wall time: 466 ms


array(0.5313459, dtype=float32)

# Nice one!

This is way faster than [laptop.ipynb](laptop.ipynb), hooray for GPUs! The next logical step is: what if my data is too big for memory? 

Dask saves the day once again! RAPIDS uses Dask for parallelizing GPU computation across multi-node multi-GPU settings. Buckle up, and check out [rapids-dask.ipynb](rapids-dask.ipynb).