- Using the pre-build RAPIDS image on Google Cloud's AI Platform Notebooks with a T4 GPU, 8vCPUs, 30GB RAM
- https://cloud.google.com/ai-platform/notebooks/docs/images#deciding
- This should provide CUDA 10.0, rapids 0.12

In [None]:
%%bash
nvidia-smi
nvcc --version

In [None]:
#!conda create -n rapids-0.16 -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.16 python=3.7 cudatoolkit=11.0

In [1]:
import numpy as np; print('numpy Version:', np.__version__)
import pandas as pd; print('pandas Version:', pd.__version__)
import xgboost as xgb; print('XGBoost Version:', xgb.__version__)
import cudf; print('cudf Version:', cudf.__version__)
import cuml; print('cudf Version:', cuml.__version__)
import gcsfs; print('gcsfs Version:', gcsfs.__version__)
import time

numpy Version: 1.19.4
pandas Version: 1.1.4
XGBoost Version: 1.3.0-SNAPSHOT
cudf Version: 0.16.0
cudf Version: 0.16.0
gcsfs Version: 0.7.1


Download HIGGs dataset & unzip
https://archive.ics.uci.edu/ml/datasets/HIGGS

In [None]:
# %%bash
# wget https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz -P /home/jupyter/
# gzip -d /home/jupyter/HIGGS.csv.gz /home/jupyter/
# ls -lh /home/jupyter/

In [2]:
colnames = ['label'] + ['feature-%02d' % i for i in range(1, 29)]
#filname = '/home/jupyter/HIGGS.csv'
filname = 'gs://mchrestkha-github-ml-examples/higgs/HIGGS.csv'

In [5]:
start_time = time.time()
df=cudf.read_csv(filname, header=None, names=colnames)
print("[INFO]: ------ Data Ingestion is completed in {} seconds ---".format((time.time() - start_time)))
start_time = time.time()
X = df[df.columns.difference(['label'])]
y = df['label']
dtrain=xgb.DMatrix(X,y)
print("[INFO]: ------ DMatrix is completed in {} seconds ---".format((time.time() - start_time)))

start_time = time.time()
param =  {
               'max_depth': 8,
               'objective': 'reg:squarederror',
               'tree_method': 'gpu_hist'
             }
bst = xgb.train(param, dtrain)
print("[INFO]: ------ Training is completed in {} seconds ---".format((time.time() - start_time)))

[INFO]: ------ Data Ingestion is completed in 40.777522802352905 seconds ---
[INFO]: ------ DMatrix is completed in 0.21851801872253418 seconds ---
[INFO]: ------ Training is completed in 4.119179964065552 seconds ---


## Single Node with CPUs (PANDAS + XGBoost) or single GPU (RAPIDS-cuDF + XGBoost)
- XGBoost w/ RAPIDS examples https://rapids.ai/xgboost.html

### Expected CPUs numbers
[INFO]: ------ Data Ingestion is completed in 104.7611632347107 seconds ---   
TOD0: Add Data transformation steps  
[INFO]: ------ Training is completed in 30.218074321746826 seconds ---

#### Expected GPU numbers
[INFO]: ------ Data Ingestion is completed in 18.212464094161987 seconds ---  
TOD0: Add Data transformation steps  
[INFO]: ------ Training is completed in 5.825598955154419 seconds ---

In [13]:
def xgboost_fun(gpu_cpu, tree_method, filname):
    colnames = ['label'] + ['feature-%02d' % i for i in range(1, 29)]
    
    start_time = time.time()
    if gpu_cpu=='cpu':
        df=pd.read_csv(filname, header=None, names=colnames)
    else: 
        df=cudf.read_csv(filname, header=None, names=colnames)
    print("[INFO]: ------ Data Ingestion is completed in {} seconds ---".format((time.time() - start_time)))

    start_time = time.time()
    X = df[df.columns.difference(['label'])]
    y = df['label']
    dtrain=xgb.DMatrix(X,y)
    print("[INFO]: ------ DMatrix is completed in {} seconds ---".format((time.time() - start_time)))

    start_time = time.time()
    param =  {
    #           'max_depth': 8,
               'objective': 'reg:squarederror',
               'tree_method': tree_method
             }
    bst = xgb.train(param, dtrain)
    print("[INFO]: ------ Training is completed in {} seconds ---".format((time.time() - start_time)))
    return bst

In [14]:
bst=xgboost_fun('gpu','gpu_hist',filname)

[INFO]: ------ Data Ingestion is completed in 37.2753381729126 seconds ---
[INFO]: ------ DMatrix is completed in 0.08628320693969727 seconds ---
[INFO]: ------ Training is completed in 3.8763697147369385 seconds ---


In [15]:
bst=xgboost_fun('cpu','hist',filname)

[INFO]: ------ Data Ingestion is completed in 209.98585748672485 seconds ---
[INFO]: ------ DMatrix is completed in 3.794123888015747 seconds ---
[INFO]: ------ Training is completed in 16.614694833755493 seconds ---


## TODO: Single Node with multiple GPUS (Dask + RAPIDS) --- Scales to 4 T4s, 8 V100s, or 16 A100s on GCP


## TODO: Multi-Node with multiple GPUS (Dask + RAPIDS) Scales to 64+ GPUs