<a href="https://colab.research.google.com/github/srivatsan88/YouTubeLI/blob/master/Benchmarking_XGBoost_with_Humming_Bird_ML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


Training an XGBoost Classifier 



In [0]:
!pip3 install -U xgboost

In [0]:
import xgboost as xgb

In [0]:
from __future__ import print_function
import sys,tempfile, urllib, os
import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np

In [0]:
from sklearn.datasets import fetch_openml
covtyp = fetch_openml(name='covertype', version=4)

In [0]:
covtyp.data.shape

In [0]:
np.unique(covtyp.target)

In [0]:
!nvidia-smi

In [0]:
cov_df = pd.DataFrame(data= np.c_[covtyp['data'], covtyp['target']],
                     columns= covtyp['feature_names'] + ['target'])

In [0]:
cov_df.memory_usage(index=True).sum()

In [0]:
cov_df.head()

In [0]:
print ("Rows     : " ,cov_df.shape[0])
print ("Columns  : " ,cov_df.shape[1])

In [0]:
cov_df.target.value_counts()

In [0]:
cov_df.dtypes

In [0]:
for cols in cov_df.columns:
  cov_df[cols] = pd.to_numeric(cov_df[cols])

In [0]:
cov_df['target'] = cov_df['target']-1

In [0]:
cov_df_X = cov_df.copy()
cov_df_y =  cov_df_X.pop('target')

In [0]:
X_train, X_test, y_train, y_test = train_test_split(cov_df_X, cov_df_y, train_size=0.75, test_size=0.25)

In [0]:
np.save('test.npy', arr=X_test.to_numpy())

In [0]:
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

In [0]:
!nvidia-smi

In [0]:
import time
#setting tree and tree depth
num_round = 1000
maxdepth = 10
param = {
  'colsample_bylevel': 1,
  'colsample_bytree': 1,
  'gamma': 0,
  'learning_rate': 0.1, 
  'random_state': 1010,
  'objective': 'multi:softmax', 
  'num_class': 7, 
}

In [0]:
param['tree_method'] = 'gpu_hist'
param['grow_policy'] = 'depthwise'
param['max_depth'] = maxdepth
param['max_leaves'] = 0
param['verbosity'] = 0
param['gpu_id'] = 0
param['updater'] = 'grow_gpu_hist'
param['predictor'] = 'gpu_predictor'

gpu_result = {} 

# Training with the above parameters
xgb_model=xgb.train(param, dtrain, num_round, evals=[(dtest, 'test')], evals_result=gpu_result, verbose_eval=20)



In [0]:
xgb_model.save_model('xgb_covtype.model')

Test Performance of XGBoost CPU and GPU

https://scnakandala.github.io/papers/TR_2020_Hummingbird.pdf



In [1]:
!nvidia-smi

Sat Jun 13 01:42:25 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [0]:
import xgboost as xgb
import numpy as np

In [0]:
test=np.load('test.npy')

In [4]:
test

array([[3.355e+03, 5.700e+01, 2.000e+01, ..., 0.000e+00, 0.000e+00,
        1.000e+00],
       [3.150e+03, 2.130e+02, 2.000e+01, ..., 0.000e+00, 0.000e+00,
        0.000e+00],
       [2.952e+03, 2.100e+02, 1.500e+01, ..., 0.000e+00, 0.000e+00,
        0.000e+00],
       ...,
       [2.955e+03, 3.240e+02, 9.000e+00, ..., 0.000e+00, 0.000e+00,
        0.000e+00],
       [2.981e+03, 1.490e+02, 2.200e+01, ..., 0.000e+00, 0.000e+00,
        0.000e+00],
       [2.845e+03, 1.940e+02, 2.800e+01, ..., 0.000e+00, 0.000e+00,
        0.000e+00]])

In [5]:
test.shape

(145253, 54)

In [0]:
dtest_all=xgb.DMatrix(test)
dtest_1=xgb.DMatrix(test[0:1])
dtest_10k=xgb.DMatrix(test[0:10000])

Nrounds = 1000
Tree depth = 10

In [0]:
model = xgb.Booster({'nthread': 2})  
model.load_model('xgb_covtype.model')

In [8]:
%%timeit -r 3
# Run XGBoost native scoring on CPU - For Single Record
model.predict(dtest_1)

The slowest run took 1623.54 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.1 µs per loop


In [9]:
%%timeit -r 3
# Run XGBoost native scoring on CPU - For 10K Record
model.predict(dtest_10k)

The slowest run took 22172.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 319 µs per loop


In [10]:
%%timeit -r 3
# Run XGBoost native scoring on CPU - For 145,000 Record
model.predict(dtest_all)

The slowest run took 20896.49 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 4.82 ms per loop


In [0]:
model_gpu = xgb.Booster({"predictor": "gpu_predictor"})  
model_gpu.load_model('xgb_covtype.model')

In [12]:
%%timeit -r 3
# Run XGBoost native scoring on GPU - For Single Record
model_gpu.predict(dtest_1)

The slowest run took 7700.07 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 47.1 µs per loop


In [13]:
%%timeit -r 3
# Run XGBoost native scoring on GPU - For 10K Record
model_gpu.predict(dtest_10k)

The slowest run took 1390.24 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 69.3 µs per loop


In [14]:
%%timeit -r 3
# Run XGBoost native scoring on GPU - For 145,000 Record
model_gpu.predict(dtest_all)

The slowest run took 1202.11 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 644 µs per loop


Test on Hummingbird for performance.. This will install older version of XGBoost and so might need kernel restart

In [0]:
!pip install hummingbird-ml

In [0]:
import torch
import numpy as np
from hummingbird.ml import convert
import xgboost as xgb

In [0]:
test=np.load('test.npy')

In [3]:
test.shape

(145253, 54)

In [0]:
model = xgb.XGBClassifier()
model.load_model('xgb_covtype.model')

In [0]:
hb_model = convert(model, 'pytorch',extra_config={"n_features":54})

In [6]:
%%timeit -r 3
# Run Hummingbird on CPU - For Single Record
hb_model.predict(test[0:1])

The slowest run took 6.70 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 8.49 ms per loop


In [7]:
%%timeit -r 3
# Run Hummingbird on CPU - For 10K Record
hb_model.predict(test[0:10000])

KeyboardInterrupt: ignored

In [0]:
hb_model.to('cuda')

In [9]:
%%timeit -r 3
# Run Hummingbird on GPU - For Single Record
hb_model.predict(test[0:1])

The slowest run took 46.38 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.17 ms per loop


In [10]:
%%timeit -r 3
# Run Hummingbird on GPU - For 10K Record
hb_model.predict(test[0:10000])

10 loops, best of 3: 180 ms per loop


In [11]:
%%timeit -r 3
# Run Hummingbird on GPU for 125,000 Records.
hb_model.predict(test)

RuntimeError: ignored