## The simpliest usage example of py_boost

### Installation (if needed)

**Note**: replace cupy-cuda110 with your cuda version!!!

In [1]:
# !pip install cupy-cuda110 py-boost

### Imports

In [2]:
import os
# Optional: set the device to run
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

os.makedirs('../data', exist_ok=True)

import joblib
from sklearn.datasets import make_regression
import numpy as np

# simple case - just one class is used
from py_boost import GradientBoosting, TLPredictor, TLCompiledPredictor
from py_boost.cv import CrossValidation

### Generation of dummy regression data

In [3]:
%%time
X, y = make_regression(150000, 100, n_targets=10, random_state=42)
X_test, y_test = X[:50000], y[:50000]
X, y = X[-50000:], y[-50000:]

CPU times: user 2.25 s, sys: 1.7 s, total: 3.94 s
Wall time: 840 ms


### Training a GBDT model

The only argument required here is a loss function. It, together with the input target shape, determines the task type. The loss function can be passed as a Loss instance or using a string alias:

* ***'mse'*** for the regression/multitask regression
* ***'msle'*** for the regression/multitask regression
* ***'bce'*** for the binary/multilabel classification
* ***'crossentropy'*** for the multiclassification

Training is simply done by calling the .fit metod. Possible argumentsare the following:

* ***'X'*** 
* ***'y'*** 
* ***'sample_weight'*** 
* ***'eval_sets'***  
A validation set is passed as a list of dicts with possible keys ['X', 'y', 'sample_weight']. Note: if multiple valid sets are passed, the best model is selected using the last one.

#### The example below illustrates how to train a simple regression task.

In [4]:
%%time
model = GradientBoosting('mse')

model.fit(X, y[:, 0], eval_sets=[{'X': X_test, 'y': y_test[:, 0]},])

[13:46:50] Stdout logging level is INFO.
[13:46:50] GDBT train starts. Max iter 100, early stopping rounds 100
[13:46:50] Iter 0; Sample 0, rmse = 173.68515684801378; 
[13:46:51] Iter 10; Sample 0, rmse = 133.2329530295129; 
[13:46:51] Iter 20; Sample 0, rmse = 107.90963556466895; 
[13:46:51] Iter 30; Sample 0, rmse = 90.084247821124; 
[13:46:51] Iter 40; Sample 0, rmse = 76.43082690808421; 
[13:46:51] Iter 50; Sample 0, rmse = 65.55840290366771; 
[13:46:51] Iter 60; Sample 0, rmse = 56.76828179857891; 
[13:46:51] Iter 70; Sample 0, rmse = 49.56530732970541; 
[13:46:51] Iter 80; Sample 0, rmse = 43.58915629705123; 
[13:46:52] Iter 90; Sample 0, rmse = 38.67230605797685; 
[13:46:52] Iter 99; Sample 0, rmse = 34.99810538399258; 
CPU times: user 8.07 s, sys: 843 ms, total: 8.91 s
Wall time: 7.07 s


<py_boost.gpu.boosting.GradientBoosting at 0x7f226c682e50>

### Traininig a GBDT model in a multiregression case

Each of built-in loss functions has its own default metric, so metric definition is optional. 
If you need to specify the evaluation metric, you can pass a Metric instance or use a string alias.

#### Default metrics:

* ***'rmse'*** is the default for the ***'mse'*** loss
* ***'rmsle'*** is the default for the  ***'msle'*** loss
* ***'bce'*** is the default for the ***'bce'*** loss
* ***'crossentropy'*** is the default for the ***'crossentropy'*** loss

#### Non-default metrics:

* ***'r2'*** for the regression/multitask regression
* ***'auc'*** for the binary classification
* ***'accuracy'*** for any classification task
* ***'precision'*** for any classification task
* ***'recall'*** for any classification task
* ***'f1'*** for any classification task

It is possible to specify other common GBDT hyperparameters as shown below.

#### The following example demonstrates how to train a model for a multioutput regression task (no extra definition needed to switch the task to multioutput one, you just need to pass a multidimensional target).

In [5]:
%%time
model = GradientBoosting('mse', 'r2_score',
                         ntrees=1000, lr=.01, verbose=100, es=200, lambda_l2=1,
                         subsample=.8, colsample=.8, min_data_in_leaf=10, min_gain_to_split=0, 
                         max_bin=256, max_depth=6)

model.fit(X, y, eval_sets=[{'X': X_test, 'y': y_test},])

[13:46:52] Stdout logging level is INFO.
[13:46:52] GDBT train starts. Max iter 1000, early stopping rounds 200
[13:46:52] Iter 0; Sample 0, R2_score = 0.00839443442782406; 
[13:46:54] Iter 100; Sample 0, R2_score = 0.5168097848746246; 
[13:46:56] Iter 200; Sample 0, R2_score = 0.724303880811622; 
[13:46:59] Iter 300; Sample 0, R2_score = 0.8327414123391066; 
[13:47:01] Iter 400; Sample 0, R2_score = 0.8949407844670823; 
[13:47:03] Iter 500; Sample 0, R2_score = 0.932041592450705; 
[13:47:05] Iter 600; Sample 0, R2_score = 0.954701378867479; 
[13:47:08] Iter 700; Sample 0, R2_score = 0.9687442771860237; 
[13:47:10] Iter 800; Sample 0, R2_score = 0.9776054193117595; 
[13:47:12] Iter 900; Sample 0, R2_score = 0.9832915747998822; 
[13:47:14] Iter 999; Sample 0, R2_score = 0.9869841448112808; 
CPU times: user 22.2 s, sys: 2.64 s, total: 24.8 s
Wall time: 22.7 s


<py_boost.gpu.boosting.GradientBoosting at 0x7f22654c43a0>

## Inference

#### Prediction can be done via calling the .predict method

In [6]:
%%time
preds = model.predict(X_test)

preds.shape

CPU times: user 492 ms, sys: 19.9 ms, total: 512 ms
Wall time: 511 ms


(50000, 10)

In [7]:
preds

array([[-231.01654  , -140.90642  , -273.48715  , ..., -133.48174  ,
        -209.62552  , -227.96652  ],
       [-110.47679  , -108.942795 ,  -55.01467  , ..., -125.84514  ,
        -113.05262  ,  -17.904222 ],
       [ -30.944475 ,  -53.859447 ,  147.24858  , ...,   21.4143   ,
         -19.186203 , -204.1296   ],
       ...,
       [ -78.442665 ,  139.2149   ,   84.7346   , ...,  230.55833  ,
          36.89756  ,   18.752386 ],
       [  -6.682429 ,  139.4572   ,  245.92271  , ...,  150.75662  ,
         173.935    ,  206.14671  ],
       [  -2.8269227,   42.224876 ,  172.65169  , ...,   96.963524 ,
          29.174818 ,   14.396346 ]], dtype=float32)

#### Prediction for certan iterations can be done via calling the .predict_staged method

In [8]:
%%time
preds = model.predict_staged(X_test, iterations=[100, 300, 500])

preds.shape

CPU times: user 31.5 ms, sys: 3.89 ms, total: 35.4 ms
Wall time: 34.5 ms


(3, 50000, 10)

#### Tree leaves indicies prediction for certan iterations can be done via calling the .predict_leaves method

In [9]:
%%time
preds = model.predict_leaves(X_test, iterations=[100, 300, 500])

preds.shape

CPU times: user 8.73 ms, sys: 8.16 ms, total: 16.9 ms
Wall time: 16 ms


(3, 50000, 1)

In [10]:
preds.T[0]

array([[14, 17,  9],
       [50, 43, 27],
       [32, 43, 55],
       ...,
       [54, 50,  9],
       [30, 43, 19],
       [60, 43, 27]], dtype=int32)

#### Feature importances

In [11]:
model.get_feature_importance()

array([  33.,   44.,   39.,   51.,   69.,   49., 5690.,   53.,   44.,
         64.,   41.,   51.,   49.,   46.,   47., 6074., 5400.,   51.,
         46., 5448.,   24.,   35.,   48.,   64.,   49.,   28.,   40.,
         47.,   45.,   53.,   54.,   49.,   51.,   44.,   33.,   54.,
       5994.,   33.,   46.,   40.,   53.,   48.,   69.,   39.,   52.,
         43.,   53.,   35.,   56.,   50.,   55.,   53., 5946.,   44.,
         49.,   52.,   60.,   37.,   38.,   37.,   60.,   37.,   37.,
         45.,   68.,   48.,   48.,   37.,   49.,   59.,   35.,   45.,
         43.,   55.,   57.,   38.,   55.,   43.,   44.,   58.,   49.,
         43.,   44.,   47.,   53.,   40., 5528., 3624.,   41., 5803.,
         39., 6118.,   43.,   35.,   66.,   47.,   39.,   48.,   46.,
         59.], dtype=float32)

#### The trained model can be saved as pickle for inference

In [12]:
joblib.dump(model, '../data/temp_model.pkl')

new_model = joblib.load('../data/temp_model.pkl')
new_model.predict(X_test)

array([[-231.01654  , -140.90642  , -273.48715  , ..., -133.48174  ,
        -209.62552  , -227.96652  ],
       [-110.47679  , -108.942795 ,  -55.01467  , ..., -125.84514  ,
        -113.05262  ,  -17.904222 ],
       [ -30.944475 ,  -53.859447 ,  147.24858  , ...,   21.4143   ,
         -19.186203 , -204.1296   ],
       ...,
       [ -78.442665 ,  139.2149   ,   84.7346   , ...,  230.55833  ,
          36.89756  ,   18.752386 ],
       [  -6.682429 ,  139.4572   ,  245.92271  , ...,  150.75662  ,
         173.935    ,  206.14671  ],
       [  -2.8269227,   42.224876 ,  172.65169  , ...,   96.963524 ,
          29.174818 ,   14.396346 ]], dtype=float32)

### CPU Inference via treelite

We can also save model for CPU inference via treelite library. For that purposes please use TL wrappers (both for compiled and built-in inference)

In [13]:
%%time
tl_model = TLPredictor(model)

100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:06<00:00, 156.89it/s]


CPU times: user 6.33 s, sys: 97.8 ms, total: 6.42 s
Wall time: 6.41 s


In [14]:
%%time
tl_model.predict(X_test, nthread=4)

CPU times: user 3.65 s, sys: 22.7 ms, total: 3.67 s
Wall time: 967 ms


array([[-231.01654  , -140.9064   , -273.48718  , ..., -133.48172  ,
        -209.6255   , -227.96655  ],
       [-110.47679  , -108.9428   ,  -55.014668 , ..., -125.84514  ,
        -113.05263  ,  -17.904224 ],
       [ -30.944464 ,  -53.859463 ,  147.2486   , ...,   21.414305 ,
         -19.186205 , -204.1296   ],
       ...,
       [ -78.44266  ,  139.21492  ,   84.734604 , ...,  230.55836  ,
          36.89758  ,   18.752384 ],
       [  -6.682427 ,  139.4572   ,  245.9227   , ...,  150.75665  ,
         173.935    ,  206.14671  ],
       [  -2.8269308,   42.22487  ,  172.65169  , ...,   96.96352  ,
          29.174807 ,   14.396349 ]], dtype=float32)

Treelite model could be saved to folder for using in the next session

In [15]:
%%time
tl_model.dump('../data/tl_dump', rewrite=True)
tl_model = TLPredictor.load('../data/tl_dump')

CPU times: user 12.4 ms, sys: 12.1 ms, total: 24.4 ms
Wall time: 23.1 ms


In [16]:
%%time
tl_model.predict(X_test, nthread=4)

CPU times: user 3.6 s, sys: 6.49 ms, total: 3.61 s
Wall time: 929 ms


array([[-231.01654  , -140.9064   , -273.48718  , ..., -133.48172  ,
        -209.6255   , -227.96655  ],
       [-110.47679  , -108.9428   ,  -55.014668 , ..., -125.84514  ,
        -113.05263  ,  -17.904224 ],
       [ -30.944464 ,  -53.859463 ,  147.2486   , ...,   21.414305 ,
         -19.186205 , -204.1296   ],
       ...,
       [ -78.44266  ,  139.21492  ,   84.734604 , ...,  230.55836  ,
          36.89758  ,   18.752384 ],
       [  -6.682427 ,  139.4572   ,  245.9227   , ...,  150.75665  ,
         173.935    ,  206.14671  ],
       [  -2.8269308,   42.22487  ,  172.65169  , ...,   96.96352  ,
          29.174807 ,   14.396349 ]], dtype=float32)

For better speed up you can compile your model and make inference even more efficient

In [17]:
%%time
tl_compiled = tl_model.compile('gcc', '../data/templib.so', nthread=4)

[13:47:25] ../src/compiler/ast/split.cc:29: Parallel compilation enabled; member trees will be divided into 28 translation units.
CPU times: user 1.33 s, sys: 421 ms, total: 1.75 s
Wall time: 1min 22s


In [18]:
%%time
tl_compiled.predict(X_test)

CPU times: user 7.62 s, sys: 242 ms, total: 7.86 s
Wall time: 1.98 s


array([[-231.01653  , -140.90643  , -273.4874   , ..., -133.48174  ,
        -209.62552  , -227.96649  ],
       [-110.47675  , -108.9429   ,  -55.014687 , ..., -125.84509  ,
        -113.05263  ,  -17.904226 ],
       [ -30.944471 ,  -53.85942  ,  147.24878  , ...,   21.414314 ,
         -19.186174 , -204.12958  ],
       ...,
       [ -78.4427   ,  139.21492  ,   84.73472  , ...,  230.55849  ,
          36.89757  ,   18.752378 ],
       [  -6.6824355,  139.45705  ,  245.92262  , ...,  150.75633  ,
         173.93489  ,  206.14703  ],
       [  -2.8269129,   42.224827 ,  172.65163  , ...,   96.96337  ,
          29.174755 ,   14.396325 ]], dtype=float32)

Compiled model could also be saved, but note - instanse will save only the metadata. After loading, it expect to find compiled library path stay the same. Otherwise, you can update the library path via .set_libpath method

In [19]:
%%time
tl_compiled.dump('../data/tl_compiled.pkl')
tl_compiled = TLCompiledPredictor.load('../data/tl_compiled.pkl')
# optional - if libpath changed or if you want to change nthreads
tl_compiled.set_libpath(nthread=1)

CPU times: user 407 ms, sys: 381 ms, total: 789 ms
Wall time: 333 ms


In [20]:
%%time
tl_compiled.predict(X_test)

CPU times: user 7.62 s, sys: 35.2 ms, total: 7.66 s
Wall time: 7.65 s


array([[-231.01653  , -140.90643  , -273.4874   , ..., -133.48174  ,
        -209.62552  , -227.96649  ],
       [-110.47675  , -108.9429   ,  -55.014687 , ..., -125.84509  ,
        -113.05263  ,  -17.904226 ],
       [ -30.944471 ,  -53.85942  ,  147.24878  , ...,   21.414314 ,
         -19.186174 , -204.12958  ],
       ...,
       [ -78.4427   ,  139.21492  ,   84.73472  , ...,  230.55849  ,
          36.89757  ,   18.752378 ],
       [  -6.6824355,  139.45705  ,  245.92262  , ...,  150.75633  ,
         173.93489  ,  206.14703  ],
       [  -2.8269129,   42.224827 ,  172.65163  , ...,   96.96337  ,
          29.174755 ,   14.396325 ]], dtype=float32)

### Cross Validation

Also py_boost supports built in cross validation wrapper that produce out-of-fold prediction

In [21]:
%%time
model = GradientBoosting('mse')
cv = CrossValidation(model)

oof_pred = cv.fit_predict(X, y, cv=5)

pred = cv.predict(X_test)
((pred - y_test) ** 2).mean() ** .5

[13:48:57] Stdout logging level is INFO.
[13:48:57] GDBT train starts. Max iter 100, early stopping rounds 100
[13:48:57] Iter 0; Sample 0, rmse = 175.91900028751053; 
[13:48:58] Iter 10; Sample 0, rmse = 144.86752139166103; 
[13:48:58] Iter 20; Sample 0, rmse = 123.02825623032723; 
[13:48:58] Iter 30; Sample 0, rmse = 106.3482861664181; 
[13:48:58] Iter 40; Sample 0, rmse = 93.17366772305697; 
[13:48:59] Iter 50; Sample 0, rmse = 82.30149701624404; 
[13:48:59] Iter 60; Sample 0, rmse = 73.16025141008643; 
[13:48:59] Iter 70; Sample 0, rmse = 65.31353463481472; 
[13:48:59] Iter 80; Sample 0, rmse = 58.64621885613945; 
[13:49:00] Iter 90; Sample 0, rmse = 52.96590318099443; 
[13:49:00] Iter 99; Sample 0, rmse = 48.44187587193623; 
[13:49:00] Stdout logging level is INFO.
[13:49:00] GDBT train starts. Max iter 100, early stopping rounds 100
[13:49:00] Iter 0; Sample 0, rmse = 175.84208721886034; 
[13:49:00] Iter 10; Sample 0, rmse = 144.8819560235163; 
[13:49:00] Iter 20; Sample 0, rmse 

47.2855939592917