## The simpliest usage example of py_boost

### Installation (if needed)

**Note**: replace cupy-cuda110 with your cuda version!!!

In [1]:
# !pip install cupy-cuda110 py-boost

### Imports

In [2]:
import os
# Optional: set the device to run
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

os.makedirs('../data', exist_ok=True)

import joblib
from sklearn.datasets import make_regression
import numpy as np

# simple case - just one class is used
from py_boost import GradientBoosting, TLPredictor, TLCompiledPredictor
from py_boost.cv import CrossValidation

### Generation of dummy regression data

In [3]:
%%time
X, y = make_regression(150000, 100, n_targets=10, random_state=42)
X_test, y_test = X[:50000], y[:50000]
X, y = X[-50000:], y[-50000:]

CPU times: user 3.2 s, sys: 370 ms, total: 3.57 s
Wall time: 866 ms


### Training a GBDT model

The only argument required here is a loss function. It, together with the input target shape, determines the task type. The loss function can be passed as a Loss instance or using a string alias:

* ***'mse'*** for the regression/multitask regression
* ***'msle'*** for the regression/multitask regression
* ***'bce'*** for the binary/multilabel classification
* ***'crossentropy'*** for the multiclassification

Training is simply done by calling the .fit metod. Possible argumentsare the following:

* ***'X'*** 
* ***'y'*** 
* ***'sample_weight'*** 
* ***'eval_sets'***  
A validation set is passed as a list of dicts with possible keys ['X', 'y', 'sample_weight']. Note: if multiple valid sets are passed, the best model is selected using the last one.

#### The example below illustrates how to train a simple regression task.

In [4]:
%%time
model = GradientBoosting('mse')

model.fit(X, y[:, 0], eval_sets=[{'X': X_test, 'y': y_test[:, 0]},])

[16:08:04] Stdout logging level is INFO.
[16:08:04] GDBT train starts. Max iter 100, early stopping rounds 100
[16:08:05] Iter 0; Sample 0, rmse = 173.68515691977944; 
[16:08:05] Iter 10; Sample 0, rmse = 133.23295011038803; 
[16:08:05] Iter 20; Sample 0, rmse = 107.90963333026548; 
[16:08:05] Iter 30; Sample 0, rmse = 90.08368631256529; 
[16:08:05] Iter 40; Sample 0, rmse = 76.43011229452102; 
[16:08:05] Iter 50; Sample 0, rmse = 65.57255537063156; 
[16:08:05] Iter 60; Sample 0, rmse = 56.77176734590884; 
[16:08:05] Iter 70; Sample 0, rmse = 49.60711914100726; 
[16:08:06] Iter 80; Sample 0, rmse = 43.62769085132933; 
[16:08:06] Iter 90; Sample 0, rmse = 38.6875278370893; 
[16:08:06] Iter 99; Sample 0, rmse = 35.01089441400534; 
CPU times: user 6.83 s, sys: 274 ms, total: 7.1 s
Wall time: 4.96 s


<py_boost.gpu.boosting.GradientBoosting at 0x7f0e81c43400>

### Traininig a GBDT model in a multiregression case

Each of built-in loss functions has its own default metric, so metric definition is optional. 
If you need to specify the evaluation metric, you can pass a Metric instance or use a string alias.

#### Default metrics:

* ***'rmse'*** is the default for the ***'mse'*** loss
* ***'rmsle'*** is the default for the  ***'msle'*** loss
* ***'bce'*** is the default for the ***'bce'*** loss
* ***'crossentropy'*** is the default for the ***'crossentropy'*** loss

#### Non-default metrics:

* ***'r2'*** for the regression/multitask regression
* ***'auc'*** for the binary classification
* ***'accuracy'*** for any classification task
* ***'precision'*** for any classification task
* ***'recall'*** for any classification task
* ***'f1'*** for any classification task

It is possible to specify other common GBDT hyperparameters as shown below.

#### The following example demonstrates how to train a model for a multioutput regression task (no extra definition needed to switch the task to multioutput one, you just need to pass a multidimensional target).

In [5]:
%%time
model = GradientBoosting('mse', 'r2_score',
                         ntrees=1000, lr=.01, verbose=100, es=200, lambda_l2=1,
                         subsample=.8, colsample=.8, min_data_in_leaf=10, min_gain_to_split=0, 
                         max_bin=256, max_depth=6)

model.fit(X, y, eval_sets=[{'X': X_test, 'y': y_test},])

[16:08:06] Stdout logging level is INFO.
[16:08:06] GDBT train starts. Max iter 1000, early stopping rounds 200
[16:08:06] Iter 0; Sample 0, R2_score = 0.008394434570827336; 
[16:08:08] Iter 100; Sample 0, R2_score = 0.5167311821304741; 
[16:08:10] Iter 200; Sample 0, R2_score = 0.724112018222454; 
[16:08:12] Iter 300; Sample 0, R2_score = 0.8326867808861295; 
[16:08:14] Iter 400; Sample 0, R2_score = 0.894878283490004; 
[16:08:16] Iter 500; Sample 0, R2_score = 0.9320441058906963; 
[16:08:18] Iter 600; Sample 0, R2_score = 0.9546838153329367; 
[16:08:20] Iter 700; Sample 0, R2_score = 0.9687535106845113; 
[16:08:22] Iter 800; Sample 0, R2_score = 0.9776117268479518; 
[16:08:24] Iter 900; Sample 0, R2_score = 0.9832966630583361; 
[16:08:26] Iter 999; Sample 0, R2_score = 0.9869894787290912; 
CPU times: user 22.2 s, sys: 213 ms, total: 22.4 s
Wall time: 20.2 s


<py_boost.gpu.boosting.GradientBoosting at 0x7f0f634abb20>

## Inference

#### Prediction can be done via calling the .predict method

In [6]:
%%time
preds = model.predict(X_test)

preds.shape

CPU times: user 2.21 s, sys: 846 ms, total: 3.05 s
Wall time: 3.07 s


(50000, 10)

In [7]:
preds

array([[-223.87396  , -135.50194  , -265.8046   , ..., -129.46918  ,
        -205.92981  , -221.22426  ],
       [-105.44449  , -102.702545 ,  -46.391914 , ..., -119.10743  ,
        -103.06447  ,   -8.659139 ],
       [ -44.662792 ,  -64.49487  ,  139.20688  , ...,   14.63566  ,
         -27.567158 , -215.73244  ],
       ...,
       [ -88.6747   ,  126.8604   ,   77.65632  , ...,  221.90102  ,
          27.670258 ,    5.0210505],
       [  -5.025809 ,  140.92496  ,  243.93556  , ...,  150.28708  ,
         173.07477  ,  206.62967  ],
       [  -9.335781 ,   37.36901  ,  169.7846   , ...,   94.27019  ,
          27.018436 ,    8.480061 ]], dtype=float32)

#### Prediction for certan iterations can be done via calling the .predict_staged method

In [8]:
%%time
preds = model.predict_staged(X_test, iterations=[100, 300, 500])

preds.shape

CPU times: user 33 ms, sys: 8.07 ms, total: 41.1 ms
Wall time: 39.3 ms


(3, 50000, 10)

#### Tree leaves indicies prediction for certan iterations can be done via calling the .predict_leaves method

In [9]:
%%time
preds = model.predict_leaves(X_test, iterations=[100, 300, 500])

preds.shape

CPU times: user 17.6 ms, sys: 307 µs, total: 17.9 ms
Wall time: 14 ms


(3, 50000, 1)

In [10]:
preds.T[0]

array([[10, 41, 10],
       [51, 41, 28],
       [32, 41, 54],
       ...,
       [54, 48, 10],
       [27, 41, 20],
       [60, 41, 28]], dtype=int32)

#### Feature importances

In [11]:
model.get_feature_importance()

array([  45.,   48.,   36.,   43.,   55.,   44., 5515.,   47.,   33.,
         70.,   35.,   46.,   45.,   44.,   36., 5894., 5514.,   47.,
         35., 5529.,   37.,   38.,   42.,   69.,   40.,   31.,   60.,
         62.,   40.,   46.,   50.,   59.,   61.,   57.,   38.,   42.,
       5865.,   40.,   48.,   37.,   56.,   35.,   51.,   55.,   38.,
         44.,   50.,   59.,   53.,   40.,   42.,   50., 5952.,   48.,
         56.,   71.,   50.,   35.,   50.,   41.,   49.,   35.,   35.,
         46.,   49.,   39.,   38.,   51.,   34.,   63.,   48.,   46.,
         44.,   31.,   46.,   56.,   47.,   32.,   44.,   45.,   58.,
         37.,   30.,   55.,   51.,   33., 5589., 3564.,   41., 5851.,
         46., 6212.,   54.,   39.,   52.,   51.,   35.,   49.,   42.,
         52.], dtype=float32)

#### The trained model can be saved as pickle for inference



In [12]:
joblib.dump(model, '../data/temp_model.pkl')

new_model = joblib.load('../data/temp_model.pkl')
new_model.predict(X_test)

array([[-223.87396  , -135.50194  , -265.8046   , ..., -129.46918  ,
        -205.92981  , -221.22426  ],
       [-105.44449  , -102.702545 ,  -46.391914 , ..., -119.10743  ,
        -103.06447  ,   -8.659139 ],
       [ -44.662792 ,  -64.49487  ,  139.20688  , ...,   14.63566  ,
         -27.567158 , -215.73244  ],
       ...,
       [ -88.6747   ,  126.8604   ,   77.65632  , ...,  221.90102  ,
          27.670258 ,    5.0210505],
       [  -5.025809 ,  140.92496  ,  243.93556  , ...,  150.28708  ,
         173.07477  ,  206.62967  ],
       [  -9.335781 ,   37.36901  ,  169.7846   , ...,   94.27019  ,
          27.018436 ,    8.480061 ]], dtype=float32)

#### Alternative and more stable way to serialize is to dump as json

***Note*** : It is important to load json to the GradientBoosting instance with the same loss function! Loss function is important for inference since it contains the type of postprocessing function (for example sigmoid for bce or softmax for crossentropy). For the details, please check [Tutorial_3_Custom_features](https://github.com/sb-ai-lab/Py-Boost/blob/master/tutorials/Tutorial_3_Custom_features.ipynb)

In [13]:
model.dump('../data/temp_model.pb')

# restore model
new_model = GradientBoosting('mse').load('../data/temp_model.pb')
new_model.predict(X_test)

100%|██████████| 1000/1000 [00:00<00:00, 1646.10it/s]


array([[-223.87396  , -135.50194  , -265.8046   , ..., -129.46918  ,
        -205.92981  , -221.22426  ],
       [-105.44449  , -102.702545 ,  -46.391914 , ..., -119.10743  ,
        -103.06447  ,   -8.659139 ],
       [ -44.662792 ,  -64.49487  ,  139.20688  , ...,   14.63566  ,
         -27.567158 , -215.73244  ],
       ...,
       [ -88.6747   ,  126.8604   ,   77.65632  , ...,  221.90102  ,
          27.670258 ,    5.0210505],
       [  -5.025809 ,  140.92496  ,  243.93556  , ...,  150.28708  ,
         173.07477  ,  206.62967  ],
       [  -9.335781 ,   37.36901  ,  169.7846   , ...,   94.27019  ,
          27.018436 ,    8.480061 ]], dtype=float32)

### CPU Inference via treelite

We can also save model for CPU inference via treelite library. For that purposes please use TL wrappers (both for compiled and built-in inference)

In [14]:
%%time
tl_model = TLPredictor(model)

100%|██████████| 1000/1000 [00:05<00:00, 180.50it/s]


CPU times: user 5.53 s, sys: 124 ms, total: 5.65 s
Wall time: 5.58 s


In [15]:
%%time
tl_model.predict(X_test, nthread=4)



CPU times: user 3.36 s, sys: 23.4 ms, total: 3.38 s
Wall time: 900 ms


array([[-223.87396  , -135.50195  , -265.8046   , ..., -129.46918  ,
        -205.92981  , -221.22426  ],
       [-105.44449  , -102.70254  ,  -46.39192  , ..., -119.10744  ,
        -103.06447  ,   -8.659145 ],
       [ -44.66279  ,  -64.494865 ,  139.2069   , ...,   14.635647 ,
         -27.567156 , -215.73244  ],
       ...,
       [ -88.67468  ,  126.860374 ,   77.65633  , ...,  221.90102  ,
          27.670267 ,    5.0210514],
       [  -5.025807 ,  140.92497  ,  243.93558  , ...,  150.28708  ,
         173.07477  ,  206.62967  ],
       [  -9.335782 ,   37.36902  ,  169.7846   , ...,   94.27018  ,
          27.018427 ,    8.480059 ]], dtype=float32)

Treelite model could be saved to folder for using in the next session

In [16]:
%%time
tl_model.dump('../data/tl_dump', rewrite=True)
tl_model = TLPredictor.load('../data/tl_dump')

CPU times: user 31.7 ms, sys: 31.6 ms, total: 63.3 ms
Wall time: 63.7 ms


In [17]:
%%time
tl_model.predict(X_test, nthread=4)

CPU times: user 3.35 s, sys: 7.27 ms, total: 3.36 s
Wall time: 856 ms


array([[-223.87396  , -135.50195  , -265.8046   , ..., -129.46918  ,
        -205.92981  , -221.22426  ],
       [-105.44449  , -102.70254  ,  -46.39192  , ..., -119.10744  ,
        -103.06447  ,   -8.659145 ],
       [ -44.66279  ,  -64.494865 ,  139.2069   , ...,   14.635647 ,
         -27.567156 , -215.73244  ],
       ...,
       [ -88.67468  ,  126.860374 ,   77.65633  , ...,  221.90102  ,
          27.670267 ,    5.0210514],
       [  -5.025807 ,  140.92497  ,  243.93558  , ...,  150.28708  ,
         173.07477  ,  206.62967  ],
       [  -9.335782 ,   37.36902  ,  169.7846   , ...,   94.27018  ,
          27.018427 ,    8.480059 ]], dtype=float32)

For better speed up you can compile your model and make inference even more efficient

In [18]:
%%time
tl_compiled = tl_model.compile('gcc', '../data/templib.so', nthread=4)

[16:08:40] ../src/compiler/ast/split.cc:29: Parallel compilation enabled; member trees will be divided into 56 translation units.




CPU times: user 1.29 s, sys: 90.3 ms, total: 1.38 s
Wall time: 56.2 s




In [19]:
%%time
tl_compiled.predict(X_test)



CPU times: user 6.62 s, sys: 588 ms, total: 7.21 s
Wall time: 1.83 s


array([[-223.87384  , -135.502    , -265.80447  , ..., -129.46922  ,
        -205.92975  , -221.22443  ],
       [-105.44452  , -102.70257  ,  -46.391895 , ..., -119.1074   ,
        -103.06445  ,   -8.659153 ],
       [ -44.662804 ,  -64.4949   ,  139.20688  , ...,   14.63566  ,
         -27.567184 , -215.73254  ],
       ...,
       [ -88.67466  ,  126.86047  ,   77.65633  , ...,  221.90108  ,
          27.670267 ,    5.021043 ],
       [  -5.0258074,  140.92477  ,  243.93553  , ...,  150.28697  ,
         173.0748   ,  206.62964  ],
       [  -9.335763 ,   37.36898  ,  169.78474  , ...,   94.270195 ,
          27.01838  ,    8.4801   ]], dtype=float32)

Compiled model could also be saved, but note - instanse will save only the metadata. After loading, it expect to find compiled library path stay the same. Otherwise, you can update the library path via .set_libpath method

In [20]:
%%time
tl_compiled.dump('../data/tl_compiled.pkl')
tl_compiled = TLCompiledPredictor.load('../data/tl_compiled.pkl')
# optional - if libpath changed or if you want to change nthreads
tl_compiled.set_libpath(nthread=1)

CPU times: user 441 ms, sys: 1.27 s, total: 1.71 s
Wall time: 732 ms


In [21]:
%%time
tl_compiled.predict(X_test)

CPU times: user 6.5 s, sys: 3.48 ms, total: 6.5 s
Wall time: 6.49 s


array([[-223.87384  , -135.502    , -265.80447  , ..., -129.46922  ,
        -205.92975  , -221.22443  ],
       [-105.44452  , -102.70257  ,  -46.391895 , ..., -119.1074   ,
        -103.06445  ,   -8.659153 ],
       [ -44.662804 ,  -64.4949   ,  139.20688  , ...,   14.63566  ,
         -27.567184 , -215.73254  ],
       ...,
       [ -88.67466  ,  126.86047  ,   77.65633  , ...,  221.90108  ,
          27.670267 ,    5.021043 ],
       [  -5.0258074,  140.92477  ,  243.93553  , ...,  150.28697  ,
         173.0748   ,  206.62964  ],
       [  -9.335763 ,   37.36898  ,  169.78474  , ...,   94.270195 ,
          27.01838  ,    8.4801   ]], dtype=float32)

### Cross Validation

Also py_boost supports built in cross validation wrapper that produce out-of-fold prediction

In [22]:
%%time
model = GradientBoosting('mse')
cv = CrossValidation(model)

oof_pred = cv.fit_predict(X, y, cv=5)

pred = cv.predict(X_test)
((pred - y_test) ** 2).mean() ** .5

[16:09:45] Stdout logging level is INFO.
[16:09:45] GDBT train starts. Max iter 100, early stopping rounds 100
[16:09:45] Iter 0; Sample 0, rmse = 175.61429211372365; 
[16:09:45] Iter 10; Sample 0, rmse = 144.7971654806504; 
[16:09:45] Iter 20; Sample 0, rmse = 122.99271744910226; 
[16:09:46] Iter 30; Sample 0, rmse = 106.4159816075694; 
[16:09:46] Iter 40; Sample 0, rmse = 93.21293236506831; 
[16:09:46] Iter 50; Sample 0, rmse = 82.41181507620963; 
[16:09:46] Iter 60; Sample 0, rmse = 73.33020428706712; 
[16:09:46] Iter 70; Sample 0, rmse = 65.59096934329497; 
[16:09:47] Iter 80; Sample 0, rmse = 58.97445734858388; 
[16:09:47] Iter 90; Sample 0, rmse = 53.305058355844025; 
[16:09:47] Iter 99; Sample 0, rmse = 48.81809291377473; 
[16:09:47] Stdout logging level is INFO.
[16:09:47] GDBT train starts. Max iter 100, early stopping rounds 100
[16:09:47] Iter 0; Sample 0, rmse = 177.1335485003128; 
[16:09:47] Iter 10; Sample 0, rmse = 145.70845025808; 
[16:09:48] Iter 20; Sample 0, rmse = 1

47.35542441585655