## The simpliest usage example of py_boost

### Installation (if needed)

**Note**: replace cupy-cuda110 with your cuda version!!!

In [1]:
# !pip install cupy-cuda110 py-boost

### Imports

In [2]:
import os
# Optional: set the device to run
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

os.makedirs('../data', exist_ok=True)

import joblib
from sklearn.datasets import make_regression
import numpy as np

# simple case - just one class is used
from py_boost import GradientBoosting, TLPredictor, TLCompiledPredictor
from py_boost.cv import CrossValidation

### Generation of dummy regression data

In [3]:
%%time
X, y = make_regression(150000, 100, n_targets=10, random_state=42)
X_test, y_test = X[:50000], y[:50000]
X, y = X[-50000:], y[-50000:]

CPU times: user 2.2 s, sys: 1.75 s, total: 3.95 s
Wall time: 840 ms


### Training a GBDT model

The only argument required here is a loss function. It, together with the input target shape, determines the task type. The loss function can be passed as a Loss instance or using a string alias:

* ***'mse'*** for the regression/multitask regression
* ***'msle'*** for the regression/multitask regression
* ***'bce'*** for the binary/multilabel classification
* ***'crossentropy'*** for the multiclassification

Training is simply done by calling the .fit metod. Possible argumentsare the following:

* ***'X'*** 
* ***'y'*** 
* ***'sample_weight'*** 
* ***'eval_sets'***  
A validation set is passed as a list of dicts with possible keys ['X', 'y', 'sample_weight']. Note: if multiple valid sets are passed, the best model is selected using the last one.

#### The example below illustrates how to train a simple regression task.

In [4]:
%%time
model = GradientBoosting('mse')

model.fit(X, y[:, 0], eval_sets=[{'X': X_test, 'y': y_test[:, 0]},])

[19:14:36] Stdout logging level is INFO.
[19:14:36] GDBT train starts. Max iter 100, early stopping rounds 100
[19:14:37] Iter 0; Sample 0, rmse = 173.68515729678407; 
[19:14:37] Iter 10; Sample 0, rmse = 133.23291182753496; 
[19:14:37] Iter 20; Sample 0, rmse = 107.9095745634706; 
[19:14:37] Iter 30; Sample 0, rmse = 90.08428101668538; 
[19:14:37] Iter 40; Sample 0, rmse = 76.43099211547967; 
[19:14:37] Iter 50; Sample 0, rmse = 65.55844096384095; 
[19:14:37] Iter 60; Sample 0, rmse = 56.76824308107686; 
[19:14:37] Iter 70; Sample 0, rmse = 49.56542493520833; 
[19:14:37] Iter 80; Sample 0, rmse = 43.58938208881945; 
[19:14:37] Iter 90; Sample 0, rmse = 38.672492217206646; 
[19:14:37] Iter 99; Sample 0, rmse = 34.997985689171; 
CPU times: user 7.4 s, sys: 1.4 s, total: 8.79 s
Wall time: 6.74 s


<py_boost.gpu.boosting.GradientBoosting at 0x7f4b6470cd30>

### Traininig a GBDT model in a multiregression case

Each of built-in loss functions has its own default metric, so metric definition is optional. 
If you need to specify the evaluation metric, you can pass a Metric instance or use a string alias.

#### Default metrics:

* ***'rmse'*** is the default for the ***'mse'*** loss
* ***'rmsle'*** is the default for the  ***'msle'*** loss
* ***'bce'*** is the default for the ***'bce'*** loss
* ***'crossentropy'*** is the default for the ***'crossentropy'*** loss

#### Non-default metrics:

* ***'r2'*** for the regression/multitask regression
* ***'auc'*** for the binary classification
* ***'accuracy'*** for any classification task
* ***'precision'*** for any classification task
* ***'recall'*** for any classification task
* ***'f1'*** for any classification task

It is possible to specify other common GBDT hyperparameters as shown below.

#### The following example demonstrates how to train a model for a multioutput regression task (no extra definition needed to switch the task to multioutput one, you just need to pass a multidimensional target).

In [5]:
%%time
model = GradientBoosting('mse', 'r2_score',
                         ntrees=1000, lr=.01, verbose=100, es=200, lambda_l2=1,
                         subsample=.8, colsample=.8, min_data_in_leaf=10, min_gain_to_split=0, 
                         max_bin=256, max_depth=6)

model.fit(X, y, eval_sets=[{'X': X_test, 'y': y_test},])

[19:14:38] Stdout logging level is INFO.
[19:14:38] GDBT train starts. Max iter 1000, early stopping rounds 200
[19:14:38] Iter 0; Sample 0, R2_score = 0.008394434549009278; 
[19:14:40] Iter 100; Sample 0, R2_score = 0.5168094390086947; 
[19:14:42] Iter 200; Sample 0, R2_score = 0.7243032718479143; 
[19:14:44] Iter 300; Sample 0, R2_score = 0.8327191017153759; 
[19:14:46] Iter 400; Sample 0, R2_score = 0.8949604280015734; 
[19:14:48] Iter 500; Sample 0, R2_score = 0.9320386213143411; 
[19:14:50] Iter 600; Sample 0, R2_score = 0.9546872525005246; 
[19:14:53] Iter 700; Sample 0, R2_score = 0.968740322258984; 
[19:14:55] Iter 800; Sample 0, R2_score = 0.9776125682679488; 
[19:14:57] Iter 900; Sample 0, R2_score = 0.983301943221187; 
[19:14:59] Iter 999; Sample 0, R2_score = 0.9869841095038282; 
CPU times: user 20.4 s, sys: 2.82 s, total: 23.2 s
Wall time: 21.5 s


<py_boost.gpu.boosting.GradientBoosting at 0x7f4b19167430>

## Inference

#### Prediction can be done via calling the .predict method

In [6]:
%%time
preds = model.predict(X_test)

preds.shape

CPU times: user 1.21 s, sys: 561 ms, total: 1.77 s
Wall time: 1.78 s


(50000, 10)

In [7]:
preds

array([[-231.57986  , -141.4888   , -276.73294  , ..., -134.38333  ,
        -211.01692  , -229.32335  ],
       [-118.59609  , -115.6356   ,  -60.311188 , ..., -132.56573  ,
        -119.318855 ,  -24.551783 ],
       [ -33.319748 ,  -56.017963 ,  146.31969  , ...,   20.21881  ,
         -21.19869  , -207.61913  ],
       ...,
       [ -83.02409  ,  129.98756  ,   72.42775  , ...,  219.36597  ,
          28.731098 ,   10.504779 ],
       [  -7.9215307,  136.1649   ,  244.74832  , ...,  147.08717  ,
         171.48286  ,  204.75542  ],
       [ -10.4112625,   38.52069  ,  169.79082  , ...,   95.44684  ,
          26.127382 ,    7.7268643]], dtype=float32)

#### Prediction for certan iterations can be done via calling the .predict_staged method

In [8]:
%%time
preds = model.predict_staged(X_test, iterations=[100, 300, 500])

preds.shape

CPU times: user 331 ms, sys: 244 ms, total: 575 ms
Wall time: 582 ms


(3, 50000, 10)

#### Tree leaves indicies prediction for certan iterations can be done via calling the .predict_leaves method

In [9]:
%%time
preds = model.predict_leaves(X_test, iterations=[100, 300, 500])

preds.shape

CPU times: user 18.3 ms, sys: 0 ns, total: 18.3 ms
Wall time: 17 ms


(3, 50000, 1)

In [10]:
preds.T[0]

array([[14, 17, 19],
       [50, 43, 15],
       [32, 43, 22],
       ...,
       [54, 50, 28],
       [30, 43, 19],
       [60, 43, 27]], dtype=int32)

#### Feature importances

In [11]:
model.get_feature_importance()

array([  46.,   60.,   44.,   40.,   56.,   45., 5621.,   55.,   46.,
         53.,   33.,   70.,   43.,   52.,   40., 6120., 5571.,   42.,
         39., 5477.,   29.,   41.,   29.,   72.,   45.,   34.,   42.,
         56.,   51.,   42.,   43.,   57.,   64.,   41.,   47.,   53.,
       5946.,   36.,   36.,   41.,   55.,   51.,   53.,   45.,   39.,
         56.,   44.,   33.,   45.,   53.,   36.,   49., 5845.,   40.,
         49.,   63.,   46.,   42.,   41.,   46.,   53.,   37.,   43.,
         45.,   42.,   44.,   41.,   42.,   35.,   43.,   47.,   42.,
         54.,   41.,   41.,   41.,   52.,   42.,   50.,   49.,   56.,
         61.,   33.,   53.,   50.,   41., 5478., 3564.,   47., 5884.,
         36., 6169.,   52.,   40.,   49.,   41.,   25.,   43.,   47.,
         69.], dtype=float32)

#### The trained model can be saved as pickle for inference

In [12]:
joblib.dump(model, '../data/temp_model.pkl')

new_model = joblib.load('../data/temp_model.pkl')
new_model.predict(X_test)

array([[-231.57986  , -141.4888   , -276.73294  , ..., -134.38333  ,
        -211.01692  , -229.32335  ],
       [-118.59609  , -115.6356   ,  -60.311188 , ..., -132.56573  ,
        -119.318855 ,  -24.551783 ],
       [ -33.319748 ,  -56.017963 ,  146.31969  , ...,   20.21881  ,
         -21.19869  , -207.61913  ],
       ...,
       [ -83.02409  ,  129.98756  ,   72.42775  , ...,  219.36597  ,
          28.731098 ,   10.504779 ],
       [  -7.9215307,  136.1649   ,  244.74832  , ...,  147.08717  ,
         171.48286  ,  204.75542  ],
       [ -10.4112625,   38.52069  ,  169.79082  , ...,   95.44684  ,
          26.127382 ,    7.7268643]], dtype=float32)

### CPU Inference via treelite

We can also save model for CPU inference via treelite library. For that purposes please use TL wrappers (both for compiled and built-in inference)

In [13]:
%%time
tl_model = TLPredictor(model)

  0%|          | 0/1000 [00:00<?, ?it/s]

CPU times: user 6.93 s, sys: 73.1 ms, total: 7 s
Wall time: 6.96 s


In [14]:
%%time
tl_model.predict(X_test, nthread=4)

CPU times: user 3.67 s, sys: 16.3 ms, total: 3.69 s
Wall time: 951 ms


array([[-231.57983  , -141.48882  , -276.73294  , ..., -134.3833   ,
        -211.01695  , -229.32335  ],
       [-118.59609  , -115.635605 ,  -60.311188 , ..., -132.56573  ,
        -119.31886  ,  -24.551783 ],
       [ -33.319748 ,  -56.017986 ,  146.31972  , ...,   20.218803 ,
         -21.19869  , -207.61913  ],
       ...,
       [ -83.02409  ,  129.98753  ,   72.42776  , ...,  219.36597  ,
          28.731113 ,   10.504774 ],
       [  -7.921529 ,  136.1649   ,  244.74834  , ...,  147.08717  ,
         171.48286  ,  204.75542  ],
       [ -10.411263 ,   38.5207   ,  169.79083  , ...,   95.44683  ,
          26.127403 ,    7.7268615]], dtype=float32)

Treelite model could be saved to folder for using in the next session

In [15]:
%%time
tl_model.dump('../data/tl_dump', rewrite=True)
tl_model = TLPredictor.load('../data/tl_dump')

CPU times: user 13.8 ms, sys: 11.4 ms, total: 25.2 ms
Wall time: 25.1 ms


In [16]:
%%time
tl_model.predict(X_test, nthread=4)

CPU times: user 3.61 s, sys: 8.37 ms, total: 3.62 s
Wall time: 925 ms


array([[-231.57983  , -141.48882  , -276.73294  , ..., -134.3833   ,
        -211.01695  , -229.32335  ],
       [-118.59609  , -115.635605 ,  -60.311188 , ..., -132.56573  ,
        -119.31886  ,  -24.551783 ],
       [ -33.319748 ,  -56.017986 ,  146.31972  , ...,   20.218803 ,
         -21.19869  , -207.61913  ],
       ...,
       [ -83.02409  ,  129.98753  ,   72.42776  , ...,  219.36597  ,
          28.731113 ,   10.504774 ],
       [  -7.921529 ,  136.1649   ,  244.74834  , ...,  147.08717  ,
         171.48286  ,  204.75542  ],
       [ -10.411263 ,   38.5207   ,  169.79083  , ...,   95.44683  ,
          26.127403 ,    7.7268615]], dtype=float32)

For better speed up you can compile your model and make inference even more efficient

In [17]:
%%time
tl_compiled = tl_model.compile('gcc', '../data/templib.so', nthread=4)

[19:15:13] ../src/compiler/ast/split.cc:29: Parallel compilation enabled; member trees will be divided into 28 translation units.
CPU times: user 1.36 s, sys: 531 ms, total: 1.89 s
Wall time: 1min 24s


In [18]:
%%time
tl_compiled.predict(X_test)

CPU times: user 7.34 s, sys: 320 ms, total: 7.66 s
Wall time: 1.91 s


array([[-231.57999  , -141.48885  , -276.73315  , ..., -134.38322  ,
        -211.01698  , -229.32344  ],
       [-118.59618  , -115.63559  ,  -60.311268 , ..., -132.56573  ,
        -119.31883  ,  -24.551754 ],
       [ -33.31976  ,  -56.017975 ,  146.31967  , ...,   20.218842 ,
         -21.198748 , -207.6191   ],
       ...,
       [ -83.02403  ,  129.98749  ,   72.42781  , ...,  219.366    ,
          28.731108 ,   10.504768 ],
       [  -7.9215336,  136.16504  ,  244.74826  , ...,  147.0871   ,
         171.48293  ,  204.75563  ],
       [ -10.41129  ,   38.520718 ,  169.79054  , ...,   95.4468   ,
          26.127438 ,    7.7269163]], dtype=float32)

Compiled model could also be saved, but note - instanse will save only the metadata. After loading, it expect to find compiled library path stay the same. Otherwise, you can update the library path via .set_libpath method

In [19]:
%%time
tl_compiled.dump('../data/tl_compiled.pkl')
tl_compiled = TLCompiledPredictor.load('../data/tl_compiled.pkl')
# optional - if libpath changed or if you want to change nthreads
tl_compiled.set_libpath(nthread=1)

CPU times: user 348 ms, sys: 380 ms, total: 729 ms
Wall time: 333 ms


In [20]:
%%time
tl_compiled.predict(X_test)

CPU times: user 7.31 s, sys: 90.8 ms, total: 7.4 s
Wall time: 7.39 s


array([[-231.57999  , -141.48885  , -276.73315  , ..., -134.38322  ,
        -211.01698  , -229.32344  ],
       [-118.59618  , -115.63559  ,  -60.311268 , ..., -132.56573  ,
        -119.31883  ,  -24.551754 ],
       [ -33.31976  ,  -56.017975 ,  146.31967  , ...,   20.218842 ,
         -21.198748 , -207.6191   ],
       ...,
       [ -83.02403  ,  129.98749  ,   72.42781  , ...,  219.366    ,
          28.731108 ,   10.504768 ],
       [  -7.9215336,  136.16504  ,  244.74826  , ...,  147.0871   ,
         171.48293  ,  204.75563  ],
       [ -10.41129  ,   38.520718 ,  169.79054  , ...,   95.4468   ,
          26.127438 ,    7.7269163]], dtype=float32)

### Cross Validation

Also py_boost supports built in cross validation wrapper that produce out-of-fold prediction

In [21]:
%%time
model = GradientBoosting('mse')
cv = CrossValidation(model)

oof_pred = cv.fit_predict(X, y, cv=5)

pred = cv.predict(X_test)
((pred - y_test) ** 2).mean() ** .5

[19:16:48] Stdout logging level is INFO.
[19:16:48] GDBT train starts. Max iter 100, early stopping rounds 100
[19:16:48] Iter 0; Sample 0, rmse = 177.54048994833394; 
[19:16:48] Iter 10; Sample 0, rmse = 146.2775051551839; 
[19:16:48] Iter 20; Sample 0, rmse = 124.46384581534963; 
[19:16:48] Iter 30; Sample 0, rmse = 107.81194294424012; 
[19:16:48] Iter 40; Sample 0, rmse = 94.63195602132788; 
[19:16:49] Iter 50; Sample 0, rmse = 83.66130474594783; 
[19:16:49] Iter 60; Sample 0, rmse = 74.46850567155803; 
[19:16:49] Iter 70; Sample 0, rmse = 66.66850053490023; 
[19:16:49] Iter 80; Sample 0, rmse = 59.97216254154367; 
[19:16:50] Iter 90; Sample 0, rmse = 54.190201828155374; 
[19:16:50] Iter 99; Sample 0, rmse = 49.70896192258387; 
[19:16:50] Stdout logging level is INFO.
[19:16:50] GDBT train starts. Max iter 100, early stopping rounds 100
[19:16:50] Iter 0; Sample 0, rmse = 174.53567903144224; 
[19:16:50] Iter 10; Sample 0, rmse = 143.51674627960324; 
[19:16:50] Iter 20; Sample 0, rms

47.34613200723712