## This tutorial shows how to convert your model to ONNX and use for CPU inference

In [1]:
import os
# Optional: set the device to run
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

os.makedirs('../data', exist_ok=True)

import numpy as np
import joblib
import onnxruntime

from sklearn.datasets import make_regression

from py_boost import GradientBoosting
from py_boost import pb_to_onnx, ONNXPredictor

### Generate dummy multilabel task and train the model

In [2]:
%%time
X, y = make_regression(150000, 100, n_targets=5, random_state=42)
# binarize
y = (y > y.mean(axis=0)).astype(np.float32)

model = GradientBoosting(
    'bce', lr=0.01, verbose=100, 
    ntrees=100, es=200, 
)
model.fit(X, y)
pp = model.predict(X)

[15:37:33] Stdout logging level is INFO.
[15:37:33] GDBT train starts. Max iter 100, early stopping rounds 200
[15:37:34] Iter 0; 
[15:37:37] Iter 99; 
CPU times: user 15.1 s, sys: 1.85 s, total: 16.9 s
Wall time: 10.1 s


### Convert the model to ONNX

The simpliest way to convert is using `pb_to_onnx` function. Just pass the `py-boost` model and path to store parsed model

In [3]:
pb_to_onnx(model, '../data/pb_model.onnx')

100%|██████████| 100/100 [00:00<00:00, 1723.04it/s]


Once the parsing is completed, you can run `onnxruntime` session for inference

In [4]:
%%time

# start session
sess = onnxruntime.InferenceSession(
    '../data/pb_model.onnx', 
    providers=["CPUExecutionProvider"]
)

# run inference
preds = sess.run(['Y'], {'X': X.astype(np.float32, copy=False)})[0]
preds = 1 / (1 + np.exp(-preds))
preds

CPU times: user 5.59 s, sys: 131 ms, total: 5.72 s
Wall time: 395 ms


array([[0.6264308 , 0.41568166, 0.5388822 , 0.4261355 , 0.57804173],
       [0.59586126, 0.42369062, 0.56585   , 0.57584757, 0.5392887 ],
       [0.72726965, 0.67056704, 0.49255225, 0.6711969 , 0.635281  ],
       ...,
       [0.5112887 , 0.38028964, 0.4761739 , 0.52265   , 0.4513791 ],
       [0.67362005, 0.54282206, 0.62851644, 0.6090929 , 0.7003519 ],
       [0.56341565, 0.52830017, 0.41594115, 0.43341845, 0.42639387]],
      dtype=float32)

In [5]:
np.abs(preds - pp).max()

2.3841858e-07

***Note*** : by default, parser only collect the trees and base score info. So, it knows nothing about the postprocessing function, for example `sigmoid` in this case. That's why we apply sigmoid after inference part. But we can pass one of built-in `ONNX` post transforms: 'NONE', 'SOFTMAX', 'LOGISTIC', 'SOFTMAX_ZERO', or 'PROBIT' to avoid this step. Probably it is going to be more efficient:

In [6]:
%%time
pb_to_onnx(model, '../data/pb_model.onnx', post_transform='LOGISTIC') # pass built-in post transform

# start session
sess = onnxruntime.InferenceSession(
    '../data/pb_model.onnx', 
    providers=["CPUExecutionProvider"]
)

# run inference
preds = sess.run(['Y'], {'X': X.astype(np.float32, copy=False)})[0]
preds

100%|██████████| 100/100 [00:00<00:00, 1670.84it/s]


CPU times: user 5.58 s, sys: 178 ms, total: 5.76 s
Wall time: 583 ms


array([[0.62643087, 0.41568172, 0.5388822 , 0.42613554, 0.57804173],
       [0.5958613 , 0.42369062, 0.56584996, 0.57584757, 0.5392887 ],
       [0.72726965, 0.67056704, 0.49255228, 0.6711969 , 0.6352811 ],
       ...,
       [0.5112887 , 0.3802896 , 0.47617394, 0.52265   , 0.45137918],
       [0.67362005, 0.54282206, 0.6285165 , 0.6090929 , 0.7003519 ],
       [0.56341565, 0.5283001 , 0.41594112, 0.43341845, 0.42639393]],
      dtype=float32)

In [7]:
np.abs(preds - pp).max()

2.3841858e-07

***Filter outputs*** . Another option is to convert just a part of outputs to `ONNX`, for the case when we need only particular outputs for inference. For example, we want to keep only 0 and 2 outputs for inference and we don't want to compute the parts of model that doesn't affect the result:

In [8]:
%%time
pb_to_onnx(model, '../data/pb_model.onnx', fltr=[0, 2], post_transform='LOGISTIC') # pass array to filter outputs

# start session
sess = onnxruntime.InferenceSession(
    '../data/pb_model.onnx', 
    providers=["CPUExecutionProvider"]
)

# run inference
preds = sess.run(['Y'], {'X': X.astype(np.float32, copy=False)})[0]
preds

100%|██████████| 100/100 [00:00<00:00, 2039.98it/s]


CPU times: user 5.31 s, sys: 178 ms, total: 5.48 s
Wall time: 528 ms


array([[0.62643087, 0.5388822 ],
       [0.5958613 , 0.56584996],
       [0.72726965, 0.49255228],
       ...,
       [0.5112887 , 0.47617394],
       [0.67362005, 0.6285165 ],
       [0.56341565, 0.41594112]], dtype=float32)

In [9]:
np.abs(preds - pp[:, [0, 2]]).max()

1.937151e-07

### Built-in wrapper

As an alternative you can use wrapper that hide all the manipulations with `ONNX` and let you just call fit and predict. You can build wrapper from model:

In [10]:
onnx_predictor = ONNXPredictor(
    model, '../data/pb_model.onnx', 
    fltr=[0, 2], 
)

100%|██████████| 100/100 [00:00<00:00, 1909.37it/s]


In [11]:
%%time
preds = onnx_predictor.predict(X)
preds

CPU times: user 4.71 s, sys: 156 ms, total: 4.86 s
Wall time: 328 ms


array([[0.6264308 , 0.5388822 ],
       [0.59586126, 0.56585   ],
       [0.72726965, 0.49255225],
       ...,
       [0.5112887 , 0.4761739 ],
       [0.67362005, 0.62851644],
       [0.56341565, 0.41594115]], dtype=float32)

In [12]:
np.abs(preds - pp[:, [0, 2]]).max()

1.7881393e-07

***Note*** : You can not save `ONNXPredictor` object, since `onnxruntime.InferenceSession` is not pickable. Instead, to use it in the other session, you can restore it from `ONNX` model file. But note that in this case you will loose the information about postprocessing function, if it was not provided as `post_transform` to `ONNXPredictor`.

First option, provide the post_transform to `ONNXPredictor`:

In [13]:
# build the predictor and save parsed as ../data/pb_model.onnx
onnx_predictor = ONNXPredictor(
    model, '../data/pb_model.onnx', 
    fltr=[0, 2], 
    post_transform='LOGISTIC' # provide the ONNX post_transform manually
)

# create new instance from ../data/pb_model.onnx
onnx_predictor = ONNXPredictor.from_onnx('../data/pb_model.onnx')
preds = onnx_predictor.predict(X)
np.abs(preds - pp[:, [0, 2]]).max()

100%|██████████| 100/100 [00:00<00:00, 2116.98it/s]


1.937151e-07

Second, is to provide the python postprocessing function in the new session:

In [14]:
# build the predictor and save parsed as ../data/pb_model.onnx
onnx_predictor = ONNXPredictor(
    model, '../data/pb_model.onnx', 
    fltr=[0, 2], 
)

# create new instance from ../data/pb_model.onnx
onnx_predictor = ONNXPredictor.from_onnx(
    '../data/pb_model.onnx', 
    postprocess_fn=lambda x: 1 / (1 + np.exp(-x)) # provide py-boost postprocess_fn manually
)
preds = onnx_predictor.predict(X)
np.abs(preds - pp[:, [0, 2]]).max()

100%|██████████| 100/100 [00:00<00:00, 2232.89it/s]


1.7881393e-07