<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Gradient-Boosted-Tree-Inferencing" data-toc-modified-id="Gradient-Boosted-Tree-Inferencing-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Gradient Boosted Tree Inferencing</a></span><ul class="toc-item"><li><span><a href="#Preparation" data-toc-modified-id="Preparation-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Preparation</a></span><ul class="toc-item"><li><span><a href="#Regression" data-toc-modified-id="Regression-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Regression</a></span></li><li><span><a href="#Binary-Classification" data-toc-modified-id="Binary-Classification-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>Binary Classification</a></span></li><li><span><a href="#Multiclass-Classification" data-toc-modified-id="Multiclass-Classification-1.1.3"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>Multiclass Classification</a></span></li></ul></li><li><span><a href="#C++-Implementation" data-toc-modified-id="C++-Implementation-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>C++ Implementation</a></span></li></ul></li><li><span><a href="#Reference" data-toc-modified-id="Reference-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Reference</a></span></li></ul></div>

In [1]:
# 1. magic to print version
# 2. magic so that the notebook will reload external python modules
%matplotlib inline
%load_ext watermark
%load_ext autoreload
%autoreload 2

import os
import numpy as np
import pandas as pd
import m2cgen as m2c
import sklearn.datasets as datasets
from xgboost import XGBClassifier, XGBRegressor

# prevent scientific notations
pd.set_option('display.float_format', lambda x: '%.3f' % x)

%watermark -a 'Ethen' -d -t -v -p numpy,pandas,sklearn,m2cgen,xgboost

Ethen 2021-02-07 10:25:19 

CPython 3.6.4
IPython 7.15.0

numpy 1.18.5
pandas 1.0.5
sklearn 0.23.1
m2cgen 0.9.0
xgboost 1.2.1


# Gradient Boosted Tree Inferencing

It's very common in industry setting to prototype a machine learning model in Python and translate it into other languages such as C++, Java, etc, when it comes to deploying. This usually happens where the core application is written in other languages such as C++, Java, etc. and it is an extremely time sensitive application where we can't afford the cost of calling an external API to fetch the model prediction.

In this article, we'll be looking at how we can achieve this with Gradient Boosted Trees, specifically XGBoost. Different library might have different ways to doing this, but the concept should be similar.

**Tree Structure**

A typical model dump from XGBoost looks like the following:

```
booster[0]:
0:[bmi<0.00942232087] yes=1,no=2,missing=1
	1:[bmi<-0.0218342301] yes=3,no=4,missing=3
		3:[bmi<-0.0584798381] yes=7,no=8,missing=7
			7:leaf=25.84091
			8:leaf=33.0292702
		4:[bp<0.0270366594] yes=9,no=10,missing=9
			9:leaf=38.7487526
			10:leaf=51.0882378
	2:[bp<0.0235937908] yes=5,no=6,missing=5
		5:leaf=53.0696678
		6:leaf=69.4000015
booster[1]:
0:[bmi<0.00511107268] yes=1,no=2,missing=1
	1:[bp<0.0390867069] yes=3,no=4,missing=3
		3:[bmi<-0.0207564179] yes=7,no=8,missing=7
			7:leaf=21.0474758
			8:leaf=27.7326946
		4:[bmi<0.000799824367] yes=9,no=10,missing=9
			9:leaf=36.1850548
			10:leaf=14.9188232
	2:[bmi<0.0730132312] yes=5,no=6,missing=5
		5:[bp<6.75072661e-05] yes=11,no=12,missing=11
			11:leaf=31.3889732
			12:leaf=43.4056664
		6:[bp<-0.0498541184] yes=13,no=14,missing=13
			13:leaf=13.0395498
			14:leaf=59.377037
```

There are 3 distinct information:

- `booster` Gradient Boosting Tree is an ensemble tree method, each new booster indicates the start of a new tree. The number of trees we have will be equivalent to the number of trees we specified for the model (e.g. for the sklearn XGBoost API, `n_estimators` controls this) multiplied by the number of distinct classes. For regression model or binary classification model, the number of booster in the model dump will be exactly equal to the number of trees we've specified. Whereas for multi class classification, say we have 3 classes, then tree 0 will contribute to the raw prediction of class 0, tree 1 to class 1, tree 2 to class 2, tree 3 to class 0 and so on.
- `node` Following the booster is each tree's if-else structure. e.g. for node 0, if the feature `bmi` is less than a threshold, then it will branch to node 1 else it will branch to node 2.
- `leaf` Once we reach the leaf, we can accumulate the response prediction. e.g. node 7 is a leaf, and the prediction for this node is 25.84091.


**Raw Prediction**

We mentioned that to get the prediction for a given input, we sum up the response prediction associated from each tree's leaf node. The holds true for regression models, but for other models, we will need to perform a transformation on top the raw prediction to get to the probabilities. e.g. for when building a binary classification, a logistic transformation will be needed on top of the raw prediction, whereas for the multi-class classification, a softmax transformation is needed.

## Preparation

All the examples below, be it regression, binary classification or multi class classification all follow the same structure.

- We load some pre-processed data.
- Train a quick XGBoost model.
- Dump the raw model to disk.
- Generate a sample prediction so we can later verify whether the prediction matches with the model converted to cpp.

### Regression

In [2]:
X, y = datasets.load_diabetes(return_X_y=True, as_frame=True)
X = X[["age", "sex", "bmi", "bp"]]
X.head()

Unnamed: 0,age,sex,bmi,bp
0,0.038,0.051,0.062,0.022
1,-0.002,-0.045,-0.051,-0.026
2,0.085,0.051,0.044,-0.006
3,-0.089,-0.045,-0.012,-0.037
4,0.005,-0.045,-0.036,0.022


In [3]:
regression_model_params = {
    'n_estimators': 2,
    'max_depth': 3,
    'base_score': 0.0
}
regression_model = XGBRegressor(**regression_model_params).fit(X, y)
regression_model

XGBRegressor(base_score=0.0, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.300000012, max_delta_step=0, max_depth=3,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=2, n_jobs=0, num_parallel_tree=1, random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=None)

In [4]:
regression_model.get_booster().dump_model("regression.txt")

In [5]:
regression_model.predict(X.iloc[[0]])

array([96.475334], dtype=float32)

### Binary Classification

In [6]:
X, y = datasets.make_classification(n_samples=10000, n_features=5, random_state=42, n_classes=2)
X

array([[-2.24456934, -1.36232827,  1.55433334, -2.0869092 , -1.27760482],
       [-0.46503462, -0.57657929, -0.2033143 ,  0.43042571,  1.98019634],
       [ 1.0967453 ,  1.31568265,  0.40073014, -0.88575625,  0.72711376],
       ...,
       [-3.17646599, -2.97878542,  0.32401442,  0.12710402, -0.63318634],
       [-0.41224819,  0.17380558,  1.04229889, -1.62625451, -1.24718999],
       [-1.02487223, -0.70828082,  0.55578021, -0.70007904, -0.43269446]])

In [7]:
binary_model_params = {
    'n_estimators': 3,
    'max_depth': 3,
    'tree_method': 'hist',
    'grow_policy': 'lossguide'
}
binary_model = XGBClassifier(**binary_model_params).fit(X, y)
binary_model

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              grow_policy='lossguide', importance_type='gain',
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=3, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=3, n_jobs=0,
              num_parallel_tree=1, random_state=0, reg_alpha=0, reg_lambda=1,
              scale_pos_weight=1, subsample=1, tree_method='hist',
              validate_parameters=1, verbosity=None)

In [8]:
binary_model.get_booster().dump_model("binary_class.txt")

In [9]:
inputs = np.array([[0.0, 0.2, 0.4, 0.6, 0.8]])
binary_model.predict_proba(inputs)

array([[0.2894203, 0.7105797]], dtype=float32)

### Multiclass Classification

In [10]:
X, y = datasets.load_iris(return_X_y=True, as_frame=True)
X.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [11]:
multi_class_model_params = {
    'n_estimators': 2,
    'max_depth': 3
}
multi_class_model = XGBClassifier(**multi_class_model_params).fit(X, y)
multi_class_model

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=2, n_jobs=0, num_parallel_tree=1,
              objective='multi:softprob', random_state=0, reg_alpha=0,
              reg_lambda=1, scale_pos_weight=None, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [12]:
multi_class_model.get_booster().dump_model("multi_class.txt")

In [13]:
inputs = np.array([[5.1, 3.5, 1.4, 0.2]])
multi_class_model.predict_proba(inputs)

array([[0.6092037 , 0.19627656, 0.19451974]], dtype=float32)

## C++ Implementation

The rest of the content is about implementing the boosted tree inferencing logic in C++, all the code resides in the [`gbt_inference`](https://github.com/ethen8181/machine-learning/tree/master/model_deployment/gbt_inference/gbt_inference) folder for those interested. In practice, we don't always have to rely on naive code that we've implemented to solidify our understanding. e.g. the [m2cgen (Model 2 Code Generator)](https://github.com/BayesWitnesses/m2cgen) project is one of the many projects out there that focuses on converting a trained model into native code. If we export our regression model, we can see that the inferencing logic is indeed a bunch of if else statements followed by a summation at the very end.

In [14]:
code = m2c.export_to_c(regression_model)
print(code)

double score(double * input) {
    double var0;
    if ((input[2]) >= (0.009422321)) {
        if ((input[3]) >= (0.02359379)) {
            var0 = 69.4;
        } else {
            var0 = 53.069668;
        }
    } else {
        if ((input[2]) >= (-0.02183423)) {
            if ((input[3]) >= (0.02703666)) {
                var0 = 51.088238;
            } else {
                var0 = 38.748753;
            }
        } else {
            if ((input[2]) >= (-0.058479838)) {
                var0 = 33.02927;
            } else {
                var0 = 25.84091;
            }
        }
    }
    double var1;
    if ((input[2]) >= (0.0051110727)) {
        if ((input[2]) >= (0.07301323)) {
            if ((input[3]) >= (-0.04985412)) {
                var1 = 59.377037;
            } else {
                var1 = 13.03955;
            }
        } else {
            if ((input[3]) >= (0.000067507266)) {
                var1 = 43.405666;
            } else {
                var1 = 31.388973

# Reference

- [Blog: Roll your own XGBoost model](https://medium.com/swlh/roll-your-own-xgboost-model-7490106b9523)