### This notebook demonstrates the usage of the `convert_to_onnx` helper function for converting a model from any supported framework into ONNX for QPS inference. The function simply wraps the individual conversion functions from [ONNXMLTools](https://github.com/onnx/onnxmltools). This notebook provides examples for conversion from PyTorch, Sklearn, Keras, Tensorflow, and LightGBM.

In [1]:
import numpy as np
import pandas as pd
import onnxruntime as rt
import pickle

from mlisne import convert_to_onnx
from mlisne import estimate_qps_onnx

In general all models can be converted with the same basic function call, as demonstrated by the minimum examples below. 

All that is typically needed for conversion is a model, the framework string, and dummy input data. 

Aside from PyTorch, conversion from all other frameworks will return the converted model, which can then be sent directly into QPS estimation or an InferenceSession.

#### Basic examples: Sklearn, Keras, LightGBM

Note on the dummy input data: the data can be in any format or shape as long as it is exactly how you would pass it into a model for inference. 

`convert_to_onnx` allows for models that take continuous and discrete inputs as separate arrays -- please refer to the Pytorch section for an in-depth example. 

In [51]:
# Minimal example with Sklearn iris 
data = pd.read_csv(f"data/iris_data.csv")
data = data.iloc[:,1:]
model = pickle.load(open(f"models/iris_logreg.pickle", 'rb'))
onnx = convert_to_onnx(model, "sklearn", data)
# We can save the ONNX model to file as well
onnx = convert_to_onnx(model, "sklearn", data, path = "../tests/test_models/iris_test.onnx") 

# We can estimate QPS/run inference directly with the converted model
qps_skl = estimate_qps_onnx(onnx.SerializeToString(), X_c = data)
print(qps_skl[:5])

[0.00224272 0.00168826 0.00191517 0.08352797 0.0016868 ]


In [18]:
# We can also specify a label for the ONNX input node
convert_to_onnx(model, "sklearn", data, path = "../tests/test_models/iris_test_named_node.onnx", input_names = ('new_name',))

sess1 = rt.InferenceSession("../tests/test_models/iris_test.onnx")
print("Default input name:", sess1.get_inputs()[0].name)

sess2 = rt.InferenceSession("../tests/test_models/iris_test_named_node.onnx")
print("Manual input name:", sess2.get_inputs()[0].name)

Default input name: c_inputs
Manual input name: new_name


In [22]:
# Minimal example with Keras iris
import keras

model = keras.models.load_model(f"models/keras_example")
print(model.summary())
onnx = convert_to_onnx(model, "keras", data)
print(onnx)

The ONNX operator number change on the optimization: 10 -> 6
The maximum opset needed by this model is only 9.


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 10)                50        
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 33        
Total params: 83
Trainable params: 83
Non-trainable params: 0
_________________________________________________________________
None
ir_version: 4
producer_name: "keras2onnx"
producer_version: "1.7.0"
domain: "onnxmltools"
model_version: 0
doc_string: ""
graph {
  node {
    input: "dense_1_input"
    input: "dense_1_3/kernel:0"
    output: "dense_10"
    name: "dense_1"
    op_type: "MatMul"
    doc_string: ""
    domain: ""
  }
  node {
    input: "dense_10"
    input: "dense_1_3/bias:0"
    output: "biased_tensor_name1"
    name: "Add1"
    op_type: "Add"
    domain: ""
  }
  node {
    input: "biased_tensor_name1"
    output: "dense_1_3

In [37]:
# Minimal example with LightGBM
import lightgbm as lgb

data = pd.read_csv(f"data/lgbm_regression.test", header=None, sep='\t')
data = data.drop(0, axis = 1)
model = lgb.Booster(model_file= f"models/lgbm_example.txt")
onnx = convert_to_onnx(model, "lightgbm", data)

# QPS estimation -- note that the return array is 2D for LightGBM 
estimate_qps_onnx(onnx.SerializeToString(), X_c = data[0:5])

The maximum opset needed by this model is only 1.


array([[0.6600082 ],
       [0.45872724],
       [0.3988191 ],
       [0.49400043],
       [0.3441565 ]], dtype=float32)

LightGBM models will return predictions in a 2D array when converted to ONNX. Otherwise the outputs will be the same as inference with the original model. 

In [50]:
sess = rt.InferenceSession(onnx.SerializeToString())
label_name = sess.get_outputs()[0].name
input_name = sess.get_inputs()[0].name

onnx_out = sess.run([label_name], {input_name: np.array(data)})[0]
print(onnx_out.shape)

og_out = model.predict(data)
print(og_out.shape)

np.testing.assert_array_almost_equal(onnx_out.flatten(), og_out, decimal = 5)

(500, 1)
(500,)


#### Tensorflow (In progress...)

#### PyTorch

In [3]:
import torch
import torch.nn as nn

We will read in a pretrained model with the below framework.

In [4]:
class CatModel(nn.Module):

    def __init__(self, embedding_size, num_numerical_cols, output_size, layers, p=0.4):
        super().__init__()
        self.all_embeddings = nn.ModuleList([nn.Embedding(ni, nf) for ni, nf in embedding_size])
        self.embedding_dropout = nn.Dropout(p)
        self.batch_norm_num = nn.BatchNorm1d(num_numerical_cols)

        all_layers = []
        num_categorical_cols = sum((nf for ni, nf in embedding_size))
        input_size = num_categorical_cols + num_numerical_cols

        for i in layers:
            all_layers.append(nn.Linear(input_size, i))
            all_layers.append(nn.ReLU(inplace=True))
            all_layers.append(nn.BatchNorm1d(i))
            all_layers.append(nn.Dropout(p))
            input_size = i

        all_layers.append(nn.Linear(layers[-1], output_size))

        self.layers = nn.Sequential(*all_layers)

        self.m = nn.Softmax(dim=1)

    def forward(self, x_categorical, x_numerical):
        embeddings = []
        for i,e in enumerate(self.all_embeddings):
            embeddings.append(e(x_categorical[:,i]))
        x = torch.cat(embeddings, 1)
        x = self.embedding_dropout(x)

        x_numerical = self.batch_norm_num(x_numerical)
        x = torch.cat([x, x_numerical], 1)
        x = self.layers(x)
        x = self.m(x)

        return x[:,1]

In [5]:
model = CatModel([(3, 2), (2, 1), (2, 1), (2, 1)], 6, 2, [200,100,50], p=0.4)
model.load_state_dict(torch.load(f"models/churn_categorical.pt"))

<All keys matched successfully>

You will notice that this model takes two separate inputs, `x_categorical` and `x_numerical` for discrete and continuous variables, respectively. Our conversion function is well equipped to handle such cases. 

In [6]:
# Read in input data 
data = pd.read_csv("data/churn_data.csv")

# Split discrete and continuous data
categorical_cols = ['Geography', 'Gender', 'HasCrCard', 'IsActiveMember']
numerical_cols = ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']

for category in categorical_cols:
    data[category] = data[category].astype('category')
cat = [data[c].cat.codes.values for c in categorical_cols]

cat_data = np.stack(cat, 1)
num_data = np.array(data[numerical_cols])

cat_tensor = torch.tensor(cat_data).long()
num_tensor = torch.tensor(num_data).float()

PyTorch conversion is slightly different from the other frameworks in that it makes use of a package-specific `export` function. This function will not return the converted onnx model and requires a file path to save to.


In [7]:
# Without a file path, the conversion will throw an error
try:
    convert_to_onnx(model, "pytorch", cat_data, num_data)
except ValueError as e:
    print(e)

For PyTorch to ONNX conversion a file path must be given.


`convert_to_onnx` allows for models with up to two separate inputs. Keep in mind that the order that these inputs are passed should be the same as their order in the model's `forward` function. We can also pass input names that can be referenced during ONNX inferencing, but these are not required.

Models are typically sensitive to data types, so it is important to cast the input data to the expected type before passing them into conversion.

In [8]:
# Dummy data can be in any format, as long as it is convertible to a tensor and multi-dimensional
# The below calls will all result in the same outcome
ret = convert_to_onnx(model, "pytorch", dummy_input1 = cat_data.astype(np.int64), dummy_input2 = num_data.astype(np.float32), 
                      path = "../tests/test_models/churn_cat_test.onnx", input_names=("d_inputs", "c_inputs"))
ret2 = convert_to_onnx(model, "pytorch", dummy_input1 = cat_data[0,np.newaxis].astype(np.int64), dummy_input2 = num_data[0,np.newaxis].astype(np.float32), 
                       path = "../tests/test_models/churn_cat_test.onnx", input_names=("d_inputs", "c_inputs"))
ret3 = convert_to_onnx(model, "pytorch", dummy_input1 = cat_tensor, dummy_input2 = num_tensor, 
                       path = "../tests/test_models/churn_cat_test.onnx", input_names=("d_inputs", "c_inputs"))

print(ret, ret2, ret3)

True True True


We can also set the name(s) of the output node. 

In [9]:
convert_to_onnx(model, "pytorch", dummy_input1 = cat_tensor, dummy_input2 = num_tensor, output_names = ["cat_out"], 
                       path = "../tests/test_models/churn_cat_test.onnx", input_names=("d_inputs", "c_inputs"))

sess = rt.InferenceSession("../tests/test_models/churn_cat_test.onnx")
sess.get_outputs()[0].name

'cat_out'

We compare inference between ONNX and the original model to validate successful conversion

In [12]:
with torch.no_grad():
    torch_preds = model(cat_tensor, num_tensor).numpy()
sess = rt.InferenceSession("../tests/test_models/churn_cat_test.onnx")
output_name = sess.get_outputs()[0].name
onnx_preds = sess.run([output_name], {"c_inputs": num_data.astype(np.float32),
                                      "d_inputs": cat_data.astype(np.int64)})[0]

np.testing.assert_array_almost_equal(torch_preds, onnx_preds, decimal=5)

AssertionError: 
Arrays are not almost equal to 5 decimals

Mismatched elements: 9785 / 10000 (97.8%)
Max absolute difference: 0.9963715
Max relative difference: 13.200315
 x: array([0.51932, 0.21089, 1.     , ..., 0.06185, 0.07258, 0.10373],
      dtype=float32)
 y: array([0.28603, 0.2202 , 1.     , ..., 0.14621, 0.09255, 0.10823],
      dtype=float32)

Now we can run QPS inference

In [18]:
estimate_qps_onnx("../tests/test_models/churn_cat_test.onnx", X_c = num_data.astype(np.float32), X_d = cat_data.astype(np.int64), 
                  input_type = 2)

Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running BatchNormalization node. Name:'BatchNormalization_13' Status Message: CUDNN error executing cudnnBatchNormalizationForwardInference( CudnnHandle(), cudnn_batch_norm_mode_, &alpha, &beta, data_desc, x_data, data_desc, y_data, bn_tensor_desc, scale_data, b_data, mean_data, var_data, epsilon_)