# Use function when converting into ONNX

Once a a scikit-learn model is converting into ONNX, there is no easy way to retrieve the original scikit-learn model. The following notebook explores an alternative way to convert a model into ONNX by using functions. In this new method, every piece of a pipeline becomes a function.

In [1]:
from jyquickhelper import add_notebook_menu
add_notebook_menu()

In [2]:
%matplotlib inline

In [3]:
%load_ext mlprodict

## A pipeline

In [4]:
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn import set_config
set_config(display="diagram")

data = load_iris()
X, y = data.data, data.target
steps = [
    ("preprocessing", StandardScaler()),
    ("classifier", LogisticRegression(penalty='l1', solver="liblinear"))]
pipe = Pipeline(steps)
pipe.fit(X, y)

## Its conversion into ONNX

### Without functions

In [5]:
from mlprodict.plotting.text_plot import onnx_simple_text_plot
from mlprodict.onnx_conv import to_onnx

onx = to_onnx(pipe, X, options={'zipmap': False})
print(onnx_simple_text_plot(onx))

opset: domain='' version=14
opset: domain='ai.onnx.ml' version=1
input: name='X' type=dtype('float64') shape=[None, 4]
init: name='Su_Subcst' type=dtype('float64') shape=(4,) -- array([5.84333333, 3.05733333, 3.758     , 1.19933333])
init: name='Di_Divcst' type=dtype('float64') shape=(4,) -- array([0.82530129, 0.43441097, 1.75940407, 0.75969263])
init: name='coef' type=dtype('float64') shape=(12,)
init: name='intercept' type=dtype('float64') shape=(3,) -- array([-1.86506089, -0.89658497, -4.56614529])
init: name='classes' type=dtype('int32') shape=(3,) -- array([0, 1, 2])
init: name='shape_tensor' type=dtype('int64') shape=(1,) -- array([-1], dtype=int64)
init: name='axis' type=dtype('int64') shape=(1,) -- array([1], dtype=int64)
Sub(X, Su_Subcst) -> Su_C0
  Div(Su_C0, Di_Divcst) -> variable
    MatMul(variable, coef) -> multiplied
      Add(multiplied, intercept) -> raw_scores
        Sigmoid(raw_scores) -> raw_scoressig
          Abs(raw_scoressig) -> norm_abs
            ReduceSum(n

In [6]:
%onnxview onx

### With functions

In [7]:
onxf = to_onnx(pipe, X, as_function=True, options={'zipmap': False})
print(onnx_simple_text_plot(onxf))

No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5'
opset: domain='' version=15
opset: domain='sklearn' version=1
input: name='X' type=dtype('float64') shape=[None, 4]
main___Pipeline_1734459081968[sklearn](X) -> main_classifier_label, main_classifier_probabilities
output: name='main_classifier_label' type=dtype('int64') shape=[None]
output: name='main_classifier_probabilities' type=dtype('float64') shape=[None, 3]
----- function name=main__preprocessing___StandardScaler_1734202136896 domain=sklearn
----- doc_string: HYPER:{"StandardScaler":{"copy": true, "with_mean": true, "with_std": true}}
opset: domain='' version=14
input: 'X'
Constant(value=[5.8433333...) -> Su_Subcst
  Sub(X, Su_Subcst) -> Su_C0
Constant(value=[0.8253012...) -> Di_Divcst
  Div(Su_C0, Di_Divcst) -> variable
output: name='variable' type=? shape=?
----- function name=main__classifier___LogisticRegression_1734202137184 domain=sklearn
----- doc_string: HYPER:{"LogisticR

In [8]:
%onnxview onxf

Based on that, it should be possible to rebuild the original scikit-learn pipeline. Hyperparameters are stored in the attribute  `doc_string`.

## A more complex one

In [9]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import MinMaxScaler

data = load_iris()
X, y = data.data, data.target
steps = [
    ("preprocessing", ColumnTransformer([
        ('A', StandardScaler(), [0, 1]),
        ('B', MinMaxScaler(), [2, 3])])),
    ("classifier", LogisticRegression(penalty='l1', solver="liblinear"))]
pipe = Pipeline(steps)
pipe.fit(X, y)

In [10]:
onxf = to_onnx(pipe, X, as_function=True, options={'zipmap': False})
print(onnx_simple_text_plot(onxf))

opset: domain='' version=15
opset: domain='sklearn' version=1
input: name='X' type=dtype('float64') shape=[None, 4]
main___Pipeline_1734198554880[sklearn](X) -> main_classifier_label, main_classifier_probabilities
output: name='main_classifier_label' type=dtype('int64') shape=[None]
output: name='main_classifier_probabilities' type=dtype('float64') shape=[None, 3]
----- function name=main__preprocessing__B___MinMaxScaler_1734196938256 domain=sklearn
----- doc_string: HYPER:{"MinMaxScaler":{"clip": false, "copy": true, "feature_range": [0, 1]}}
opset: domain='' version=14
input: 'X'
Cast(X, to=11) -> Ca_output0
Constant(value=[0.1694915...) -> Mu_Mulcst
  Mul(Ca_output0, Mu_Mulcst) -> Mu_C0
Constant(value=[-0.169491...) -> Ad_Addcst
  Add(Mu_C0, Ad_Addcst) -> variable
output: name='variable' type=? shape=?
----- function name=main__preprocessing__A___StandardScaler_1734196937584 domain=sklearn
----- doc_string: HYPER:{"StandardScaler":{"copy": true, "with_mean": true, "with_std": true}}

In [11]:
%onnxview onxf