# Numpy API for ONNX

This notebook shows how to write python functions similar functions as numpy offers and get a function which can be converted into ONNX.

In [1]:
from jyquickhelper import add_notebook_menu
add_notebook_menu()

In [2]:
%load_ext mlprodict

## A pipeline with FunctionTransformer

In [3]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
data = load_iris()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [4]:
import numpy
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer, StandardScaler
from sklearn.linear_model import LogisticRegression

pipe = make_pipeline(
            FunctionTransformer(numpy.log),
            StandardScaler(),
            LogisticRegression())
pipe.fit(X_train, y_train)

Pipeline(steps=[('functiontransformer',
                 FunctionTransformer(func=<ufunc 'log'>)),
                ('standardscaler', StandardScaler()),
                ('logisticregression', LogisticRegression())])

Let's convert it into ONNX.

In [5]:
from mlprodict.onnx_conv import to_onnx
try:
    onx = to_onnx(pipe, X_train.astype(numpy.float64))
except RuntimeError as e:
    print(e)

FunctionTransformer is not supported unless the transform function is None (= identity). You may raise an issue at https://github.com/onnx/sklearn-onnx/issues.


## Use ONNX instead of numpy

The pipeline cannot be converter because the converter does not know how to convert the function (`numpy.log`) held by `FunctionTransformer` into ONNX. One way to avoid that is to replace it by a function `log` defined with *ONNX* operators and executed with an ONNX runtime.

In [6]:
import mlprodict.npy.numpy_onnx_pyrt as npnxrt

pipe = make_pipeline(
            FunctionTransformer(npnxrt.log),
            StandardScaler(),
            LogisticRegression())
pipe.fit(X_train, y_train)

Pipeline(steps=[('functiontransformer',
                 FunctionTransformer(func=<mlprodict.npy.onnx_numpy_wrapper.wrapper_onnxnumpy_np object at 0x000001B3233D70D0>)),
                ('standardscaler', StandardScaler()),
                ('logisticregression', LogisticRegression())])

In [7]:
onx = to_onnx(pipe, X_train.astype(numpy.float64), rewrite_ops=True)

In [8]:
%onnxview onx

The operator `Log` is belongs to the graph. There is some overhead by using this function on small matrices. The gap is much less on big matrices.

In [9]:
%timeit numpy.log(X_train)

4.3 µs ± 295 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [10]:
%timeit npnxrt.log(X_train)

14.4 µs ± 2.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


## More complex function

What about more complex functions? It is a bit more complicated too. The previous syntax does not work.

In [11]:
def custom_fct(x):
    return npnxrt.log(x + 1)

pipe = make_pipeline(
            FunctionTransformer(custom_fct),
            StandardScaler(),
            LogisticRegression())
pipe.fit(X_train, y_train)

Pipeline(steps=[('functiontransformer',
                 FunctionTransformer(func=<function custom_fct at 0x000001B32339E430>)),
                ('standardscaler', StandardScaler()),
                ('logisticregression', LogisticRegression())])

In [12]:
try:
    onx = to_onnx(pipe, X_train.astype(numpy.float64), rewrite_ops=True)
except TypeError as e:
    print(e)

FunctionTransformer is not supported unless the transform function is of type <class 'function'> wrapped with onnxnumpy.


The syntax is different.

In [13]:
from typing import Any
from mlprodict.npy import onnxnumpy_default, NDArray
import mlprodict.npy.numpy_onnx_impl as npnx

@onnxnumpy_default
def custom_fct(x: NDArray[(None, None), numpy.float64]) -> NDArray[(None, None), numpy.float64]:
    return npnx.log(x + numpy.float64(1))

pipe = make_pipeline(
            FunctionTransformer(custom_fct),
            StandardScaler(),
            LogisticRegression())
pipe.fit(X_train, y_train)

Pipeline(steps=[('functiontransformer',
                 FunctionTransformer(func=<mlprodict.npy.onnx_numpy_wrapper.onnxnumpy_custom_fct_None_None object at 0x000001B324CF29A0>)),
                ('standardscaler', StandardScaler()),
                ('logisticregression', LogisticRegression())])

In [14]:
onx = to_onnx(pipe, X_train.astype(numpy.float64), rewrite_ops=True)
%onnxview onx

Let's compare the time to *numpy*.

In [15]:
def custom_numpy_fct(x):
    return numpy.log(x + numpy.float64(1))

%timeit custom_numpy_fct(X_train)

5.68 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [16]:
%timeit custom_fct(X_train)

18.3 µs ± 878 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
