Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onnxruntime for converted BernouilliNB, MultinomialNB (scikit-learn) does not produce the same results as the original model #151

Closed
xadupre opened this issue Oct 17, 2018 · 13 comments

Comments

@xadupre
Copy link
Collaborator

xadupre commented Oct 17, 2018

No description provided.

@prabhat00155
Copy link
Collaborator

Could you share your code and dataset?

@xadupre
Copy link
Collaborator Author

xadupre commented Oct 17, 2018

This is part of PR:

#144

You need to run the unit tests in class TestNaiveBayesConverter to generate model, inputs, expected outputs, all are stored on disk. The runtime is then run in class TestBackendWithOnnxRuntime. The code is in branch https://github.com/xadupre/onnxmltools/tree/testrt/onnxmltools.

@prabhat00155
Copy link
Collaborator

I see you have made changes to NaiveBayes.py file which has the NB converters. I had tested test_NaiveBayesConverter.py file and they were running fine. Are you saying that the original file was causing test failures or did you see the mismatch after your changes?

@xadupre
Copy link
Collaborator Author

xadupre commented Oct 17, 2018

When you say running, you mean just the conversion or the conversion + the execution of the converted model with one onnx backend. That's what I did today. The converter was woring fine but the execution of the converted onnx with onnxruntime was either failing either producing different results. The mismatch was before my changes.

@prabhat00155
Copy link
Collaborator

I mean conversion + execution. Here is what I did:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB, BernoulliNB
from onnxmltools import convert_sklearn
from onnxmltools.convert.common.data_types import FloatTensorType

data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

model_MNB = MultinomialNB().fit(X_train, y_train)
model_BNB = BernoulliNB().fit(X_train, y_train)
onnx_MNB = convert_sklearn(model_MNB, "model MNB", [('input', FloatTensorType([1, 4]))])
onnx_BNB = convert_sklearn(model_BNB, "model BNB", [('input', FloatTensorType([1, 4]))])
onnxmltools.utils.save_model(onnx_MNB, "onnx_MNB.onnx")
onnxmltools.utils.save_model(onnx_BNB, "onnx_BNB.onnx")

scikit_result_MNB = np.hstack([model_MNB.predict_proba(X_test), model_MNB.predict(X_test).reshape((-1, 1))])
scikit_result_BNB = np.hstack([model_BNB.predict_proba(X_test), model_BNB.predict(X_test).reshape((-1, 1))])

np.mean(onnx_res_mnb[:, 3] == scikit_result_MNB[:, 3]) # Gives 1 as output
np.mean(onnx_res_bnb[:, 3] == scikit_result_BNB[:, 3]) # Gives 1 as output

This means all the predictions match between scikit and onnx models.

np.mean(np.isclose(onnx_res_mnb, scikit_result_MNB)) # Gives 1 as output
np.mean(np.isclose(onnx_res_bnb, scikit_result_BNB)) # Gives 0.25 as output

MNB outputs (probabilities + labels) seem to match, whereas BNB probability values seem to vary a little although their labels match in this example.

@xadupre
Copy link
Collaborator Author

xadupre commented Oct 18, 2018

How did you get onnx_res_mnb?

@prabhat00155
Copy link
Collaborator

$ ./onnxruntime_exec.exe -m onnx_MNB.onnx -t iris_test.csv > result_MNB.csv
'onnx_MNB.onnx' loaded successfully.
Done loading model: onnx_MNB.onnx
Execution Status: OK

prroy@B115FFDGPUN03 /cygdrive/c/Users/prroy/LotusRT/Lotus_Oct9/Lotus/onnxru ntime/cmake_build/Debug
$ ./onnxruntime_exec.exe -m onnx_BNB.onnx -t iris_test.csv > result_BNB.csv
'onnx_BNB.onnx' loaded successfully.
Done loading model: onnx_BNB.onnx
Execution Status: OK
In Python notebook:
onnx_res_mnb = np.loadtxt(fname='result_MNB.csv', delimiter=',')
onnx_res_bnb = np.loadtxt(fname='result_BNB.csv', delimiter=',')

@xadupre
Copy link
Collaborator Author

xadupre commented Oct 18, 2018

What is the version of onnxruntime you are using? I observed differences between versions. I'm currently testing against the version released on pypi (1.3.0). The tests I put in place test all outputs, prediction and probabilities.

@prabhat00155
Copy link
Collaborator

How do I check the version of onnxruntime? I cloned Lotus repo on Oct 9 and built onnxruntime project.

@xadupre
Copy link
Collaborator Author

xadupre commented Oct 18, 2018

This should work then except I think onnxruntime only produces the predicted labels and not the scores. If you build it yourself, you can add an option to build the python package too and check with this one the converted model. Documentation for onnxruntime is here: https://docs.microsoft.com/en-us/python/api/overview/azure/onnx/examples-md?view=azure-onnx-py. I'll check with the runtime tomorrow on my side.

@xadupre
Copy link
Collaborator Author

xadupre commented Oct 19, 2018

Here is what I get with the latest version of onnxruntime and the current onnxmltools. onnxruntime and the executable gives the same results. On this example, MNB works, BNB does not.

import os
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB, BernoulliNB
from onnxmltools import convert_sklearn
from onnxmltools.convert.common.data_types import FloatTensorType
import onnxmltools
import numpy as np

data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)
model_MNB = MultinomialNB().fit(X_train, y_train)
model_BNB = BernoulliNB().fit(X_train, y_train)
onnx_MNB = convert_sklearn(model_MNB, "model MNB", [('input', FloatTensorType([1, 4]))])
onnx_BNB = convert_sklearn(model_BNB, "model BNB", [('input', FloatTensorType([1, 4]))])

xt = np.zeros((X_test.shape[0], X_test.shape[1]+1))
xt[:, 1:] = X_test
xt[:, 0] = y_test
np.savetxt("iris_test.csv", xt, delimiter=',', fmt='%f')

onnxmltools.utils.save_model(onnx_MNB, "onnx_MNB.onnx")
onnxmltools.utils.save_model(onnx_BNB, "onnx_BNB.onnx")

fold = r"path_to_exec"
if True or not os.path.exists("result_BNB.csv"):
cmd = fold + "\onnxruntime_exec.exe -m onnx_BNB.onnx -t iris_test.csv > result_BNB.csv"
os.system(cmd)
if True or not os.path.exists("result_MNB.csv"):
cmd = fold + "\onnxruntime_exec.exe -m onnx_MNB.onnx -t iris_test.csv > result_MNB.csv"
os.system(cmd)

onnx_res_mnb = np.loadtxt("result_MNB.csv", delimiter=',')
onnx_res_bnb = np.loadtxt("result_BNB.csv", delimiter=',')

scikit_result_MNB = np.hstack([model_MNB.predict_proba(X_test), model_MNB.predict(X_test).reshape((-1, 1))])
scikit_result_BNB = np.hstack([model_BNB.predict_proba(X_test), model_BNB.predict(X_test).reshape((-1, 1))])

print(np.mean(onnx_res_mnb[:, 3] == scikit_result_MNB[:, 3])) # Gives 1 as output
print(np.mean(onnx_res_bnb[:, 3] == scikit_result_BNB[:, 3])) # Gives 1 as output

#This means all the predictions match between scikit and onnx models.
print(np.mean(np.isclose(onnx_res_mnb, scikit_result_MNB))) # Gives 1 as output
print(np.mean(np.isclose(onnx_res_bnb, scikit_result_BNB))) # Gives 0.25 as output

import onnxruntime

mnb = onnxruntime.InferenceSession('onnx_MNB.onnx')
mnb_prd = mnb.run(None, {'input': X_test[:1].astype(np.float32)})
print("MNB")
print("ONNX-PY ", mnb_prd[1])
print("ONNX-EXE", onnx_res_mnb[:1, 1:])
print("SKL ", scikit_result_MNB[:1, :3])

bnb = onnxruntime.InferenceSession('onnx_BNB.onnx')
bnb_prd = bnb.run(None, {'input': X_test[:1].astype(np.float32)})
print("BNB")
print("ONNX-PY ", bnb_prd[1])
print("ONNX-EXE", onnx_res_bnb[:1, 1:])
print("SKL ", scikit_result_BNB[:1, :3])

Outputs:

0.0
0.0
0.0
0.0
MNB
ONNX-PY [{0: 0.04780237749218941, 1: 0.5113309621810913, 2: 0.4408663213253021}]
ONNX-EXE [[0.04780238 0.51133096 0.44086632]]
SKL [[0.04780234 0.51133139 0.44086628]]
BNB
ONNX-PY [{0: 0.3253963887691498, 1: 0.4300372302532196, 2: 0.24456651508808136}]
ONNX-EXE [[0.32539639 0.43003723 0.24456652]]
SKL [[0.33333227 0.34244143 0.3242263 ]]

@xadupre
Copy link
Collaborator Author

xadupre commented Oct 30, 2018

BernouillNB - binarisation of features is not part of the converter

https://github.com/scikit-learn/scikit-learn/blob/bac89c2/sklearn/naive_bayes.py#L938

@prabhat00155
Copy link
Collaborator

Yup, here is the PR: #162

wenbingl pushed a commit that referenced this issue Feb 14, 2019
* remove unnecessary print, add quote around filenames in some places

* replaces as_matrix by values (pandas warnings)

* changes variable name to avoid getting warnings about invalid names

* better consistency for converted, allows targetted onnx version to be None

* Revert "better consistency for converted, allows targetted onnx version to be None"

This reverts commit e257ca1.

* handle the comparison of ONNX versions in only one place

* fix bug with OneHotEncoder and scikit-learn 0.20

* release the constraint on scikit-learn (0.20.0 allowed)

* fix one type issue for Python 2.7

* add documentation to compare_strict_version

* Fixes #151, BernouilliNB converter

* Removes unused nodes in graph

* Adresses issue #143, enables build with keras 2.1.2

* Revert modifications due to a wrong merge

* update keras version

* Disable test on keras/mobilenet as it does not work

* add unit test for xception (failing)

* remove duplicate install

* skip unit test if not installed (tensorflow still not available on python 3.7)

* Fix when keras is not available

* Fix missing import

* Update test_single_operator_with_cntk_backend.py

* Set up CI with Azure Pipelines

* Update azure pipeline

* Skip a unit test if tensorflow is not installed

* merge

* missing import

* Revert "Merge branch 'master' of https://github.com/onnx/onnxmltools"

This reverts commit 178e763, reversing
changes made to 1a617ef.

* revert changes

* Revert changes

* \r

* \r

* documentation

* Fix appveyor

* fix bad merge

* remove example on keras

* Delete requirements-deep.txt
wenbingl pushed a commit that referenced this issue Feb 14, 2019
* remove unnecessary print, add quote around filenames in some places

* replaces as_matrix by values (pandas warnings)

* changes variable name to avoid getting warnings about invalid names

* better consistency for converted, allows targetted onnx version to be None

* Revert "better consistency for converted, allows targetted onnx version to be None"

This reverts commit e257ca1.

* handle the comparison of ONNX versions in only one place

* fix bug with OneHotEncoder and scikit-learn 0.20

* release the constraint on scikit-learn (0.20.0 allowed)

* fix one type issue for Python 2.7

* add documentation to compare_strict_version

* Fixes #151, BernouilliNB converter

* Removes unused nodes in graph

* Adresses issue #143, enables build with keras 2.1.2

* Revert modifications due to a wrong merge

* update keras version

* Disable test on keras/mobilenet as it does not work

* add unit test for xception (failing)

* remove duplicate install

* skip unit test if not installed (tensorflow still not available on python 3.7)

* Fix when keras is not available

* Fix missing import

* Update test_single_operator_with_cntk_backend.py

* Set up CI with Azure Pipelines

* Update azure pipeline

* Skip a unit test if tensorflow is not installed

* merge

* missing import

* Revert "Merge branch 'master' of https://github.com/onnx/onnxmltools"

This reverts commit 178e763, reversing
changes made to 1a617ef.

* revert changes

* Revert changes

* \r

* \r

* first step in the migration of xgboost code

* XGBoost regression works

* Finalize xgboost converter

* Update README.md

* Add function has_tensorflow

* Update test_single_operator_with_cntk_backend.py

* better desgin for a unit test

* update xgboost classifier

* Delete test_keras_xception.py

* Delete requirements-deep.txt

* Delete test_keras_modebilenetv2.py

* less spaces

* lower precision for xgboost comparison tests

* disable xgboost testing on python 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants