Binary classification probability in ONNX for SVM with probability=False in sklearn #990

qi-yuan-cresset · 2023-04-27T20:01:00Z

Hello,
I'm confused by the output probability of SVM classifier converted to ONNX. For the following code:

import numpy as np
import onnxruntime as rt
import skl2onnx
from skl2onnx.common.data_types import FloatTensorType
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from skl2onnx import convert_sklearn


X, y = load_iris(return_X_y=True)
X, y = X[y != 2], y[y != 2]
n_samples, n_features = X.shape
model = SVC(probability=False,random_state=5)
model.fit(X.astype(numpy_type), y)

initial_type = [("float_input", FloatTensorType([None,  4]))]
onnx = convert_sklearn(model, initial_types=initial_type)
with open("model_t.onnx", "wb") as f:
    f.write(onnx.SerializeToString())

sess = rt.InferenceSession("model_t.onnx")
input_name = sess.get_inputs()[0].name
pred_onx = sess.run(None, {input_name: np.array(X[:10, :]).astype(np.float32)})
print(pred_onx[0])
print(pred_onx[1])

The output I got are as below:

[0 0 0 0 0 0 0 0 0 0]
[[-1.2660402  1.2660402]
 [-1.1427525  1.1427525]
 [-1.2851433  1.2851433]
 [-1.1439595  1.1439595]
 [-1.3043578  1.3043578]
 [-1.0796759  1.0796759]
 [-1.2667828  1.2667828]
 [-1.1935505  1.1935505]
 [-1.1515036  1.1515036]
 [-1.13976    1.13976  ]]

All the "labels" are 0, while all the "probability" for class 1 are great than class 0.
Is this a bug or some expected behaviour?
The "labels" predicted are consistent between sklearn and onnx, however, the probability values predicted from onnx are opposite to the labels, which caused inconsistency for me in building and converting voting classifiers in sklearn + onnx, since onnx does "soft voting" only.
I know that by setting SVC with probability=True can make the probability prediction consistent between sklearn and onnx, but the probability predicted by SVC in sklearn sometimes disagree with the corresponding labels, too, which could also lead to problem when building voting classifier.

Any suggestions on this issue would be highly appreciated.

Thanks very much,

Qi

The text was updated successfully, but these errors were encountered:

xadupre · 2023-06-22T12:33:04Z

I took your model and added two nodes to extract the first column of your results. Would that be ok?

import numpy as np
import onnxruntime as rt
import skl2onnx
from skl2onnx.common.data_types import FloatTensorType
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from skl2onnx import convert_sklearn


X, y = load_iris(return_X_y=True)
X = X.astype(np.float32)
X, y = X[y != 2], y[y != 2]
n_samples, n_features = X.shape
model = SVC(probability=False, random_state=5)
model.fit(X, y)
print(model.predict(X[:10]))
print(model.decision_function(X[:10]))
print("----------")

initial_type = [("float_input", FloatTensorType([None, 4]))]
onx = convert_sklearn(model, initial_types=initial_type, target_opset=18)

sess = rt.InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"])
input_name = sess.get_inputs()[0].name
pred_onx = sess.run(None, {input_name: np.array(X[:10, :]).astype(np.float32)})
print(pred_onx[0])
print(pred_onx[1])

print("---------------------")
# see https://onnx.ai/onnx/intro/python.html
# Add one node to the model to extract the first column.
from onnx import TensorProto
from onnx.helper import make_node, make_tensor_value_info, make_model, make_graph
from onnx.numpy_helper import from_array
from onnx.version_converter import convert_version

# Make sure the onnx opset is one of the latest.
# sklearn-onnx chooses the lower possible value.
# In this case, it is 9. The operator Slice(opset 13) is added at the
# end of the graph to extract one column. The model needs to be upgraded.
onx = convert_version(onx, target_version=18)

inits = list(onx.graph.initializer)
inits.extend(
    [
        from_array(np.array([0], dtype=np.int64), name="zero"),
        from_array(np.array([1], dtype=np.int64), name="one"),
        from_array(np.array([-1], dtype=np.int64), name="mone"),
    ]
)
nodes = list(onx.graph.node)
nodes.extend(
    [
        make_node("Slice", ["probabilities", "zero", "one", "one"], ["new_scores2"]),
        make_node("Reshape", ["new_scores2", "mone"], ["new_scores"]),
    ]
)
outputs = [
    onx.graph.output[0],
    make_tensor_value_info("new_scores", TensorProto.FLOAT, [None]),
]
graph = make_graph(nodes, onx.graph.name, onx.graph.input, outputs, inits)
new_model = make_model(graph, opset_imports=onx.opset_import)


sess = rt.InferenceSession(
    new_model.SerializeToString(), providers=["CPUExecutionProvider"]
)
input_name = sess.get_inputs()[0].name
pred_onx = sess.run(None, {input_name: np.array(X[:10, :]).astype(np.float32)})
print(pred_onx[0])
print(pred_onx[1])

Output is the following:

[0 0 0 0 0 0 0 0 0 0]
[-1.26604015 -1.14275248 -1.28514317 -1.14395955 -1.30435769 -1.07967584
 -1.2667828  -1.19355031 -1.15150364 -1.13976   ]
----------
[0 0 0 0 0 0 0 0 0 0]
[[-1.2660402  1.2660402]
 [-1.1427525  1.1427525]
 [-1.2851433  1.2851433]
 [-1.1439595  1.1439595]
 [-1.3043578  1.3043578]
 [-1.0796759  1.0796759]
 [-1.2667828  1.2667828]
 [-1.1935505  1.1935505]
 [-1.1515036  1.1515036]
 [-1.13976    1.13976  ]]
---------------------
[0 0 0 0 0 0 0 0 0 0]
[-1.2660402 -1.1427525 -1.2851433 -1.1439595 -1.3043578 -1.0796759
 -1.2667828 -1.1935505 -1.1515036 -1.13976  ]

qi-yuan-cresset · 2023-06-29T17:11:05Z

Hi @xadupre,

Thanks very much for your response. I understand that your code extract the first column of "probability" prediction from the ONNX model. However, if I understand correctly, that doesn't solve my problem, where the "probability" of class 0 from the prediction is smaller than class1, but the predicted "label" was 0 - For example, for the first sample, the "probability" of 0 is -1.266 and the "probability" of 1 is 1.266, but the predicted label was 0, which disagrees with the probability prediction.

Any suggestions would be appreciated, and let me know if I have misunderstood your code, or if anything from my question is unclear.

Best wishes,

Qi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary classification probability in ONNX for SVM with probability=False in sklearn #990

Binary classification probability in ONNX for SVM with probability=False in sklearn #990

qi-yuan-cresset commented Apr 27, 2023

xadupre commented Jun 22, 2023

qi-yuan-cresset commented Jun 29, 2023

Binary classification probability in ONNX for SVM with probability=False in sklearn #990

Binary classification probability in ONNX for SVM with probability=False in sklearn #990

Comments

qi-yuan-cresset commented Apr 27, 2023

xadupre commented Jun 22, 2023

qi-yuan-cresset commented Jun 29, 2023