Batch predictions for nearest neighbors #293

xadupre · 2019-10-18T14:32:32Z

No description provided.

prabhat00155 · 2019-10-22T12:22:52Z

skl2onnx/operator_converters/nearest_neighbours.py

@@ -5,498 +5,236 @@
 # --------------------------------------------------------------------------


Why do we need to change the entire KNN converter implementation to support batch prediction?

I wanted to reuse the cdist optimisation I made for the GaussianProcess. It was faster that way.

prabhat00155 · 2019-10-29T14:58:29Z

skl2onnx/operator_converters/nearest_neighbours.py

+
+        if training_labels.dtype == np.int32:
+            training_labels = training_labels.astype(np.int64)
+        extracted = OnnxArrayFeatureExtractor(


Is OnnxArrayFeatureExtractor different from ArrayFeatureExtractor? The reason why KNN converter couldn't handle batch prediction was ArrayFeatureExtractor as they can only handle one homogenous indices(Y is a tensor of int), which means you can't extract different values for different test example(based on their nearest neightbour).

That's the same. It is used to extract the neighbors labels. It is equivalent to GatherElements if the target dimension is one but still needed if the target dimension is more than one. The previous version had more than one ArrayFeatureExtractor.

prabhat00155 · 2019-10-31T14:40:54Z

skl2onnx/algebra/complex_functions.py

@@ -150,3 +153,33 @@ def _onnx_cdist_minkowski(X, Y, dtype=None, op_version=None, p=2, **kwargs):
                    op_version=op_version)
    return OnnxTranspose(node[1], perm=[1, 0], op_version=op_version,
                         **kwargs)
+
+
+def _onnx_cdist_manhattan(X, Y, dtype=None, op_version=None, **kwargs):


Do we need to implement such a function for every metric we want to support?

That seemed easy to do. I refactored to avoid duplicated code.

prabhat00155 · 2019-10-31T14:41:19Z

skl2onnx/algebra/complex_functions.py

+
+def _onnx_cdist_manhattan(X, Y, dtype=None, op_version=None, **kwargs):
+    """
+    Returns the ONNX graph which computes the Minkowski distance


Minkowski distance?

Manhattan. Sorry, wrong copy paste.

prabhat00155 · 2019-10-31T16:14:44Z

skl2onnx/operator_converters/nearest_neighbours.py

 from ..common.data_types import Int64TensorType
+from ..algebra.onnx_ops import (
+    OnnxTopK, OnnxMul, OnnxArrayFeatureExtractor,


It's easier to locate these if sorted, given there are so many.

prabhat00155 · 2019-10-31T16:17:07Z

skl2onnx/operator_converters/nearest_neighbours.py

-def _get_weights(scope, container, topk_values_name, distance_power):
+    Retrieves the nearest neigbours *ONNX*.
+    :param X: features or *OnnxOperatorMixin*
+    :param Y: neighbours or *OnnxOperatorMixin*


Y can be confusing here. Can you name them appropriately instead of X and Y.

Replaced by XA, XB

prabhat00155 · 2019-10-31T16:21:59Z

skl2onnx/operator_converters/nearest_neighbours.py

+    opv = container.target_opset
+    dtype = container.dtype
+
+    if X.type.__class__ == Int64TensorType:


Can't we use isinstance()?

Not all cdist return integer, it seems easier to cast now than after.

I meant can't you write:
if isinstance(X, Int64TensorType)
instead of
if X.type.class == Int64TensorType

prabhat00155 · 2019-10-31T16:36:02Z

skl2onnx/operator_converters/nearest_neighbours.py

+    k = op.n_neighbors
+    training_labels = op._y if hasattr(op, '_y') else None
+    distance_kwargs = {}
+    if metric == 'minkowski':


Aren't we handling this with cdist?

I prefered to have a cdist function which follow the scipy implementation and the converter choose a shorter path if it is more appropriate.

skl2onnx/operator_converters/nearest_neighbours.py

prabhat00155 · 2019-11-01T13:42:05Z

skl2onnx/algebra/complex_functions.py

@@ -61,14 +61,14 @@ def _onnx_squareform_pdist_sqeuclidean(X, dtype=None, op_version=None,
    return node[1]


-def onnx_cdist(X, Y, metric='sqeuclidean', dtype=None,
+def onnx_cdist(XA, XB, metric='sqeuclidean', dtype=None,


Any reason of having the names in capitals, pep8 recommends variable names to be in lower case.

I reused the same names scipy is using.

prabhat00155 · 2019-11-01T13:44:58Z

skl2onnx/operator_converters/nearest_neighbours.py

+    opv = container.target_opset
+    dtype = container.dtype
+
+    if X.type.__class__ == Int64TensorType:


I meant can't you write:
if isinstance(X, Int64TensorType)
instead of
if X.type.class == Int64TensorType

lgtm-com · 2019-11-05T19:22:16Z

This pull request introduces 4 alerts when merging 47f0afa into 4b769c5 - view on LGTM.com

new alerts:

3 for Module is imported with 'import' and 'import from'
1 for 'import *' may pollute namespace

prabhat00155 · 2019-11-06T19:34:02Z

I get error with the int labels in regressor:

data = load_digits()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = KNeighborsRegressor().fit(X_train, y_train)

model_onnx = convert_sklearn(model, 'knn', [('input', Int64TensorType([None, X_test.shape[1]]))])
save_model(model_onnx, 'knn.onnx')
sess = InferenceSession('knn.onnx')

Fail Traceback (most recent call last)
in
7 model_onnx = convert_sklearn(model, 'knn', [('input', Int64TensorType([None, X_test.shape[1]]))])
8 save_model(model_onnx, 'knn.onnx')
----> 9 sess = InferenceSession('knn.onnx')

~/Documents/MachineLearning/onnx_projects/tmp_env/lib/python3.6/site-packages/onnxruntime/capi/session.py in init(self, path_or_bytes, sess_options)
21 self._path_or_bytes = path_or_bytes
22 self._sess_options = sess_options
---> 23 self._load_model()
24 self._enable_fallback = True
25

~/Documents/MachineLearning/onnx_projects/tmp_env/lib/python3.6/site-packages/onnxruntime/capi/session.py in _load_model(self, providers)
33
34 if isinstance(self._path_or_bytes, str):
---> 35 self._sess.load_model(self._path_or_bytes, providers)
36 elif isinstance(self._path_or_bytes, bytes):
37 self._sess.read_bytes(self._path_or_bytes, providers)

Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from knn.onnx failed:Type Error: Type (tensor(float)) of output arg (variable) of node (Re_ReduceMean) does not match expected type (tensor(int64)).

prabhat00155 · 2019-11-06T19:37:48Z

skl2onnx/operator_converters/nearest_neighbours.py

+            training_labels = training_labels.ravel()
+            axis = 1
+
+        if training_labels.dtype == np.int32:


Where do you handle string labels?

Apparently I did not, that also means it is not covered by any unit test. I'll need to add more tests tomorrow.

Yeah, we don't have unit tests for that. Also, if you are updating the unit tests, can you make them use fit_classification_model() and fit_regression_model() from tests_helper.py.

The last failure comes from neighbours. When there neighbours are at the same exact distance, scikit-learn and onnx don't necessarily select the sames.

prabhat00155 · 2019-11-06T20:00:49Z

Also, I see mismatches with integer features in KNN regressor:

X, y = make_regression(n_samples=1000, n_features=100, random_state=42)
X = X.astype(np.int64)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)
model = KNeighborsRegressor().fit(X_train, y_train)
model_onnx = convert_sklearn(model, 'knn', [('input', Int64TensorType([None, X_test.shape[1]]))])
save_model(model_onnx, 'knn.onnx')
sess = InferenceSession('knn.onnx')
res = sess.run(None, {'input': np.array(X_test)})
np.mean(np.isclose(res[0], model.predict(X_test)))

0.73

lgtm-com · 2019-11-07T14:27:09Z

This pull request introduces 1 alert when merging 578e789 into dd9159c - view on LGTM.com

new alerts:

1 for Unused import

prabhat00155 · 2019-11-08T15:55:01Z

skl2onnx/operator_converters/nearest_neighbours.py

+
+    if np.issubdtype(op.classes_.dtype, np.floating):
+        classes = op.classes_.astype(np.int32)
+    elif np.issubdtype(op.classes_.dtype, np.signedinteger):


Both elif and else statements are same, you can just remove the elif part.

prabhat00155 · 2019-11-08T15:55:48Z

skl2onnx/operator_converters/nearest_neighbours.py

+    Converts *KNeighborsRegressor* into *ONNX*.
+    The converted model may return different predictions depending
+    on how the runtime select the topk element.
+    *sciki-learn* uses function `argpartition


prabhat00155 · 2019-11-08T15:56:45Z

skl2onnx/operator_converters/nearest_neighbours.py

+            training_labels = training_labels.ravel()
+            axis = 1
+
+        if training_labels.dtype == np.int32:


prabhat00155 · 2019-11-08T15:57:44Z

tests/test_sklearn_nearest_neighbour_converter.py

+        StrictVersion(onnxruntime.__version__) < StrictVersion("0.5.0"),
+        reason="not available")
+    def test_model_knn_classifier_multi_class_string(self):
+        model, X = self._fit_model_multiclass_classification(


We may want to clean up this unit test and use utility functions defined in test_utils.py later.

sdpython added 6 commits October 18, 2019 15:53

Update nearest_neighbours.py

e331d2c

enables batch predictions for knn

9ea7d74

Update test_sklearn_nearest_neighbour_converter.py

1820278

Update test_sklearn_calibrated_classifier_cv_converter.py

0569ed8

Update calibrated_classifier_cv.py

0ac6585

Update test_custom_transformer.py

e2d95c8

prabhat00155 reviewed Oct 22, 2019

View reviewed changes

This was referenced Oct 25, 2019

How is the sklearn version managed ? #251

Closed

Does Batch predictions support for KNeighborsClassifier in onnx ? #215

Closed

prabhat00155 reviewed Oct 29, 2019

View reviewed changes

prabhat00155 reviewed Oct 31, 2019

View reviewed changes

sdpython added 6 commits October 31, 2019 21:43

update cdist

c35b445

Merge branch 'master' of https://github.com/onnx/sklearn-onnx into knn

aa563a7

Merge branch 'master' of https://github.com/onnx/sklearn-onnx into knn

4e195a7

Update test_sklearn_glm_regressor_converter.py

416afe1

Update test_custom_transformer.py

104d708

Update test_sklearn_calibrated_classifier_cv_converter.py

1bcfa5c

prabhat00155 reviewed Nov 1, 2019

View reviewed changes

sdpython added 7 commits November 4, 2019 17:12

Merge branch 'master' of https://github.com/onnx/sklearn-onnx into knn

abdd152

Simplifies output names in OnnxOperator

231c4a8

Handles TensorProto constants, handle a string instead of lists

495f592

Update onnx_operator.py

bfda58b

Update onnx_operator.py

c9c9ee8

Fix comment from PR

64c4b08

Merge branch 'master' of https://github.com/onnx/sklearn-onnx into knn

47f0afa

prabhat00155 reviewed Nov 6, 2019

View reviewed changes

Merge branch 'master' of https://github.com/onnx/sklearn-onnx into knn

9cc0e6e

sdpython added 3 commits November 7, 2019 09:59

Fix label int, label string

f93b76d

Fix one unit test, add a new one

a0ec70d

Update documentation

578e789

Update nearest_neighbours.py

b7aac8b

prabhat00155 approved these changes Nov 8, 2019

View reviewed changes

sdpython added 2 commits November 8, 2019 16:59

Merge branch 'master' of https://github.com/onnx/sklearn-onnx into knn

b796ead

Improve the code and comments

99d84f8

xadupre merged commit 7a68ad1 into onnx:master Nov 8, 2019

xadupre deleted the knn branch November 14, 2019 11:02

		@@ -5,498 +5,236 @@
		# --------------------------------------------------------------------------

Batch predictions for nearest neighbors #293

Batch predictions for nearest neighbors #293

Conversation

xadupre commented Oct 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgtm-com bot commented Nov 5, 2019

prabhat00155 commented Nov 6, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prabhat00155 commented Nov 6, 2019

lgtm-com bot commented Nov 7, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment