Skip to content

Commit

Permalink
Fix parallel backend neighbors (#12172)
Browse files Browse the repository at this point in the history
  • Loading branch information
tomMoral authored and jnothman committed Oct 15, 2018
1 parent a152cce commit cec0fba
Show file tree
Hide file tree
Showing 3 changed files with 50 additions and 1 deletion.
26 changes: 26 additions & 0 deletions doc/whats_new/v0.20.rst
Expand Up @@ -2,6 +2,32 @@

.. currentmodule:: sklearn

.. _changes_0_20_1:

Version 0.20.1
==============

**October XX, 2018**

This is a bug-fix release with some minor documentation improvements and
enhancements to features released in 0.20.0.

- |Efficiency| make :class:`cluster.MeanShift` no longer try to do nested
parallelism as the overhead would hurt performance significantly when
``n_jobs > 1``.
:issue:`12159` by :user:`Olivier Grisel <ogrisel>`.

- |Fix| :func:`linear_model.SGDClassifier` and variants
with ``early_stopping=True`` would not use a consistent validation
split in the multiclass case and this would cause a crash when using
those estimators as part of parallel parameter search or cross-validation.
:issue:`12122` by :user:`Olivier Grisel <ogrisel>`.

- |Fix| force the parallelism backend to :code:`threading` for
:class:`neighbors.KDTree` and :class:`neighbors.BallTree` in Python 2.7 to
avoid pickling errors caused by the serialization of their methods.
:issue:`12171` by :user:`Thomas Moreau <tomMoral>`

.. _changes_0_20:

Version 0.20.0
Expand Down
4 changes: 3 additions & 1 deletion sklearn/neighbors/base.py
Expand Up @@ -9,6 +9,7 @@
from functools import partial
from distutils.version import LooseVersion

import sys
import warnings
from abc import ABCMeta, abstractmethod

Expand Down Expand Up @@ -429,7 +430,8 @@ class from an array representing our data set and ask who's
raise ValueError(
"%s does not work with sparse matrices. Densify the data, "
"or set algorithm='brute'" % self._fit_method)
if LooseVersion(joblib_version) < LooseVersion('0.12'):
if (sys.version_info < (3,) or
LooseVersion(joblib_version) < LooseVersion('0.12')):
# Deal with change of API in joblib
delayed_query = delayed(self._tree.query,
check_pickle=False)
Expand Down
21 changes: 21 additions & 0 deletions sklearn/neighbors/tests/test_neighbors.py
Expand Up @@ -27,6 +27,8 @@
from sklearn.utils.testing import ignore_warnings
from sklearn.utils.validation import check_random_state

from sklearn.externals.joblib import parallel_backend

rng = np.random.RandomState(0)
# load and shuffle iris dataset
iris = datasets.load_iris()
Expand Down Expand Up @@ -1316,6 +1318,25 @@ def test_same_radius_neighbors_parallel(algorithm):
assert_array_almost_equal(graph, graph_parallel)


@pytest.mark.parametrize('backend', ['loky', 'multiprocessing', 'threading'])
@pytest.mark.parametrize('algorithm', ALGORITHMS)
def test_knn_forcing_backend(backend, algorithm):
# Non-regression test which ensure the knn methods are properly working
# even when forcing the global joblib backend.
with parallel_backend(backend):
X, y = datasets.make_classification(n_samples=30, n_features=5,
n_redundant=0, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y)

clf = neighbors.KNeighborsClassifier(n_neighbors=3,
algorithm=algorithm,
n_jobs=3)
clf.fit(X_train, y_train)
clf.predict(X_test)
clf.kneighbors(X_test)
clf.kneighbors_graph(X_test, mode='distance').toarray()


def test_dtype_convert():
classifier = neighbors.KNeighborsClassifier(n_neighbors=1)
CLASSES = 15
Expand Down

0 comments on commit cec0fba

Please sign in to comment.