Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] BUG ensure object array are properly casted when dtype=object #16076

Merged
merged 32 commits into from Jan 15, 2020

Conversation

alexshacked
Copy link
Contributor

@alexshacked alexshacked commented Jan 9, 2020

closes #16036

Fix a bug where calling np.array(..., dtype=object) will create a N-D array while algorithms are expecting a 1-D array with objects inside (similar to a list).

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @alexshacked !

sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
@glemaitre glemaitre changed the title [MRG] Using dbscan with precomputed neighbors gives an error in 0.22.… [MRG] BUG ensure object array are properly casted when dtype=object Jan 9, 2020
Copy link
Contributor

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A couple of changes.

Please add an entry to the change log at doc/whats_new/v0.20.rst under bug fixes. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
@jnothman
Copy link
Member

jnothman commented Jan 9, 2020 via email

@glemaitre
Copy link
Contributor

Oh I see, I was seeking for np.array(..., dtype=object).
Then, I agree to move it either as it is now (I would rename it _to_object_array) or even in utils if we have something similar in other file. I will look at it.

@glemaitre
Copy link
Contributor

So we need to call _to_object_array for sklearn.neighbors._base: l.945-948; l.952-953

NB: I searched for the patter [:] = and filter that it was preceded by the creation of a numpy object array.

sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
@jnothman
Copy link
Member

jnothman commented Jan 9, 2020

So we need to call _to_object_array for sklearn.neighbors._base: l.945-948; l.952-953

I see similar here:

sklearn/preprocessing/tests/test_label.py=435=def test_multilabel_binarizer_non_integer_labels():
sklearn/preprocessing/tests/test_label.py:436:    tuple_classes = np.empty(3, dtype=object)
sklearn/preprocessing/tests/test_label.py-437-    tuple_classes[:] = [(1,), (2,), (3,)]
--
sklearn/neighbors/_classification.py:541:            pred_labels = np.zeros(len(neigh_ind), dtype=object)
sklearn/neighbors/_classification.py-542-            pred_labels[:] = [_y[ind, k] for ind in neigh_ind]

but otherwise agree it's all in radius_neighbors

@alexshacked
Copy link
Contributor Author

@glemaitre change log is in v0.20.rst? I thought v0.23.rst

@glemaitre
Copy link
Contributor

v0.23.rst

@glemaitre
Copy link
Contributor

Ups my automatic answering is broken :)

@alexshacked
Copy link
Contributor Author

ok. v0.23 then. Thanks @glemaitre

alexshacked and others added 11 commits January 10, 2020 01:12
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Copy link
Contributor

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to apply the _to_object_array function on the following line:

sklearn/preprocessing/tests/test_label.py=435=def test_multilabel_binarizer_non_integer_labels():
sklearn/preprocessing/tests/test_label.py:436:    tuple_classes = np.empty(3, dtype=object)
sklearn/preprocessing/tests/test_label.py-437-    tuple_classes[:] = [(1,), (2,), (3,)]

I propose to add a docstring (as you did earlier) and move the _to_object_array function in sklearn/utils/__init__.py. Then, we can import it in neighbors and preprocessing.

We just need to add a small test in sklearn/utils/tests/test_utils.py to check the expected behavior:

@pytest.mark.parametrize(
    "sequence",
    [[np.array(1), np.array(2)], [[1, 2], [3, 4]]]
)
test_to_object_array(sequence):
    out = _to_object_array(sequence)
    assert isinstance(out, ndarray)
    assert out.dtype.kind == 'O'
    assert out.ndim == 1

doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved
doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved
doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
alexshacked and others added 2 commits January 10, 2020 15:44
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
alexshacked and others added 3 commits January 10, 2020 15:48
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@alexshacked
Copy link
Contributor Author

Hi @glemaitre. Moved function to_object_array() to sklearn.utils and changed the message in the change log of v0.23.rst

Copy link
Contributor

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart of making the function private LGTM. @alexshacked you can accept my suggestion and this would be enough.

sklearn/utils/__init__.py Outdated Show resolved Hide resolved
sklearn/preprocessing/tests/test_label.py Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
@alexshacked
Copy link
Contributor Author

alexshacked commented Jan 10, 2020

Sorry about this @glemaitre . I thought one underscore means private inside the class, not private inside the package. Will restore the underscore

alexshacked and others added 11 commits January 10, 2020 20:18
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre glemaitre added this to the 0.22.2 milestone Jan 13, 2020
@glemaitre
Copy link
Contributor

LGTM. @jnothman @thomasjpfan Could you have a look. I added the regression tag and tag it as a candidate for 0.22.2

TomDLT
TomDLT approved these changes Jan 13, 2020
Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

sklearn/utils/__init__.py Show resolved Hide resolved
doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
@alexshacked
Copy link
Contributor Author

Thanks for your comments @TomDLT. Will apply them all.

@TomDLT TomDLT merged commit c4ea377 into scikit-learn:master Jan 15, 2020
@TomDLT
Copy link
Member

TomDLT commented Jan 15, 2020

Thanks @alexshacked !

thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Feb 28, 2020
ogrisel added a commit that referenced this pull request Feb 28, 2020
* FIX ensure object array are properly casted when dtype=object (#16076)

* DOC Docstring example of classifier should import classifier (#16430)

* MNT Update nightly build URL and release staging config (#16435)

* BUG ensure that estimator_name is properly stored in the ROC display (#16500)

* BUG ensure that name is properly stored in the precision/recall display (#16505)

* ENH Perform KNN imputation without O(n^2) memory cost (#16397)

* bump scikit-learn version for binder

* bump version to 0.22.2

* MNT Skips failing SpectralCoclustering doctest (#16232)

* TST Updates test for deprecation in pandas.SparseArray (#16040)

* move 0.22.2 what's new entries (#16586)

* add 0.22.2 in the news of the web site frontpage

* skip test_ard_accuracy_on_easy_problem

Co-authored-by: alexshacked <al.shacked@gmail.com>
Co-authored-by: Oleksandr Pavlyk <oleksandr-pavlyk@users.noreply.github.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Joel Nothman <joel.nothman@gmail.com>
Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using dbscan with precomputed neighbors gives an error in 0.22.X, but not in 0.21.3.
5 participants