New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] BUG ensure object array are properly casted when dtype=object #16076
Conversation
…X, but not in 0.21.3 (scikit-learn#16036)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @alexshacked !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. A couple of changes.
Please add an entry to the change log at doc/whats_new/v0.20.rst under bug fixes. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:
I see a similar idiom three times in the radius_neighbors method in this
file
|
Oh I see, I was seeking for |
So we need to call NB: I searched for the patter |
I see similar here:
but otherwise agree it's all in radius_neighbors |
@glemaitre change log is in v0.20.rst? I thought v0.23.rst |
v0.23.rst |
Ups my automatic answering is broken :) |
…in regression testing also distances
ok. v0.23 then. Thanks @glemaitre |
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to apply the _to_object_array
function on the following line:
sklearn/preprocessing/tests/test_label.py=435=def test_multilabel_binarizer_non_integer_labels():
sklearn/preprocessing/tests/test_label.py:436: tuple_classes = np.empty(3, dtype=object)
sklearn/preprocessing/tests/test_label.py-437- tuple_classes[:] = [(1,), (2,), (3,)]
I propose to add a docstring (as you did earlier) and move the _to_object_array
function in sklearn/utils/__init__.py
. Then, we can import it in neighbors and preprocessing.
We just need to add a small test in sklearn/utils/tests/test_utils.py
to check the expected behavior:
@pytest.mark.parametrize(
"sequence",
[[np.array(1), np.array(2)], [[1, 2], [3, 4]]]
)
test_to_object_array(sequence):
out = _to_object_array(sequence)
assert isinstance(out, ndarray)
assert out.dtype.kind == 'O'
assert out.ndim == 1
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…moved to_object_array() to sklearn.utils
… fix PEP8 errors
Hi @glemaitre. Moved function to_object_array() to sklearn.utils and changed the message in the change log of v0.23.rst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart of making the function private LGTM. @alexshacked you can accept my suggestion and this would be enough.
Sorry about this @glemaitre . I thought one underscore means private inside the class, not private inside the package. Will restore the underscore |
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…removed unnecessary comment
…restored underscore
LGTM. @jnothman @thomasjpfan Could you have a look. I added the regression tag and tag it as a candidate for 0.22.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for your comments @TomDLT. Will apply them all. |
…improved _to_object_array() documentation
Thanks @alexshacked ! |
* FIX ensure object array are properly casted when dtype=object (#16076) * DOC Docstring example of classifier should import classifier (#16430) * MNT Update nightly build URL and release staging config (#16435) * BUG ensure that estimator_name is properly stored in the ROC display (#16500) * BUG ensure that name is properly stored in the precision/recall display (#16505) * ENH Perform KNN imputation without O(n^2) memory cost (#16397) * bump scikit-learn version for binder * bump version to 0.22.2 * MNT Skips failing SpectralCoclustering doctest (#16232) * TST Updates test for deprecation in pandas.SparseArray (#16040) * move 0.22.2 what's new entries (#16586) * add 0.22.2 in the news of the web site frontpage * skip test_ard_accuracy_on_easy_problem Co-authored-by: alexshacked <al.shacked@gmail.com> Co-authored-by: Oleksandr Pavlyk <oleksandr-pavlyk@users.noreply.github.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Joel Nothman <joel.nothman@gmail.com> Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
closes #16036
Fix a bug where calling
np.array(..., dtype=object)
will create a N-D array while algorithms are expecting a 1-D array with objects inside (similar to a list).