New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check_array 'accept_sparse' parameter quirk since 0.17 #7880

Closed
lesteve opened this Issue Nov 15, 2016 · 6 comments

Comments

Projects
None yet
3 participants
@lesteve
Member

lesteve commented Nov 15, 2016

Description

Since 0.17 accept_sparse can be any type without any exception being raised. It would be great to add a test like this and make sure that we get an exception. Also I am not sure whether we accept accept_sparse=True. Looking at the code we do handle the specific case of accept_sparse=False. If we accept booleans at the very least the docstring should be amended.

Steps/Code to Reproduce

from scipy import sparse
from sklearn.utils import check_array

class MyClass(object): pass

garbage_parameter = MyClass()
check_array(sparse.csr_matrix([1, 2]), accept_sparse=garbage_parameter)

Expected Results

Some exception being raised because the input parameter does not mean anything

Actual Results

No exception raised

@jkarno

This comment has been minimized.

Show comment
Hide comment
@jkarno

jkarno Nov 15, 2016

Contributor

I can work on this.

A little unclear about your boolean comment. Do you think booleans should be allowed, or should we take out the False case?

Contributor

jkarno commented Nov 15, 2016

I can work on this.

A little unclear about your boolean comment. Do you think booleans should be allowed, or should we take out the False case?

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Nov 15, 2016

Member

Do you think booleans should be allowed, or should we take out the False case?

Not sure to be honest, accept_sparse=False is the same as accept_sparse=None it seems (I may be wrong about that, please double-check). accept_sparse=False does not seem to be used in the scikit-learn code. accept_sparse=True on the other hand seems used. I guess it can be useful if you want to allow any kind of sparse matrices. At the very least this should be documented in the docstring of check_array and all similar functions like check_X_y, etc ... use git grep -P 'accept_sparse ?:'
to get all the occurences.

At the moment I am guessing that the code path that handles accept_sparse=True is the same that the ones that handles accept_sparse=MyClass() (probably no if clause is hit). This should be tidied up by checking that accept_sparse is either None or a boolean or a Sequence. Adding tests with edge cases will be needed too.

For reference, the commit that introduced the changes in 0.17: 89c1018

Member

lesteve commented Nov 15, 2016

Do you think booleans should be allowed, or should we take out the False case?

Not sure to be honest, accept_sparse=False is the same as accept_sparse=None it seems (I may be wrong about that, please double-check). accept_sparse=False does not seem to be used in the scikit-learn code. accept_sparse=True on the other hand seems used. I guess it can be useful if you want to allow any kind of sparse matrices. At the very least this should be documented in the docstring of check_array and all similar functions like check_X_y, etc ... use git grep -P 'accept_sparse ?:'
to get all the occurences.

At the moment I am guessing that the code path that handles accept_sparse=True is the same that the ones that handles accept_sparse=MyClass() (probably no if clause is hit). This should be tidied up by checking that accept_sparse is either None or a boolean or a Sequence. Adding tests with edge cases will be needed too.

For reference, the commit that introduced the changes in 0.17: 89c1018

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Nov 22, 2016

Member

@jkarno are you working on this? Then I'll remove the Need Contributor tag.

Member

amueller commented Nov 22, 2016

@jkarno are you working on this? Then I'll remove the Need Contributor tag.

@jkarno

This comment has been minimized.

Show comment
Hide comment
@jkarno

jkarno Nov 22, 2016

Contributor

Yep, I'm still working on it.

Contributor

jkarno commented Nov 22, 2016

Yep, I'm still working on it.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Nov 23, 2016

Member

Hm there are some weird issues with the current code-paths.
The fact that we convert strings to lists but not anything else, for example.
Also, the docstring of _ensure_sparse_format suggests that it accepts strings, but I don't think this is true. The casting of strings to lists should probably happen in _ensure_sparse_format, not in check_array (I'm pretty sure I'm guilty of introducing this mess).

I guess in _ensure_sparse_format we want

if accept_sparse in [None, False]:
    raise TypeError(...)
elif isinstance(accept_sparse, (list, tuple)):
    ...
else:
    raise ValueError("invalid parameter")

The question is whether we want to ensure that all elements of a list that is passed are sensible.
I don't think that's necessary. So if you pass ["csr", object()] I'm fine with not raising a ValueError, because there is pretty clear semantics on what's supposed to happen.

Member

amueller commented Nov 23, 2016

Hm there are some weird issues with the current code-paths.
The fact that we convert strings to lists but not anything else, for example.
Also, the docstring of _ensure_sparse_format suggests that it accepts strings, but I don't think this is true. The casting of strings to lists should probably happen in _ensure_sparse_format, not in check_array (I'm pretty sure I'm guilty of introducing this mess).

I guess in _ensure_sparse_format we want

if accept_sparse in [None, False]:
    raise TypeError(...)
elif isinstance(accept_sparse, (list, tuple)):
    ...
else:
    raise ValueError("invalid parameter")

The question is whether we want to ensure that all elements of a list that is passed are sensible.
I don't think that's necessary. So if you pass ["csr", object()] I'm fine with not raising a ValueError, because there is pretty clear semantics on what's supposed to happen.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Nov 23, 2016

Member

The question is whether we want to ensure that all elements of a list that is passed are sensible.
I don't think that's necessary. So if you pass ["csr", object()] I'm fine with not raising a ValueError, because there is pretty clear semantics on what's supposed to happen.

Same feeling here.

Member

lesteve commented Nov 23, 2016

The question is whether we want to ensure that all elements of a list that is passed are sensible.
I don't think that's necessary. So if you pass ["csr", object()] I'm fine with not raising a ValueError, because there is pretty clear semantics on what's supposed to happen.

Same feeling here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment