Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check_array 'accept_sparse' parameter quirk since 0.17 #7880

Closed
lesteve opened this issue Nov 15, 2016 · 6 comments
Closed

check_array 'accept_sparse' parameter quirk since 0.17 #7880

lesteve opened this issue Nov 15, 2016 · 6 comments
Labels

Comments

@lesteve
Copy link
Member

@lesteve lesteve commented Nov 15, 2016

Description

Since 0.17 accept_sparse can be any type without any exception being raised. It would be great to add a test like this and make sure that we get an exception. Also I am not sure whether we accept accept_sparse=True. Looking at the code we do handle the specific case of accept_sparse=False. If we accept booleans at the very least the docstring should be amended.

Steps/Code to Reproduce

from scipy import sparse
from sklearn.utils import check_array

class MyClass(object): pass

garbage_parameter = MyClass()
check_array(sparse.csr_matrix([1, 2]), accept_sparse=garbage_parameter)

Expected Results

Some exception being raised because the input parameter does not mean anything

Actual Results

No exception raised

@jkarno
Copy link
Contributor

@jkarno jkarno commented Nov 15, 2016

I can work on this.

A little unclear about your boolean comment. Do you think booleans should be allowed, or should we take out the False case?

@lesteve
Copy link
Member Author

@lesteve lesteve commented Nov 15, 2016

Do you think booleans should be allowed, or should we take out the False case?

Not sure to be honest, accept_sparse=False is the same as accept_sparse=None it seems (I may be wrong about that, please double-check). accept_sparse=False does not seem to be used in the scikit-learn code. accept_sparse=True on the other hand seems used. I guess it can be useful if you want to allow any kind of sparse matrices. At the very least this should be documented in the docstring of check_array and all similar functions like check_X_y, etc ... use git grep -P 'accept_sparse ?:'
to get all the occurences.

At the moment I am guessing that the code path that handles accept_sparse=True is the same that the ones that handles accept_sparse=MyClass() (probably no if clause is hit). This should be tidied up by checking that accept_sparse is either None or a boolean or a Sequence. Adding tests with edge cases will be needed too.

For reference, the commit that introduced the changes in 0.17: 89c1018

@amueller
Copy link
Member

@amueller amueller commented Nov 22, 2016

@jkarno are you working on this? Then I'll remove the Need Contributor tag.

@jkarno
Copy link
Contributor

@jkarno jkarno commented Nov 22, 2016

Yep, I'm still working on it.

@amueller
Copy link
Member

@amueller amueller commented Nov 23, 2016

Hm there are some weird issues with the current code-paths.
The fact that we convert strings to lists but not anything else, for example.
Also, the docstring of _ensure_sparse_format suggests that it accepts strings, but I don't think this is true. The casting of strings to lists should probably happen in _ensure_sparse_format, not in check_array (I'm pretty sure I'm guilty of introducing this mess).

I guess in _ensure_sparse_format we want

if accept_sparse in [None, False]:
    raise TypeError(...)
elif isinstance(accept_sparse, (list, tuple)):
    ...
else:
    raise ValueError("invalid parameter")

The question is whether we want to ensure that all elements of a list that is passed are sensible.
I don't think that's necessary. So if you pass ["csr", object()] I'm fine with not raising a ValueError, because there is pretty clear semantics on what's supposed to happen.

@lesteve
Copy link
Member Author

@lesteve lesteve commented Nov 23, 2016

The question is whether we want to ensure that all elements of a list that is passed are sensible.
I don't think that's necessary. So if you pass ["csr", object()] I'm fine with not raising a ValueError, because there is pretty clear semantics on what's supposed to happen.

Same feeling here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants
You can’t perform that action at this time.