-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Informative exception when passing a sparse matrix #7566
Conversation
Proposed solution to scipy#7565
scipy/spatial/distance.py
Outdated
@@ -1364,6 +1364,11 @@ def pdist(X, metric='euclidean', p=None, w=None, V=None, VI=None): | |||
# between all pairs of vectors in X using the distance metric 'abc' but | |||
# with a more succinct, verifiable, but less efficient implementation. | |||
|
|||
|
|||
if scipy.sparse.issparse(X): | |||
raise NotImplementedError("pdist does not support sparse matrices, use skearn's pairwise_distances\nhttp://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs indicate that NotImplementedError
should be used for abstract methods or in-progress methods. I don't think there's a plan to support sparse inputs here, so TypeError
might be more appropriate.
Also, I'd like a shorter message. Maybe something like: "pdist doesn't support sparse matrix inputs, use sklearn.metrics.pairwise_distances instead."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Done...
Fix @perimosocordiae comments
scipy/spatial/distance.py
Outdated
@@ -1364,6 +1364,11 @@ def pdist(X, metric='euclidean', p=None, w=None, V=None, VI=None): | |||
# between all pairs of vectors in X using the distance metric 'abc' but | |||
# with a more succinct, verifiable, but less efficient implementation. | |||
|
|||
|
|||
if scipy.sparse.issparse(X): | |||
raise TypeError("pdist does not support sparse matrix input, use skearn's pairwise_distances instead") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: skearn -> sklearn
Also wrap the string to fit in 80 columns, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Done !
scipy/spatial/distance.py
Outdated
@@ -1364,6 +1364,11 @@ def pdist(X, metric='euclidean', p=None, w=None, V=None, VI=None): | |||
# between all pairs of vectors in X using the distance metric 'abc' but | |||
# with a more succinct, verifiable, but less efficient implementation. | |||
|
|||
|
|||
if scipy.sparse.issparse(X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scipy.sparse
needs to be imported before use.
@perimosocordiae , please re-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @urigoren!
My final concern is that this will cause scipy.spatial
to depend on importing scipy.sparse
, which will slightly increase the time it takes to run import scipy.spatial
the first time. I don't expect this to be a big slowdown, though, so I'm +1 to merge.
This doesn't look quite right to me:
|
Looks like a documentation change is a better way to go about this. And perhaps it would make sense to move the scikit-learn distance metrics into |
@rgommers , So what do you suggest to do next? Move this check to |
For now, let's just add a warning not to use sparse matrix inputs to the docstring of I do think that eventually |
What are the downsides of merging this PR, only the possible increase in loading time ? I agree with @perimosocordiae , that we definitely want |
No you can't. Measuring import time is quite tricky to get right (after the first import it's a lookup in globals so much faster) - we had some discussion on that on the numpy-discussion list last year. Either way, import time is not the main issue. The issues are the two I listed in #7566 (comment). |
Agreed
+1 |
Just noting that there's a similar utility function instead of a direct use of |
Using `_asarray_validated` instead of `issparse` as @ev-br suggested
Proposed solution to
#7565