-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added sparse nonzero functionality to min, max, argmin, argmax #11899
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proposed API makes sense, though I think we may want to post to the mailing list to solicit feedback once this PR is ready to go.
scipy/sparse/data.py
Outdated
if out is not None: | ||
raise ValueError(("Sparse matrices do not support " | ||
"an 'out' parameter.")) | ||
|
||
validateaxis(axis) | ||
|
||
if hasattr(self, 'has_canonical_format'): | ||
if not self.has_canonical_format and nonzero: | ||
self.eliminate_zeros() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be eliminating zeros here. There are use-cases that distinguish between explicit and implicit zeros (like the csgraph module), so we can't always assume that explicit zeros should be ignored. It's also an in-place update, which I wouldn't expect to happen when calling .max()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced this mainly to have compatibility with the non-canonical forms. There could be a couple of different alternative implementations. The first is it could consider explicit zeros that are left in the matrix. The second could be a forced zero elimination if any values are zeros in the data. The main issue with the former is that it could give unexpected behavior if the user didn't call eliminate zeros prior to calling the function. That may be the best alternative especially if the flag is updated from 'nonzero' to 'explicit' along with the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for changing the name rather than eliminating zeros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new keyword should be in place. I used "explicit", since it's considering values that are within the actual matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should also mention that I reworked the tests to allow the non-canonical sparse matrices to be correctly validated in the 'explicit' case.
return am | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the if-block always returns, I'd drop the else here and preserve the original indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you're thinking here. The if block was introduced to bypass the compare operation, otherwise the zero would be considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wasn't clear. What I meant was:
if nonzero:
if m == 0:
return []
return am
if compare(m, zero):
return mat.row...
...etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I understand now that the elses are redundant. I made those changes.
Added nonzero parsing funcitonality into sparse matrix methods Added sparse nonzero functionality Removed redundant else block Changed working and removed enforced summation Corrected formatting issues
@perimosocordiae Any additional thoughts about the proposed changes? |
I haven't had a chance to look this over again recently, but I'm still +1 on the general idea. I'll be on vacation for the next week, so I probably won't have more concrete comments for a little while. In the mean time, could you send an email to the scipy-dev@python.org mailing list with a summary of the proposed changes (and a link to this PR)? |
considered. If a row/column is empty, a zero will be returned | ||
to indicate it contains no nonzero values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a row/column is empty, a zero will be returned indicate it contains no nonzero values.
This is highly nonstandard behavior, no? max([])
is an exception in Python and csr_matrix((2,0)).max(axis=1)
is an exception in SciPy. I do not think returning 0 from max/min of an empty list is considered correct behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The row/column is empty only in the sense there are no explicit values contained in the row/column, so that in dense form it would only contain zeros. Returning a zero under those conditions is correct because only zeros are present in the row/column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but when explicit=True
, the implicit zeros are not considered in the min
/max
operation. It is min
/max
ing over an empty list of explicit elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but when explicit=True
, the implicit zeros are not considered in the min
/max
operation. It is min
/max
ing over an empty list of explicit elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would returning a NaN be more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, min
/max
would return None on an empty vector, but you can't do that with a NumPy array. NaN is a fine sentinel for float32 and float64, but there is no equivalent for integer types.
Another possibility is min([]) == inf
and max([]) == -inf
, for float dtypes, and 232-1 and -232 for integer dtypes. But that is pretty ugly.
This is admittedly a hard problem when the data structure does not allow for missing integer values.
Hello, I am interested in picking up this PR if @dloney is not involved anymore. |
Thank you both for working on this. As said in #16467 everyone agreed to continue there. |
Added nonzero parsing functionality into sparse matrix min, max, argmin, and argmax functions.
What does this implement/fix?
This allows a user to selectively consider only the nonzero values in the sparse min, max, argmin, and argmax functions. A programmer will often use a sparse matrix when zeros do not matter. Previous to this change, the programmer will have needed to mathematically manipulate the sparse matrix so that these functions gave the correct result when considering nonzero values. By passing in the nonzero keyword, these functions are now able to consider only nonzero values within the matrix. The change defaults to the existing functionality and only activates these changes if the user activates the input keyword.
This issue exists in several of my existing programs. As a benchmark, this implementation improves performance between 1.15x and 1.35x compared to manipulating the sparse matrix to identify nonzero values.
Additional information
This broadly mirrors the same functionality as implemented in cupy, which should help the compatibility of the packages.