Added sparse nonzero functionality to min, max, argmin, argmax #11899

dloney · 2020-04-20T03:53:08Z

Added nonzero parsing functionality into sparse matrix min, max, argmin, and argmax functions.

What does this implement/fix?

This allows a user to selectively consider only the nonzero values in the sparse min, max, argmin, and argmax functions. A programmer will often use a sparse matrix when zeros do not matter. Previous to this change, the programmer will have needed to mathematically manipulate the sparse matrix so that these functions gave the correct result when considering nonzero values. By passing in the nonzero keyword, these functions are now able to consider only nonzero values within the matrix. The change defaults to the existing functionality and only activates these changes if the user activates the input keyword.

This issue exists in several of my existing programs. As a benchmark, this implementation improves performance between 1.15x and 1.35x compared to manipulating the sparse matrix to identify nonzero values.

Additional information

This broadly mirrors the same functionality as implemented in cupy, which should help the compatibility of the packages.

perimosocordiae

I think the proposed API makes sense, though I think we may want to post to the mailing list to solicit feedback once this PR is ready to go.

perimosocordiae · 2020-04-20T13:28:00Z

scipy/sparse/data.py

        if out is not None:
            raise ValueError(("Sparse matrices do not support "
                              "an 'out' parameter."))

        validateaxis(axis)

+        if hasattr(self, 'has_canonical_format'):
+            if not self.has_canonical_format and nonzero:
+                self.eliminate_zeros()


I don't think we should be eliminating zeros here. There are use-cases that distinguish between explicit and implicit zeros (like the csgraph module), so we can't always assume that explicit zeros should be ignored. It's also an in-place update, which I wouldn't expect to happen when calling .max().

I introduced this mainly to have compatibility with the non-canonical forms. There could be a couple of different alternative implementations. The first is it could consider explicit zeros that are left in the matrix. The second could be a forced zero elimination if any values are zeros in the data. The main issue with the former is that it could give unexpected behavior if the user didn't call eliminate zeros prior to calling the function. That may be the best alternative especially if the flag is updated from 'nonzero' to 'explicit' along with the documentation.

+1 for changing the name rather than eliminating zeros.

The new keyword should be in place. I used "explicit", since it's considering values that are within the actual matrix.

I should also mention that I reworked the tests to allow the non-canonical sparse matrices to be correctly validated in the 'explicit' case.

scipy/sparse/data.py

perimosocordiae · 2020-04-20T13:36:11Z

scipy/sparse/data.py

                        return am
+                else:


Since the if-block always returns, I'd drop the else here and preserve the original indentation.

I'm not sure what you're thinking here. The if block was introduced to bypass the compare operation, otherwise the zero would be considered.

Sorry I wasn't clear. What I meant was:

if nonzero: if m == 0: return [] return am if compare(m, zero): return mat.row... ...etc...

Thanks. I understand now that the elses are redundant. I made those changes.

Added nonzero parsing funcitonality into sparse matrix methods Added sparse nonzero functionality Removed redundant else block Changed working and removed enforced summation Corrected formatting issues

dloney · 2020-07-09T18:36:02Z

@perimosocordiae Any additional thoughts about the proposed changes?

perimosocordiae · 2020-07-11T05:47:44Z

I haven't had a chance to look this over again recently, but I'm still +1 on the general idea.

I'll be on vacation for the next week, so I probably won't have more concrete comments for a little while. In the mean time, could you send an email to the scipy-dev@python.org mailing list with a summary of the proposed changes (and a link to this PR)?

drhagen · 2020-07-15T12:53:30Z

scipy/sparse/data.py

+            considered. If a row/column is empty, a zero will be returned
+            to indicate it contains no nonzero values.


If a row/column is empty, a zero will be returned indicate it contains no nonzero values.

This is highly nonstandard behavior, no? max([]) is an exception in Python and csr_matrix((2,0)).max(axis=1) is an exception in SciPy. I do not think returning 0 from max/min of an empty list is considered correct behavior.

The row/column is empty only in the sense there are no explicit values contained in the row/column, so that in dense form it would only contain zeros. Returning a zero under those conditions is correct because only zeros are present in the row/column.

Yes, but when explicit=True, the implicit zeros are not considered in the min/max operation. It is min/maxing over an empty list of explicit elements.

Would returning a NaN be more appropriate?

Ideally, min/max would return None on an empty vector, but you can't do that with a NumPy array. NaN is a fine sentinel for float32 and float64, but there is no equivalent for integer types.

Another possibility is min([]) == inf and max([]) == -inf, for float dtypes, and 232-1 and -232 for integer dtypes. But that is pretty ugly.

This is admittedly a hard problem when the data structure does not allow for missing integer values.

quantresearch1 · 2022-06-17T00:16:32Z

Hello, I am interested in picking up this PR if @dloney is not involved anymore.
I have merged main into this code and resolved the conflicts here https://github.com/quantresearch1/scipy/tree/sparse_nonzero_dev . Should I raise a PR ?

tupui · 2022-06-25T16:07:54Z

Thank you both for working on this. As said in #16467 everyone agreed to continue there.

tylerjereddy added scipy.sparse enhancement A new feature or improvement labels Apr 20, 2020

perimosocordiae requested changes Apr 20, 2020

View reviewed changes

dloney force-pushed the sparse_nonzero branch from 1d5e28e to 822d9f8 Compare April 21, 2020 02:59

dloney force-pushed the sparse_nonzero branch from 822d9f8 to b8e0fed Compare June 11, 2020 03:30

Added sparse nonzero functionality

6672410

Added nonzero parsing funcitonality into sparse matrix methods Added sparse nonzero functionality Removed redundant else block Changed working and removed enforced summation Corrected formatting issues

dloney force-pushed the sparse_nonzero branch from b8e0fed to 6672410 Compare June 12, 2020 03:24

dloney mentioned this pull request Jul 9, 2020

Added min, max, argmin, argmax to sparse csr and csc matrices cupy/cupy#2711

Merged

leofang mentioned this pull request Jul 13, 2020

Mark nonzero parameters experimental in sparse min/max cupy/cupy#3583

Merged

drhagen reviewed Jul 15, 2020

View reviewed changes

takagi mentioned this pull request Jul 20, 2020

Fix sparse min/max to return sparse matrix cupy/cupy#3536

Merged

quantresearch1 mentioned this pull request Jun 25, 2022

ENH: sparse: add nonzero functionality to min, max, argmin, argmax #16467

Open

tupui closed this Jun 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added sparse nonzero functionality to min, max, argmin, argmax #11899

Added sparse nonzero functionality to min, max, argmin, argmax #11899

dloney commented Apr 20, 2020

perimosocordiae left a comment

perimosocordiae Apr 20, 2020

dloney Apr 21, 2020

perimosocordiae May 29, 2020

dloney Jun 11, 2020

dloney Jun 11, 2020

perimosocordiae Apr 20, 2020

dloney Apr 21, 2020

perimosocordiae May 29, 2020

dloney Jun 11, 2020

dloney commented Jul 9, 2020

perimosocordiae commented Jul 11, 2020

drhagen Jul 15, 2020

dloney Jul 23, 2020

drhagen Jul 26, 2020

drhagen Jul 26, 2020

dloney Aug 2, 2020

drhagen Aug 11, 2020

quantresearch1 commented Jun 17, 2022

tupui commented Jun 25, 2022

		considered. If a row/column is empty, a zero will be returned
		to indicate it contains no nonzero values.

Added sparse nonzero functionality to min, max, argmin, argmax #11899

Added sparse nonzero functionality to min, max, argmin, argmax #11899

Conversation

dloney commented Apr 20, 2020

What does this implement/fix?

Additional information

perimosocordiae left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dloney commented Jul 9, 2020

perimosocordiae commented Jul 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

quantresearch1 commented Jun 17, 2022

tupui commented Jun 25, 2022