ENH: sparse: Add indexing for 1D arrays #20120

dschult · 2024-02-20T18:43:03Z

Adds support for indexing with 1D arrays.

Builds on #19833 (csr-1d) to provide indexing for 1D CSR arrays, though this code also works for DOK with minor changes after #19715 (dok-1d) gets merged.

The diff will be much easier to read after #19833 is merged. Look at only the last commit to see the diff for just this PR. I will rebase as needed, but I think the CSR PR will be merged soon and the diff here will become cleaner.

adds test_indexing1d.py
refactors IndexMixin._validate_indices to use an ndim-independent approach for both 1d and 2d.
the supported formats will need methods _get_int, _get_slice and _get_array. These are dispatched to after processing the index.
this tries to incorporate all improvements from the recently merged defect: sparse: 1d bool mask with wrong shape should raise IndexError #19957 (and doesn't change those tests).
the ndim-independent approach means some helper functions are no longer needed and/or have been inlined. The two remaining helper functions are now methods so they have access to self.ndim. They are _as_indices and _compatible_boolean_array.

scipy/sparse/tests/test_arithmetic1d.py

…mments

Move TestGetSet1d to indexing1d.py. Simplify helper functions.

… fix fancy This should set us up for nD indexing when the time comes. Note: ndindex could work for much of this -- but not for sparse array boolean In the future we could maybe implement sparse integer array indexing. We should think about whether 0D return should be sparse array or ndarray. Currently ndarray. This doesn't make reduction functions return sparse arrays yet -- only indexing

dschult · 2024-03-18T04:09:41Z

I've changed the tests to use np.testing assert_equal and assert_allclose in these new tests. I'll create a separate PR to change the new-but-merged-tests to use them too.

I've also added indexing support for np.newaxis/None (though it can't make 3D+). And that was an excuse to revamp some of the getitem/setitem code to be easier to move to nd. Quanstide's ndindex library would give almost everything we need for validate_indices, but it doesn't support boolean sparse arrays as indexes. Anyway, we can now index 2d and 1d sparse arrays.

I believe the test failures are not related to this PR.

Note: This PR does not change reduction operations to have them return 1D sparse arrays. That will be a separate PR.

Note: Another separate PR: We need to decide how to handle 0D. A[3,4] should return a 0D object as per array_api. But should it return a numpy or scipy.sparse 0D object? In general, an array type "should" return it's own type when indexed. But sparse is not a standard array type -- we don't expect the entire array api to be implemented. But the answer to this question will impact this same code.

…mments

scipy/sparse/_index.py

scipy/sparse/_csr.py

scipy/sparse/_dok.py

dschult · 2024-06-04T00:57:58Z

I've made the suggested changes. Thanks for the note about Python 3.10+ being supported (3.9 dropped)

scipy/sparse/_index.py

scipy/sparse/tests/test_indexing1d.py

dschult · 2024-06-04T06:53:47Z

I updated as suggested (and realized why the else block suggestion was correct -- we don't need to convert the dtype at that location.

This should be ready. The failing test seems unrelated.

@stefanv you looked at this some a while back -- can you take another look?

perimosocordiae · 2024-06-04T18:52:20Z

This looks ready to merge from my end. @stefanv feel free to push the button if you agree.

dschult · 2024-06-06T14:49:03Z

Thoughts for release notes:
In New features section for sparse:

sparse arrays now support indexing for 1D-arrays using ints, slices and arrays.
Methods min, max, nanmin, nanmax return 1D COO arrays when axis
is specified.

Getting ahead of myself, but feedback is welcome:
Once the migration guide document is available we can make a Highlights entry something like:

sparse arrays are now fully functional replacements for sparse matrices.
Consider converting your code to use sparse arrays. See migration_to_sparray.rst
for guidelines to this process and open an issue if you have questions. We have
dedicated reviewer bandwidth to help with these conversion over the next two releases.

stefanv · 2024-06-19T19:15:01Z

scipy/sparse/_csr.py

+            if spot.size:
+                return self.data[spot[0]]
+            return self.data.dtype.type(0)
+        raise IndexError(f'index ({idx}) out of range')


It's a bit strange to have the error removed from the check so far; it's not necessary to change for this PR, but I think it'd be more intuitive to write:

if (idx < 0) or (idx >= self.shape[0]): raise IndexError(...)

Is the current if check correct? It has <= in both positions, but presumably idx should be strictly less than self.shape[0].

What about negative indices?

I agree with your style comment -- and I agree that the <= should be < self.shape[0].

Negative indices are handled in the _validate_indices method.

Which reminded me that _validate_indices also checks the size of idx, so there is no need to check it here. That is why the incorrect <= check wasn't flagged by the existing test in test_indexing1d.py::test_getelement for index values just beyond the limits of shape.

Thanks for flagging this -- I have removed this "if/raise" statement as it is redundant with logic in _validate_indices.

stefanv · 2024-06-19T19:17:00Z

scipy/sparse/_dok.py

+        return self._get_array(list(i_range))
+
+    def _get_array(self, idx):
+        idx = np.asarray(idx)


This call can blow up the range, is that something we are worried about?

This is how the method _asindices handles the input idx. I can guess that we aren't worried about it blowing up because we don't support sparse non-boolean arrays as indices, and we handle boolean arrays (in _validate_indices) by converting them using ix.nonzero().

stefanv · 2024-06-19T19:19:15Z

scipy/sparse/_index.py

+            if res.shape == () and new_shape != ():
+                if len(new_shape) == 1:
+                    return self.__class__([res], shape=new_shape, dtype=self.dtype)
+                if len(new_shape) == 2:
+                    return self.__class__([[res]], shape=new_shape, dtype=self.dtype)


This is probably fine, but I don't 100% get which case we're handling here.

I've added a comment indicating that this is to handle multiple np.newaxis indices, i.e. A[3, 4, None, None].

stefanv · 2024-06-19T19:21:08Z

scipy/sparse/_index.py

+                N = len(idx_range)
+                if N == 1 and x.size == 1:
+                    self._set_int(idx_range[0], x.flat[0])
+                idx = np.arange(*idx.indices(self.shape[0]))


This does densify the range above; is that OK?

I was basing this idiom on the 2D case from this module, and get_arrayXslice in _csr.py. I think the comment above isn't clear. We use a Python range to check for the special case when the slice has a single element -- and we call _set_int instead of _set_array. Part of the lack of clarity is that this shortcut should be finished with a return and it isn't. So, in fact the code in the PR was setting the element once in _set_int and then again in _set_array later on. That's my mistake. And the tests don't catch it because it is just doing extra work -- not creating a wrong answer.

I have added a comment and a return statement to make this more clear, and actually take the shortcut when we can.

stefanv · 2024-06-19T19:25:08Z

scipy/sparse/_index.py

+def _compatible_boolean_index(idx, desired_ndim):
+    """Check for boolean array or array-like. peek before asarray for array-like"""
+    # assume already an array if attr ndim exists: skip to bottom
+    if not hasattr(idx, 'ndim'):


This feels a bit risky; but not sure there's any single attribute that would tell us whether we are dealing with a numpy array or any array-like derivative. Maybe .__array__, but again not sure what external arrays would implement; we'd perhaps have to check with someone who works on the Array API.

I agree that it feels risky -- but it is following the logic of the previous code (the removed line 384 below) which used ndim as an indicator of a compatible array type.

I will look for a more appropriate check for compatible array types, but I think that should be a separate PR.
I have added/improved comments explaining that this assumption is being made. Also for better clarity, I added return None at the end of this function to be more clear that returning None carries information here .

remove superfluous check of idx add multple comments to clarify logic add clarifying `return None` at end of _compatible_boolean_index

stefanv · 2024-06-20T23:53:22Z

Thank you, @dschult!

dschult requested a review from perimosocordiae as a code owner February 20, 2024 18:43

github-actions bot added scipy.sparse Meson Items related to the introduction of Meson as the new build system for SciPy enhancement A new feature or improvement labels Feb 20, 2024

lucascolley removed the Meson Items related to the introduction of Meson as the new build system for SciPy label Feb 20, 2024

j-bowhay reviewed Feb 20, 2024

View reviewed changes

scipy/sparse/tests/test_arithmetic1d.py Outdated Show resolved Hide resolved

tylerjereddy reviewed Mar 12, 2024

View reviewed changes

scipy/sparse/tests/test_arithmetic1d.py Outdated Show resolved Hide resolved

dschult added 7 commits March 17, 2024 21:53

add csr 1d and test_arithmetic1d

96e24f3

clean up after rebase of _mul to _matmul namechange

e13323d

change matmul to produce 0d with vec @ vec. Adjust tests. clean up co…

ea55021

…mments

Add tests for 1d@1d. Fix 1d on the right of @. Slow to convert to csc

23b288d

update _index.py for 1d, refactor _validate_indices, add tests.

969631e

Move TestGetSet1d to indexing1d.py. Simplify helper functions.

convert assert tests to np.testing assert_equal and assert_allclose

2c6d4a8

dschult force-pushed the index-1d branch from 4662244 to 2c6d4a8 Compare March 18, 2024 02:01

dschult added 3 commits March 17, 2024 22:23

lint fixes

861da67

lint fixes

b72a0b2

fix error msg issue due to lint fix

a23880b

dschult added 8 commits March 18, 2024 16:04

add DOK support for 1D indexing.

e208dfc

add csr 1d and test_arithmetic1d

5ce91ef

clean up after rebase of _mul to _matmul namechange

fd8359d

change matmul to produce 0d with vec @ vec. Adjust tests. clean up co…

f0200f0

…mments

Add tests for 1d@1d. Fix 1d on the right of @. Slow to convert to csc

13f2088

fixup _coo_to_compressed from setdiag fix to work for 1d

394eef0

fix _coo.py review comments; add copy flag to 1D tocsr

c6e1d35

fix other review comments

4596915

perimosocordiae reviewed Mar 22, 2024

View reviewed changes

scipy/sparse/_index.py Outdated Show resolved Hide resolved

perimosocordiae reviewed Mar 22, 2024

View reviewed changes

scipy/sparse/_index.py Outdated Show resolved Hide resolved

rewrite fin new_shape logic. make copy of coords if asked.

e345c2c

dschult added 4 commits May 7, 2024 10:45

Merge branch 'main' into index-1d

ef0a3c8

remove override of get/setitem in csr

7add9a3

handle style changes

032b1e0

update _get_int and friends in compressed

59e6ec4

ilan-gold mentioned this pull request May 27, 2024

ENH: sparse: first pass at array API standard compat #20190

Draft

Merge branch 'main' into index-1d

82a0118

lucascolley added the needs-release-note a maintainer should add a release note written by a reviewer/author to the wiki label Jun 3, 2024

Update test_common1d.py

4e97dac

perimosocordiae requested changes Jun 3, 2024

View reviewed changes

scipy/sparse/_csr.py Outdated Show resolved Hide resolved

scipy/sparse/_csr.py Outdated Show resolved Hide resolved

scipy/sparse/_dok.py Outdated Show resolved Hide resolved

scipy/sparse/_dok.py Outdated Show resolved Hide resolved

scipy/sparse/_dok.py Outdated Show resolved Hide resolved

adopt review suggestions

ae4a3c9

perimosocordiae approved these changes Jun 4, 2024

View reviewed changes

implement suggestions

f11c40d

perimosocordiae approved these changes Jun 4, 2024

View reviewed changes

stefanv reviewed Jun 19, 2024

View reviewed changes

address review

586785a

remove superfluous check of idx add multple comments to clarify logic add clarifying `return None` at end of _compatible_boolean_index

stefanv merged commit 66ec333 into scipy:main Jun 20, 2024

larsoner mentioned this pull request Jun 21, 2024

BUG: Indexing broken for sparse arrays #21016

Closed

dschult mentioned this pull request Jun 22, 2024

BUG: sparse: Fix advanced indexing using both slice and array #21022

Merged

MridulS mentioned this pull request Jul 2, 2024

FIX: scipy 1d indexing tripped up numpy? networkx/networkx#7541

Merged

dschult deleted the index-1d branch July 9, 2024 18:03

lucascolley removed the needs-release-note a maintainer should add a release note written by a reviewer/author to the wiki label Jun 9, 2025

Uh oh!

ENH: sparse: Add indexing for 1D arrays #20120

ENH: sparse: Add indexing for 1D arrays #20120

Uh oh!

Conversation

dschult commented Feb 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dschult commented Mar 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dschult commented Jun 4, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dschult commented Jun 4, 2024

Uh oh!

perimosocordiae commented Jun 4, 2024

Uh oh!

dschult commented Jun 6, 2024

Uh oh!

stefanv Jun 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dschult Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stefanv commented Jun 20, 2024

Uh oh!

Uh oh!

dschult commented Feb 20, 2024 •

edited

Loading

dschult commented Mar 18, 2024 •

edited

Loading

stefanv Jun 19, 2024 •

edited

Loading

dschult Jun 20, 2024 •

edited

Loading