Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

For ticket #1075 #425

Merged
merged 38 commits into from

5 participants

Daniel Smith Ralf Gommers Pauli Virtanen Jake Vanderplas Warren Weckesser
Daniel Smith

...test

to confirm 1075 was fixed and fixed slicing tests.

I fixed TestFancyIndexing under test_base.py, added some code into lil_matrix.__setitem_ and getitem to improve the handling of numpy arrays as indexes, and reorganized the code to allow for fancy indexing. If at least one of the two indexes is a scalar or a slice, the code runs exactly the same as before. However, if the code is two lists, the code is changed to set items just like a 2-d numpy array.

Daniel Smith added some commits
Ralf Gommers
Owner

Thanks for working on that issue Daniel.

It's on the 0.12.0 Milestone, so it would be great if someone could have a look at this and if it looks good, get it merged by Saturday. Any takers?

Pauli Virtanen
Owner
pv commented

I had this lying around, might be of interest here: https://github.com/pv/scipy-work/commits/bug/sparse-fixes

IIRC, fixing LIL indexing breaks backwards compatibility, and this needs to be mentioned in the release notes.

Jake Vanderplas
Collaborator

I'll do a full review by Saturday - I'm pretty swamped for the next 24 hrs or so, though.

Pauli Virtanen
Owner
pv commented

This passes the extended tests (with sparse_test_class() in gh-426), except:

S = lil_matrix(np.random.rand(5, 7))
S[1:4]  # <- fails

I = np.array([[1, 2, 3], [3, 4, 2]])
J = np.array([[5, 6, 3], [2, 3, 1]])
S[I,J] # <- fails

S[1:5,0] = range(1, 5) # <- fails

(The present LIL of course passes fewer of these tests.)

Daniel Smith

I think I have fixed the class to pass those two tests. I had a little bit of trouble running the new test suite, but I didn't break anything in the old test suite.

Daniel Smith
Daniel Smith
Daniel Smith Integrated test_base.py from gh-426, added tests for new indexing and…
… added

matrix indexing functioning.
6814b7f
Daniel Smith

As the commit log mentions, I integrated gh-426#426 with the new tests that I added. While I won't swear by including every use case, I think I got a good portion of them. I fixed the two bugs that pv pointed out.

I also added functionality so that:

A[ [[1, 2, 3], [4, 5, 6]], [1, 2, 3]] = 1

works. I looked into trying to replicate the behavior for putting an array on the right hand side of the above assignment, but numpy gives surprising and unintuitive results for that, so I just raise a NotImplementedError.

Pauli Virtanen
Owner
pv commented

Assignment with (multidimensional) index arrays is defined as

# A[i,j] = b
if np.issubdtype(i, np.bool_):
    i = np.nonzero(i)[0]
if np.issubdtype(j, np.bool_):
    j = np.nonzero(j)[0]
i, j, b = np.broadcast_arrays(i, j, b)
for ix, jx, bx in zip(i.ravel(), j.ravel(), b.ravel()):
    A[ix,jx] = bx

If you mix slices in this, things become a bit more complicated.

Jake Vanderplas
Collaborator

I'm seeing three test errors that seem to be related to this:

======================================================================
ERROR: test_fancy_indexing_randomized (test_base.TestCSC)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vanderplas/Opensource/scipy/scipy/sparse/tests/test_base.py", line 1153, in test_fancy_indexing_randomized
    assert_raises(IndexError, S.__getitem__, ([I, I], slice(None)))
  File "/home/vanderplas/PythonEnv/pydev/local/lib/python2.7/site-packages/numpy/testing/utils.py", line 1008, in assert_raises
    return nose.tools.assert_raises(*args,**kwargs)
  File "/usr/lib/python2.7/unittest/case.py", line 471, in assertRaises
    callableObj(*args, **kwargs)
  File "/home/vanderplas/PythonEnv/pydev/local/lib/python2.7/site-packages/scipy/sparse/csc.py", line 153, in __getitem__
    return self.T[col,row].T
  File "/home/vanderplas/PythonEnv/pydev/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 246, in __getitem__
    P = extractor(col,self.shape[1]).T        #[1:2,[1,2]]
  File "/home/vanderplas/PythonEnv/pydev/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 224, in extractor
    return csr_matrix((data,indices,indptr), shape=shape)
  File "/home/vanderplas/PythonEnv/pydev/local/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 85, in __init__
    self.check_format(full_check=False)
  File "/home/vanderplas/PythonEnv/pydev/local/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 133, in check_format
    raise ValueError('data, indices, and indptr should be rank 1')
ValueError: data, indices, and indptr should be rank 1

======================================================================
ERROR: test_fancy_indexing_randomized (test_base.TestCSR)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vanderplas/Opensource/scipy/scipy/sparse/tests/test_base.py", line 1153, in test_fancy_indexing_randomized
    assert_raises(IndexError, S.__getitem__, ([I, I], slice(None)))
  File "/home/vanderplas/PythonEnv/pydev/local/lib/python2.7/site-packages/numpy/testing/utils.py", line 1008, in assert_raises
    return nose.tools.assert_raises(*args,**kwargs)
  File "/usr/lib/python2.7/unittest/case.py", line 471, in assertRaises
    callableObj(*args, **kwargs)
  File "/home/vanderplas/PythonEnv/pydev/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 252, in __getitem__
    P = extractor(row, self.shape[0])        #[[1,2],j] or [[1,2],1:2]
  File "/home/vanderplas/PythonEnv/pydev/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 224, in extractor
    return csr_matrix((data,indices,indptr), shape=shape)
  File "/home/vanderplas/PythonEnv/pydev/local/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 85, in __init__
    self.check_format(full_check=False)
  File "/home/vanderplas/PythonEnv/pydev/local/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 133, in check_format
    raise ValueError('data, indices, and indptr should be rank 1')
ValueError: data, indices, and indptr should be rank 1

======================================================================
ERROR: test_set_slice (test_base.TestLIL)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vanderplas/Opensource/scipy/scipy/sparse/tests/test_base.py", line 1000, in test_set_slice
    A[0,:] = list(range(100))
  File "/home/vanderplas/PythonEnv/pydev/local/lib/python2.7/site-packages/scipy/sparse/lil.py", line 427, in __setitem__
    self._setitem_setrow(row, data, j, xrow, xdata, xcols)
  File "/home/vanderplas/PythonEnv/pydev/local/lib/python2.7/site-packages/scipy/sparse/lil.py", line 336, in _setitem_setrow
    raise IndexError('invalid index')
IndexError: invalid index
Jake Vanderplas jakevdp commented on the diff
scipy/sparse/tests/test_base.py
@@ -839,29 +842,19 @@ def test_get_slices(self):
assert_array_equal(E[1:2, 1:2], F[1:2, 1:2].todense())
assert_array_equal(E[:, 1:], F[:, 1:].todense())
+ def test_non_unit_stride_2d_indexing(self):
+ # Regression test -- used to silently ignore the stride.
Jake Vanderplas Collaborator
jakevdp added a note

This could be misinterpreted - maybe say "previously a non-unit stride was silently ignored"

Also, because this is an API change that could break some code, it should be mentioned in the change notes.

Pauli Virtanen Owner
pv added a note

Good catch. This change was made in fa373d0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
scipy/sparse/tests/test_base.py
((15 lines not shown))
+ s = slice(int8(2),int8(4),None)
+ assert_equal(A[s,:].todense(), B[2:4,:])
+ assert_equal(A[:,s].todense(), B[:,2:4])
+
+
+class _TestSlicingAssign:
+ def test_slice_scalar_assign(self):
+ A = self.spmatrix((5, 5))
+ B = np.zeros((5, 5))
+ for C in [A, B]:
+ C[0:1,1] = 1
+ C[3:0,0] = 4
+ C[3:4,0] = 9
+ C[0,4:] = 1
+ C[3::-1,4:] = 9
+ assert_array_equal(A.A, B)
Jake Vanderplas Collaborator
jakevdp added a note

Perhaps it's just my personal preference, but I find writing A.toarray() more clear than A.A, especially when the matrix itself is called A. Would you mind changing those?

Pauli Virtanen Owner
pv added a note

Sorry, my bad. The use of some_array.A is what used to be in the whole file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Jake Vanderplas jakevdp commented on the diff
scipy/sparse/tests/test_base.py
((62 lines not shown))
+ B[2,1] = 7
+
+ expected = array([[10,0,0],
+ [0,0,6],
+ [0,14,0],
+ [0,0,0]])
+
+ B[:,:] = B+B
+ assert_array_equal(B.todense(),expected)
+
+ block = [[1,0],[0,4]]
+ B[:2,:2] = csc_matrix(array(block))
+ assert_array_equal(B.todense()[:2,:2],block)
+
+ def test_set_slice(self):
+ A = self.spmatrix((5,10))
Jake Vanderplas Collaborator
jakevdp added a note

I'd like to see the first set of tests in test_set_slice use a loop for more clarity; e.g.

left_slices = [slice(None), 1, slice(None)]
right_slices = [0, slice(None), slice(None)]
values = [1, 2, 3]
for (ls, rs, v) in zip(left_slices, right_slices, values):
    A[ls, rs] = v
    B[ls, rs] = v
    assert_array_equal(A.toarray(), B)
Pauli Virtanen Owner
pv added a note

All this is mostly just moving the existing tests there around. A cleanup would be nice, but maybe out of scope for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
scipy/sparse/tests/test_base.py
((114 lines not shown))
+ except ValueError:
+ caught += 1
+ try:
+ A[0,:] = list(range(100))
+ except ValueError:
+ caught += 1
+ try:
+ A[:,1] = list(range(100))
+ except ValueError:
+ caught += 1
+ try:
+ A[:,1] = A.copy()
+ except:
+ caught += 1
+ assert_equal(caught,5)
+
Jake Vanderplas Collaborator
jakevdp added a note

Testing for exceptions could use a loop similar to above. I'd also prefer using numpy.testing.assert_raises so that any failure can be traced to one particular statement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
scipy/sparse/tests/test_base.py
((40 lines not shown))
+ (slice(2, 5), slice(5, -2))]:
+ _test_set_slice(i, j, 9)
+ for i, j in [(np.arange(3), np.arange(3)), ((0, 3, 4), (1, 2, 4))]:
+ _test_set_slice(i, j, 3)
+ # [[[1, 2], [1, 2]], [1, 2]]
+ for i, j, c in [(np.array([[1, 2], [1, 3]]), [1, 3], 3),
+ (np.array([0, 4]), [[0, 3], [1, 2]], 4),
+ ([[1, 2, 3], [0, 2, 4]], [[0, 4, 3], [4, 1, 2]], 6)]:
+ _test_set_slice(i, j, c)
+
+ A = self.spmatrix((5, 5))
+ assert_raises(NotImplementedError, A.__setitem__,
+ ([[1, 2], [0, 1]], [1, 2]), range(2))
+ assert_raises(IndexError, A.__setitem__,
+ ([[1, 2], [0, 1]], slice(None)), 1)
+
Jake Vanderplas Collaborator
jakevdp added a note

This test could be better written using a test generator, though I seem to recall that test generators for some reason don't work within classes. Does anybody know about that?

Pauli Virtanen Owner
pv added a note

They don't work with classes inheriting from TestCase. You can inherit from object, and then they work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
scipy/sparse/tests/test_base.py
@@ -1564,13 +1774,13 @@ class TestLIL( _TestCommon, _TestHorizSlicing, _TestVertSlicing,
B[3,0] = 10
- @dec.knownfailureif(True, "Fancy indexing is known to be broken for LIL" \
- " matrices")
+ #@dec.knownfailureif(True, "Fancy indexing is known to be broken for LIL" \
+ # " matrices")
def test_fancy_indexing_set(self):
Jake Vanderplas Collaborator
jakevdp added a note

The commented-out decorators could probably be removed rather than uncommented - I think this issue is fixed now! Same below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Jake Vanderplas
Collaborator

Daniel - very nice work. Thanks for spending the time on this! The LIL improvements look good, and the more complete test suite is really nice. I have a few comments on the diff, and a few outstanding test failures. Once these are addressed, I would be fine to merge it!

Pauli Virtanen
Owner
pv commented

@Daniel-B-Smith: it's usually best to avoid mixing in code from different pull requests unless this is absolutely necessary, and then it's best to do it by rebasing or merging to preserve the commits, so that the logical structure is preserved. Here, I had some trouble seeing which tests were added to test the new LIL functionality.

Cleaned-up branch with the new commits on top of gh-426 is here: https://github.com/pv/scipy-work/commits/sparse-test-reorg-425

You may want to base further work on that git branch my-backup master; git reset --hard pv/sparse-test-reorg-425 (makes a backup branch, and resets the current one to point to sparse-test-reorg-425; this also resets any uncommitted changes)

***

Anyway, this looks mostly good, and passes the test suite.

Some comments however:

  • The aim is to support exactly the same operations as numpy dense array indexing. There are some quirks there, so we don't want to add tests in the test suite that check for failures when the dense array indexing would work.

  • Some code duplication in fancy indexing might be avoidable by using the above broadcast_arrays + ravel trick.

Daniel Smith

Thank you for the comments pv and jakevdp. I will base further work on the branch pv posted. Unfortunately, the behavior is slightly more complicated than the ravel trick posted, but I think I only need a few lines to process the input before sending it to ravel. That will definitely save some code.

First, I will work through the diffs above and add tests for the setting with an array and Boolean indices. My apologies for the multiple changes in the commit. I thought that since the tests and the LIL changes were connected that was safe. I will separate the commits this time.

I should probably spend more time on my real job this week, but after that, should I try to work on the CSR and CSC cases?

Also, thank you for spending time reviewing and critiquing my code. I'm new to open source things, and I know this takes up just as much of your time.

Thanks,
Daniel

Pauli Virtanen
Owner
pv commented

@Daniel-B-Smith: thanks for spending time on this --- it's very helpful, and this functionality has been missed. If you want to add support for full indexing features also to CSR and CSC, that would be quite useful as these matrix classes are used very often.

Jake Vanderplas
Collaborator

Daniel - sounds good - and if you can look at CSR and CSC after this, that would be awesome! This type of improvement is really important and will be useful to a lot of scipy users. And no worries if it it takes some time: I know how real jobs can get in the way of this sort of stuff :smiley:

Daniel Smith added some commits
Daniel Smith Updated test suite based on pull request comments. Still needs fixing at
line 1227 where C is undefined.
7b71477
Daniel Smith Updated code using ravel() and now able to set values based on an arr…
…ay. A

tradeoff has to be made between code duplication and memory usage. Making
everything into the same shape would be the most elegant, but A[:, :] = 1
would generate three arrays (i, j, x) all having the same size as A. If there
is desire to simplify the code, that can be easily done.
77494f2
Daniel Smith

The code has been updated. I made the simple changes (A.A and deleting the commented decorators) mentioned above to the test suite. I also completely rewrote setitem. There are still some tests that fail. At line 1227, I'm not sure what you are trying to test, so I left that alone. The other failure comes from whether you want lil_matrix to behave like a NumPy matrix or NumPy array. The two behaviors, are unfortunately in conflict. In particular, we have:

B = np.zeros((3, 3))
B[:, 0] = array([0, 0, 0])
B = np.asmatrix(B)
B[:, 0] = matrix([[0], [0], [0]])

For the first behavior, we can write:

i0 = [0, 1, 2]
B[i0, 0] = B[0, i0]

but we can't for the latter.

I left the code bringing up an error there, because I think it exposes a deeper problem. Without looking through the test code, I think there is some inconsistencies in there. The problem is that, after my changes, getitem acts like an array and setitem acts as a matrix. I wanted to rewrite getitem anyway, but I expect that it is now going to make test_base.py very angry.

Daniel Smith

Correction, 'getitem' is working fine, but 'setitem' is not working. And it makes sense that the test coverage is poor because that feature didn't exist until today. I think I've been staring at code too long for today. I'm now seeing a couple of errors about how 'setitem' is not working, so I'm going to let that code sit.

I did rewrite 'getitem' to make the code more readable and reduce code duplication. I'll commit that in a minute.

Daniel Smith added some commits
Daniel Smith Rewrote __getitem__ to be more readable and reduce code duplication. ccbcf41
Daniel Smith Copied code from __getitem__ to __setitem__ . This fixes one error in
test_base and brings up a new error:

line 1172
A[1:3, 1] = [[10], [20]]
now works

line 984
A[1:5, 0] = range(1, 5)
is now broken

line 984 does not work for NumPy matrix objects, so I think that test should
be deleted. I'm leaving that alone for now.
1128a3b
Daniel Smith Changed most of the try: ... except *Error: ... blocks
into assert_raises(*Error)
e5a2200
Daniel Smith Fixed and expanded test_fancy_assign_ndarray 41c056a
Daniel Smith Expanded _TestFancyMultidimAssign to ensure proper test coverage ee603c7
Daniel Smith Expanded the test coverage for slices 3c26e68
Daniel Smith

With those commits, this should be ready to go. I fixed the repeated try: ... excepts:... to be assert_raises, made sure comparisons for multi-dimensional array comparisons were being done against NumPy matrices. Finally, I expanded the test coverage in _TestFancyMultidimAssign such that I am now convinced that the LIL class is fully replicating the relevant NumPy matrix class behavior. The only use case I can think of that aren't covered is setting the matrix using a slice against against a 3-dim array. Someone setting a matrix with a 3-d object seems pretty surprising, so I'm comfortable not testing that use case. I can certainly add it if you think it would be useful.

Once this is integrated. I will create an updated fork and go to work on DOK/CSC/CSR. Thank you for all of your help.

Ralf Gommers
Owner

Still five test failures:

======================================================================
ERROR: test_fancy_indexing_randomized (test_base.TestCSC)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/tests/test_base.py", line 1119, in test_fancy_indexing_randomized
    assert_raises(IndexError, S.__getitem__, ([I, I], slice(None)))
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/numpy/testing/utils.py", line 1008, in assert_raises
    return nose.tools.assert_raises(*args,**kwargs)
  File "/usr/lib/python2.7/unittest/case.py", line 471, in assertRaises
    callableObj(*args, **kwargs)
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/csc.py", line 153, in __getitem__
    return self.T[col,row].T
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 246, in __getitem__
    P = extractor(col,self.shape[1]).T        #[1:2,[1,2]]
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 224, in extractor
    return csr_matrix((data,indices,indptr), shape=shape)
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 85, in __init__
    self.check_format(full_check=False)
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 133, in check_format
    raise ValueError('data, indices, and indptr should be rank 1')
ValueError: data, indices, and indptr should be rank 1

======================================================================
ERROR: test_fancy_indexing_set (test_base.TestCSC)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/tests/test_base.py", line 1080, in test_fancy_indexing_set
    _test_set_slice(i, j, 3)
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/tests/test_base.py", line 1070, in _test_set_slice
    A[i, j] = 1
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 468, in __setitem__
    raise NotImplementedError("Fancy indexing in assignment not "
NotImplementedError: Fancy indexing in assignment not supported for csr matrices.

======================================================================
ERROR: test_fancy_indexing_ndarray (test_base.TestCSR)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/tests/test_base.py", line 1375, in wrapper
    return fc(*a, **kw)
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/tests/test_base.py", line 1179, in test_fancy_indexing_ndarray
    assert_raises(IndexError, S.__geitem___, (I, slice(None)))
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/base.py", line 400, in __getattr__
    raise AttributeError(attr + " not found")
AttributeError: __geitem___ not found

======================================================================
ERROR: test_fancy_indexing_randomized (test_base.TestCSR)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/tests/test_base.py", line 1119, in test_fancy_indexing_randomized
    assert_raises(IndexError, S.__getitem__, ([I, I], slice(None)))
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/numpy/testing/utils.py", line 1008, in assert_raises
    return nose.tools.assert_raises(*args,**kwargs)
  File "/usr/lib/python2.7/unittest/case.py", line 471, in assertRaises
    callableObj(*args, **kwargs)
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 252, in __getitem__
    P = extractor(row, self.shape[0])        #[[1,2],j] or [[1,2],1:2]
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 224, in extractor
    return csr_matrix((data,indices,indptr), shape=shape)
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 85, in __init__
    self.check_format(full_check=False)
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 133, in check_format
    raise ValueError('data, indices, and indptr should be rank 1')
ValueError: data, indices, and indptr should be rank 1

======================================================================
ERROR: test_fancy_indexing_set (test_base.TestCSR)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/tests/test_base.py", line 1080, in test_fancy_indexing_set
    _test_set_slice(i, j, 3)
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/tests/test_base.py", line 1070, in _test_set_slice
    A[i, j] = 1
  File "/home/rgommers/.tox/scipy/py27/local/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 468, in __setitem__
    raise NotImplementedError("Fancy indexing in assignment not "
NotImplementedError: Fancy indexing in assignment not supported for csr matrices.
pv added some commits
Pauli Virtanen pv TST: sparse: more sparse-sparse assignment tests 79f4cad
Pauli Virtanen pv BUG: sparse/lil: simplify __setitem__ and fix bugs in it
The sparse matrix code path always uses .toarray(), so it can be
scrapped.
ce05e90
Pauli Virtanen pv BUG: sparse/lil: use isscalarlike instead
Also remove an indexing warning --- currently all code paths have this
efficiency issue...
8ea81ac
Pauli Virtanen
Owner
pv commented

There are typos left in code in untested and some tested branches. You can run scipy.sparse.test("full", verbose=2, raise_warnings="develop", coverage=True, extra_argv=["--cover-html", "--cover-html-dir=coverhtml"]) to get coverage information, if you have Python's coverage module installed.

I think that it's possible to still simplify the indexing code somewhat. This refactoring & typo fixes & more additional tests here: https://github.com/pv/scipy-work/commits/oth/Daniel-B-Smith/master
pv@3c26e68...oth/Daniel-B-Smith/master

The previous LIL matrix indexing implementation included a special case for LIL-LIL slice assignment, which would ignore zero entries (I may be mistaken here). In this PR, all code paths for a[i,j] = b convert b to a dense matrix. For efficiency reasons it might be useful to look if this special-case code can be resurrected... OTOH, the old __getitem__ code didn't contain such optimizations, so it's not clear how much this mattered in practice.

Because the fancy indexing/assignment as implemented in this PR always resolves more or less to element-by-element get/set, this implementation could actually be used for all sparse matrix classes that allow setting and getting single entries. It might be worthwhile to factor it out to helper routines. The special cases that can be implemented more efficiently in the different matrix formats could then be handled as special cases, rather than having the present situation with only partial indexing implementations. (Out of scope for this PR, getting it to work for LIL first is enough, I think.)

scipy/sparse/lil.py
((48 lines not shown))
else:
- raise IndexError
+ i = np.atleast_2d(i)
+ j = np.atleast_2d(j)
+ # Make i and j into same shape
Warren Weckesser Collaborator

Align the comment with the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Warren Weckesser
Collaborator

It is great to see work being done on this issue. Thanks Daniel!

The code at the beginning of __getitem__ needs work. It gets fooled by something like b[[0,1]]. That refers to the first two rows of b, not b[0,1]. E.g.:

In [23]: b = lil_matrix([[0,0,1,2,0],[3,4,0,0,0],[0,0,0,5,0],[6,0,0,0,0]])

In [24]: b.toarray()
Out[24]: 
array([[0, 0, 1, 2, 0],
       [3, 4, 0, 0, 0],
       [0, 0, 0, 5, 0],
       [6, 0, 0, 0, 0]])

In [25]: b[[0,1]]
Out[25]: 0

In [26]: a = b.toarray()

In [27]: a[[0,1]]
Out[27]: 
array([[0, 0, 1, 2, 0],
       [3, 4, 0, 0, 0]])

This is not a regression (i.e. that's how it currently works), but it would be nice to get all the indexing behavior aligned with the ndarray, if possible.

Daniel B. Smith added some commits
Daniel B. Smith Turned on remaining tests for LIL and fixed indexing for multi-index
tests.
cb42901
Daniel B. Smith Changed name of test_fancy_indexing defined inside TestLIL to
test_fancy_indexing_lil to prevent the method in _TestFancyIndexing from
getting overridden.
a48089f
Daniel B. Smith More test fixes. Sparse matrices weren't broadcast to arrays. 066072a
Daniel B. Smith Fixed _index_to_arrays to handle cases such as b[[1, 3]] and for the
row vector special case.
4ca491f
Daniel Smith

Thank you very much for all your work on this pv. I had never heard of the coverage package, but it seems very useful. I turned the two remaining test sets on and fixed one function that got overridden. It looks like the test coverage is pretty good now. The row view section needs coverage.

Warren, I will also work on that issue. There was a test for that in test_base.py, but it was getting overridden in the TestLIL class and not run. That test actually brings up an issue I mentioned above. Do we want sparse matrices to behave like NumPy ndarrays or matrices? The tests seem to have a mixture of the two, and they have very different behaviors in terms of row vector indices. In the commit above, I added row vector behavior, which causes some tests to no longer give an error but breaks others. Is this something that I should bring up to the developer mailing list? I think the first priority before fixing anything else needs to be making that decision and making the tests consistent with that.

Daniel Smith

I went back through and changed all the comparisons to be with NumPy matrices and did some more cleaning of the tests. The class now passes all of the tests as written. Remaining issues:

  • getrowview and a number of the exceptions have no test coverage. Not necessarily relevant to this PR, but needs fixing.

  • The question of pv's suggested row slice optimization. For setting an individual row, getrowveiw would do exactly that, and the code for that is still there.

I think factoring this up to work for all of the sparse matrix classes with individual item setting/getting is a great idea, but is also probably a next step beyond this PR.

Please let me know what you think.

Pauli Virtanen

Commented-out code should be removed.

Pauli Virtanen

This check should be the other way around --- i.e., only tuples should be treated as a pair of indices (i, j)

The list isn't being treated as a pair of indices. Lists are being treated as the first index. Without the checks in this order, the incorrect behavior Warren commented on would remain.

The old code treated A[ [i, j] ] = A[i, j]. Testing for a list first ensures that A[ [i, j] ] = A[ [i, j], :].

Owner

What I meant is that you also need A[some_sequence] behave like A[[i,j]] --- the i, j = index unpacking should be guarded by isinstance(index, tuple) as A[i,j] <-> A[(i, j)] as per Python semantics.

It's true that lists and ndarrays catch most cases, but we can as well be thorough here.

Ah, I misunderstood your comment. The commit I forgot to push this morning covers most cases, but I can switch the order real quick.

Pauli Virtanen

Why is this change made? Seems equivalent to (J + 7).tolist(), but shouldn't it work also as an array?

The change was made because I and J are lists, and it was bringing up an error. I could have also made I and J into an array and used tolist(). I did it this way because they were started as lists, and this is consistent.

Pauli Virtanen pv commented on the diff
scipy/sparse/tests/test_base.py
((64 lines not shown))
+ assert_almost_equal(A.sum(), 6)
+ B = asmatrix(np.zeros((4, 3)))
+ B[(1, 2, 3), (0, 1, 2)] = [1, 2, 3]
+ assert_array_equal(A.todense(), B)
+
+class _TestFancyMultidim:
+ def test_fancy_indexing_ndarray(self):
+ sets = [
+ (np.array([[1], [2], [3]]), np.array([3, 4, 2])),
+ (np.array([[1], [2], [3]]), np.array([[3, 4, 2]])),
+ (np.array([[1, 2, 3]]), np.array([[3], [4], [2]])),
+ (np.array([1, 2, 3]), np.array([[3], [4], [2]])),
+ (np.array([[1, 2, 3], [3, 4, 2]]),
+ np.array([[5, 6, 3], [2, 3, 1]]))
+ ]
+ # These inputs generate 3-D outputs
Pauli Virtanen Owner
pv added a note

Commented-out code should be removed.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
scipy/sparse/tests/test_base.py
((179 lines not shown))
+
+ D = np.asmatrix(np.random.rand(5, 7))
+ S = self.spmatrix(D)
+ X = np.random.rand(2, 3)
+
+ I = [[1, 2, 3], [3, 4, 2]]
+ J = [[5, 6, 3], [2, 3, 1]]
+
+ I_bad = [[ii + 5 for ii in i] for i in I]
+ J_bad = [[jj + 7 for jj in j] for j in J]
+
+ C = [1, 2, 3, 4, 5, 6, 7]
+ assert_raises(IndexError, S.__setitem__, (I_bad, slice(None)), C)
+ assert_raises(IndexError, S.__setitem__, (slice(None), J_bad), C)
+
+ # The below code requires 3-D intermediates inside __setitem__
Pauli Virtanen Owner
pv added a note

Should be removed. Or, you can convert some of them to assert_raises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Daniel Smith

I forgot to push this to GitHub earlier. I think this handles the issue that you were concerned about. I could also protect the try with a tuple if you would prefer that.

Pauli Virtanen
Owner
pv commented

Thanks a lot for your help, I think this PR is about ready to merge. I have one Python 3 fix and one test fix in my branch that should come along, too.

Some special cases and fast paths could be useful to add (later on, possibly): some benchmarks

import numpy as np
from scipy.sparse import lil_matrix, rand
x = rand(10000,10000,0.001).tolil()

Now:

In []: %timeit x[32,:]
10 loops, best of 3: 113 ms per loop

In []: %timeit x[:,32]
1 loops, best of 3: 231 ms per loop

In []: %timeit x[np.arange(100),np.arange(100)]
1000 loops, best of 3: 1.52 ms per loop

In []: %timeit x[:,32] = 9
10 loops, best of 3: 92.6 ms per loop

Before:

In []: %timeit x[32,:]   # <- the old row-slicing fast path, apparently
10 loops, best of 3: 22.8 ms per loop

In []: %timeit x[:,32]
10 loops, best of 3: 129 ms per loop

In []: %timeit x[np.arange(100),np.arange(100)]  # <- the general case is the same speed
1000 loops, best of 3: 1.4 ms per loop

In []: %timeit x[:,32] = 9
10 loops, best of 3: 68.1 ms per loop

The new code is more general and there's some additional overhead --- possibly coming from using Numpy arrays rather than lists in for loops --- I'll take a look. Some micro-optimization for fast paths would probably give the factor of 2 back to the easy cases.

Pauli Virtanen
Owner
pv commented

Ok, found a micro-opt for the speed issue (try to guess what it is before looking at it, it's in my branch :)

In []: %timeit x[32,:]
10 loops, best of 3: 33.5 ms per loop

In []: %timeit x[:,32]
10 loops, best of 3: 154 ms per loop

In [13]: %timeit x[np.arange(100),np.arange(100)]
1000 loops, best of 3: 867 us per loop

In []: %timeit x[:,32] = 9
10 loops, best of 3: 52.3 ms per loop

I'll go ahead and merge this PR unless someone has still things to add.

Daniel Smith

I attempted to guess, but I completely spaced on integer arithmetic being that much faster.

Pauli Virtanen pv merged commit 7e866f0 into from
Pauli Virtanen
Owner
pv commented

Merged, thanks to everybody involved.

Boolean indexing is still missing --- this would probably not be too difficult to add.

Ralf Gommers
Owner

Question: the plan is to implement the currently missing features in the near future, right? I'm asking because I noticed we suddenly have >100 skipped tests, which looks quite odd.

Pauli Virtanen
Owner
pv commented

Not near future, but I think this is the long-term plan. It's of course possible to get rid of the skipped tests --- however, they are written so that if an AssertionError is raised, the test fails instead of skips, which aims to prevent something working in an incorrect way (e.g. the LIL matrix indexing before this fix) and acts as a reminder of what's not implemented.

Ralf Gommers
Owner

That's still useful then, so let's leave it as is. There's not really a good place to document where the SKIPs come from, so nothing to do then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Feb 7, 2013
  1. Fixed ticket #1075 for lil_matrix, added fancy slice indexing and add…

    Daniel Smith authored
    …ed test
    
    to confirm 1075 was fixed and fixed slicing tests.
  2. Attempted to fix bugs seen with sparse_test_class()

    Daniel Smith authored
  3. Fixed another thing for S[1:5, 0] = range(1, 5)y

    Daniel Smith authored
Commits on Feb 8, 2013
  1. Integrated test_base.py from gh-426, added tests for new indexing and…

    Daniel Smith authored
    … added
    
    matrix indexing functioning.
Commits on Feb 19, 2013
  1. Updated test suite based on pull request comments. Still needs fixing at

    Daniel Smith authored
    line 1227 where C is undefined.
  2. Updated code using ravel() and now able to set values based on an arr…

    Daniel Smith authored
    …ay. A
    
    tradeoff has to be made between code duplication and memory usage. Making
    everything into the same shape would be the most elegant, but A[:, :] = 1
    would generate three arrays (i, j, x) all having the same size as A. If there
    is desire to simplify the code, that can be easily done.
  3. Copied code from __getitem__ to __setitem__ . This fixes one error in

    Daniel Smith authored
    test_base and brings up a new error:
    
    line 1172
    A[1:3, 1] = [[10], [20]]
    now works
    
    line 984
    A[1:5, 0] = range(1, 5)
    is now broken
    
    line 984 does not work for NumPy matrix objects, so I think that test should
    be deleted. I'm leaving that alone for now.
Commits on Feb 28, 2013
  1. Changed most of the try: ... except *Error: ... blocks

    Daniel Smith authored
    into assert_raises(*Error)
  2. Fixed and expanded test_fancy_assign_ndarray

    Daniel Smith authored
  3. Expanded the test coverage for slices

    Daniel Smith authored
Commits on Mar 2, 2013
  1. Pauli Virtanen

    TST: sparse: clean up the test suite

    pv authored
    Move indexing tests to appropriate classes splitting some of them into
    parts. Remove duplicated tests. Remove dead code. Fix typos.
  2. Pauli Virtanen
  3. Pauli Virtanen

    BUG: sparse/lil: more robust _slicetoseq implementation

    pv authored
    The current failed for negative strides + start point past the end.
  4. Pauli Virtanen
  5. Pauli Virtanen
  6. Pauli Virtanen
  7. Pauli Virtanen
  8. Pauli Virtanen
  9. Pauli Virtanen
  10. Pauli Virtanen
  11. Pauli Virtanen
  12. Pauli Virtanen

    BUG: sparse/lil: simplify __setitem__ and fix bugs in it

    pv authored
    The sparse matrix code path always uses .toarray(), so it can be
    scrapped.
  13. Pauli Virtanen

    BUG: sparse/lil: use isscalarlike instead

    pv authored
    Also remove an indexing warning --- currently all code paths have this
    efficiency issue...
Commits on Mar 4, 2013
  1. Changed name of test_fancy_indexing defined inside TestLIL to

    Daniel B. Smith authored
    test_fancy_indexing_lil to prevent the method in _TestFancyIndexing from
    getting overridden.
  2. Fixed _index_to_arrays to handle cases such as b[[1, 3]] and for the

    Daniel B. Smith authored
    row vector special case.
Commits on Mar 7, 2013
  1. Daniel Smith
  2. Minor cleanup of test_base.py including changing a few comparison arrays

    Daniel Smith authored
    to test sparse matrix against NumPy matrix.
  3. More polishing of test_base.py

    Daniel Smith authored
  4. Commented out tests that required 3-D structures

    Daniel Smith authored
Commits on Mar 8, 2013
  1. Deleted commented out code.

    Daniel B. Smith authored
  2. Got rid of some code duplication in _index_to_arrays

    Daniel B. Smith authored
  3. Changed order of index checks

    Daniel B. Smith authored
Something went wrong with that request. Please try again.