-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Fancy indexing for CSR and CSC matrices. #2689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
scipy/sparse/compressed.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo ValueError, run pyflakes to check if there are more of those
|
I think it may be better to submit this without sparse boolean indexing, since it is taking a while to complete it. But all other fancy indexing is complete. I'll submit the sparse boolean indexing in my next PR. |
|
@cowlicks: why does the boolean indexing path need to be guarded by |
scipy/sparse/compressed.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is unnecessary --- this internal routine will always be called with scalar row and col?
|
@pv Yeah I noticed this last night. It almost works, and it is simpler than what I tried at first. But indexing with vector-like sparse matrices doesn't work yet. This is because |
|
I don't think indexing with vector-like sparse matrices should be special cased. They are in reality always 2-dimensional, |
|
So basically... Only indexing with boolean sparse matrices that are the same shape as the indexee should be supported? |
|
Yes, I think so, at least for this PR. Note that Numpy itself doesn't enforce shape restriction --- it just does The decision whether it's a good idea to try to start guessing the user's intent needs some more thought. Numpy itself is not consistent with how it treats 2-d arrays: I think Scipy should raise an error for the second case, for now. |
|
Anyway, other than that, looks good to me! |
|
I'm clueless on this build error in py 2.6. But 2.7 and 3.3 passed. edit: close & reopen fixed it. |
scipy/sparse/csr.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT:
What I said below was incorrect. Disregard.
This path is not necessary because we can do the same thing with extractor as _get_submatrix. But _get_submatrix is limited to slices with step = 1, however I left it in because, @Daniel-B-Smith did. I figured there must be some reason for this, like _get_submatrix being faster for step = 1 or something. Is this the case?
|
This introduces a failing test due to a bug in CSC's I will try to fix the bug causing this after lunch. |
|
I'm not sure if giving CSC its own |
|
I think if the CSC |
scipy/sparse/csc.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused routine
scipy/sparse/tests/test_base.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These yield statements have no effect: the rest of the test is not written as a test generator.
Also, else: seems to be missing, and the comparison is the wrong way around?
|
Cleaned up the test, sent a PR. cowlicks#1 |
Pr 2689
|
Looks like more problems related not having size 0 matrices. When I change the check in So I think I'm stuck tracking everything down that could give a size zero matrix, and adding checks so it will return a 1x1 empty matrix. |
|
Also I think there would be a lot of non-obvious decisions to be made when squishing sparse matrix schemes into a 1D form. For example would COO just store things as (None, col, val) and (row, None, val) or a 2 element tuple thing? What would compressed formats do? How would the squished scheme show it was horizontal or vertical? I'm not sure there is a reasonable way to do this for many formats. I think we would just have to invent something special that would masquerade as the other formats whenever they are size zero. |
|
I think returning size-1 result from a slice that should give size 0 is incorrect. It's better to have it raise an exception instead. So I think you should remove all special-casing where 0-size result is converted to 1-size, and have the corresponding tests be knownfailures. The correct fix would then be to fix the system so that it allows for 0 x n and n x 0 sparse matrices. |
|
@pv That sounds like the right thing to do. Would this case would be a |
|
I think either one will do. Hopefully we'll manage to fix this for 0.14 (or even 0.13 if someone is fast enough). |
Also add errors in this case.
Also removed tests that previously checked the size 0 workaround.
|
Okay, I reverted the work around for size 0 matrices, and added the |
ENH: sparse: Fancy indexing for CSR and CSC matrices Add initial fancy indexing support for CSR and CSC.
|
Merged + some additional fixes in 76bce8b. Thanks @cowlicks & @Daniel-B-Smith |
|
There's probably a lot to optimize in the CSR/CSC indexing business --- it's probably not the most efficient thing to have a Python loop that insert elements one by one... Correctness trumps speed, though. |
Addresses comments by @pv: scipy#2689 (diff) scipy#2689 (diff) scipy#2689 (diff)
This introduces a failing test due to a bug in CSC's .nonzero() where the indices are not sorted C-style as described here: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#boolean This addresses @pv comments: scipy#2689 (diff) scipy#2689 (diff)
This is based off of @Daniel-B-Smith's work (thanks!).
I also add support for indexing with sparse boolean matrices, but it does not have tests yet. I will add tests for this tomorrow. For now I just wanted to get this out because it finishes what Daniel started.
Here's a demo of what I mean by indexing with sparse boolean matrices.