New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sparse: Add "order" and "out" arguments to todense() and toarray() #229
Conversation
This adds an 'order' keyword to select the memory layout of the resulting array when using toarray() to create a dense array from a sparse matrix type. The default remains a C-contiguous array. Tests are also updated to test both the default and explicit 'C' and 'F' modes. Generated output of SWIG will follow in a separate commit, so that this one can be easily cherry-picked if what I generated is unsatisfactory.
This adds an 'order' keyword to select the memory layout of the resulting object when using todense() to create a numpy matrix object from a sparse matrix type, similarly to the previous change to toarray(). The default remains a C-contiguous matrix. Tests are also updated to test both the default and explicit 'C' and 'F' modes. Generated output of SWIG will follow in a separate commit, so that this one can be easily cherry-picked if what I generated is unsatisfactory.
This commit contains SWIG-generated code only.
Useful feature, thanks! |
I've added docstrings for toarray() and todense() in the base class. It seems kind of silly to copy and paste the same documentation everywhere, but I can't recall whether we're supposed to support Python 2.4 still, and |
Looks good - this will be a nice feature. |
My worry with copying and pasting docstrings stems from something that Greg Wilson once told me; to paraphrase: "if the same thing is repeated in more than one place, it will eventually be wrong in at least one". I would much rather put a dummy "See: spmatrix.toarray" docstring and have it be overwritten by EDIT: always -> eventually. Corrected by GVW on Twitter. That makes more sense... |
Mutually exclusive to the previously added 'order=' argument, this should serve the same crowd, i.e. people who occasionally need a dense copy of their sparse matrix but *really* can't afford to be allocating an array that size over and over again. Refactored a bit into the base class so as to not duplicate code between lil.py and coo.py. Tests included.
This matches the new argument in the array version, toarray(). out is not required to be a numpy.matrix but is wrapped in one upon return.
I've added some more functionality (that should be of use to certain people, maybe scikit-learn included: the ability to specify an 'out' argument for todense and toarray. Anything currently holding this back from merge that I should address? |
Python 2.4 is indeed still supported. The solution to simply refer to another docstring is the right one. Did anyone happen to test this on 2.4 or 3.2? |
Not tested, but thoroughly audited for 2.4 problems (we get a lot of them in Theano). I don't have very much 3.x experience. I wasn't sure but based on the preprocessor directives, looks like the SWIG bindings do indeed support Python 2.4 through 3.x, without any special futzing on my part. |
@dwf - very nice. Very intuitive interface, and the tests pass for me on python 2.6.4. This will be a really helpful feature for memory-heavy applications! |
All tests pass with python 2.4. |
M,N = self.shape | ||
coo_todense(M, N, self.nnz, self.row, self.col, self.data, B.ravel()) | ||
coo_todense(M, N, self.nnz, self.row, self.col, self.data, | ||
B.ravel(order='A'), fortran) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a problem here: you never check whether B is contiguous. If it's not, then B.ravel() will return a copy and the output won't be written to the original array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Tue, Jun 05, 2012 at 11:07:34AM -0700, Jacob Vanderplas wrote:
M,N = self.shape
coo_todense(M, N, self.nnz, self.row, self.col, self.data, B.ravel())
coo_todense(M, N, self.nnz, self.row, self.col, self.data,
B.ravel(order='A'), fortran)
I think there's a problem here: you never check whether B is contiguous. If it's not, then B.ravel() will return a copy and the output won't be written to the original array.
Good catch. f2py has this annoying problem as well for non-Fortran
contiguous inputs, I wouldn't want to accidentally introduce more of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most sparse types go through coo_matrix as an intermediate step, but some
(mostly just lil) do not. I'll update the docstring to say "most sparse
types" require a C- or F- contiguous out
array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a value error if the "out" array is not contiguous. This feature is for users who know exactly what they want in terms of memory management: if they're doing something wrong, they need to know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I added that too. I just thought the docs should reflect this reality as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the changes in 06cc844.
@dlaxalde testing with 3.x would be wise because regenerating these SWIG wrappers doesn't happen often. I also don't know what the procedure is or what can go wrong. |
I'll do that testing now. |
@rgommers I believe I did "swig -c++ -python coo.i" for reference. |
Works fine on 3.2. The last commit looks fine too, so I guess this can be merged. |
3.2 testing did turn up another issue with sparse.csgraph - it can't be imported at all (2to3 issue). I'd swear that I tested it on 3.x before. |
That's not good. Any idea what's causing the problem? |
sparse: Add "order" and "out" arguments to todense() and toarray()
Thanks David! Nice work. |
No problem, thank you for catching the contiguity issue. :) |
@jakevdp yes, the Cython source files should be using absolute imports everywhere. Or adding the |
Actually that's not enough - 2to3 doesn't seem to recognize compiled extension names starting with an underscore. The csgraph/init.py`` file looks like:
Note the missing |
Thanks Ralph. |
cool. |
Import issue fixed by #243. |
Adds an order argument so that one can go directly from sparse types to Fortran-contiguous arrays/matrix objects.
Also adds an "out" argument for re-using an already-allocated output buffer. Potentially useful when working close to the memory ceiling.
This involved re-generating the SWIG wrappers, I'm not sure exactly what the procedure is for that, so I included the re-generated code in a separate commit.