sparse: Add "order" and "out" arguments to todense() and toarray() #229

dwf · 2012-05-28T06:51:19Z

Adds an order argument so that one can go directly from sparse types to Fortran-contiguous arrays/matrix objects.

Also adds an "out" argument for re-using an already-allocated output buffer. Potentially useful when working close to the memory ceiling.

This involved re-generating the SWIG wrappers, I'm not sure exactly what the procedure is for that, so I included the re-generated code in a separate commit.

This adds an 'order' keyword to select the memory layout of the resulting array when using toarray() to create a dense array from a sparse matrix type. The default remains a C-contiguous array. Tests are also updated to test both the default and explicit 'C' and 'F' modes. Generated output of SWIG will follow in a separate commit, so that this one can be easily cherry-picked if what I generated is unsatisfactory.

This adds an 'order' keyword to select the memory layout of the resulting object when using todense() to create a numpy matrix object from a sparse matrix type, similarly to the previous change to toarray(). The default remains a C-contiguous matrix. Tests are also updated to test both the default and explicit 'C' and 'F' modes. Generated output of SWIG will follow in a separate commit, so that this one can be easily cherry-picked if what I generated is unsatisfactory.

This commit contains SWIG-generated code only.

GaelVaroquaux · 2012-05-28T07:11:02Z

Useful feature, thanks!

dwf · 2012-05-28T16:17:08Z

I've added docstrings for toarray() and todense() in the base class. It seems kind of silly to copy and paste the same documentation everywhere, but I can't recall whether we're supposed to support Python 2.4 still, and functools.wraps is 2.5+.

jakevdp · 2012-05-29T00:31:05Z

Looks good - this will be a nice feature.
There was some talk on the scipy-dev email list lately regarding python version. I seem to remember someone claiming that several tests fail under python 2.4... I think we're still nominally supposed to be compatible with 2.4 though.
That being said, I think explicit copied-and-pasted docstrings leads to more understandable code.

dwf · 2012-05-29T00:56:11Z

My worry with copying and pasting docstrings stems from something that Greg Wilson once told me; to paraphrase: "if the same thing is repeated in more than one place, it will eventually be wrong in at least one". I would much rather put a dummy "See: spmatrix.toarray" docstring and have it be overwritten by functools.wraps (or a custom, equivalent decorator that does the same thing) so that IDEs and IPython sessions have access to the right thing.

EDIT: always -> eventually. Corrected by GVW on Twitter. That makes more sense...

Mutually exclusive to the previously added 'order=' argument, this should serve the same crowd, i.e. people who occasionally need a dense copy of their sparse matrix but *really* can't afford to be allocating an array that size over and over again. Refactored a bit into the base class so as to not duplicate code between lil.py and coo.py. Tests included.

This matches the new argument in the array version, toarray(). out is not required to be a numpy.matrix but is wrapped in one upon return.

dwf · 2012-06-04T19:55:34Z

I've added some more functionality (that should be of use to certain people, maybe scikit-learn included: the ability to specify an 'out' argument for todense and toarray. Anything currently holding this back from merge that I should address?

rgommers · 2012-06-04T20:59:16Z

Python 2.4 is indeed still supported. The solution to simply refer to another docstring is the right one.

Did anyone happen to test this on 2.4 or 3.2?

dwf · 2012-06-04T23:12:52Z

Not tested, but thoroughly audited for 2.4 problems (we get a lot of them in Theano). I don't have very much 3.x experience.

I wasn't sure but based on the preprocessor directives, looks like the SWIG bindings do indeed support Python 2.4 through 3.x, without any special futzing on my part.

jakevdp · 2012-06-05T15:07:10Z

@dwf - very nice. Very intuitive interface, and the tests pass for me on python 2.6.4. This will be a really helpful feature for memory-heavy applications!
If things are OK in 2.4, then all we need is confirmation that this works in 3.2 and then we can merge. I don't currently have 3.2 installed - is there anybody set up to quickly test this?

dlax · 2012-06-05T16:22:21Z

All tests pass with python 2.4.
Why do you need to test with 3.2?

jakevdp · 2012-06-05T18:07:34Z

scipy/sparse/coo.py

        M,N = self.shape
-        coo_todense(M, N, self.nnz, self.row, self.col, self.data, B.ravel())
+        coo_todense(M, N, self.nnz, self.row, self.col, self.data,
+                    B.ravel(order='A'), fortran)


I think there's a problem here: you never check whether B is contiguous. If it's not, then B.ravel() will return a copy and the output won't be written to the original array.

On Tue, Jun 05, 2012 at 11:07:34AM -0700, Jacob Vanderplas wrote:

M,N = self.shape

coo_todense(M, N, self.nnz, self.row, self.col, self.data, B.ravel())

coo_todense(M, N, self.nnz, self.row, self.col, self.data,

B.ravel(order='A'), fortran)

I think there's a problem here: you never check whether B is contiguous. If it's not, then B.ravel() will return a copy and the output won't be written to the original array.

Good catch. f2py has this annoying problem as well for non-Fortran
contiguous inputs, I wouldn't want to accidentally introduce more of that.

Most sparse types go through coo_matrix as an intermediate step, but some
(mostly just lil) do not. I'll update the docstring to say "most sparse
types" require a C- or F- contiguous out array.

I'd prefer a value error if the "out" array is not contiguous. This feature is for users who know exactly what they want in terms of memory management: if they're doing something wrong, they need to know.

Oh, I added that too. I just thought the docs should reflect this reality as well.

See the changes in 06cc844.

rgommers · 2012-06-05T19:24:43Z

@dlaxalde testing with 3.x would be wise because regenerating these SWIG wrappers doesn't happen often. I also don't know what the procedure is or what can go wrong.

rgommers · 2012-06-05T19:25:10Z

I'll do that testing now.

dwf · 2012-06-05T19:28:08Z

@rgommers I believe I did "swig -c++ -python coo.i" for reference.

rgommers · 2012-06-05T20:05:11Z

Works fine on 3.2. The last commit looks fine too, so I guess this can be merged.

rgommers · 2012-06-05T20:06:00Z

3.2 testing did turn up another issue with sparse.csgraph - it can't be imported at all (2to3 issue). I'd swear that I tested it on 3.x before.

jakevdp · 2012-06-05T20:11:50Z

3.2 testing did turn up another issue with sparse.csgraph - it can't be imported at all (2to3 issue). I'd swear that I tested it on 3.x before.

That's not good. Any idea what's causing the problem?

sparse: Add "order" and "out" arguments to todense() and toarray()

jakevdp · 2012-06-05T20:12:51Z

Thanks David! Nice work.

dwf · 2012-06-05T20:17:24Z

No problem, thank you for catching the contiguity issue. :)

rgommers · 2012-06-05T20:25:39Z

@jakevdp yes, the Cython source files should be using absolute imports everywhere. Or adding the . for relative imports, but that doesn't work with Python 2.4

rgommers · 2012-06-05T21:12:53Z

Actually that's not enough - 2to3 doesn't seem to recognize compiled extension names starting with an underscore. The csgraph/init.py`` file looks like:

from ._components import cs_graph_components
from ._laplacian import laplacian
from _shortest_path import shortest_path, ...

Note the missing . on the last line. Will look into this more tomorrow.

jakevdp · 2012-06-05T22:28:46Z

Thanks Ralph.

samueljohn · 2012-06-05T23:06:16Z

cool.

rgommers · 2012-06-06T19:16:29Z

Import issue fixed by #243.

dwf added 3 commits May 28, 2012 02:38

New SWIG wrapper for toarray()/todense() addition.

5febad9

This commit contains SWIG-generated code only.

DOC: toarray()/todense() base class docstrings.

ed8f27f

dwf added 3 commits June 4, 2012 15:25

ENH: Added an out= argument to sparse todense().

34c5eaa

This matches the new argument in the array version, toarray(). out is not required to be a numpy.matrix but is wrapped in one upon return.

DOC: added pointers to superclass docstring.

0ceeefa

jakevdp reviewed Jun 5, 2012
View reviewed changes

BUG: reject non-contig out arg in toarray()

06cc844

jakevdp added a commit that referenced this pull request Jun 5, 2012

Merge pull request #229 from dwf/f_ordered_from_sparse

8afcd37

sparse: Add "order" and "out" arguments to todense() and toarray()

jakevdp merged commit 8afcd37 into scipy:master Jun 5, 2012

rgommers mentioned this pull request Jun 9, 2012

TST: sparse: mark dok matrix tests as knownfail with python 2.4. #237

Closed

scipy-gitbot mentioned this pull request Apr 25, 2013

ndarray + dok sparse array doesn't work on Python 2.4 (Trac #1559) #2084

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sparse: Add "order" and "out" arguments to todense() and toarray() #229

sparse: Add "order" and "out" arguments to todense() and toarray() #229

dwf commented May 28, 2012

GaelVaroquaux commented May 28, 2012

dwf commented May 28, 2012

jakevdp commented May 29, 2012

dwf commented May 29, 2012

dwf commented Jun 4, 2012

rgommers commented Jun 4, 2012

dwf commented Jun 4, 2012

jakevdp commented Jun 5, 2012

dlax commented Jun 5, 2012

jakevdp Jun 5, 2012

dwf Jun 5, 2012

dwf Jun 5, 2012

jakevdp Jun 5, 2012

dwf Jun 5, 2012

dwf Jun 5, 2012

rgommers commented Jun 5, 2012

rgommers commented Jun 5, 2012

dwf commented Jun 5, 2012

rgommers commented Jun 5, 2012

rgommers commented Jun 5, 2012

jakevdp commented Jun 5, 2012

jakevdp commented Jun 5, 2012

dwf commented Jun 5, 2012

rgommers commented Jun 5, 2012

rgommers commented Jun 5, 2012

jakevdp commented Jun 5, 2012

samueljohn commented Jun 5, 2012

rgommers commented Jun 6, 2012

sparse: Add "order" and "out" arguments to todense() and toarray() #229

sparse: Add "order" and "out" arguments to todense() and toarray() #229

Conversation

dwf commented May 28, 2012

GaelVaroquaux commented May 28, 2012

dwf commented May 28, 2012

jakevdp commented May 29, 2012

dwf commented May 29, 2012

dwf commented Jun 4, 2012

rgommers commented Jun 4, 2012

dwf commented Jun 4, 2012

jakevdp commented Jun 5, 2012

dlax commented Jun 5, 2012

jakevdp Jun 5, 2012

Choose a reason for hiding this comment

dwf Jun 5, 2012

Choose a reason for hiding this comment

dwf Jun 5, 2012

Choose a reason for hiding this comment

jakevdp Jun 5, 2012

Choose a reason for hiding this comment

dwf Jun 5, 2012

Choose a reason for hiding this comment

dwf Jun 5, 2012

Choose a reason for hiding this comment

rgommers commented Jun 5, 2012

rgommers commented Jun 5, 2012

dwf commented Jun 5, 2012

rgommers commented Jun 5, 2012

rgommers commented Jun 5, 2012

jakevdp commented Jun 5, 2012

jakevdp commented Jun 5, 2012

dwf commented Jun 5, 2012

rgommers commented Jun 5, 2012

rgommers commented Jun 5, 2012

jakevdp commented Jun 5, 2012

samueljohn commented Jun 5, 2012

rgommers commented Jun 6, 2012