Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse: Add "order" and "out" arguments to todense() and toarray() #229

Merged
merged 8 commits into from Jun 5, 2012
Merged

sparse: Add "order" and "out" arguments to todense() and toarray() #229

merged 8 commits into from Jun 5, 2012

Conversation

dwf
Copy link
Contributor

@dwf dwf commented May 28, 2012

Adds an order argument so that one can go directly from sparse types to Fortran-contiguous arrays/matrix objects.

Also adds an "out" argument for re-using an already-allocated output buffer. Potentially useful when working close to the memory ceiling.

This involved re-generating the SWIG wrappers, I'm not sure exactly what the procedure is for that, so I included the re-generated code in a separate commit.

dwf added 3 commits May 28, 2012 02:38
This adds an 'order' keyword to select the memory layout of the
resulting array when using toarray() to create a dense array
from a sparse matrix type. The default remains a C-contiguous
array. Tests are also updated to test both the default and
explicit 'C' and 'F' modes.

Generated output of SWIG will follow in a separate commit, so
that this one can be easily cherry-picked if what I generated
is unsatisfactory.
This adds an 'order' keyword to select the memory layout of the
resulting object when using todense() to create a numpy matrix
object from a sparse matrix type, similarly to the previous
change to toarray(). The default remains a C-contiguous
matrix. Tests are also updated to test both the default and
explicit 'C' and 'F' modes.

Generated output of SWIG will follow in a separate commit, so
that this one can be easily cherry-picked if what I generated
is unsatisfactory.
This commit contains SWIG-generated code only.
@GaelVaroquaux
Copy link
Member

Useful feature, thanks!

@dwf
Copy link
Contributor Author

dwf commented May 28, 2012

I've added docstrings for toarray() and todense() in the base class. It seems kind of silly to copy and paste the same documentation everywhere, but I can't recall whether we're supposed to support Python 2.4 still, and functools.wraps is 2.5+.

@jakevdp
Copy link
Member

jakevdp commented May 29, 2012

Looks good - this will be a nice feature.
There was some talk on the scipy-dev email list lately regarding python version. I seem to remember someone claiming that several tests fail under python 2.4... I think we're still nominally supposed to be compatible with 2.4 though.
That being said, I think explicit copied-and-pasted docstrings leads to more understandable code.

@dwf
Copy link
Contributor Author

dwf commented May 29, 2012

My worry with copying and pasting docstrings stems from something that Greg Wilson once told me; to paraphrase: "if the same thing is repeated in more than one place, it will eventually be wrong in at least one". I would much rather put a dummy "See: spmatrix.toarray" docstring and have it be overwritten by functools.wraps (or a custom, equivalent decorator that does the same thing) so that IDEs and IPython sessions have access to the right thing.

EDIT: always -> eventually. Corrected by GVW on Twitter. That makes more sense...

dwf added 3 commits June 4, 2012 15:25
Mutually exclusive to the previously added 'order=' argument, this
should serve the same crowd, i.e. people who occasionally need a dense
copy of their sparse matrix but *really* can't afford to be allocating
an array that size over and over again. Refactored a bit into the base
class so as to not duplicate code between lil.py and coo.py.

Tests included.
This matches the new argument in the array version, toarray(). out is
not required to be a numpy.matrix but is wrapped in one upon return.
@dwf
Copy link
Contributor Author

dwf commented Jun 4, 2012

I've added some more functionality (that should be of use to certain people, maybe scikit-learn included: the ability to specify an 'out' argument for todense and toarray. Anything currently holding this back from merge that I should address?

@rgommers
Copy link
Member

rgommers commented Jun 4, 2012

Python 2.4 is indeed still supported. The solution to simply refer to another docstring is the right one.

Did anyone happen to test this on 2.4 or 3.2?

@dwf
Copy link
Contributor Author

dwf commented Jun 4, 2012

Not tested, but thoroughly audited for 2.4 problems (we get a lot of them in Theano). I don't have very much 3.x experience.

I wasn't sure but based on the preprocessor directives, looks like the SWIG bindings do indeed support Python 2.4 through 3.x, without any special futzing on my part.

@jakevdp
Copy link
Member

jakevdp commented Jun 5, 2012

@dwf - very nice. Very intuitive interface, and the tests pass for me on python 2.6.4. This will be a really helpful feature for memory-heavy applications!
If things are OK in 2.4, then all we need is confirmation that this works in 3.2 and then we can merge. I don't currently have 3.2 installed - is there anybody set up to quickly test this?

@dlax
Copy link
Member

dlax commented Jun 5, 2012

All tests pass with python 2.4.
Why do you need to test with 3.2?

M,N = self.shape
coo_todense(M, N, self.nnz, self.row, self.col, self.data, B.ravel())
coo_todense(M, N, self.nnz, self.row, self.col, self.data,
B.ravel(order='A'), fortran)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a problem here: you never check whether B is contiguous. If it's not, then B.ravel() will return a copy and the output won't be written to the original array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Jun 05, 2012 at 11:07:34AM -0700, Jacob Vanderplas wrote:

     M,N = self.shape
  •    coo_todense(M, N, self.nnz, self.row, self.col, self.data, B.ravel())
    
  •    coo_todense(M, N, self.nnz, self.row, self.col, self.data,
    
  •                B.ravel(order='A'), fortran)
    

I think there's a problem here: you never check whether B is contiguous. If it's not, then B.ravel() will return a copy and the output won't be written to the original array.

Good catch. f2py has this annoying problem as well for non-Fortran
contiguous inputs, I wouldn't want to accidentally introduce more of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most sparse types go through coo_matrix as an intermediate step, but some
(mostly just lil) do not. I'll update the docstring to say "most sparse
types" require a C- or F- contiguous out array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a value error if the "out" array is not contiguous. This feature is for users who know exactly what they want in terms of memory management: if they're doing something wrong, they need to know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I added that too. I just thought the docs should reflect this reality as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the changes in 06cc844.

@rgommers
Copy link
Member

rgommers commented Jun 5, 2012

@dlaxalde testing with 3.x would be wise because regenerating these SWIG wrappers doesn't happen often. I also don't know what the procedure is or what can go wrong.

@rgommers
Copy link
Member

rgommers commented Jun 5, 2012

I'll do that testing now.

@dwf
Copy link
Contributor Author

dwf commented Jun 5, 2012

@rgommers I believe I did "swig -c++ -python coo.i" for reference.

@rgommers
Copy link
Member

rgommers commented Jun 5, 2012

Works fine on 3.2. The last commit looks fine too, so I guess this can be merged.

@rgommers
Copy link
Member

rgommers commented Jun 5, 2012

3.2 testing did turn up another issue with sparse.csgraph - it can't be imported at all (2to3 issue). I'd swear that I tested it on 3.x before.

@jakevdp
Copy link
Member

jakevdp commented Jun 5, 2012

3.2 testing did turn up another issue with sparse.csgraph - it can't be imported at all (2to3 issue). I'd swear that I tested it on 3.x before.

That's not good. Any idea what's causing the problem?

jakevdp added a commit that referenced this pull request Jun 5, 2012
sparse: Add "order" and "out" arguments to todense() and toarray()
@jakevdp jakevdp merged commit 8afcd37 into scipy:master Jun 5, 2012
@jakevdp
Copy link
Member

jakevdp commented Jun 5, 2012

Thanks David! Nice work.

@dwf
Copy link
Contributor Author

dwf commented Jun 5, 2012

No problem, thank you for catching the contiguity issue. :)

@rgommers
Copy link
Member

rgommers commented Jun 5, 2012

@jakevdp yes, the Cython source files should be using absolute imports everywhere. Or adding the . for relative imports, but that doesn't work with Python 2.4

@rgommers
Copy link
Member

rgommers commented Jun 5, 2012

Actually that's not enough - 2to3 doesn't seem to recognize compiled extension names starting with an underscore. The csgraph/init.py`` file looks like:

from ._components import cs_graph_components
from ._laplacian import laplacian
from _shortest_path import shortest_path, ...

Note the missing . on the last line. Will look into this more tomorrow.

@jakevdp
Copy link
Member

jakevdp commented Jun 5, 2012

Thanks Ralph.

@samueljohn
Copy link

cool.

@rgommers
Copy link
Member

rgommers commented Jun 6, 2012

Import issue fixed by #243.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants