Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow np.array(COO) #218

Closed
hameerabbasi opened this issue Dec 12, 2018 · 9 comments
Closed

Disallow np.array(COO) #218

hameerabbasi opened this issue Dec 12, 2018 · 9 comments

Comments

@hameerabbasi
Copy link
Collaborator

Currently, we have a mix of when to allow dense inputs and when not to allow them. Currently, we have the __array__ protocol which converts the sparse array into a dense one when using np.array(COO) or similar.

This would be fine if we knew that we were densifying, but in practice, we really don't. Think of something that calls np.asarray internally, this would densify, and in many cases fill up memory and raise a MemoryError.

In practice, what we want most of the time is to disallow np.[as]array(COO) and provide an explicit escape hatch in the form of COO.todense().

Recently, I added an environment variable to control this, so this discussion is mostly about changing the default behaviour.

Thoughts, @rgommers, @shoyer, @mrocklin, @perimosocordiae?

@asmeurer
Copy link
Member

Having worked with the environment variable enabled for the past few weeks, I can confirm that it often densifies in unexpected places. For instance, an algorithm that uses einsum will densify because einsum isn't implemented for sparse.

@mrocklin
Copy link
Collaborator

CuPy explicitly doesn't implement the __array__ protocol for this reason. Dask.array still does. Both seem defensible. No strong thoughts either way.

@scopatz
Copy link

scopatz commented Dec 12, 2018

I agree with @asmeurer here. The purpose of sparse seems to be to create, manage, and operate on sparse arrays. I don't expect that default behavior is to densify as frequently as it does.

@shoyer
Copy link
Member

shoyer commented Dec 12, 2018

__array_function__ will help here, since NumPy functions won't automatically densify anymore.

I agree that either choice is defensible.

In the long term, it would be nice to have a protocol for coercion to NumPy arrays even if the operation could be very expensive.

@asmeurer
Copy link
Member

I think @hameerabbasi didn't want to add the flag to the Python level because of thread safety (right now it only exists as an environment variable that must be set before importing sparse). See the discussion at #210.

@perimosocordiae
Copy link

I'm +1 for disallowing implicit densification. That's the approach taken by scipy.sparse, and I think it has been effective in guiding users away from unexpected performance traps.

Users often test their code using small inputs, so problems caused by densification have a bad habit of only showing up in production.

@hameerabbasi
Copy link
Collaborator Author

It seems all comments are either in favor or neutral. I've implemented this in #220.

In the long term, it would be nice to have a protocol for coercion to NumPy arrays even if the operation could be very expensive.

This already exists as a method rather than a protocol, .todense(), the problem with __array__ seems to be that it makes densification implicit, with a lot of NumPy and other library functions calling it.

__array_function__ will help here, since NumPy functions won't automatically densify anymore.

__array_function__ will help, but only internal NumPy functions are guaranteed to use it. So unless we allow overriding np.asarray, it isn't perfect.

@shoyer
Copy link
Member

shoyer commented Dec 13, 2018

This already exists as a method rather than a protocol, .todense()

Ideally the method would also be available on non-sparse arrays

@asmeurer
Copy link
Member

I agree with @shoyer. At the very least it would be nice to have sparse.todense() as a function so that we can call it universally on sparse and dense inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants