New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow np.array(COO) #218
Comments
Having worked with the environment variable enabled for the past few weeks, I can confirm that it often densifies in unexpected places. For instance, an algorithm that uses |
CuPy explicitly doesn't implement the |
I agree with @asmeurer here. The purpose of sparse seems to be to create, manage, and operate on sparse arrays. I don't expect that default behavior is to densify as frequently as it does. |
I agree that either choice is defensible. In the long term, it would be nice to have a protocol for coercion to NumPy arrays even if the operation could be very expensive. |
I think @hameerabbasi didn't want to add the flag to the Python level because of thread safety (right now it only exists as an environment variable that must be set before importing |
I'm +1 for disallowing implicit densification. That's the approach taken by scipy.sparse, and I think it has been effective in guiding users away from unexpected performance traps. Users often test their code using small inputs, so problems caused by densification have a bad habit of only showing up in production. |
It seems all comments are either in favor or neutral. I've implemented this in #220.
This already exists as a method rather than a protocol,
|
Ideally the method would also be available on non-sparse arrays |
I agree with @shoyer. At the very least it would be nice to have |
Currently, we have a mix of when to allow dense inputs and when not to allow them. Currently, we have the
__array__
protocol which converts the sparse array into a dense one when usingnp.array(COO)
or similar.This would be fine if we knew that we were densifying, but in practice, we really don't. Think of something that calls
np.asarray
internally, this would densify, and in many cases fill up memory and raise aMemoryError
.In practice, what we want most of the time is to disallow
np.[as]array(COO)
and provide an explicit escape hatch in the form ofCOO.todense()
.Recently, I added an environment variable to control this, so this discussion is mostly about changing the default behaviour.
Thoughts, @rgommers, @shoyer, @mrocklin, @perimosocordiae?
The text was updated successfully, but these errors were encountered: