Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy arrays for asymptotically better performance #326

Open
hameerabbasi opened this issue Feb 17, 2020 · 9 comments
Open

Lazy arrays for asymptotically better performance #326

hameerabbasi opened this issue Feb 17, 2020 · 9 comments
Labels

Comments

@hameerabbasi
Copy link
Collaborator

@hameerabbasi hameerabbasi commented Feb 17, 2020

I've already brought this up in another thread (dask/dask#5879), but I was planning to make sparse collections within this library lazy for asymptotically better performance in certain situations. See the following research papers for details:

And the following talks:

With this in mind, would it make sense to make sparse collections lazy, with the caveat of an API break? These would have an API similar to Dask, having to do arr.compute() for the final result. As discussed in dask/dask#5879, it would also follow the protocols for dask custom collections.

If we manage to do this right, adding GPU support shouldn't be difficult either. But the question arises, is it worth the break an API compatibility to do this?

@hameerabbasi

This comment has been minimized.

Copy link
Collaborator Author

@hameerabbasi hameerabbasi commented Feb 17, 2020

Here's my proposal: We declare 0.x a backwards-compatible API, and consider 1.x to be based on TACO. We will emit a FutureWarning in 0.* about the API change.

@mrocklin

This comment has been minimized.

Copy link
Collaborator

@mrocklin mrocklin commented Feb 17, 2020

@hameerabbasi

This comment has been minimized.

Copy link
Collaborator Author

@hameerabbasi hameerabbasi commented Feb 17, 2020

Perhaps it would make sense to have a
different sparse_lazy library that depended strongly on this one?

I think if we were to do this, it’d be the other way around: immediate operations would just do .compute() at the end. How about I leave the main namespace API as-is move the lazy part to sparse.perf.

@mrocklin

This comment has been minimized.

Copy link
Collaborator

@mrocklin mrocklin commented Feb 17, 2020

That sounds nicer to me from an integration perspective. Also, to be clear, I'm very excited about seeing what this can do in terms of performance. My original hesitation is mostly around thinking about other libraries depending on pydata/sparse. My guess is that folks will be more hesitant to depend on anything that has any sort of non-traditional behavior.

@rgommers

This comment has been minimized.

Copy link

@rgommers rgommers commented Feb 17, 2020

I think there's also a good amount of evidence that eager is better as a default, see e.g. PyTorch modes and Tensorflow moving to eager by default.

So +1 for eager by default, and advertising lazy mode prominently as a potential speedup method.

@hameerabbasi

This comment has been minimized.

Copy link
Collaborator Author

@hameerabbasi hameerabbasi commented Feb 26, 2020

So I think the final decision is:

  1. Build the eager method on top of the lazy.
  2. Expose the lazy method as well for potential speedups, as a separate submodule.
@dhirschfeld

This comment has been minimized.

Copy link

@dhirschfeld dhirschfeld commented Feb 26, 2020

Potentially OT, but I'm curious how this relates to uarray - isn't that lazily evaluated? And if so, couldn't you just create a sparse backend?

@hameerabbasi

This comment has been minimized.

Copy link
Collaborator Author

@hameerabbasi hameerabbasi commented Feb 26, 2020

@dhirschfeld An early version of uarray that was renamed to metadsl was lazy. The current iteration is not lazy.

@saulshanabrook

This comment has been minimized.

Copy link

@saulshanabrook saulshanabrook commented Mar 11, 2020

Build the eager method on top of the lazy.

That makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.