Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add section about very large arrays #9

Open
wants to merge 1 commit into
base: array-best-practices
Choose a base branch
from

Conversation

rabernat
Copy link

This is my attempt to distill my experience with very large arrays. I got my numbers from Matt's blog post on dask scaling limits.

I have probably said some inaccurate things that need to be corrected. This is just a first draft.

@mrocklin
Copy link
Owner

Thanks @rabernat !

Two thoughts:

  1. A lot of this seems general beyond dask arrays, we might want a general core best practices section that talks a bit about overhead and such.
  2. This makes claims that people shouldn't use Dask for datasets over several TB, and that they should use for loops instead. This was definitely true in your case (and I'm hoping that other Dask maintainers can help you resolve those problems in the near future), but it may not be true in general. I hesitate to tell people not to use Dask at this scale. Other groups do use Dask at this scale quite happily, they just have different parameters than you do. For example, they might use much larger chunk sizes, or they may not care about interactivity, and may instead be submitting batch jobs.

Instead, I wonder if we might pull out some parts of this, like encouraging people to use larger chunk sizes as graphs get large, to help users get to these larger scales more smoothly.

@rabernat
Copy link
Author

Both your comments make sense.

I understand that you might want to revise and moderate the way I describe dask's scaling limitations. I am totally fine with that. You have a much broader view of the landscape than I do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants