Skip to content

Commit

Permalink
wip add array best practices
Browse files Browse the repository at this point in the history
  • Loading branch information
mrocklin committed Feb 28, 2018
1 parent 31f0f8f commit 7017aa6
Showing 1 changed file with 22 additions and 0 deletions.
22 changes: 22 additions & 0 deletions docs/source/array-best-practices.rst
@@ -0,0 +1,22 @@
Best Practices
==============

This page contains a list of best practices for Dask arrays.

- When deciding chunks be aware that the scheduler may impose overheads up to
1ms per operation per chunk. You want to make your chunks large enough that
operations on those chunks take up 100ms or so.

You also want chunks to be small enough that you can have several of them in
memory at once, probably more than twice the number of threads you're using,
even for simple computations.

Aiming for chunks that are 50MB-500MB is usually a good rule of thumb.
You should experiment though.

- When loading data from chunked data sources (like HDF5) you should arrange
your chunks to align with the chunks of the underlying data source,
otherwise you may read through all of your data many times more than is
necessary.

- It is difficult to make the HDF5 library is not

0 comments on commit 7017aa6

Please sign in to comment.