wip add array best practices

mrocklin · Feb 28, 2018 · 7017aa6 · 7017aa6
1 parent 31f0f8f
commit 7017aa6
Showing 1 changed file with 22 additions and 0 deletions.
diff --git a/docs/source/array-best-practices.rst b/docs/source/array-best-practices.rst
@@ -0,0 +1,22 @@
+Best Practices
+==============
+
+This page contains a list of best practices for Dask arrays.
+
+-  When deciding chunks be aware that the scheduler may impose overheads up to
+   1ms per operation per chunk.  You want to make your chunks large enough that
+   operations on those chunks take up 100ms or so.
+
+   You also want chunks to be small enough that you can have several of them in
+   memory at once, probably more than twice the number of threads you're using,
+   even for simple computations.
+
+   Aiming for chunks that are 50MB-500MB is usually a good rule of thumb.
+   You should experiment though.
+
+-  When loading data from chunked data sources (like HDF5) you should arrange
+   your chunks to align with the chunks of the underlying data source,
+   otherwise you may read through all of your data many times more than is
+   necessary.
+
+-  It is difficult to make the HDF5 library is not