### Raison d'etre

Provide best practices, tips and tricks for single node Dask computing clusters, if you learn something valuable please add it here. This is likely going to be a constantly evolving document.

In [3]:
import dask
import pandas as pd
import numpy as np
import dask.array as da
import dask.dataframe as dd

### Scheduler

In [6]:
# TODO (rav): document tradeoffs between process and thread pool based local scheduler

### Graph optimizations/changes

`optimization.fuse.ave-width` is a useful optimization strategy to effectively control/trade task parallelism for processing locality, thus in some cases reducing communication overhead between workers, this can be particaully important for distributed client/cluster.

In [10]:
with dask.config.set({"optimization.fuse.ave-width": ...}):
    ...

Official doc:
> Upper limit for width, where width = num_nodes / height, a good measure of parallelizability

from [here](https://docs.dask.org/en/latest/configuration-reference.html#dask.optimization.fuse.ave-width) and [here](https://docs.dask.org/en/latest/optimize.html#dask.optimization.fuse)


Example [notebook](https://github.com/dask/dask-examples/blob/4affee9d31bccd327205af90dd495347c8f2f7f7/applications/array-optimization.ipynb).

### Data organization

#### client.rebalance

Assuming you have a process-pool/distributed cluster, before computational expensive operation (if possible) it's worth to `persist` data, keep in mind though that `persist` returns `delayed` object, in some cases the chunk/partition/part data distribution among workers can be uneven, which might lead to communication overhead in steps following the `persist` operation. In these cases it's worth to `rebalance` data, to allow for better distribution among workers, for example:

In [None]:
a = da.from_delayed(...)
a = a.persist() # this operation is asynchronized, returns immediately
client.rebalance(a) # this operation is synchronized, and returns only after persist and rebalance is done
# next operation is computationally expensive and embarrassingly parallel
...