# Dask SLURMClusters

The MLeRP notebook environment uses Dask SLURMClusters to create a
middle ground that has the interactivity of a notebook backed by the
power of HPC. You will be provisioned with a CPU based notebook session
for your basic analysis and code development. Then, when you’re ready to
run tests you will use Dask to submit your python functions to the SLURM
queue.

This enables: - Flexibility to experiment with your dataset
interactively - Ability to change compute requirements such as RAM, size
of GPU, number of processes and so on… without ever leaving the notebook
environment - Elastic scaling of compute - Efficient utilisation of the
hardware - Releasing of resources when not in use

In [None]:
from dask_jobqueue import SLURMCluster
from distributed import Client, LocalCluster
import dask

# Point Dask to the SLURM to use as it's back end
cluster = SLURMCluster(
    memory="64g", processes=1, cores=8
)

# Scale out to 4 nodes
num_nodes = 4
cluster.scale(num_nodes)
client = Client(cluster)

Dask will now spin our jobs up in anticipation for work to the scale
that you specify.

You can check in on your jobs like you would with any other SLURM job
with `squeue`.

In [2]:
!squeue

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
              1637     batch Jupyter  mhar0048  R      42:23      1 mlerp-node05
              1638     batch dask-wor mhar0048  R       0:16      1 mlerp-node05
              1639     batch dask-wor mhar0048  R       0:16      1 mlerp-node05
              1640     batch dask-wor mhar0048  R       0:16      1 mlerp-node05
              1641     batch dask-wor mhar0048  R       0:16      1 mlerp-node05

Alternatively, we can use the adapt method, which will let us scale out
as we need the compute… and scale back when we’re idle letting others
use the cluster.

We reccommend that you use the adapt method while you’re actively
developing your code so that you don’t need to worry about cleaning up
after yourself. The scale method can be used when you’re ready to run
longer tests with higher utilisation.

In [3]:
cluster.adapt(minimum=0, maximum=num_nodes)

<distributed.deploy.adaptive.Adaptive at 0x7f85ea9ec820>

In [7]:
# You may need to run this cell a few times while waiting for Dask to clean up
!squeue


             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
              1637     batch Jupyter  mhar0048  R      43:01      1 mlerp-node05

Dask has a UI that will let you see how the tasks are being computed.
You won’t be able to connect to this with your web browser but VSCode
and Jupyter have extensions for you to connect to it.

Use the loopback address: http://127.0.0.1:8787 (Adjust the port to the
one listed when you make the client if needed)

Now let’s define a dask array and perform some computation. Dask arrays
are parallelised across your workers nodes so they can be greater than
the size of one worker’s memory. Dask evaluates lazily, retuning
‘futures’ which record the tasks needed to be completed in the compute
graph. They can be computed later for its value.

Dask also has parallelised implementations of dataframes and collections
of objects (called bags). These are written to be as similar as possible
to familiar libraries like numpy, pandas and pyspark. You can read more
about [arrays](https://docs.dask.org/en/stable/array.html),
[dataframes](https://docs.dask.org/en/stable/dataframe.html) and
[bags](https://docs.dask.org/en/stable/bag.html) with Dask’s
documentation.

In [4]:
import dask.array as da
x = da.random.random((1000, 1000, 1000))
x  # Note how the value of the array hasn't been computed yet


You can check squeue while this is running to see the jobs dynamically
spinning up to perform the computation.

In [7]:
x[0][0][:10].compute()


array([0.9527929 , 0.93675059, 0.11717679, 0.47114357, 0.73693508,
       0.01302143, 0.86360879, 0.12592881, 0.52676823, 0.99186392])

We can also accelerate dask arrays with GPUs using cupy. There is
similar support for accelerating dask dataframes with CuDF.

In [8]:
dask.config.set({"array.backend": "cupy"})
y = da.random.random((1000, 1000, 1000))
y.compute()
y[0][0][:10].compute()

array([0.02380941, 0.62371184, 0.88393467, 0.8604588 , 0.16488854,
       0.11214214, 0.86582312, 0.01384666, 0.79636323, 0.58940477])

Finally, we can shut down the SLURMCluster now that we’re done with it.

In [11]:
# Shut down the cluster
client.shutdown()
