# 04. Deploying Dask

## Overview

In this notebook, you will learn how to:

 - Configure remote Dask.distributed deployment.
 - Deploy Dask.distributed scheduler and workers on compute nodes.
 - Access scheduler and worker dashboards.

## Import idact

Add `idact` to path if it's not already installed, for instance if this notebook is executed in a cloned repo.

In [1]:
import bitmath
import sys

sys.path.append('../')

We will use a wildcard import for convenience:

In [2]:
from idact import *

## Load the cluster

Let's load the environment and the cluster. Make sure to use your cluster name.

In [3]:
load_environment()
cluster = show_cluster("hpc")
cluster

Cluster(pro.cyfronet.pl, 22, plggarstka, auth=AuthMethod.PUBLIC_KEY, key='C:\\Users\\Maciej/.ssh\\id_rsa_6p', install_key=False, disable_sshd=False)

In [4]:
access_node = cluster.get_access_node()
access_node.connect()

## Configure remote Dask deployment

### Install Dask.distributed on the cluster

Make sure `dask`, `distributed` and `bokeh` packages are installed with the Python 3.5+ distribution you intend to use on the cluster, see [Dask documentation](http://distributed.dask.org/en/latest/install.html).

If you encounter any problems with deployment, this may be due to some library versions being incompatible. You can try installing frozen versions included with the *idact* repo in `envs/dask_jupyter_tornado.txt`, same as described in the previous tutorial `03. Deploying Jupyter`.

### Specify setup actions

Same as for the Jupyter deployment, in order for *idact* to find and execute the proper binaries, you'll need to specify setup steps as a list of Bash script lines.

They may very well be the exact same instructions as for the Jupyter deployment. If they are not, replace `cluster.config.setup_actions.jupyter` with the correct instructions list.

In [5]:
cluster.config.setup_actions.dask = cluster.config.setup_actions.jupyter
save_environment()

### Choose the scratch directory

Dask requires a directory for offloading data when the memory starts to fill up. Usually, the faster the storage, the better, see `--local-directory` in [dask-worker documentation](http://distributed.dask.org/en/latest/worker.html#spill-data-to-disk).

You can pass an absolute path, or a cluster environment variable.

In [6]:
cluster.config.scratch = '$SCRATCH'
save_environment()

## Allocate nodes for the Dask deployment

We will deploy Dask on three nodes: a scheduler on the first node, and one worker on each node. Make sure to adjust the `--account` parameter, same as in previous notebooks.

In [7]:
nodes = cluster.allocate_nodes(nodes=3,
                               cores=2,
                               memory_per_node=bitmath.GiB(10),
                               walltime=Walltime(minutes=10),
                               native_args={
                                   '--account': 'intdata'
                               })
nodes

2018-11-24 01:40:34 INFO: Creating the ssh directory.


Nodes([Node(NotAllocated),Node(NotAllocated),Node(NotAllocated)], SlurmAllocation(job_id=14334588))

In [8]:
nodes.wait()
nodes

Nodes([Node(p0289:60904, 2018-11-24 00:50:41.868727+00:00),Node(p0290:43482, 2018-11-24 00:50:41.868727+00:00),Node(p0291:54153, 2018-11-24 00:50:41.868727+00:00)], SlurmAllocation(job_id=14334588))

## Deploy Dask

After the initial setup, Dask can be deployed with a single command:

In [9]:
dd = deploy_dask(nodes)
dd

2018-11-24 01:40:46 INFO: Deploying Dask on 3 nodes.
2018-11-24 01:40:46 INFO: Connecting to p0289:60904 (1/3).
2018-11-24 01:40:48 INFO: Connecting to p0290:43482 (2/3).
2018-11-24 01:40:50 INFO: Connecting to p0291:54153 (3/3).
2018-11-24 01:40:52 INFO: Deploying scheduler on the first node: p0289.
2018-11-24 01:41:20 INFO: Checking scheduler connectivity from p0289 (1/3).
2018-11-24 01:41:20 INFO: Checking scheduler connectivity from p0290 (2/3).
2018-11-24 01:41:20 INFO: Checking scheduler connectivity from p0291 (3/3).
2018-11-24 01:41:21 INFO: Deploying workers.
2018-11-24 01:41:21 INFO: Deploying worker 1/3.
2018-11-24 01:41:42 INFO: Deploying worker 2/3.
2018-11-24 01:42:05 INFO: Deploying worker 3/3.
2018-11-24 01:42:46 INFO: Retried and failed: config.retries[Retry.OPEN_TUNNEL].{count=3, seconds_between=5}
2018-11-24 01:42:46 ERROR: Failure: Adding last hop.
2018-11-24 01:43:08 INFO: Validating worker 1/3.
2018-11-24 01:43:08 INFO: Validating worker 2/3.
2018-11-24 01:43:08 I

DaskDeployment(scheduler=tcp://localhost:41056/tcp://172.20.65.34:41056, workers=3)

If the deployment fails, take a look at `idact.log` to find out why.

You can get a Dask client for the deployment:

In [10]:
client = dd.get_client()
client

0,1
Client  Scheduler: tcp://localhost:41056  Dashboard: http://localhost:34652/status,Cluster  Workers: 3  Cores: 6  Memory: 32.21 GB


You shouldn't perform computations with Dask.distributed from your local computer, due to likely Python and library version mismatches.

Even if your Python environment matched the cluster exactly, the amount of data that could be transferred to your local computer could prove overwhelming.

We will address this issue in one of the next notebooks, by making the Dask deployment accessible from a notebook deployed on the cluster.

## Browse Dask dashboards

Dask provides dashboards for the scheduler and each worker:

In [11]:
dd.diagnostics.addresses

['http://localhost:34652/status',
 'http://localhost:50707/main',
 'http://localhost:43578/main',
 'http://localhost:49392/main']

To open all dashboards, execute the line below. You can also click the scheduler dashboard link under `get_client` above.

In [12]:
dd.diagnostics.open_all()

## Cancel Dask deployment

After you're done, you can cancel the deployment by calling `cancel`, though it will be killed anyway when the node allocation ends.

In [13]:
dd.cancel()

2018-11-24 01:43:48 INFO: Cancelling worker deployment on p0291.
2018-11-24 01:43:54 INFO: Cancelling worker deployment on p0290.
2018-11-24 01:44:02 INFO: Cancelling worker deployment on p0289.
2018-11-24 01:44:08 INFO: Cancelling scheduler deployment on p0289.


Alternatively, the following will just close the tunnels, without attempting to kill Dask scheduler and workers:

In [14]:
dd.cancel_local()

Dask client is multithreaded, so it needs to be closed.

In [15]:
client.close()

## Cancel the allocation

It's important to cancel an allocation if you're done with it early, in order to minimize the CPU time you are charged for.

In [16]:
nodes.running()

True

In [17]:
nodes.cancel()

2018-11-24 01:44:17 INFO: Cancelling job 14334588.


In [18]:
nodes.running()

False

## Next notebook

In the next notebook, we will install *idact* on the cluster and configure it from a deployed Jupyter Notebook.