# Dask Workflow - Remote

This notebook is intended to be executed on the cluster as a continuation of notebook `02a-DaskWorkflow-Local.ipynb`.

## Initial setup

`idact` should be installed using pip, e.g:

```
module load plgrid/tools/python-intel/3.6.2
python3 -m pip install --user git+https://github.com/garstka/eng-project.git
```

## Load cluster

If you pushed the environment from local notebook, this will load it:

In [1]:
from idact import *

load_environment()

Alternatively, use `add_cluster`, as described in notebook 01.

Then, show the cluster:

In [2]:
cluster = show_cluster("pro")  # replace with your cluster name if necessary
cluster

Cluster(pro.cyfronet.pl, 22, plggarstka, auth=AuthMethod.PUBLIC_KEY, key=None, install_key=True, disable_sshd=False)

## Setup cluster

The following is not necessary if you pushed the environment:

In [3]:
cluster.config.setup_actions.dask = ['module load plgrid/tools/python-intel/3.6.2']
cluster.config.scratch = '$SCRATCH'

save_environment()

`idact` still needs to connect to the login node from the compute node this notebook is probably running on.

It will need an SSH key added to `authorized_keys`, so you can either provide its path, or do nothing and let it generate and install one.

In [4]:
# cluster.config.key = os.path.expanduser('~/.ssh/id_rsa')
# cluster.config.install_key = False

## Test connection

In [5]:
node = cluster.get_access_node()
node

Node(pro.cyfronet.pl:22, None)

On your first action, you may be asked for a password to install the SSH key.

In [6]:
node.connect()
save_environment()  # Never install the key again.

2018-11-15 00:30:41 INFO: Installing key using password authentication.
Password for plggarstka@pro.cyfronet.pl:22: 
2018-11-15 00:30:46 INFO: Private key not specified.


In [7]:
node.run('whoami')

'plggarstka'

In [8]:
node.run('hostname')

'login01.pro.cyfronet.pl'

## Pull nodes deployment

To deploy Dask, you will need the allocation from the local notebook:

In [9]:
deployments = cluster.pull_deployments()
deployments

2018-11-15 00:31:06 INFO: Pulling deployments.
2018-11-15 00:31:08 INFO: Creating the ssh directory.
2018-11-15 00:31:14 INFO: Pulled allocation deployment: Nodes([Node(p0097:57204, 2018-11-14 23:49:33.998070+00:00),Node(p0100:44128, 2018-11-14 23:49:33.998070+00:00)], SlurmAllocation(job_id=14194946))


SynchronizedDeployments(nodes=1, jupyter_deployments=0)

In [10]:
nodes = deployments.nodes[0]
nodes

Nodes([Node(p0097:57204, 2018-11-14 23:49:33.998070+00:00),Node(p0100:44128, 2018-11-14 23:49:33.998070+00:00)], SlurmAllocation(job_id=14194946))

## Deploy Dask

In [11]:
dd = deploy_dask(nodes)
dd

2018-11-15 00:31:18 INFO: Deploying Dask on 2 nodes.
2018-11-15 00:31:18 INFO: Connecting to p0097:57204 (1/2).
2018-11-15 00:31:19 INFO: Connecting to p0100:44128 (2/2).
2018-11-15 00:31:20 INFO: Deploying scheduler on the first node: p0097.
2018-11-15 00:31:45 INFO: Retried and failed: config.retries[Retry.OPEN_TUNNEL].{count=3, seconds_between=5}
2018-11-15 00:31:45 ERROR: Failure: Adding last hop.
2018-11-15 00:31:46 INFO: Bound to port 42460 instead.
2018-11-15 00:31:46 INFO: Checking scheduler connectivity from p0097 (1/2).
2018-11-15 00:31:46 INFO: Checking scheduler connectivity from p0100 (2/2).
2018-11-15 00:31:46 INFO: Deploying workers.
2018-11-15 00:31:46 INFO: Deploying worker 1/2.
2018-11-15 00:32:00 INFO: Deploying worker 2/2.
2018-11-15 00:32:20 INFO: Validating worker 1/2.
2018-11-15 00:32:20 INFO: Validating worker 2/2.


DaskDeployment(scheduler=tcp://localhost:44628/tcp://172.20.64.97:51347, workers=2)

Get Dask client:

In [12]:
client = dd.get_client()
client

0,1
Client  Scheduler: tcp://localhost:44628  Dashboard: http://localhost:55995/status,Cluster  Workers: 2  Cores: 4  Memory: 21.47 GB


Perform a sample computation:

In [13]:
x = client.submit(lambda value: value + 1, 10)

In [14]:
x.result() == 11

True

Diagnostics servers are tunnelled:

In [15]:
dd.diagnostics.addresses

['http://localhost:42460/status',
 'http://localhost:35992/main',
 'http://localhost:45989/main']

They can't be opened in a browser, because this notebook is on the cluster.

See instructions in the local notebook on how to access diagnostics from your local computer.

## Continue with local notebook

Perform the rest of instructions in the local notebook.

## Cancel Dask deployment (optional)

In [17]:
client.close()

In [18]:
dd.cancel()  # Will be killed anyway when allocation is cancelled.

2018-11-15 00:33:29 INFO: Cancelling worker deployment on p0100.
2018-11-15 00:33:35 INFO: Cancelling worker deployment on p0097.
2018-11-15 00:33:41 INFO: Cancelling scheduler deployment on p0097.
