# Dask Workflow - Remote

This notebook is intended to be executed on the cluster as a continuation of notebook `02a-DaskWorkflow-Local.ipynb`.

## Initial setup

`idact` should be installed using pip, e.g:

```
module load plgrid/tools/python-intel/3.6.2
python3 -m pip install --user git+https://github.com/garstka/eng-project.git
```

In [1]:
import logging

## Load cluster

If you pushed the environment from local notebook, this will load it:

In [2]:
from idact import *

load_environment()

Alternatively, use `add_cluster`, as described in notebook 01.

Then, show the cluster:

In [3]:
cluster = show_cluster("pro")  # replace with your cluster name if necessary
cluster

Cluster(pro.cyfronet.pl, 22, plggarstka, auth=AuthMethod.PUBLIC_KEY, key=None, install_key=True, disable_sshd=False)

The following is not necessary if you pushed the environment:

In [4]:
set_log_level(logging.INFO)
#set_log_level(logging.DEBUG)
cluster.config.setup_actions.dask = ['module load plgrid/tools/python-intel/3.6.2']
cluster.config.scratch = '$SCRATCH'

save_environment()

In [5]:
node = cluster.get_access_node()
node

Node(pro.cyfronet.pl:22, None)

On your first action, you will be asked for a password to install the key.
You can connect explicitly (optional) to do this right now:

In [6]:
node.connect()

2018-11-02 22:35:31 INFO: Installing key using password authentication.
Password for plggarstka@pro.cyfronet.pl:22: 
2018-11-02 22:35:36 INFO: Private key not specified.


In [7]:
node.run('whoami')

'plggarstka'

In [8]:
node.run('hostname')

'login01.pro.cyfronet.pl'

## Pull nodes deployment

To deploy Dask, you will need the allocation from the local notebook:

In [9]:
deployments = cluster.pull_deployments()
deployments

2018-11-02 22:35:51 INFO: Pulling deployments.
2018-11-02 22:35:54 INFO: Creating the ssh directory.
2018-11-02 22:36:00 INFO: Pulled allocation deployment: Nodes([Node(p0653:51136, 2018-11-02 21:54:16.293426+00:00),Node(p0657:56621, 2018-11-02 21:54:16.293426+00:00)], SlurmAllocation(job_id=13969005))


SynchronizedDeployments(nodes=1, jupyter_deployments=0)

In [10]:
nodes = deployments.nodes[0]
nodes

Nodes([Node(p0653:51136, 2018-11-02 21:54:16.293426+00:00),Node(p0657:56621, 2018-11-02 21:54:16.293426+00:00)], SlurmAllocation(job_id=13969005))

## Deploy Dask

One-time config step (cluster-specific):

In [11]:
dd = deploy_dask(nodes)
dd

2018-11-02 22:36:08 INFO: Deploying Dask on 2 nodes.
2018-11-02 22:36:08 INFO: Connecting to p0653:51136 (1/2).
2018-11-02 22:36:09 INFO: Connecting to p0657:56621 (2/2).
2018-11-02 22:36:10 INFO: Deploying scheduler on the first node: p0653.


2018-11-02 22:36:20,216| ERROR   | Problem setting SSH Forwarder up: Couldn't open tunnel :48170 <> 127.0.0.1:48170 might be in use or destination not reachable




2018-11-02 22:36:25,382| ERROR   | Problem setting SSH Forwarder up: Couldn't open tunnel :48170 <> 127.0.0.1:48170 might be in use or destination not reachable




2018-11-02 22:36:30,547| ERROR   | Problem setting SSH Forwarder up: Couldn't open tunnel :48170 <> 127.0.0.1:48170 might be in use or destination not reachable




2018-11-02 22:36:35,700| ERROR   | Problem setting SSH Forwarder up: Couldn't open tunnel :48170 <> 127.0.0.1:48170 might be in use or destination not reachable


2018-11-02 22:36:35 ERROR: Failure: Adding last hop.


2018-11-02 22:36:36,874| ERROR   | Could not establish connection from ('127.0.0.1', 35261) to remote side of the tunnel
2018-11-02 22:36:36,877| ERROR   | Exception: Error reading SSH protocol banner
2018-11-02 22:36:36,880| ERROR   | Traceback (most recent call last):
2018-11-02 22:36:36,881| ERROR   |   File "/net/people/plggarstka/.local/lib/python3.6/site-packages/paramiko/transport.py", line 2138, in _check_banner
2018-11-02 22:36:36,882| ERROR   |     buf = self.packetizer.readline(timeout)
2018-11-02 22:36:36,884| ERROR   |   File "/net/people/plggarstka/.local/lib/python3.6/site-packages/paramiko/packet.py", line 367, in readline
2018-11-02 22:36:36,885| ERROR   |     buf += self._read_timeout(timeout)
2018-11-02 22:36:36,886| ERROR   |   File "/net/people/plggarstka/.local/lib/python3.6/site-packages/paramiko/packet.py", line 563, in _read_timeout
2018-11-02 22:36:36,887| ERROR   |     raise EOFError()
2018-11-02 22:36:36,888| ERROR   | EOFError
2018-11-02 22:36:36,889| ERROR

2018-11-02 22:36:42 INFO: Bound to port 36969 instead.
2018-11-02 22:36:42 INFO: Checking scheduler connectivity from p0653 (1/2).
2018-11-02 22:36:42 INFO: Checking scheduler connectivity from p0657 (2/2).
2018-11-02 22:36:42 INFO: Deploying workers.
2018-11-02 22:36:42 INFO: Deploying worker 1/2.
2018-11-02 22:36:50 INFO: Deploying worker 2/2.
2018-11-02 22:36:58 INFO: Validating worker 1/2.
2018-11-02 22:36:58 INFO: Validating worker 2/2.


DaskDeployment(scheduler=tcp://localhost:35492/tcp://172.20.66.143:37209, workers=2)

Get Dask client:

In [12]:
client = dd.get_client()
client

0,1
Client  Scheduler: tcp://localhost:35492  Dashboard: http://localhost:48170/status,Cluster  Workers: 2  Cores: 4  Memory: 21.47 GB


Perform a sample computation:

In [13]:
x = client.submit(lambda value: value + 1, 10)

In [14]:
x.result() == 11

True

Diagnostics servers are tunnelled:

In [15]:
dd.diagnostics.addresses

['http://localhost:36969', 'http://localhost:36552', 'http://localhost:46206']

They can't be opened in a browser, because this notebook is on the cluster.

See instructions in the local notebook on how to access diagnostics from your local computer.

## Continue with local notebook

Perform the rest of instructions in the local notebook.

## Cancel Dask deployment

In [16]:
client.shutdown()

In [17]:
dd.cancel()  # Optional, will be killed when allocation is cancelled.

2018-11-02 22:38:51 INFO: Cancelling worker deployment on p0657.
2018-11-02 22:38:57 INFO: Cancelling worker deployment on p0653.
2018-11-02 22:39:03 INFO: Cancelling scheduler deployment on p0653.
