# Dask Workflow - Remote

This notebook is intended to be executed on the cluster as a continuation of notebook `02a-DaskWorkflow-Local.ipynb`.

## Initial setup

`idact` should be installed using pip, e.g:

```
module load plgrid/tools/python-intel/3.6.2
python3 -m pip install --user git+https://github.com/garstka/eng-project.git
```

## Load cluster

If you pushed the environment from local notebook, this will load it:

In [1]:
from idact import *

load_environment()

Alternatively, use `add_cluster`, as described in notebook 01.

Then, show the cluster:

In [2]:
cluster = show_cluster("pro")  # replace with your cluster name if necessary
cluster

Cluster(pro.cyfronet.pl, 22, plggarstka, auth=AuthMethod.PUBLIC_KEY, key='/net/people/plggarstka/.ssh/id_rsa_ip', install_key=False, disable_sshd=False)

## Set up cluster

The following is not necessary if you pushed the environment:

In [3]:
cluster.config.setup_actions.dask = ['module load plgrid/tools/python-intel/3.6.2']
cluster.config.scratch = '$SCRATCH'

save_environment()

`idact` still needs to connect to the login node from the compute node this notebook is probably running on.

It will need an SSH key added to `authorized_keys`, so you can either provide its path, or do nothing and let it generate and install one.

In [4]:
# cluster.config.key = os.path.expanduser('~/.ssh/id_rsa')
# cluster.config.install_key = False

## Test connection

In [5]:
node = cluster.get_access_node()
node

Node(pro.cyfronet.pl:22, None)

On your first action, you may be asked for a password to install the SSH key.

In [6]:
node.connect()
save_environment()  # Never install the key again.

In [7]:
node.run('whoami')

'plggarstka'

In [8]:
node.run('hostname')

'login01.pro.cyfronet.pl'

## Pull deployments

You can now access any deployments you pushed in the first notebook:

In [9]:
deployments = cluster.pull_deployments()
deployments

2018-11-17 21:33:43 INFO: Pulling deployments.
2018-11-17 21:33:45 INFO: Creating the ssh directory.
2018-11-17 21:34:02 INFO: Desired local tunnel port 33867 is taken. Binding to random port instead.
2018-11-17 21:34:03 INFO: Desired local tunnel port 48132 is taken. Binding to random port instead.
2018-11-17 21:34:04 INFO: Desired local tunnel port 42937 is taken. Binding to random port instead.
2018-11-17 21:34:16 INFO: Pulled allocation deployment: Nodes([Node(p0207:59354, 2018-11-17 20:40:07.927042+00:00),Node(p0213:50495, 2018-11-17 20:40:07.927042+00:00)], SlurmAllocation(job_id=14230445))
2018-11-17 21:34:16 INFO: Pulled Jupyter deployment: JupyterDeployment(8080 -> Node(p0207:59354, 2018-11-17 20:40:07.927042+00:00)
2018-11-17 21:34:16 INFO: Pulled Dask deployment: DaskDeployment(scheduler=tcp://localhost:33573/tcp://172.20.64.207:33867, workers=2)


SynchronizedDeployments(nodes=1, jupyter_deployments=1, dask_deployments=1)

In [10]:
nodes = deployments.nodes[-1] if deployments.nodes else None
nodes

Nodes([Node(p0207:59354, 2018-11-17 20:40:07.927042+00:00),Node(p0213:50495, 2018-11-17 20:40:07.927042+00:00)], SlurmAllocation(job_id=14230445))

In [11]:
nb = (deployments.jupyter_deployments[-1]
      if deployments.jupyter_deployments else None)
nb

JupyterDeployment(8080 -> Node(p0207:59354, 2018-11-17 20:40:07.927042+00:00)

We're most interested in the Dask deployment:

In [12]:
dd = deployments.dask_deployments[-1]
dd

DaskDeployment(scheduler=tcp://localhost:33573/tcp://172.20.64.207:33867, workers=2)

## Working with Dask

Get a Dask client:

In [13]:
client = dd.get_client()
client

0,1
Client  Scheduler: tcp://localhost:33573  Dashboard: http://localhost:48132/status,Cluster  Workers: 2  Cores: 4  Memory: 21.47 GB


Perform a simple computation:

In [14]:
x = client.submit(lambda value: value + 1, 10)

In [15]:
x.result() == 11

True

## Continue with the local notebook

Perform the rest of instructions in the local notebook.