# Parallelizing Boutiques execution on a Slurm Cluster

The following cells show how to distribute a Python function on a Slurm Cluster. This notebook must be run on the cluster headnode. After starting Jupyter on the headnode, you will need to start an ssh tunnel from your computer (`my_computer`) by running the following `sshuttle` command:

```
[name@my_computer ~]$ sshuttle --dns -Nr [USERNAME]@narval.computecanada.ca
```

## Setting up the cluster

In [3]:
# Required pip packages: dask[complete], dask-jobqueue

from dask.distributed import Client
from dask_jobqueue import SLURMCluster
import os

In [7]:
# n_worker x n_cores is the total number of jobs that can be run concurrently
n_worker = 2
n_cores = 2
hostname = os.environ["HOSTNAME"]
# See documentation at https://jobqueue.dask.org/en/latest/generated/dask_jobqueue.SLURMCluster.html
cluster = SLURMCluster(scheduler_options={"host": hostname}, account="rrg-glatard", cores=n_cores, memory='4GB')
client = Client(cluster)
cluster.scale(jobs=n_worker)

Perhaps you already have a cluster running?
Hosting the HTTP server on port 36791 instead


In [24]:
client # Check the dashboard to monitor your tasks

0,1
Connection method: Cluster object,Cluster type: dask_jobqueue.SLURMCluster
Dashboard: http://10.80.49.3:36791/status,

0,1
Dashboard: http://10.80.49.3:36791/status,Workers: 0
Total threads: 0,Total memory: 0 B

0,1
Comm: tcp://10.80.49.3:45225,Workers: 0
Dashboard: http://10.80.49.3:36791/status,Total threads: 0
Started: 23 hours ago,Total memory: 0 B


## Running applications on the cluster

In [16]:
import dask
import time

# Change this function so that it runs the Boutiques task
def f(x): 
    time.sleep(10)
    return 5+x

fd = dask.delayed(f)

In [17]:
fd(3)

Delayed('f-cf829ae5-ef72-480c-94c9-297aac7ec41a')

In [18]:
fd(3).compute()

8

In [22]:
a = [fd(x) for x in range(100)]
dask.compute(*a)

(5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104)

Task exception was never retrieved
future: <Task finished name='Task-32341' coro=<Client._gather.<locals>.wait() done, defined at /home/glatard/venvs/livingpark/lib/python3.10/site-packages/distributed/client.py:2054> exception=AllExit()>
Traceback (most recent call last):
  File "/home/glatard/venvs/livingpark/lib/python3.10/site-packages/distributed/client.py", line 2063, in wait
    raise AllExit()
distributed.client.AllExit
