##  Dask on SLURM

### Preliminaries (Arnes):

- Existing Lmod Anaconda3 module (Python 3.6) : `module load Anaconda3`
- Anaconda environment: `conda create -n bd39 python=3.9 anaconda xarray streamz`
- SLURM cluster: `(bd39) pip install dask_jobqueue`
- Jupyter kernel: `(bd39) python -m ipykernel install --user --name bd39 --display-name "Python (bd39)"`

### Login

Beware: notebooks in ~/notebooks. Use the same login node thoroughout the session!
- SSH: `ssh mkukar@hpc-login2.arnes.si`
- Anaconda env: `module load Anaconda3 && conda activate bd39`
- Jupyter: `jupyter notebook --no-browser --port=8080 --notebook-dir ~/notebooks`
- SSH tunnel: `ssh -N -o ServerAliveInterval=30 -L localhost:8080:localhost:8080 -L localhost:8087:localhost:8087 mkukar@hpc-login2.arnes.si"`
- Notebook: [`http://localhost:8080`](http://localhost:8080)
- Dashboard: [http://localhost:8087](http://localhost:8087) 
- SSH tunnel: `ssh -N -L 8080:localhost:8080 mkukar@hpc-login2.arnes.si`

### SLURM 

- SLURM info: `sinfo`
- Show all my jobs: `squeue --me`
- Cancel all my jobs: `squeue -u $USER | awk '{print $1}' | tail -n+2 | xargs scancel`

### References
1. SlurmCluster: http://jobqueue.dask.org/en/latest/generated/dask_jobqueue.SLURMCluster.html
2. Dask on Summit: https://blog.dask.org/2019/08/28/dask-on-summit
3. Dask on HPC: https://www.youtube.com/watch?v=FXsgmwpRExM&t=872s
4. Chunk sizes: https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes

In [1]:
import warnings; warnings.simplefilter(action='ignore', category=FutureWarning)
import dask_jobqueue
from distributed import Client
from contextlib import suppress 
import platform; print( platform.python_version () )

3.9.7


In [1]:
with suppress(Exception):
    client.shutdown()

cluster = dask_jobqueue.SLURMCluster(
            queue = 'all', 
            processes=2,
            cores=16, 
            memory='32GB',
            scheduler_options={'dashboard_address': ':8087'},
            death_timeout=120 # seconds
          )

client = Client(cluster, timeout="120s")
display(client.cluster)

NameError: name 'suppress' is not defined

In [29]:
client.cluster.scale(16)
display(client.cluster)

Tab(children=(HTML(value='<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-outpu…

## Test with bag

In [35]:
import dask.bag as db

def bag_map(x):
    import platform, socket; 
    version = platform.python_version() 
    ip      = socket.gethostbyname(socket.gethostname()) 
    return (ip, version)

data = list(range(1,16+1))
b = db.from_sequence(data, npartitions=len(data))
print(b)

bm = b.map(bag_map)
bmc = bm.compute()
print(set(bmc), bmc)


dask.bag<from_sequence, npartitions=16>
{('153.5.72.252', '3.9.7')} [('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7'), ('153.5.72.252', '3.9.7')]


## Test with array

In [None]:
import dask.array as da
x = da.random.random((10000, 10000), chunks=(1000, 1000))
display(x)
y = x + x.T
z = y[::2, 5000:].mean(axis=1)
display(z)
print(client.cluster.job_script())

In [None]:
zz = z.compute()
print(len(zz))

# Shutdown

In [None]:
client.shutdown()