<h1>Dask workers - the cluster</h1>
Dask uses a cluster of workers to perform operations. 

<h4>Cluster types</h4>
<ul><li>By default, if you don't specify, Dask creates a local (same pc) cluster of multiple threads</li></ul>
Threads are subprocesses (share same core & memory space), fast to create and manage and good for general tasks, mostly io.
<ul><li><h4>Dask distributed</h4> is an enhanced cluster manager that can create threaded and multiprocess clusters on your locahost, a cluster of ssh connected servers, a cloud provider and a HPC like Orion</li></ul>
<b>Use distributed when scaling up a cluster from the default.</b>

In [None]:
from dask.distributed import Client, progress
client = Client()


In [None]:
#or more explicityly
cluster = LocalCluster()
client = Client(cluster)

This creates a multiprocess local cluster, perfect for general use without having to tune.<br>Always clean up by explicitly shutting down the cluster

In [None]:
client.shutdown()

Distributed clusters create a dashboard to visualize activity.  This is at port 8787 by default, or a random port if that is taken

In [None]:
client

The scheduler port is used by workers to cooridinate activity. The dashboard port is on the machine <b>where the code is run from</b>.<br><br>
If <b>running on your laptop</b>, you can just open url http://localhost:[port]<br><br>
If you have ssh'd into <b>GML servers</b> you will need to use <b>port redirection</b> to access the dashboard on the remote server:<br>
ssh [server] -L (local redirect) [localport]:[remote host]:[remote port]

ssh nimbus3 -L 8787:localhost:8787

and then in a browser: http://localhost:8787/status


<h4>You can also use Jupyter Lab to create the cluster and hook it in for you</h4>

In [None]:
client

You can tune the local cluster, specifying specific details

In [None]:
from dask.distributed import Client, progress
client = Client(processes=True, threads_per_worker=1,
                n_workers=4, memory_limit='1GB')
client

<h4>SSHCluster</h4> creates workers on ssh connected machines, creating a cluster of local servers.  This requires all servers to have ssh key exchange set up and a consistent python environment.  Our GML servers can do this easily with NFS mounts and one time key setup.

In [None]:
#The nimbi cluster:
from dask.distributed import Client, progress, SSHCluster
cluster = SSHCluster(['nimbus4','nimbus','nimbus2','nimbus3'],    #First server is the scheduler
     scheduler_options={"dashboard_address": ":8884"},            #Specify a specific port so you can set up redirector
     worker_options={'nprocs': 2,'nthreads': 1})                  #Specify the number of workers per host machine
client = Client(cluster)
client

<h2>With great power comes great responsibility!</h2>This can easily overwhelm the servers, bringing the wrath of IT down upon you.<br>Use these sparingly, in off hours.  Coordinate with IT for large jobs.<br>
<b>Limit to no more than 2 processes & 2 threads per host.</b>

<h4>Make sure to shutdown cleanly</h4>

In [None]:
client.shutdown()

<h3>dask_jobqueue</h3>Lets you create and distribute jobs on nodes of a HPC.<br>
SLURMCluster can be used to submit jobs on the Orion HPC which uses the SLURM workload manager

In [None]:
from dask.distributed import Client, progress
from dask_jobqueue import SLURMCluster
slurmextra=['-p orion','-q batch','--mail-user=john.mund@noaa.gov','--time=00:05:00']
cluster=SLURMCluster(project='co2',cores=6,processes=6,memory='1GB',log_directory='./logs',job_extra=slurmextra)
cluster.scale(jobs=3)
client = Client(cluster)
print(cluster.job_script())

This allows you to seemlessly scale code from local cluster on your laptop, to MP cluser, to SSH cluster to HPC or cloud based hosts.