# Dask clusters

(the material is based on the notebook https://github.com/jrbourbeau/hacking-dask)

This notebook covers Dask's distributed clusters in detail.

## Cluster overview

In this section we'll discuss:

1. The different components which make up a Dask cluster
2. Survey different ways to launch a cluster

<img src="img/dask-system.png" width="600">

## Dask configuration file

Taking full advantage of Dask sometimes requires user configuration. This might be to control logging verbosity, specify cluster configuration, provide credentials for security, or any of several other options that arise in production.

Configuration is specified in one of the following ways:

* YAML files in ~/.config/dask/ or /etc/dask/ or $DASK_CONFIG

* Environment variables like `DASK_DISTRIBUTED__SCHEDULER__WORK_STEALING=True`

* Default settings within other Dask sub-libraries

On coffea-casa facility we already preconfigured everything for you, so please change configurations wisely :)

In [1]:
import dask

In [2]:
dask.config.config

{'distributed': {'scheduler': {'allowed-failures': 10,
   'bandwidth': 1000000000,
   'work-stealing': False},
  'worker': {'memory': {'target': False,
    'spill': False,
    'pause': 0.8,
    'terminate': 0.95},
   'profile': {'interval': '1d', 'cycle': '2d', 'low-level': False}},
  'diagnostics': {'nvml': False},
  'version': 2,
  'dashboard': {'link': '/user/{JUPYTERHUB_USER}/proxy/{port}/status'},
  'admin': {'system-monitor': {'gil': {'enabled': True, 'interval': '10us'}},
   'tick': {'limit': '5s'}},
  'comm': {'require-encryption': True,
   'tls': {'ciphers': None,
    'ca-file': '/etc/cmsaf-secrets/ca.pem',
    'scheduler': {'cert': '/etc/cmsaf-secrets/hostcert.pem',
     'key': '/etc/cmsaf-secrets/hostcert.pem'},
    'worker': {'key': None, 'cert': None},
    'client': {'key': '/etc/cmsaf-secrets/hostcert.pem',
     'cert': '/etc/cmsaf-secrets/hostcert.pem'}}}},
 'jobqueue': {'coffea-casa': {'name': 'dask-worker',
   'cores': 1,
   'memory': '3GiB',
   'processes': 1,
   'wor

In [3]:
dask.config.get("distributed.comm.require-encryption")

True

In [4]:
#dask.config.set({'distributed.comm.require-encryption': False})

<dask.config.set at 0x7fe124d5fdf0>

In [None]:
#dask.config.get("distributed.comm.require-encryption")

### Components of a cluster

A Dask cluster is composed of three different types of objects:

1. **Scheduler**: A single, centralized scheduler process which responds to requests for computations, maintains relavant state about tasks and worker, and sends tasks to workers to be computed.
2. **Workers**: One or more worker processes which compute tasks and store/serve their results.
3. **Clients**: One or more client objects which are the user-facing entry point to interact with the cluster.

A couple of notes about workers:

- Each worker runs in its own Python process. Each worker Python process has its own `concurrent.futures.ThreadPoolExecutor` which is uses to compute tasks in parallel.
- There's actually a fourth cluster object which is often not discussed: the **Nanny**. By default Dask workers are launched and managed by a separate nanny process. This separate process allows workers to restart themselves if you want to use the `Client.restart` method, or to restart workers automatically if they get above a certain memory limit threshold.

<img src="img/dask-cluster.svg" width="600">

#### Related Documentation

- [Cluster architecture](https://distributed.dask.org/en/latest/#architecture)
- [Journey of a task](https://distributed.dask.org/en/latest/journey.html)

## Deploying Dask clusters

Deploying a Dask cluster means launching scheduler, worker, and client processes and setting up the appropriate network connections so these processes can communicate with one another. Dask clusters can be lauched in a few different ways which we will discuss later.



## Cluster managers 

Dask has the notion of cluster manager objects. Cluster managers offer a consistent interface for common activities like adding/removing workers to a cluster, retrieving logs, etc.

<img src="img/dask-cluster-manager.svg" width="600">

### Dask LocalCluster cluster manager

LocalCluster creates a "cluster" of a scheduler and workers running on the local machine.

Creating a cluster object will create a Dask scheduler and a number of Dask workers. If no arguments are specified then it will autodetect the number of CPU cores your system has and the amount of memory and create workers to appropriately fill that. You can also specify these arguments yourself.

You can create a dask.distributed scheduler by importing and creating a Client with no arguments. This overrides whatever default was previously set.

In [None]:
#from dask.distributed import Client
#client = Client()

In [None]:
#client.close()

Dask works well at many scales ranging from a single machine to clusters of many machines. In our case we provide each user already preconfigured resource ready to be scale.

### Dask-jobqueue library

The Dask-jobqueue project makes it easy to deploy Dask on common job queuing systems typically found in high performance supercomputers, academic research institutions, and other clusters. It provides a convenient interface that is accessible from interactive systems like Jupyter notebooks, or batch jobs.

Launching clusters will follows a similar pattern as using Dask's built-in `LocalCluster`:

```python

# Launch a Dask cluster on a HTCondor job queueing system [For this you will need HTCondor related configurations]
from dask_jobqueue import HTCondorCluster
cluster = HTCondorCluster(...)


# Launch a Dask cluster on a SLURM job queueing system [For this you will need SLURM related configurations]
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(...)


# Launch a Dask cluster on a PBS job queueing system [For this you will need PBS related configurations]
from dask_jobqueue import PBSCluster
cluster = PBSCluster(...)


# Launch a Dask cluster on a Kubernetes cluster [For this you will need Kubernetes related configurations]
from dask_kubernetes import KubeCluster
cluster = KubeCluster(...)

```

### CoffeaCasaCluster cluster manager

The "scale out" process at Coffea-Casa Analysis Facility is accomplished by using custom dask-jobqueue class that helps easily deploy Dask worker over UNL Tier-2 HTCondor batch queue or Kubernetes cluster available at UNL.

The dask Client is the primary entry point for users of `dask.distributed`.

We pre-configured a Dask cluster for you automatically, and you just need to initialize a Client by pointing it to the address of a Scheduler (in coffea-casa it is always `tls://localhost:8786`):

In [None]:
from dask.distributed import Client

client = Client("tls://localhost:8786")
client