# Running Clusters

*Cascadia City's CTO hasn't decided if the cloud makes sense, or on-prem Kubernetes, or something else entirely. But that hasn't added any slack to our schedule: we need to demo a Dask cluster and explain it in simple terms. The mayor has offered us a server to install on...*

We'll start with the most simple and least automated, which will allow us to easily see the machinery "under the hood," although this manual approach is not likely the best solution for real deployments.

After we're familiar with the core components, we'll review a series of tools that let us deploy and manage clusters in a more scalable and devops-friendly way.

<img src='images/dask.svg' width=800>

## Dask's Cast of Characters

### The Scheduler

Let's start a scheduler.

1. To make it simple to see what's happening, kill any running Jupyter kernels.
2. Open a new terminal in Jupyter
3. Type `dask-scheduler`

After a few seconds, you should see output that looks something like this

```
distributed.scheduler - INFO - -----------------------------------------------
distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:    tcp://192.168.1.5:8786
distributed.scheduler - INFO -   dashboard at:                     :8787
```

This tells us how workers will talk to the scheduler (the `tcp://` uri) and how we can view the dashboards (the port underneath). Notice that the scheduler process is serving the main dashboard. That's why, when the scheduler is "bogged down" with work, the dashboards are less responsive. 

__What does this scheduler do?__

The scheduler is a Python process which serves as the brains of the Dask cluster. The scheduler ...
* is the component that users communicate with to schedule work
* decides which work to send where on the cluster
* optimizes or rearranges task graphs for better throughput

Just to be extra clear on what's happened, open another terminal in Jupyter and run

`ps -efww | grep python`

You should be able to identify your dask-scheduler process. Note its PPID (parent process ID). Now run

`ps -efww | grep <ppid>` where you replace `<ppid>` with the PPID you found above. That should be the shell you used to start the scheduler.

> __Tip:__ Command-line options for the scheduler are at https://docs.dask.org/en/latest/setup/cli.html

### The Worker

We could try and talk to the scheduler and run some work ... but so far there's no one to actually do the work.

Remember: the scheduler plans the work, mediates communication, and can even help with managing metadata ... but it doesn't actually run your real Python workload tasks.

For those, we need at least one worker.

1. Open a new terminal in Jupyter
2. Type `dask-worker --no-nanny <tcp://...>` where `<tcp://...>` is replaced with the `tcp://` uri from the scheduler output

You should see output something like this:
```
distributed.worker - INFO -       Start worker at:    tcp://192.168.1.5:60278
distributed.worker - INFO -          Listening to:    tcp://192.168.1.5:60278
distributed.worker - INFO -          dashboard at:          192.168.1.5:60279
distributed.worker - INFO - Waiting to connect to:     tcp://192.168.1.5:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          8
distributed.worker - INFO -                Memory:                   17.18 GB
distributed.worker - INFO -       Local Directory: /foo/bar/baz/dask-worker-space/worker-hratgouz
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -         Registered to:     tcp://192.168.1.5:8786
distributed.worker - INFO - -------------------------------------------------
distributed.core - INFO - Starting established connection

```

Choose the Jupyer tab you used earlier to run `ps` and try it again (filtering for python).

You should see, aside from your Jupyter process, the scheduler and the worker and nothing else.

> Command line options for the worker process are also at https://docs.dask.org/en/latest/setup/cli.html

__What does this worker do?__

The worker will execute tasks -- Python functions -- for the end user, once we have a way to send work to it via the scheduler.

The worker also has its own dashboard, which you can access at the URL indicated.
* Add `/status` to the URL
* If you do not have direct routing to the indicated host and port, you won't be able to load the dashboard directly.
    * You may be able to get to it by identifying your Jupyter URL and using the /proxy/<port> functionality

Flip back to the terminal tab hosting the scheduler. You should see additional output like this
    
```
distributed.scheduler - INFO - Register worker <Worker 'tcp://192.168.1.5:60278', name: tcp://192.168.1.5:60278, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://192.168.1.5:60278
```
  
Indicating the scheduler has registered the worker and could send it tasks.

### The Client

Time to get to work!

Let's create a `Client` object and use it to run some code.

1. Open another terminal tab
2. Run `ipython`
3. Enter the following code, substituting your scheduler's uri for `<scheduler>`

```
from distributed import Client

c = Client('<scheduler>')
```

You should see output like
`<Client: 'tcp://192.168.1.5:8786' processes=1 threads=8, memory=17.18 GB>`

Meanwhile, the scheduler's output should add

`distributed.scheduler - INFO - Receive client connection: Client-c56e5898-f5ee-11ea-a83b-3c15c2cadbf2`

and `ps` should show your `ipython` process __but no additional process for the Client, since the Client is just a Python object in-process with the code that uses it (in this case, IPython shell)__

Now let's try running some work:

```python
def add_numbers(to): 
    return sum(range(to)) 
                                                                                                                                                   
f = c.submit(add_numbers, 1000)

f.result()
```

You should see the result (499500). If you're not totally convinced that you've used the scheduler to run a task on the worker, you can check:

Your main dashboard should show the `add_numbers` task in the Task Stream, Memory by Key, and Task Graph

How can we see where this task ended up? Click the `Info` tab, then the URL link for your worker. Drilling down, you should get to a screen like this, indicating your worker's details and the `add_numbers` task details.

<img src='images/worker.png' width=800>

### The Nanny

__Motivating the Nanny__

In the terminal tab you've been using for `ps`, run that `ps` command again and identify the PID of the worker.

Kill it: `kill <PID>`

The scheduler now emits

```
distributed.scheduler - INFO - Remove worker <Worker 'tcp://192.168.1.5:60278', name: tcp://192.168.1.5:60278, memory: 1, processing: 0>
distributed.core - INFO - Removing comms to tcp://192.168.1.5:60278
distributed.scheduler - INFO - Lost all workers
```

Let's wait a second and see if things improve.

Spoiler alert: not really. 

Try running the IPython code to submit a task again.

Nothing... go ahead and hit CTRL-C

__What do we do about our worker?__

We *could* manually start a new one. 

If our worker dies for an unknown/unwanted reason, like in the example above, we probably want a new worker. We could write a script that does this for us. We might want that script to keep track of other issues in the worker, like if it's running out of resources.

These common uses cases motivate the __Nanny__, a process that serves *in loco parentis* of our worker, to keep an eye on it.

In the terminal where we started our worker earlier, go "up" a line in the history and remove `--no-nanny` and run the command like this:

`dask-worker <tcp://...>` where <tcp://...> is the scheduler address.

You should see output like earlier, except, this time, the first line should read

`distributed.nanny - INFO -         Start Nanny at: 'tcp://192.168.1.5:62444'`

__Check that "it works"__

1. In the terminal tab where your scheduler lives, check that a worker is registered
2. In your IPython tab, run the code again -- you should get the result
3. In your `ps` tab, run the PS again...
    * You should see a `dask-worker` -- that's actually the nanny ("parent") process
    * You should see two entries that look like `python -c from multiprocessing...`
        * Ignore the one that says `semaphore_tracker` for now; that's a resource-tracking helper
        * The one that says `from multiprocessing.spawn...` is the worker
4. Kill the PID corresponding to the worker

Look at the worker's terminal tab. You should see two messages from the nanny:
```
distributed.nanny - INFO - Worker process 10511 was killed by signal 15
distributed.nanny - WARNING - Restarting worker
```

before the worker's greeting

`distributed.worker - INFO -       Start worker at:    tcp://192.168.1.5:62767`

Just to wrap up, run your IPython code block again. All should be well.

> __What if the nanny dies?__
>
> Go ahead and try it
>
> You'll notice that, as the parent process, if the nanny dies then it's game-over for that worker
>
> Doesn't that mean the nanny is pointless, and just "kicks the can down the road" in terms of failure?
>
> Not really: if the goal were to prevent chaos-monkey style random process termination, then yes.
>
> But the true purpose of the nanny is to recover when the worker dies "from inside" -- i.e., some user code causes a fatal fault, the worker runs out of memory, etc.
>
> *In other words, the nanny is not there to keep your cluster running from an outside, devops perspective, but rather from an inside, Dask+user perspective*

### Guest star: the cluster resource manager

By cluster resource resource manager, we mean an external, pre-existing piece of architecture that manages the pieces of your Dask cluster, and makes it easy to run multiple Dask clusters side-by-side.

Examples include
* Kubernetes
* YARN
* Coiled Cloud or other providers-as-a-service

These players are __not__ part of Dask itself.

They are "guest stars" in that they will usually be part of your Dask show, because they add a lot of capability and usability, save a lot of time and money, and will end up in your Dask operations plan.

The purpose of the cluster resource manager is to 
* save you from ever having to start and configure the scheduler, workers, etc. on your own
* provide a uniform, single location to specify configuration
    * including, usually, a container spec that simplifies provisioning dependencies to workers
* allow for a simple cluster scaling API, so that end users can programmatically change their cluster's size without direct access to any underlying services or machines

We usually encounter the cluster resource manager when creating a cluster. We use a helper library that provides an implementation of `Cluster`

Examples include
* `SSHCluster` (built in)
* dask-jobqueue or dask-drmaa and their associated cluster classes like `PBSCluster`
* dask-kubernetes (`KubeCluster`, `HelmCluster`)
* dask-yarn (`YarnCluster`)
* dask-cloudprovider (`FargateCluster`, `AzureMLCluster`)
* coiled (`coiled.Cluster`)

## Managing Clusters

In this section, we'll discuss creating and managing clusters.

As an example, we'll create a cluster using the managed Coiled Cloud service.

The goal here is not to advertise Coiled but rather to 
* Show the Cluster API
* Discuss the aspects of cluster management that Coiled provides, and which large projects, groups, or institutions need to attend to *regardless* of how they choose to deploy Dask

In [None]:
import coiled
from dask.distributed import Client

cluster = coiled.Cluster(name="training-cluster")
cluster

### What sort of cluster to we get? How can we control it?

There are two main categories of configuration, which are often at least somewhat independent

__Cluster configuration__
* number of workers
* cores per worker
* memory per worker
* use of nanny process
* certificate info for TLS
* potentially separate config for the scheduler vs. the workers
* and others, typically less important
    
__Software environment configuration__
* software packages
* package version requirements
* resource files
* anything else you might want on your cluster
    
In Coiled Cloud, these are configurable through
* Web GUI
    * https://cloud.coiled.io/{username}/software    
* CLI
    * https://docs.coiled.io/user_guide/cluster.html
    * https://docs.coiled.io/user_guide/software_environment.html
    
__Other common environments__

For the `LocalCluster`s we've created throughout the class,
* we passed cluster info to the `LocalCluster` constructor
* software environment was inherited from the Python environment where we were already working
* naturally, this is simple to operate, but doesn't scale, since it's __local__ to our machine only
    
A common distributed configuration is the Kubernetes-hosted `KubeCluster`
* cluster spec information is provided through any of...
    * the `KubeCluster` constructor https://kubernetes.dask.org/en/latest/api.html#dask_kubernetes.KubeCluster
    * `from_dict` or `from_yaml` class methods on `KubeCluster` https://kubernetes.dask.org/en/latest/kubecluster.html
    * the config/spec can also be placed in the filesystem instead of supplied programmatically
* software environment is implicit in the *container image* used for the workers and/or scheduler
    * container image is part of a *pod template*
    * the pod templates are part of the same info through which the cluster is configured
        
Similar, though slightly different, patterns are available for, e.g., `YARNCluster`

In [None]:
import pprint

pprint.pprint(cluster.scheduler_info)

To run work, we can create a client from the cluster instance

In [None]:
client = Client(cluster)
client

In [None]:
client.submit(lambda x:x*x, 3).result()

The Cluster object API depends on which implementation you are using, but most implementations allow for scaling.

We can request to scale to a particular number of workers (all based on our current worker and software config)

In [None]:
cluster.scale(4) # this can take some time if we're provisioning machines/containers in the cloud

Scaling down is usually much quicker

In [None]:
cluster.scale(2)

### Adaptive Scaling

Most clusters also support the `adapt` method, assuming they are either local, or on a distributed system which is able to allocate and deallocate workers dynamically.

The fundamental principles of this auto-scaling mechanism are at https://docs.dask.org/en/latest/setup/adaptive.html

The key pieces are summarized here:
Dask...
* has timing data on all previous tasks via its builtin profilers
* uses that info to estimate future task durations
* combines that info, along with the resources available on the workers, to estimate future time per task
* and finally creates a scaling target with the goal of matching total with a `target_duration` parameter (default: 5 seconds)

The `adapt` API offers several sets of optional limits that we can supply
* min/max number of workers
* min/max total cores
* min/max total memory

`adapt` can also accept an `Adaptive` object with a few more parameters https://distributed.dask.org/en/latest/api.html?highlight=scale#adaptive

> __Bonus:__ as we will see in a subsequent module on resilience, the `adapt` call will be useful to ensure that the number of workers stays within desired bounds even if workers fail.

#### Optional Scaling Demo Mini-Lab

Scaling -- both explicit and adaptive -- is easier to see (if a bit less realistic) when using `LocalCluster`. Since launching a worker is just starting a process in the `LocalCluster`, it happens nearly instantly. For a fun exercise, 
1. Create a `LocalCluster` with code like this

```python
from dask.distributed import Client, LocalCluster

cluster = LocalCluster(n_workers=2, threads_per_worker=1, memory_limit='512MiB')
client = Client(cluster)
client
```

2. Start the cluster
3. Open the `Cluster Map` dashboard widget
4. Try some manual scaling and see what happens
5. Now turn on adaptive scaling
6. Kick off a workload and watch the cluster

## Administrative concerns

While the APIs make the user-facing features fairly straightforward, these configurations -- and the different options available via different cluster managers (e.g., k8s, YARN, etc.) -- raise certain concerns around administration and manageability.

### Seeing and specifying config

Although programmatic APIs to configure individual Dask behaviors are great in our coder or analyst role, from an admin point of view it is often useful (or critical) to inspect the current config and to declaratively specify config.

Current runtime config is exposed via

In [None]:
import dask

dask.config.config

Config can be programmatically modified through `dask.config` APIs and and declarative config can be specified through files and/or environment variables, as described at

> https://docs.dask.org/en/latest/configuration.html

The complete config reference is at: https://docs.dask.org/en/latest/configuration-reference.html

### Managing software environments

Containerization is a step in the right direction, but even containers and a registry don't solve every problem. For example, sharing configurations, keeping dependency versions synchronized but also up-to-date (especially around internal, rapidly changing code), interacting with CI and build systems are all important concerns, though outside the scope of Dask itself.

Hosted services (like Coiled) and cloud-native patterns (like Kubernetes) provide the simplest approach.

Other systems, e.g., YARN, are more complex and will likely require additional tooling to keep everything automatically synchronized in an enterprise or large-institution setting (https://yarn.dask.org/en/latest/environments.html)

### Quotas, cost administration, and reporting

Dask itself does not have a fine-grained quota enforcement system nor cost-tracking mechanisms. 

Furthermore, even if there are no specific limits or tracking required, an organization may want or need to produce reports on activity, prodictivity, etc. at different levels of granularity.

While some of this information can be extracted from Dask's metrics (with some dedicated code), these are not high-level user-visible featured of Dask. These concerns which often motivate cloud-based solutions, third-party products, or custom development at large institutions.

### Security

Dask supports point-to-point encryption via TLS, and -- to a limited extent -- an organization's PKI and certificate infrastructure can assist in security by making it harder (or easier) for different individuals/units/groups to communicate with particular endpoint. 

For example, in an unsecured environment, I can connect a `Client` to any scheduler I am able to route to. If the scheduler requires a particular certificate, or a cert issued from a particular CA, then that vulnerability may be limited.

On the whole, however, Dask is about computation more than fine-grained security. It allows users to run potentially untrusted code remotely, and to do so in shared environments whose state (and other users) they may not fully know or trust.