# Distributed

As we covered at the beginning Dask has the ability to run work on mulitple machines using the distributed scheduler.

Until now we have actually been using the distributed scheduler for our work, but just on a single machine.

When we instantiate a `Client()` object with no arguments it will attempt to locate a Dask cluster. It will check your local Dask config and environment variables to see if connection information has been specified. If not it will create an instance of `LocalCluster` and use that.

*Specifying connection information in config is useful for system administrators to provide access to their users. We do this in the [Dask Helm Chart for Kubernetes](https://github.com/dask/helm-chart/blob/master/dask/templates/dask-jupyter-deployment.yaml#L46-L48), the chart installs a multi-node Dask cluster and a Jupyter server on a Kubernetes cluster and Jupyter is preconfigured to discover the distributed cluster.*

## Local Cluster

Let's explore the `LocalCluster` object ourselves and see what it is doing.

In [None]:
from dask.distributed import LocalCluster, Client

cluster = LocalCluster()
cluster

Creating a cluster object will create a Dask scheduler and a number of Dask workers. If no arguments are specified then it will autodetect the number of CPU cores your system has and the amount of memory and create workers to appropriately fill that.

You can also specify these arguments yourself. Let's have a look at the docstring to see the options we have available.

*These arguments can also be passed to `Client` and in the case where it creates a `LocalCluster` they will just be passed on down the line.*

In [None]:
LocalCluster?

Our cluster object has attributes and methods which we can use to access information about our cluster. For instance we can get the log output from the scheduler and all the workers with the `get_logs()` method.

In [5]:
cluster.get_logs()

We can access the url that the Dask dashboard is being hosted at.

In [7]:
cluster.dashboard_link

'http://127.0.0.1:8787/status'

In order for Dask to use our cluster we still need to create a `Client` object, but as we have already created a cluster we can pass that directly to our client.

In [17]:
client = Client(cluster)
client

0,1
Client  Scheduler: tcp://127.0.0.1:53816  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 12  Memory: 17.18 GB


In [18]:
del client, cluster

## Remote clusters via SSH

A common way to distribute your work onto multiple machines is via SSH. Dask has a cluster manager which will handle creating SSH connections for you called `SSHCluster`.

In [19]:
from dask.distributed import SSHCluster

When constructing this cluster manager we need to pass a list of addresses, either hostnames or IP addresses, which we will SSH into and attempt to start a Dask scheduler or worker on. As this tutorial is designed for Binder we will quickly setup the ability to SSH to `localhost`, which this isn't actually useful it should illustrate the point that you can SSH to other systems.