# Working with a distributed cluster

In the [previous tutorial](002-intro-to-dask.ipynb) you saw Dask's capability to figure out how to parallelize work. The complementary capability of the library is it's ability to distribute the work across a cluster of workers. 

In our setup, we'll use a so-called [Kubernetes cluster](https://kubernetes.dask.org/en/latest/).

This launches workers on our scalable cluster and passes the work to them.

In [1]:
from dask.distributed import Client, progress
from dask_kubernetes import KubeCluster

We need two objects. The first is a cluster object. Here you can select how many workers you would like to provision (this will determine how many CPUs and how much RAM you have for your task)

In [2]:
cluster = KubeCluster(n_workers=2)
cluster

VBox(children=(HTML(value='<h2>KubeCluster</h2>'), HBox(children=(HTML(value='\n<div>\n  <style scoped>\n    .…

To use the cluster, create local client object that is connected to this cluster. Following this, dask operations will go to the cluster, instead of executing locally:

In [9]:
client = Client(cluster)
client

0,1
Client  Scheduler: tcp://10.36.0.30:44001  Dashboard: /user/arokem/proxy/8787/status,Cluster  Workers: 2  Cores: 4  Memory: 14.00 GB


Using the dask bar on the left, you can now open windows that will display the task stream the task graph, with the progress in performing the tasks indicated with a color code.

In [10]:
data = [1, 2, 3, 4, 5, 6, 7, 8]

In [11]:
from dask import delayed

In [12]:
import time
def inc(x):
    time.sleep(1)
    return x + 1

In [19]:
results = []
for x in data:
    y = delayed(inc)(x)
    results.append(y)
    
total = sum(results)
#total = total.compute()

In [20]:
total.compute()

44