# Dask cluster

The Dask team desecribes themselves as a distributed Pandas library. Lucky for us, the distribute more than just pandas - they even have distributed machine learning libraries. Let's kick the tires

#### Create a docker image which has all necessary softawre installed

```Dockerfile
FROM continuumio/miniconda3
RUN apt update && apt install -y iputils-ping iproute2
RUN pip install "dask[complete]"
``

In [None]:
%%time
!docker build -t test-dask-image .

#### Create a simulated network

In [1]:
!docker network rm simulated-cluster

simulated-cluster


In [3]:
!docker network create simulated-cluster

85985a41b77ac26d1230cd3aea9bb569684319dfe120f4524acf7de6f15bd7ca


#### Start a docker container 
Scheduler at port 8786 and UI at port 8787

In [6]:
!docker run -dit --network simulated-cluster -p 8786:8786 -p 8787:8787 --name dask-scheduler test-dask-image dask scheduler --host 0.0.0.0 --dashboard-address 0.0.0.0:8787

e49e35dece8298a97bbe3658450d1cb44f802cc452667b972ce31e687ca1c2a2


In [8]:
!docker logs dask-scheduler

2025-03-04 01:24:20,221 - distributed.scheduler - INFO - -----------------------------------------------
2025-03-04 01:24:20,535 - distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2025-03-04 01:24:20,554 - distributed.scheduler - INFO - State start
2025-03-04 01:24:20,558 - distributed.scheduler - INFO - -----------------------------------------------
2025-03-04 01:24:20,559 - distributed.scheduler - INFO -   Scheduler at:     tcp://172.18.0.2:8786
2025-03-04 01:24:20,559 - distributed.scheduler - INFO -   dashboard at:  http://172.18.0.2:8787/status
2025-03-04 01:24:20,559 - distributed.scheduler - INFO - Registering Worker plugin shuffle


In [10]:
!docker exec dask-scheduler ip addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
238: eth0@if239: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.2/16 brd 172.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever


In [12]:
SCHEDULER_IP = "172.18.0.2"

#### Start workers

In [15]:
!docker run -dit --network simulated-cluster --name dask-worker1 test-dask-image dask worker dask-scheduler:8786
!docker run -dit --network simulated-cluster --name dask-worker2 test-dask-image dask worker dask-scheduler:8786
!docker run -dit --network simulated-cluster --name dask-worker3 test-dask-image dask worker dask-scheduler:8786

8483cce7f3b7dc47b99246bf711d52ec6a7f03238ab56453ce5d1337b259f432
1c0974bb7249ac53b5ad65d83aceb8c158301b7406c691f743be3a4e72c6ecfd
272e8295f91a463211a46039f33fbea27eb6f7229cebcd10fe927f0cf1030219


In [17]:
!docker logs dask-worker1

2025-03-04 01:24:28,315 - distributed.nanny - INFO -         Start Nanny at: 'tcp://172.18.0.3:36955'
2025-03-04 01:24:28,981 - distributed.worker - INFO -       Start worker at:     tcp://172.18.0.3:45925
2025-03-04 01:24:28,981 - distributed.worker - INFO -          Listening to:     tcp://172.18.0.3:45925
2025-03-04 01:24:28,981 - distributed.worker - INFO -          dashboard at:           172.18.0.3:41579
2025-03-04 01:24:28,981 - distributed.worker - INFO - Waiting to connect to:  tcp://dask-scheduler:8786
2025-03-04 01:24:28,981 - distributed.worker - INFO - -------------------------------------------------
2025-03-04 01:24:28,981 - distributed.worker - INFO -               Threads:                         22
2025-03-04 01:24:28,981 - distributed.worker - INFO -                Memory:                  15.35 GiB
2025-03-04 01:24:28,981 - distributed.worker - INFO -       Local Directory: /tmp/dask-scratch-space/worker-nsdbc8iz
2025-03-04 01:24:28,981 - distributed.worker - INFO -

### Now test the network

In [20]:
#!pip install "dask[complete]"

In [22]:
from dask.distributed import Client
import dask.array as da



In [23]:
# Connect to the Dask scheduler inside Docker
client = Client("tcp://172.18.0.2:8786")  # Use the actual scheduler IP

# Verify connection
print(client)


OSError: Timed out trying to connect to tcp://172.18.0.2:8786 after 30 s

In [None]:
# Run some distributed computation
arr = da.random.random((10000, 10000), chunks=(1000, 1000))
result = arr.mean().compute()  # Executes on the Dask workers
print(result)
