# **04_Dask: Distributed computing using `dask`**
----

- localcluster (i.e., laptop, desktop)

### **Import modules**

In [1]:
import numpy as np
import dask
from dask.distributed import Client, LocalCluster
import dask.array as da

### **Setup local cluster**

In [2]:
#cluster = LocalCluster(n_workers=1,threads_per_worker=1) # serial
cluster = LocalCluster()
client = Client(cluster)
client

Port 8787 is already in use. 
Perhaps you already have a cluster running?
Hosting the diagnostics dashboard on a random port instead.


0,1
Client  Scheduler: tcp://127.0.0.1:42973  Dashboard: http://127.0.0.1:37047/status,Cluster  Workers: 4  Cores: 8  Memory: 33.62 GB


### **Comparison of `numpy` vs. `dask` performance`**

**Test computation:**

*Define:*

\begin{equation}
\mathbf{X} \in \mathcal{R}^{n_x \times n_y}
\end{equation}

Let $y$:

\begin{equation}
    y = \sum_i (\langle \mathbf{X} \rangle_y)_i
\end{equation}

#### **Define problem size:**

In [7]:
size = (40000,40000)

#### **Using numpy (single threaded)**

In [8]:
%%time
x = np.random.uniform(low=0., high=1.0, size=size)
y = x.mean(axis=0).sum()
print(y)

20000.465727589726
CPU times: user 16.1 s, sys: 2.32 s, total: 18.4 s
Wall time: 18.2 s


#### **Using dask (distributed)**

In [9]:
%%time 
x = da.random.uniform(low=0.,high=1.0,size=size)
y = x.mean(axis=0).sum()
y_val = y.compute()
print(y_val)

19999.76150232441
CPU times: user 29 s, sys: 4.94 s, total: 34 s
Wall time: 5.68 s


**Closing the cluster:**

In [10]:
client.close()
cluster.close()