# Distributed computing using `dask`
----

- localcluster (i.e., laptop, desktop)

### **Import modules**

In [1]:
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import dask
from dask.distributed import Client, LocalCluster
import dask.array as da

### **Setup local cluster**

In [2]:
#cluster = LocalCluster(n_workers=1,threads_per_worker=1) # serial
cluster = LocalCluster()
client = Client(cluster)
client

0,1
Client  Scheduler: tcp://127.0.0.1:44819  Dashboard: http://127.0.0.1:38709/status,Cluster  Workers: 4  Cores: 8  Memory: 33.62 GB


## **Visualize the tasks:**

- Open the `dashboard` using the above http link
- `http://<localhost or server ip>:<port>/status`

### **Comparison of `numpy` vs. `dask` performance**

**Test computation:**

*Define:*

\begin{equation}
\mathbf{X} \in \mathcal{R}^{n_i \times n_j}
\end{equation}

where:
- $n_i = n_j = 40000$

Let's compute $y$ using the following expression:

\begin{equation}
    y = \sum_i (\langle \mathbf{X} \rangle_j)_i
\end{equation}

#### **Define problem size:**

In [3]:
size = (40000,40000)

#### **Using numpy (single threaded)**

In [5]:
%%time
x = np.random.uniform(low=0., high=1.0, size=size)
y = x.mean(axis=0).sum()
print(y)

20000.060284011168
CPU times: user 16.5 s, sys: 2.69 s, total: 19.2 s
Wall time: 18.3 s


#### **Using dask (distributed)**

In [6]:
%%time 
x = da.random.uniform(low=0.,high=1.0,size=size)
y = x.mean(axis=0).sum()
y_val = y.compute()
print(y_val)

20000.05861389838
CPU times: user 1.44 s, sys: 640 ms, total: 2.08 s
Wall time: 6.48 s


**Closing the cluster:**

In [7]:
client.close()
cluster.close()