# `dask.delayed`: processes vs threads

Here we use a (quite uneficient) python implementation of the euclidean distance matrix to understand how `dask.delayed` behaves with python code. Remember that before, what we run with `dask.delayed` was Scipy's `cdist` function.

In [None]:
import dask
import numpy as np

In [None]:
def euclidean_distance_matrix(x, y):
    num_samples = x.shape[0]
    dist_matrix = np.empty((num_samples, num_samples))
    for i, xi in enumerate(x):
        for j, yj in enumerate(y):
            diff = xi - yj
            dist_matrix[i][j] = diff.sum()
    return dist_matrix

In [None]:
x = np.random.random([1000, 50])

In [None]:
%%time
edm = euclidean_distance_matrix(x, x)

<mark>Question</mark>: The following dask graph runs `euclidean_distance_matrix` twice using the same input data. From the time measured in the previous cell, estimate how long it will take to run the graph? Run the cells and check your answer.

In [None]:
graph = [dask.delayed(euclidean_distance_matrix)(x, x),
         dask.delayed(euclidean_distance_matrix)(x, x)]

In [None]:
%%time
edm = dask.compute(graph, scheduler='threads')

<mark>Question</mark>: Estimate how long it will take to run the follwing cell. Run it and check your answer.

In [None]:
%%time
edm = dask.compute(graph, scheduler='processes')

<mark>Question</mark>: Could you explain the results?