In [1]:
#Module Needed: mkl,da,numpy

In [2]:
import mkl
import numpy as np
import dask.array as da

# Process an array with multiple threads

Multiple threads to process simultaneously different parts of the same array. `dask` automatically provides this feature by replacing the `numpy` function with `dask` functions. The key concept is a chunk, each chunk of data is executed separately by different threads. For example for a matrix we define a 2D block size and each of those blocks can be executed independently and then the results accumulated to get to the final answer.

### Library Dependancies

Need mkl, numpy. Install mkl with pip: ```pip install mkl```. Install numpy with pip: ```pip install numpy```.

In [3]:
# Currently numpy on some platforms is already multithreaded thanks to Intel MKL,
# for this example we disable multithreading
#import mkl
mkl.set_num_threads(1)

40

In [4]:
#import numpy as np
#import dask.array as da

In [5]:
A = np.random.rand(20000,4000)

`%whos` is a magic function provided by `IPython` that gives memory consumption of defined variables

In [6]:
%whos

Variable   Type       Data/Info
-------------------------------
A          ndarray    20000x4000: 80000000 elems, type `float64`, 640000000 bytes (610.3515625 Mb)
da         module     <module 'dask.array' from<...>/dask/array/__init__.py'>
mkl        module     <module 'mkl' from '/cm/s<...>ackages/mkl/__init__.py'>
np         module     <module 'numpy' from '/cm<...>kages/numpy/__init__.py'>


In [7]:
A

array([[0.75763818, 0.35085184, 0.98425502, ..., 0.01421718, 0.34103227,
        0.15428429],
       [0.7142614 , 0.45231674, 0.92682913, ..., 0.31946334, 0.49036212,
        0.58710489],
       [0.34087685, 0.34311714, 0.95780375, ..., 0.45139566, 0.6876615 ,
        0.64062166],
       ...,
       [0.24398239, 0.53520918, 0.02861823, ..., 0.24487501, 0.85070136,
        0.9740487 ],
       [0.30428205, 0.63153213, 0.59948973, ..., 0.82115549, 0.28211859,
        0.67236262],
       [0.23491844, 0.68300038, 0.05874588, ..., 0.17822768, 0.29224244,
        0.52757569]])

First let's perform some operations on the matrix in pure `numpy`, using a single thread

In [8]:
%time B = A**2 + np.sin(A) * A * np.log(A)

CPU times: user 1.38 s, sys: 356 ms, total: 1.73 s
Wall time: 1.74 s


## Processing with dask

First create a chunked `dask` array from the `numpy` array

In [9]:
A_dask = da.from_array(A, chunks=(2000, 1000))

In [10]:
A_dask.numblocks

(10, 4)

Then replace each function with the equivalent provided by `dask`, it implements most of the `numpy` functions and operations.

In [11]:
compute_B = (A_dask**2 + da.sin(A_dask) * A_dask * da.log(A_dask))

In [12]:
%time B_dask = compute_B.compute(num_workers=1)

CPU times: user 1.71 s, sys: 253 ms, total: 1.96 s
Wall time: 1.96 s


In [13]:
%time B_dask = compute_B.compute(num_workers=12)

CPU times: user 2.08 s, sys: 438 ms, total: 2.52 s
Wall time: 475 ms


In [14]:
#%time B_dask = compute_B.compute(num_workers=12)

In [15]:
%time B_dask = compute_B.compute(num_workers=24)

CPU times: user 2.16 s, sys: 632 ms, total: 2.8 s
Wall time: 442 ms


In [16]:
assert np.allclose(B, B_dask)