# Array Multicore

Instead of trivially parallel independent tasks here we want to use multiple threads to process simultaneously different parts of the same array. `dask` automatically provides this feature by replacing the `numpy` function with `dask` functions. The key concept is a chunk, each chunk of data is executed separately by different threads. For example for a matrix we define a 2D block size and each of those blocks can be executed independently and then the results accumulated to get to the final answer. See <http://dask.pydata.org/>

In [1]:
import numpy as np
import dask.array as da

In [2]:
A = np.random.rand(2000,10000)

In [3]:
A.size / 1e6

20.0

In [4]:
A

array([[ 0.71798204,  0.45692633,  0.10802123, ...,  0.06670243,
         0.9704418 ,  0.56210938],
       [ 0.76116472,  0.58888293,  0.85425679, ...,  0.4335984 ,
         0.13169012,  0.51282633],
       [ 0.60078063,  0.42791056,  0.79631019, ...,  0.51129926,
         0.08555533,  0.30999234],
       ..., 
       [ 0.90433695,  0.22143821,  0.90611972, ...,  0.94131951,
         0.08192971,  0.79178957],
       [ 0.58847847,  0.77443424,  0.2882611 , ...,  0.34719384,
         0.17307159,  0.0014827 ],
       [ 0.19874488,  0.70299009,  0.32151539, ...,  0.11121813,
         0.1464921 ,  0.64958825]])

In [5]:
%time B = A**2 + np.sin(A) * A * np.log(A)

CPU times: user 2.81 s, sys: 656 ms, total: 3.47 s
Wall time: 3.67 s


In [10]:
A_dask = da.from_array(A, chunks=(1000, 2000))

In [11]:
A_dask.numblocks

(2, 5)

In [12]:
%time B_dask = (A_dask**2 + da.sin(A_dask) * A_dask * da.log(A_dask)).compute()

CPU times: user 3.14 s, sys: 169 ms, total: 3.31 s
Wall time: 2.11 s


In [9]:
assert np.allclose(B, B_dask)