# 2.6. Dask Array

![](dask-array-black-text.svg)

*From the Dask documentation:*
> Dask Array implements a subset of the NumPy ndarray interface using 
> blocked algorithms, cutting up the large array into many small arrays. 
> This lets us compute on arrays larger than memory using all of our cores. 
> We coordinate these blocked algorithms using dask graphs.

Dask Arrays provide "a parallel, larger-than-memory, n-dimensional array using blocked algorithms. Simply put: distributed Numpy.

- **Parallel:** Uses all of the cores on your computer
- **Larger-than-memory:** Lets you work on datasets that are larger than your available memory by breaking up your array into many small pieces, operating on those pieces in an order that minimizes the memory footprint of your computation, and effectively streaming data from disk.
- **Blocked Algorithms:** Perform large computations by performing many smaller computations"

*Stolen from the Dask tutorial: https://github.com/dask/dask-tutorial*

In [None]:
from dask.distributed import LocalCluster, Client
cluster = LocalCluster(n_workers=4)
client = Client(cluster)
client

In [None]:
import dask.array as da
import netCDF4 as nc4
import numpy as np

## Creating Dask Arrays from Numpy Arrays

One of the easiest ways of creating Dask Arrays is directly from Numpy arrays using the `from_array` function of `dask.array`.  This function accepts anything that is "array-like," such as:

- a netCDF4 variable from netCDF4-python,
- an HDF5 field using h5py,
- a Numpy array,

or any other object that can be indexed like a Numpy array.

It takes the "array-like" object and a tuple for the `chunks` parameter, which specifies the size of each chunk of the array along each array axis.

In this example, we'll create a distributed Dask Array from NetCDF file...but first we have to create some NetCDF data.

In [None]:
for n in range(4):
    i, j = divmod(n, 2)
    ncds = nc4.Dataset('data-{}x{}.nc'.format(i, j), 'w')
    ncds.createDimension('x', 5120)
    ncds.createDimension('y', 5120)
    x = ncds.createVariable('x', 'd', ('x',))
    y = ncds.createVariable('y', 'd', ('y',))
    v = ncds.createVariable('v', 'f', ('x', 'y'))
    u = ncds.createVariable('u', 'f', ('x', 'y'))
    x[:] = np.arange(i*5120, (i+1)*5120)
    y[:] = np.arange(j*5120, (j+1)*5120)
    v[:] = np.random.random((5120,5120)).astype('f')
    u[:] = np.random.random((5120,5120)).astype('f')
    ncds.close()

In [None]:
ls -lh data*.nc

In [None]:
ncds = nc4.Dataset('data-0x0.nc')

In [None]:
v_da = da.from_array(ncds.variables['v'], chunks=(2560, 2560))
v_da

In [None]:
u_da = da.from_array(ncds.variables['u'], chunks=(2560,2560))
u_da

**Now, take a look at the Dashboard, again.**  Look at the *Bytes stored* section of the Dashboard Status page. 

In [None]:
v_da.visualize()

In [None]:
v_da.npartitions

In [None]:
v_da.numblocks

## Create Dask Arrays from Delayed Objects

In addition to reading array-like objects as Dasy Arrays, you can construct an array from `Delayed` objects, giving you a little more flexibility in what you can construct an array from.