# Dask Arrays

Original: https://examples.dask.org/array.html
     
Dask arrays coordinate many Numpy arrays, arranged into chunks within a grid.  They support a large subset of the Numpy API.

## Start Dask Cluster and client

We need to start by making a worker template based on our user and environment details

In [None]:
import os
import distributed
import dask
from dask_kubernetes import KubeCluster
from dask import array as da

In [None]:
%conda env list

**Select the conda environment you wish to use from above and asign to the variable below.
It should match the environment of the notebook.**

In [None]:
env = "datasci"

In [None]:
with open('/etc/daskernetes/worker-template.yaml') as fp:
    template = fp.read().replace('{CONDA_DEFAULT_ENV}', env).replace("{JUPYTERHUB_USER}",os.environ["JUPYTERHUB_USER"])
    template_file = f'./{env}-worker-template.yaml'
    with open(template_file,'w') as ofp:
        ofp.write(template)

## Start Cluster

In [None]:
cluster = KubeCluster.from_yaml(template_file)
cluster
cluster.adapt() # Auto scaleing
# cluster.scale(10) # or manual

from IPython.core.display import Markdown
port = cluster.dashboard_link.split(':')[-1]
url = f"https://spaceapps.informaticslab.co.uk/user/{os.environ['JUPYTERHUB_USER']}/proxy/{port}"

display(cluster)
Markdown(f"**The dashboard is at: [{url}]({url})**")

## Connect a distributed client

In [None]:
client = distributed.Client(cluster)
client

## Create Random array

This creates a 10000x10000 array of random numbers, represented as many numpy arrays of size 1000x1000 (or smaller if the array cannot be divided evenly). In this case there are 100 (10x10) numpy arrays of size 1000x1000.

In [None]:
import dask.array as da
x = da.random.random((3000, 3000, 3000), chunks=(500, 500, 500))
x

Use NumPy syntax as usual

Call `.compute()` when you want your result as a NumPy array.

You may want to watch the status page during computation.

In [None]:
z = x.mean()
z

In [None]:
z.compute()