# Scaling out: `xarray` & `dask`
<br>

<img src="https://github.com/pydata/xarray/raw/master/doc/_static/dataset-diagram.png" alt="xarray Logo" style="height: 150px;">
<img src="http://dask.readthedocs.io/en/latest/_images/dask_horizontal.svg" 
     width="30%" 
     align=right
     alt="Dask logo">
     
### *You create a DASK array by specifying chunks when reading in data from a netcdf file in x-array*

## Parallel computing with Dask
Dask divides arrays into many small pieces, called chunks, each of which is presumed to be small enough to fit into memory.

![Dask Arrays](http://dask.pydata.org/en/latest/_images/dask-array-black-text.svg)

_source: [Dask Array Documentation](http://dask.pydata.org/en/latest/array-overview.html)_

## Spinning up a cluster

### Step 1: Start a `dask` cluster

```python
from dask_jobqueue import PBSCluster
cluster = PBSCluster(cores=1, memory='10GB', processes=1, queue='share', 
                     walltime='01:00:00')
cluster.scale(4) # Ask for 4 workers
```
*^ This is a Cheyenne-specific command bc PBS*

### Step 2: Connect a client to it

```python
from dask.distributed import Client
client = Client(cluster) # Connect this local process to remote workers
client
```



### Notes on terminology (it's inconsistent!)

The arguements to `PBSCluster` determine the *job* configuration:
- `processes`: the number of *workers* per job;
- `cores`: the number of cores per jobs, shared among the workers.

**Multiple single-threaded workers per job:**

```python
n_nodes = 2
cluster = PBSCluster(cores=18, processes=18, queue='regular')
cluster.scale(18 * n_nodes) # Ask for 18 x n_nodes workers
```
*^ need to request the same number of cores & processes when creating many matplotlib plots*

**Multi-threaded workers:**

```python
n_nodes = 2
cluster = PBSCluster(cores=36, processes=9, queue='regular')
cluster.scale(9 * n_nodes) # Ask for 9 x n_nodes workers
```

## Distributed Clusters (http://distributed.dask.org/)

Dask can be deployed on distributed infrastructure, such as a an HPC system or a cloud computing system.

- `LocalCluster` - Creates a `Cluster` that can be executed locally. Each `Cluster` includes a `Scheduler` and `Worker`s. 
- `Client` - Connects to and drives computation on a distributed `Cluster`

### Dask Jobqueue (http://jobqueue.dask.org/)

- `PBSCluster` - Cheyenne
- `SlurmCluster` - DAV
- `LSFCluster`
- etc.

### Dask Kubernetes (http://kubernetes.dask.org/)

- `KubeCluster` - Cloud systems


***

## NCAR deployment details

*You cannot use a DAV cluster from Cheyenne or vice versa.*

### ncar-jobqueue enables interoperable notebooks:  
```python
from ncar_jobqueue import NCARCluster
from distributed import Client
cluster = NCARCluster(cores=1, processes=1, memory='10GB')
cluster.scale(4)
client = Client(cluster)
```
*^ In case you don't want to specify which one to use, NCAR has an "NCARCluster" command*
- calls `PBSCluster` on Cheyenne
- calls `SLURMCluster` on DAV


### Omitted arguments use default settings:
`~/.config/dask/jobqueue.yaml`  
*^ Should become familiar with this file because one has to change their project number*

## Connecting to the dashboard  

![](./img/client-report.png)

* To get from ^ to below, replace what looks like an IP address with the web address that pops up on your jupyter lab or jupyter notebooks

![](./img/dask-labextension.png)

## Understanding the dashboard

Tour of the dashboard.

*Can do this on your laptop because not requesting multiple nodes*

## Chunking and performance

The chunks parameter has critical performance implications when using Dask arrays. 
- Chunks too small: slow due to fixed overhead and queuing operations;
- Chunks too big: wasted computational resource...blown memory.


## Lazy execution

![](img/lazy-compute.png)

## Dask Delayed

Imagine a function:
```python
def my_function(x):
   # do computation, make a plot, etc. 
```    

Use `dask.delayed` to make `my_function` lazy and apply to large number (`N`) of inputs:
```python
import dask

results = []
for i in range(N):
    result_i = dask.delayed(my_function)(x[i])
    results.append(results_i)

# trigger computation of the results
results = dask.persist(*results)
```

*"x" above should not be a dask array. If it is, the dask array already parallelizes.  
Advantage is if you have a very slow for-loop (as above), then using dask will make this process way faster.   
An example of a good use of this is to make a GIF or movie with plots*

## Hand-on examples

```bash 
git clone https://github.com/ncar-hackathons/hands-on-examples
cd hands-on-examples/scientific-computing
```

Notebooks:
- xarray.ipynb
- dask.ipynb
- cesm-le/cesm-le-seaice-example.ipynb
