# Parallelization with Dask

To the user [Dask](https://docs.dask.org/en/latest/array.html) looks a lot like NumPy, with the key difference that it breaks up your array into smaller chunks and then coordinates across CPUs on your machine to process those chunks in parallel. Most of this is done without you ever having to know about it! That being said, there are still some tips and trick to getting Dask to work as efficiently as possible for your project. 

The key advantage to Dask, is that you data still looks like a single array to you (the developer) but it can be split up behind the scenes. This allows you to use your typical data analysis workflow on arrays that are larger than what you can store in your computers memory. It also allows you to take advantage of all of the CPUs available to you. On a laptop computer you definitely have more than 1 core, maybe as many as 12 or 16. On a server or super computer this could be in the hundreds. This makes it much more tractable to process datasets that are hundreds of GB or even TB. 

<img src="_static/dask.png" width="500">

Dask arrays are also _lazy_ which means that the values are not actually computed until they are needed. You can trigger the computation with `.compute()`, `.load()`, or by generating a plot. This is nice in that it allows Dask to decide the most efficient way to compute your entire analysis chain and do that at once (as opposed to in many little sub-steps). It also means that if you never actually need a certain intermediate variable, it won't get computed.

Finally, Dask integrates _directly_ with Xarray. Xarray DataArrays can be turned into Dask arrays by specifying that they should be `chunked`. Other than that you do not need to change the rest of your code. You may decide to insert `.compute()` or `.load()` calls at particular junctions as you figure out more about your project and workflow. If you overarching goal, however, is to generate plots, then you don't have to change anything else.

## Using Dask

There are two common ways to use Dask. One is to turn an Xarray DataArray into a Dask array with `.chunk()` and to never import Dask directly (it comes with Xarray). You can chunk by any dimension you want (and my multiple dimensions). You also can have the compiler automatically decide how to chunk by passing `auto`.

In [1]:
import xarray as xr
import warnings
warnings.filterwarnings("ignore")

ds = xr.open_dataset('TPOSE6_Daily_2012.nc',decode_timedelta=True).chunk({'time': 'auto'})

If we inspect the dataset, we can see that the underlying type of the variables is now `dask.array` and that each variable has a particular `chunksize`.

In [2]:
ds

Unnamed: 0,Array,Chunk
Bytes,2.86 kiB,2.86 kiB
Shape,"(366,)","(366,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray
"Array Chunk Bytes 2.86 kiB 2.86 kiB Shape (366,) (366,) Dask graph 1 chunks in 2 graph layers Data type int64 numpy.ndarray",366  1,

Unnamed: 0,Array,Chunk
Bytes,2.86 kiB,2.86 kiB
Shape,"(366,)","(366,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,79.08 kiB,79.08 kiB
Shape,"(84, 241)","(84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 79.08 kiB 79.08 kiB Shape (84, 241) (84, 241) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",241  84,

Unnamed: 0,Array,Chunk
Bytes,79.08 kiB,79.08 kiB
Shape,"(84, 241)","(84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,79.08 kiB,79.08 kiB
Shape,"(84, 241)","(84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 79.08 kiB 79.08 kiB Shape (84, 241) (84, 241) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",241  84,

Unnamed: 0,Array,Chunk
Bytes,79.08 kiB,79.08 kiB
Shape,"(84, 241)","(84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,79.08 kiB,79.08 kiB
Shape,"(84, 241)","(84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 79.08 kiB 79.08 kiB Shape (84, 241) (84, 241) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",241  84,

Unnamed: 0,Array,Chunk
Bytes,79.08 kiB,79.08 kiB
Shape,"(84, 241)","(84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 88 B 88 B Shape (22,) (22,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",22  1,

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 88 B 88 B Shape (22,) (22,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",22  1,

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.70 MiB,1.70 MiB
Shape,"(22, 84, 241)","(22, 84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.70 MiB 1.70 MiB Shape (22, 84, 241) (22, 84, 241) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",241  84  22,

Unnamed: 0,Array,Chunk
Bytes,1.70 MiB,1.70 MiB
Shape,"(22, 84, 241)","(22, 84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,434.93 kiB,434.93 kiB
Shape,"(22, 84, 241)","(22, 84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 434.93 kiB 434.93 kiB Shape (22, 84, 241) (22, 84, 241) Dask graph 1 chunks in 2 graph layers Data type bool numpy.ndarray",241  84  22,

Unnamed: 0,Array,Chunk
Bytes,434.93 kiB,434.93 kiB
Shape,"(22, 84, 241)","(22, 84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 88 B 88 B Shape (22,) (22,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",22  1,

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,79.69 kiB,79.69 kiB
Shape,"(85, 240)","(85, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 79.69 kiB 79.69 kiB Shape (85, 240) (85, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  85,

Unnamed: 0,Array,Chunk
Bytes,79.69 kiB,79.69 kiB
Shape,"(85, 240)","(85, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,79.69 kiB,79.69 kiB
Shape,"(85, 240)","(85, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 79.69 kiB 79.69 kiB Shape (85, 240) (85, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  85,

Unnamed: 0,Array,Chunk
Bytes,79.69 kiB,79.69 kiB
Shape,"(85, 240)","(85, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,79.69 kiB,79.69 kiB
Shape,"(85, 240)","(85, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 79.69 kiB 79.69 kiB Shape (85, 240) (85, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  85,

Unnamed: 0,Array,Chunk
Bytes,79.69 kiB,79.69 kiB
Shape,"(85, 240)","(85, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.71 MiB,1.71 MiB
Shape,"(22, 85, 240)","(22, 85, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.71 MiB 1.71 MiB Shape (22, 85, 240) (22, 85, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  85  22,

Unnamed: 0,Array,Chunk
Bytes,1.71 MiB,1.71 MiB
Shape,"(22, 85, 240)","(22, 85, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,438.28 kiB,438.28 kiB
Shape,"(22, 85, 240)","(22, 85, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 438.28 kiB 438.28 kiB Shape (22, 85, 240) (22, 85, 240) Dask graph 1 chunks in 2 graph layers Data type bool numpy.ndarray",240  85  22,

Unnamed: 0,Array,Chunk
Bytes,438.28 kiB,438.28 kiB
Shape,"(22, 85, 240)","(22, 85, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 78.75 kiB 78.75 kiB Shape (84, 240) (84, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  84,

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 78.75 kiB 78.75 kiB Shape (84, 240) (84, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  84,

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.69 MiB,1.69 MiB
Shape,"(22, 84, 240)","(22, 84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.69 MiB 1.69 MiB Shape (22, 84, 240) (22, 84, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  84  22,

Unnamed: 0,Array,Chunk
Bytes,1.69 MiB,1.69 MiB
Shape,"(22, 84, 240)","(22, 84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,433.12 kiB,433.12 kiB
Shape,"(22, 84, 240)","(22, 84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 433.12 kiB 433.12 kiB Shape (22, 84, 240) (22, 84, 240) Dask graph 1 chunks in 2 graph layers Data type bool numpy.ndarray",240  84  22,

Unnamed: 0,Array,Chunk
Bytes,433.12 kiB,433.12 kiB
Shape,"(22, 84, 240)","(22, 84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 78.75 kiB 78.75 kiB Shape (84, 240) (84, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  84,

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 78.75 kiB 78.75 kiB Shape (84, 240) (84, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  84,

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,621.81 MiB,127.42 MiB
Shape,"(366, 22, 84, 241)","(75, 22, 84, 241)"
Dask graph,5 chunks in 2 graph layers,5 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 621.81 MiB 127.42 MiB Shape (366, 22, 84, 241) (75, 22, 84, 241) Dask graph 5 chunks in 2 graph layers Data type float32 numpy.ndarray",366  1  241  84  22,

Unnamed: 0,Array,Chunk
Bytes,621.81 MiB,127.42 MiB
Shape,"(366, 22, 84, 241)","(75, 22, 84, 241)"
Dask graph,5 chunks in 2 graph layers,5 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,626.61 MiB,126.69 MiB
Shape,"(366, 22, 85, 240)","(74, 22, 85, 240)"
Dask graph,5 chunks in 2 graph layers,5 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 626.61 MiB 126.69 MiB Shape (366, 22, 85, 240) (74, 22, 85, 240) Dask graph 5 chunks in 2 graph layers Data type float32 numpy.ndarray",366  1  240  85  22,

Unnamed: 0,Array,Chunk
Bytes,626.61 MiB,126.69 MiB
Shape,"(366, 22, 85, 240)","(74, 22, 85, 240)"
Dask graph,5 chunks in 2 graph layers,5 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,619.23 MiB,126.89 MiB
Shape,"(366, 22, 84, 240)","(75, 22, 84, 240)"
Dask graph,5 chunks in 2 graph layers,5 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 619.23 MiB 126.89 MiB Shape (366, 22, 84, 240) (75, 22, 84, 240) Dask graph 5 chunks in 2 graph layers Data type float32 numpy.ndarray",366  1  240  84  22,

Unnamed: 0,Array,Chunk
Bytes,619.23 MiB,126.89 MiB
Shape,"(366, 22, 84, 240)","(75, 22, 84, 240)"
Dask graph,5 chunks in 2 graph layers,5 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,619.23 MiB,126.89 MiB
Shape,"(366, 22, 84, 240)","(75, 22, 84, 240)"
Dask graph,5 chunks in 2 graph layers,5 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 619.23 MiB 126.89 MiB Shape (366, 22, 84, 240) (75, 22, 84, 240) Dask graph 5 chunks in 2 graph layers Data type float32 numpy.ndarray",366  1  240  84  22,

Unnamed: 0,Array,Chunk
Bytes,619.23 MiB,126.89 MiB
Shape,"(366, 22, 84, 240)","(75, 22, 84, 240)"
Dask graph,5 chunks in 2 graph layers,5 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


If we inspect a single variable, like zonal velocity (`UVEL`), we can actually get a visualization of the chunks (cool!).

In [3]:
ds.UVEL

Unnamed: 0,Array,Chunk
Bytes,621.81 MiB,127.42 MiB
Shape,"(366, 22, 84, 241)","(75, 22, 84, 241)"
Dask graph,5 chunks in 2 graph layers,5 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 621.81 MiB 127.42 MiB Shape (366, 22, 84, 241) (75, 22, 84, 241) Dask graph 5 chunks in 2 graph layers Data type float32 numpy.ndarray",366  1  241  84  22,

Unnamed: 0,Array,Chunk
Bytes,621.81 MiB,127.42 MiB
Shape,"(366, 22, 84, 241)","(75, 22, 84, 241)"
Dask graph,5 chunks in 2 graph layers,5 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.86 kiB,2.86 kiB
Shape,"(366,)","(366,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray
"Array Chunk Bytes 2.86 kiB 2.86 kiB Shape (366,) (366,) Dask graph 1 chunks in 2 graph layers Data type int64 numpy.ndarray",366  1,

Unnamed: 0,Array,Chunk
Bytes,2.86 kiB,2.86 kiB
Shape,"(366,)","(366,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,79.08 kiB,79.08 kiB
Shape,"(84, 241)","(84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 79.08 kiB 79.08 kiB Shape (84, 241) (84, 241) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",241  84,

Unnamed: 0,Array,Chunk
Bytes,79.08 kiB,79.08 kiB
Shape,"(84, 241)","(84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,79.08 kiB,79.08 kiB
Shape,"(84, 241)","(84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 79.08 kiB 79.08 kiB Shape (84, 241) (84, 241) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",241  84,

Unnamed: 0,Array,Chunk
Bytes,79.08 kiB,79.08 kiB
Shape,"(84, 241)","(84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,79.08 kiB,79.08 kiB
Shape,"(84, 241)","(84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 79.08 kiB 79.08 kiB Shape (84, 241) (84, 241) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",241  84,

Unnamed: 0,Array,Chunk
Bytes,79.08 kiB,79.08 kiB
Shape,"(84, 241)","(84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 88 B 88 B Shape (22,) (22,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",22  1,

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 88 B 88 B Shape (22,) (22,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",22  1,

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.70 MiB,1.70 MiB
Shape,"(22, 84, 241)","(22, 84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.70 MiB 1.70 MiB Shape (22, 84, 241) (22, 84, 241) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",241  84  22,

Unnamed: 0,Array,Chunk
Bytes,1.70 MiB,1.70 MiB
Shape,"(22, 84, 241)","(22, 84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,434.93 kiB,434.93 kiB
Shape,"(22, 84, 241)","(22, 84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 434.93 kiB 434.93 kiB Shape (22, 84, 241) (22, 84, 241) Dask graph 1 chunks in 2 graph layers Data type bool numpy.ndarray",241  84  22,

Unnamed: 0,Array,Chunk
Bytes,434.93 kiB,434.93 kiB
Shape,"(22, 84, 241)","(22, 84, 241)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 88 B 88 B Shape (22,) (22,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",22  1,

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


The info above tells us how big each chunk is. You can see that the automatically generated chunks are ~125 MB. We wouldn't want to go much smaller than this, because we will eventually lose some efficiently trying to process many tiny pieces of data that are much smaller than what we can fit in memory.

## Exercise, Xarray and XGCM with Dask

First we will redo the divergence calculation from the Xarray tutorial, without any Dask chunks. Let's delete our original dataset to be safe and reload the data.

In [26]:
del ds

In [27]:
import xarray as xr
import warnings
warnings.filterwarnings("ignore")

ds = xr.open_dataset('TPOSE6_Daily_2012.nc',decode_timedelta=True)

We can inspect a variable, and check that the dask chunks are gone.

In [6]:
ds.THETA

Now let's do our divergence computation from the last lesson.

In [7]:
import xgcm 
import cmocean.cm as cmo

# create the grid object from our dataset
grid = xgcm.Grid(ds, periodic=['X','Y'])
grid

<xgcm.Grid>
Z Axis (not periodic, boundary=None):
  * center   Z
X Axis (periodic, boundary=None):
  * center   XC --> outer
  * outer    XG --> center
Y Axis (periodic, boundary=None):
  * center   YC --> outer
  * outer    YG --> center
T Axis (not periodic, boundary=None):
  * center   time

In [8]:
%%time
u_transport = ds.UVEL * ds.dyG * ds.hFacW * ds.drF
v_transport = ds.VVEL * ds.dxG * ds.hFacS * ds.drF
div_uv = (grid.diff(u_transport, 'X') + grid.diff(v_transport, 'Y')) / ds.rA  # calculate the divergence of the flow

div_uv

CPU times: user 346 ms, sys: 667 ms, total: 1.01 s
Wall time: 1.01 s


Our timer tells us that this took about 1 second. Let's clear our intermediate variables and do it again with Dask.

In [9]:
del div_uv, u_transport, v_transport

### Xarray automatically sped up with Dask

This time, let's specify chunks of 10 timesteps.

In [28]:
ds = ds.chunk({'time': 10})

In [29]:
ds.THETA

Unnamed: 0,Array,Chunk
Bytes,619.23 MiB,16.92 MiB
Shape,"(366, 22, 84, 240)","(10, 22, 84, 240)"
Dask graph,37 chunks in 2 graph layers,37 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 619.23 MiB 16.92 MiB Shape (366, 22, 84, 240) (10, 22, 84, 240) Dask graph 37 chunks in 2 graph layers Data type float32 numpy.ndarray",366  1  240  84  22,

Unnamed: 0,Array,Chunk
Bytes,619.23 MiB,16.92 MiB
Shape,"(366, 22, 84, 240)","(10, 22, 84, 240)"
Dask graph,37 chunks in 2 graph layers,37 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.86 kiB,80 B
Shape,"(366,)","(10,)"
Dask graph,37 chunks in 2 graph layers,37 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray
"Array Chunk Bytes 2.86 kiB 80 B Shape (366,) (10,) Dask graph 37 chunks in 2 graph layers Data type int64 numpy.ndarray",366  1,

Unnamed: 0,Array,Chunk
Bytes,2.86 kiB,80 B
Shape,"(366,)","(10,)"
Dask graph,37 chunks in 2 graph layers,37 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 88 B 88 B Shape (22,) (22,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",22  1,

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 88 B 88 B Shape (22,) (22,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",22  1,

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 88 B 88 B Shape (22,) (22,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",22  1,

Unnamed: 0,Array,Chunk
Bytes,88 B,88 B
Shape,"(22,)","(22,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 78.75 kiB 78.75 kiB Shape (84, 240) (84, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  84,

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 78.75 kiB 78.75 kiB Shape (84, 240) (84, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  84,

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.69 MiB,1.69 MiB
Shape,"(22, 84, 240)","(22, 84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.69 MiB 1.69 MiB Shape (22, 84, 240) (22, 84, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  84  22,

Unnamed: 0,Array,Chunk
Bytes,1.69 MiB,1.69 MiB
Shape,"(22, 84, 240)","(22, 84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,433.12 kiB,433.12 kiB
Shape,"(22, 84, 240)","(22, 84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 433.12 kiB 433.12 kiB Shape (22, 84, 240) (22, 84, 240) Dask graph 1 chunks in 2 graph layers Data type bool numpy.ndarray",240  84  22,

Unnamed: 0,Array,Chunk
Bytes,433.12 kiB,433.12 kiB
Shape,"(22, 84, 240)","(22, 84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 78.75 kiB 78.75 kiB Shape (84, 240) (84, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  84,

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 78.75 kiB 78.75 kiB Shape (84, 240) (84, 240) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",240  84,

Unnamed: 0,Array,Chunk
Bytes,78.75 kiB,78.75 kiB
Shape,"(84, 240)","(84, 240)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


Fantastic. That is exactly what we would hope to see. Now, can we do our divergence calculation any faster?? 

In [30]:
# create the grid object from our dataset
grid = xgcm.Grid(ds, periodic=['X','Y'])
grid

<xgcm.Grid>
Z Axis (not periodic, boundary=None):
  * center   Z
X Axis (periodic, boundary=None):
  * center   XC --> outer
  * outer    XG --> center
Y Axis (periodic, boundary=None):
  * center   YC --> outer
  * outer    YG --> center
T Axis (not periodic, boundary=None):
  * center   time

In [31]:
%%time
u_transport = ds.UVEL * ds.dyG * ds.hFacW * ds.drF
v_transport = ds.VVEL * ds.dxG * ds.hFacS * ds.drF
div_uv = (grid.diff(u_transport, 'X') + grid.diff(v_transport, 'Y')) / ds.rA  # calculate the divergence of the flow

div_uv.compute()

CPU times: user 894 ms, sys: 1.05 s, total: 1.95 s
Wall time: 498 ms


We can see that without Dask, this computation took a little over 1 second. With Dask it only took 450ms. That is more than a 2x speed up! When you take into account that this subset of the model output is less than 2% of the full model domain and only 20% of the model time series, that speed up starts to look pretty nice! If you are clever about when and how you chunk your data, you can get much more than a 2x speed up. 

**NOTE** Another reason we only see 2x speed up here is that 1) these are relatively small chunks of data, 2) I am running this test on a fairly powerful laptop, and 3) there are many other processs running on this laptop (the CPUs are not dedicated to a particular task). There is some overhead to parallelization (your computer has to do some logistics in the background). Small chunks are inefficient because the overhead and the computation itself may take similar amounts of time. You can get much better performance if the CPUs and I/O are dedicated to the data analysis task (like on a supercomputer). You will have to figure out what works the best for your project/data/resources (run some tests!).


More resources for Xarray and Dask: [1](https://docs.xarray.dev/en/v2023.01.0/user-guide/dask.html), [2](https://examples.dask.org/xarray.html), [3](https://tutorial.xarray.dev/intermediate/xarray_and_dask.html)  
See [this page](https://docs.xarray.dev/en/stable/user-guide/dask.html#best-practices) for a more detailed discussion of best practices with Xarray and Dask.