# A gentle introduction

`map_blocks` is inspired by the `dask.array` function of the same name and lets
you map a function on blocks of the xarray object (including Datasets!).

At _compute_ time, your function will receive a chunk of an xarray object with concrete
(computed) values along with appropriate metadata. This function should return
an xarray object.


## Setup

In [1]:
import dask
import numpy as np
import xarray as xr

First lets set up a `LocalCluster` using [dask.distributed](https://distributed.dask.org/).

You can use any kind of dask cluster. This step is completely independent of
xarray. While not strictly necessary, the dashboard provides a nice learning
tool.


In [2]:
from dask.distributed import Client

client = Client()
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 16,Total memory: 13.47 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:33071,Workers: 0
Dashboard: http://127.0.0.1:8787/status,Total threads: 0
Started: Just now,Total memory: 0 B

0,1
Comm: tcp://127.0.0.1:44021,Total threads: 4
Dashboard: http://127.0.0.1:37843/status,Memory: 3.37 GiB
Nanny: tcp://127.0.0.1:34757,
Local directory: /tmp/dask-scratch-space/worker-i9a4xjgm,Local directory: /tmp/dask-scratch-space/worker-i9a4xjgm

0,1
Comm: tcp://127.0.0.1:42127,Total threads: 4
Dashboard: http://127.0.0.1:45487/status,Memory: 3.37 GiB
Nanny: tcp://127.0.0.1:45275,
Local directory: /tmp/dask-scratch-space/worker-h0nxfure,Local directory: /tmp/dask-scratch-space/worker-h0nxfure

0,1
Comm: tcp://127.0.0.1:42295,Total threads: 4
Dashboard: http://127.0.0.1:34717/status,Memory: 3.37 GiB
Nanny: tcp://127.0.0.1:46509,
Local directory: /tmp/dask-scratch-space/worker-omhyu6pg,Local directory: /tmp/dask-scratch-space/worker-omhyu6pg

0,1
Comm: tcp://127.0.0.1:44183,Total threads: 4
Dashboard: http://127.0.0.1:34161/status,Memory: 3.37 GiB
Nanny: tcp://127.0.0.1:42685,
Local directory: /tmp/dask-scratch-space/worker-yh0g4nk3,Local directory: /tmp/dask-scratch-space/worker-yh0g4nk3


<p>&#128070</p> Click the Dashboard link above. Or click the "Search" button in the dashboard.

Let's test that the dashboard is working..


In [3]:
import dask.array

dask.array.ones((1000, 4), chunks=(2, 1)).compute()  # should see activity in dashboard

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       ...,
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]], shape=(1000, 4))

Let's open a dataset. We specify `chunks` so that we create a dask arrays for the DataArrays.
Depending on the desired function to be applied on the chunks, it is vital to set the chunks correctly. Our goal is to compute the mean along the time dimension. Therefore we do not chunk the time dimension at all (indicated by `"time": -1`)

In [4]:
ds = xr.tutorial.open_dataset("air_temperature", chunks={"time": -1, "lat": 5, "lon": 10})
ds

Unnamed: 0,Array,Chunk
Bytes,29.52 MiB,1.11 MiB
Shape,"(2920, 25, 53)","(2920, 5, 10)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 29.52 MiB 1.11 MiB Shape (2920, 25, 53) (2920, 5, 10) Dask graph 30 chunks in 2 graph layers Data type float64 numpy.ndarray",53  25  2920,

Unnamed: 0,Array,Chunk
Bytes,29.52 MiB,1.11 MiB
Shape,"(2920, 25, 53)","(2920, 5, 10)"
Dask graph,30 chunks in 2 graph layers,30 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


## Simple example

Here is an example

In [5]:
def time_mean(obj: xr.Dataset):
    # use xarray's convenient API here
    # you could convert to a pandas dataframe and use pandas' extensive API
    # or use .plot() and plt.savefig to save visualizations to disk in parallel.
    print(f"received obj of type {type(obj)}")
    print("It contains the following data variables:")
    for data_var in obj.data_vars:
        print(f"'{data_var}' with shape {obj[data_var].shape}")

    return obj.mean("time")


ds.map_blocks(time_mean)  # this is lazy!

received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (0, 0, 0)


Unnamed: 0,Array,Chunk
Bytes,10.35 kiB,400 B
Shape,"(25, 53)","(5, 10)"
Dask graph,30 chunks in 4 graph layers,30 chunks in 4 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 10.35 kiB 400 B Shape (25, 53) (5, 10) Dask graph 30 chunks in 4 graph layers Data type float64 numpy.ndarray",53  25,

Unnamed: 0,Array,Chunk
Bytes,10.35 kiB,400 B
Shape,"(25, 53)","(5, 10)"
Dask graph,30 chunks in 4 graph layers,30 chunks in 4 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [6]:
# this triggers the actual computation
ds.map_blocks(time_mean).compute()

received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (0, 0, 0)
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (2920, 5, 10)
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (2920, 5, 10)
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (2920, 5, 10)
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (2920, 5, 10)
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (2920, 5, 3)
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (2920, 5, 10)
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the follow

In [7]:
# this will calculate values and will return True if the computation works as expected
ds.map_blocks(time_mean).equals(ds.mean("time"))

received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (0, 0, 0)
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (2920, 5, 10)
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (2920, 5, 10)
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (2920, 5, 10)
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (2920, 5, 10)
received obj of type <class 'xarray.core.dataset.Dataset'>received obj of type <class 'xarray.core.dataset.Dataset'>
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (2920, 5, 10)

It contains the following data variables:
'air' with shape (2920, 5, 10)
It contains the follo

True

### Exercises


#### Exercise 1

When opening the dataset, set the chunks along the dimension to anything smaller than the size of the time dimension (< 2920), e.g. `"time": 100`:

```python
ds = xr.tutorial.open_dataset("air_temperature", chunks={"time": 100, "lat": 25, "lon": 53})
```
 
The result of `ds.map_blocks(time_mean)` is no more equivalent to `ds.mean("time")`. Why does `ds.map_blocks(time_mean)` return a different result this time? Examine the shape of the chunk passed to `ds.map_blocks(time_mean)`.

#### Exercise 2

Try applying the following function with `map_blocks`. Specify `scale` as an
argument and `offset` as a kwarg.

The docstring should help:
https://docs.xarray.dev/en/stable/generated/xarray.map_blocks.html

```
def time_mean_scaled(obj, scale, offset):
    return obj.mean("lat") * scale + offset
```

### More advanced functions

`map_blocks` needs to know what the returned object looks like _exactly_. It
does so by passing a 0-shaped xarray object to the function and examining the
result. This approach cannot work in all cases For such advanced use cases,
`map_blocks` allows a `template` kwarg. See
https://docs.xarray.dev/en/stable/user-guide/dask.html#map-blocks for more details


In [8]:
client.close()