# A gentle introduction

`map_blocks` is inspired by the `dask.array` function of the same name and lets
you map a function on blocks of the xarray object (including Datasets!).

At _compute_ time, your function will receive a chunk of an xarray object with concrete
(computed) values along with appropriate metadata. This function should return
an xarray object.


## Setup

In [1]:
import dask
import numpy as np
import xarray as xr

First lets set up a `LocalCluster` using [dask.distributed](https://distributed.dask.org/).

You can use any kind of dask cluster. This step is completely independent of
xarray. While not strictly necessary, the dashboard provides a nice learning
tool.


In [2]:
from dask.distributed import Client

client = Client()
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 16,Total memory: 31.09 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:45239,Workers: 0
Dashboard: http://127.0.0.1:8787/status,Total threads: 0
Started: Just now,Total memory: 0 B

0,1
Comm: tcp://127.0.0.1:35349,Total threads: 4
Dashboard: http://127.0.0.1:33045/status,Memory: 7.77 GiB
Nanny: tcp://127.0.0.1:38053,
Local directory: /tmp/dask-scratch-space/worker-hw9wiz8v,Local directory: /tmp/dask-scratch-space/worker-hw9wiz8v

0,1
Comm: tcp://127.0.0.1:40513,Total threads: 4
Dashboard: http://127.0.0.1:46365/status,Memory: 7.77 GiB
Nanny: tcp://127.0.0.1:34013,
Local directory: /tmp/dask-scratch-space/worker-jqmcw62x,Local directory: /tmp/dask-scratch-space/worker-jqmcw62x

0,1
Comm: tcp://127.0.0.1:34019,Total threads: 4
Dashboard: http://127.0.0.1:33159/status,Memory: 7.77 GiB
Nanny: tcp://127.0.0.1:34259,
Local directory: /tmp/dask-scratch-space/worker-32qgbmrr,Local directory: /tmp/dask-scratch-space/worker-32qgbmrr

0,1
Comm: tcp://127.0.0.1:44431,Total threads: 4
Dashboard: http://127.0.0.1:33721/status,Memory: 7.77 GiB
Nanny: tcp://127.0.0.1:37057,
Local directory: /tmp/dask-scratch-space/worker-okna2jrf,Local directory: /tmp/dask-scratch-space/worker-okna2jrf


<p>&#128070</p> Click the Dashboard link above. Or click the "Search" button in the dashboard.

Let's test that the dashboard is working..


In [3]:
import dask.array

dask.array.ones((1000, 4), chunks=(2, 1)).compute()  # should see activity in dashboard

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       ...,
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]], shape=(1000, 4))

Let's open a dataset. We specify `chunks` so that we create a dask arrays for the DataArrays.

In [10]:
ds = xr.tutorial.open_dataset("air_temperature", chunks={"time": -1, "lat": 25, "lon": 53})
ds

Unnamed: 0,Array,Chunk
Bytes,29.52 MiB,29.52 MiB
Shape,"(2920, 25, 53)","(2920, 25, 53)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 29.52 MiB 29.52 MiB Shape (2920, 25, 53) (2920, 25, 53) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",53  25  2920,

Unnamed: 0,Array,Chunk
Bytes,29.52 MiB,29.52 MiB
Shape,"(2920, 25, 53)","(2920, 25, 53)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


## Simple example

Here is an example

In [12]:
def time_mean(obj: xr.Dataset):
    # use xarray's convenient API here
    # you could convert to a pandas dataframe and use pandas' extensive API
    # or use .plot() and plt.savefig to save visualizations to disk in parallel.
    print(f"received obj of type {type(obj)}")
    print("It contains the following data variables:")
    for data_var in obj.data_vars:
        print(f"'{data_var}' with shape {obj[data_var].shape}")

    return obj.mean("time")


ds.map_blocks(time_mean)  # this is lazy!

received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (0, 0, 0)


Unnamed: 0,Array,Chunk
Bytes,10.35 kiB,10.35 kiB
Shape,"(25, 53)","(25, 53)"
Dask graph,1 chunks in 4 graph layers,1 chunks in 4 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 10.35 kiB 10.35 kiB Shape (25, 53) (25, 53) Dask graph 1 chunks in 4 graph layers Data type float64 numpy.ndarray",53  25,

Unnamed: 0,Array,Chunk
Bytes,10.35 kiB,10.35 kiB
Shape,"(25, 53)","(25, 53)"
Dask graph,1 chunks in 4 graph layers,1 chunks in 4 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [13]:
ds.map_blocks(time_mean).compute()

received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (0, 0, 0)
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (2920, 25, 53)


In [15]:
# this will calculate values and will return True if the computation works as expected
ds.map_blocks(time_mean).identical(ds.mean("time"))

received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (0, 0, 0)
received obj of type <class 'xarray.core.dataset.Dataset'>
It contains the following data variables:
'air' with shape (2920, 25, 53)


True

### Exercises


#### Exercise 1

Change the chunks along the dimension to anything smaller than the size of the time dimension (< 2920)

```python
ds = xr.tutorial.open_dataset("air_temperature", chunks={"time": -1, "lat": 25, "lon": 53})
```

#### Exercise 2

Try applying the following function with `map_blocks`. Specify `scale` as an
argument and `offset` as a kwarg.

The docstring should help:
https://docs.xarray.dev/en/stable/generated/xarray.map_blocks.html

```
def time_mean_scaled(obj, scale, offset):
    return obj.mean("lat") * scale + offset
```

### More advanced functions

`map_blocks` needs to know what the returned object looks like _exactly_. It
does so by passing a 0-shaped xarray object to the function and examining the
result. This approach cannot work in all cases For such advanced use cases,
`map_blocks` allows a `template` kwarg. See
https://docs.xarray.dev/en/stable/user-guide/dask.html#map-blocks for more details


In [8]:
client.close()