# ERASTAR : Ouverture du datacube avec Dask 

Dans la partie précédente, nous avons créé un datacube, ici nous allons voir au moyen d'opérations simple, comment utiliser et configurer ce datacube pour optimiser sont utilisation avec Dask.

In [1]:
import os
import dask
import intake
import xarray as xr
import dask_jobqueue
import dask.distributed

from pathlib import Path

## Cluster Dask

In [2]:
# Création du cluster Dask
cluster = dask_jobqueue.PBSCluster(queue='mpi', 
                                   cores=28,
                                   memory="115GB",
                                   walltime="04:00:00",
                                   interface='ib0',
                                   local_directory='/tmp',
                                   log_directory='/home1/scratch/gcaer/dask-logs',
                                   #processes=,
                                  )
cluster.scale(jobs=2)

Perhaps you already have a cluster running?
Hosting the HTTP server on port 50395 instead


In [3]:
client = dask.distributed.Client(cluster, timeout=600)
client

0,1
Connection method: Cluster object,Cluster type: dask_jobqueue.PBSCluster
Dashboard: http://10.148.1.123:50395/status,

0,1
Dashboard: http://10.148.1.123:50395/status,Workers: 0
Total threads: 0,Total memory: 0 B

0,1
Comm: tcp://10.148.1.123:35518,Workers: 0
Dashboard: http://10.148.1.123:50395/status,Total threads: 0
Started: Just now,Total memory: 0 B


In [4]:
port = client.scheduler_info()["services"]["dashboard"]
ssh_command = f'ssh -N -L {port}:{os.environ["HOSTNAME"]}:{port} {os.environ["USER"]}@datarmor.ifremer.fr'

print(f"{ssh_command}")
print(f"open browser at address of the type: http://localhost:{port}")

ssh -N -L 50395:r2i2n25:50395 gcaer@datarmor.ifremer.fr
open browser at address of the type: http://localhost:50395


## Ouverture du datacube

In [5]:
# Chemin vers le datacube
wrk = Path('/home/datawork-data-terra/odatis/data')
name = 'erastar'
version = 'datacube-year'
output = wrk / name / version

In [6]:
def open_datacube(output, chunks, start, end):
    """Ouverture du datacube"""
    vars = ["es_u10s", "es_v10s", "e5_u10s", "e5_v10s", "count", "quality_flag"]
    cat = intake.open_catalog(output / "reference.yaml")
    datacubes = [cat[str(year)](chunks=chunks).to_dask().drop_vars(vars) for year in range(start, end+1)]
    datacube = xr.concat(datacubes, dim="time")
    return datacube

Nous allons commencer par ouvrir le datacube sans préciser la taille des chunks.

In [22]:
%%time
chunks = {}
start, end = 2015, 2017
datacube = open_datacube(output, chunks, start, end)

  'dims': dict(self._ds.dims),
  'dims': dict(self._ds.dims),


CPU times: user 1.38 s, sys: 112 ms, total: 1.5 s
Wall time: 1.49 s


  'dims': dict(self._ds.dims),


In [23]:
datacube

Unnamed: 0,Array,Chunk
Bytes,406.38 GiB,15.82 MiB
Shape,"(26304, 1440, 2880)","(1, 1440, 2880)"
Dask graph,26304 chunks in 7 graph layers,26304 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 406.38 GiB 15.82 MiB Shape (26304, 1440, 2880) (1, 1440, 2880) Dask graph 26304 chunks in 7 graph layers Data type float32 numpy.ndarray",2880  1440  26304,

Unnamed: 0,Array,Chunk
Bytes,406.38 GiB,15.82 MiB
Shape,"(26304, 1440, 2880)","(1, 1440, 2880)"
Dask graph,26304 chunks in 7 graph layers,26304 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,406.38 GiB,15.82 MiB
Shape,"(26304, 1440, 2880)","(1, 1440, 2880)"
Dask graph,26304 chunks in 7 graph layers,26304 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 406.38 GiB 15.82 MiB Shape (26304, 1440, 2880) (1, 1440, 2880) Dask graph 26304 chunks in 7 graph layers Data type float32 numpy.ndarray",2880  1440  26304,

Unnamed: 0,Array,Chunk
Bytes,406.38 GiB,15.82 MiB
Shape,"(26304, 1440, 2880)","(1, 1440, 2880)"
Dask graph,26304 chunks in 7 graph layers,26304 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,406.38 GiB,15.82 MiB
Shape,"(26304, 1440, 2880)","(1, 1440, 2880)"
Dask graph,26304 chunks in 7 graph layers,26304 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 406.38 GiB 15.82 MiB Shape (26304, 1440, 2880) (1, 1440, 2880) Dask graph 26304 chunks in 7 graph layers Data type float32 numpy.ndarray",2880  1440  26304,

Unnamed: 0,Array,Chunk
Bytes,406.38 GiB,15.82 MiB
Shape,"(26304, 1440, 2880)","(1, 1440, 2880)"
Dask graph,26304 chunks in 7 graph layers,26304 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,406.38 GiB,15.82 MiB
Shape,"(26304, 1440, 2880)","(1, 1440, 2880)"
Dask graph,26304 chunks in 7 graph layers,26304 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 406.38 GiB 15.82 MiB Shape (26304, 1440, 2880) (1, 1440, 2880) Dask graph 26304 chunks in 7 graph layers Data type float32 numpy.ndarray",2880  1440  26304,

Unnamed: 0,Array,Chunk
Bytes,406.38 GiB,15.82 MiB
Shape,"(26304, 1440, 2880)","(1, 1440, 2880)"
Dask graph,26304 chunks in 7 graph layers,26304 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


Les chunks par défaut sont de la taille: `{"time": 1, "latitude": 1440, "longitude": 2880}`. On reconnait ici les chunks natif des fichiers netCDF.

## Task Graph

Nous regardons maintenant le task-graph correspondant au chargement de ce datacube. 

In [13]:
dask.delayed(datacube).dask

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  8760  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-es_tauv-92e794d8904c6e84282faa9323fefa8c",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,8760
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-es_tauv-92e794d8904c6e84282faa9323fefa8c

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  8784  shape  (8784, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-es_tauv-4b5e865f4febfe5e853935953e0e2236",2880  1440  8784

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,8784
shape,"(8784, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-es_tauv-4b5e865f4febfe5e853935953e0e2236

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  8760  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-es_tauv-583b1913ebefb688a679d9134093bbdc",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,8760
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-es_tauv-583b1913ebefb688a679d9134093bbdc

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  26304  shape  (26304, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-es_tauv-92e794d8904c6e84282faa9323fefa8c  open_dataset-es_tauv-4b5e865f4febfe5e853935953e0e2236  open_dataset-es_tauv-583b1913ebefb688a679d9134093bbdc",2880  1440  26304

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,26304
shape,"(26304, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-es_tauv-92e794d8904c6e84282faa9323fefa8c
,open_dataset-es_tauv-4b5e865f4febfe5e853935953e0e2236

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  8760  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-es_tauu-92e794d8904c6e84282faa9323fefa8c",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,8760
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-es_tauu-92e794d8904c6e84282faa9323fefa8c

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  8784  shape  (8784, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-es_tauu-4b5e865f4febfe5e853935953e0e2236",2880  1440  8784

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,8784
shape,"(8784, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-es_tauu-4b5e865f4febfe5e853935953e0e2236

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  8760  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-es_tauu-583b1913ebefb688a679d9134093bbdc",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,8760
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-es_tauu-583b1913ebefb688a679d9134093bbdc

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  26304  shape  (26304, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-es_tauu-92e794d8904c6e84282faa9323fefa8c  open_dataset-es_tauu-583b1913ebefb688a679d9134093bbdc  open_dataset-es_tauu-4b5e865f4febfe5e853935953e0e2236",2880  1440  26304

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,26304
shape,"(26304, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-es_tauu-92e794d8904c6e84282faa9323fefa8c
,open_dataset-es_tauu-583b1913ebefb688a679d9134093bbdc

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  8760  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-e5_tauv-92e794d8904c6e84282faa9323fefa8c",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,8760
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-e5_tauv-92e794d8904c6e84282faa9323fefa8c

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  8784  shape  (8784, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-e5_tauv-4b5e865f4febfe5e853935953e0e2236",2880  1440  8784

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,8784
shape,"(8784, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-e5_tauv-4b5e865f4febfe5e853935953e0e2236

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  8760  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-e5_tauv-583b1913ebefb688a679d9134093bbdc",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,8760
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-e5_tauv-583b1913ebefb688a679d9134093bbdc

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  26304  shape  (26304, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-e5_tauv-4b5e865f4febfe5e853935953e0e2236  open_dataset-e5_tauv-583b1913ebefb688a679d9134093bbdc  open_dataset-e5_tauv-92e794d8904c6e84282faa9323fefa8c",2880  1440  26304

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,26304
shape,"(26304, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-e5_tauv-4b5e865f4febfe5e853935953e0e2236
,open_dataset-e5_tauv-583b1913ebefb688a679d9134093bbdc

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  8760  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-e5_tauu-92e794d8904c6e84282faa9323fefa8c",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,8760
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-e5_tauu-92e794d8904c6e84282faa9323fefa8c

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  8784  shape  (8784, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-e5_tauu-4b5e865f4febfe5e853935953e0e2236",2880  1440  8784

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,8784
shape,"(8784, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-e5_tauu-4b5e865f4febfe5e853935953e0e2236

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  8760  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-e5_tauu-583b1913ebefb688a679d9134093bbdc",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,8760
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-e5_tauu-583b1913ebefb688a679d9134093bbdc

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  26304  shape  (26304, 1440, 2880)  dtype  float32  chunksize  (1, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-e5_tauu-4b5e865f4febfe5e853935953e0e2236  open_dataset-e5_tauu-583b1913ebefb688a679d9134093bbdc  open_dataset-e5_tauu-92e794d8904c6e84282faa9323fefa8c",2880  1440  26304

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,26304
shape,"(26304, 1440, 2880)"
dtype,float32
chunksize,"(1, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-e5_tauu-4b5e865f4febfe5e853935953e0e2236
,open_dataset-e5_tauu-583b1913ebefb688a679d9134093bbdc

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1  depends on concatenate-e67c2c119a4c2b85dcadc7afd637dcc8  concatenate-3986947ad47c4c87a23f1971f2a8c3b4  concatenate-c1e00fef91f9fd1d96c191ab50ea1d0e  concatenate-5dcbd234a7139d921079546801344330,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1
depends on,concatenate-e67c2c119a4c2b85dcadc7afd637dcc8
,concatenate-3986947ad47c4c87a23f1971f2a8c3b4
,concatenate-c1e00fef91f9fd1d96c191ab50ea1d0e
,concatenate-5dcbd234a7139d921079546801344330

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1  depends on finalize-e179663c-d2a7-4d62-906f-aa812f5555ea,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1
depends on,finalize-e179663c-d2a7-4d62-906f-aa812f5555ea


Nous voyons que le chargement complet de ce datacube nécessite **30 layers** avec **210 446 tasks**. Il s'agit donc d'un gros task-graph, il serait donc intéréssant de voir comment Dask se comporte si on ne sélectionne qu'une tranche du datacube. 

In [24]:
%%time
result = datacube.isel(time=slice(0,12)).compute()

CPU times: user 1.12 s, sys: 1.06 s, total: 2.18 s
Wall time: 3.26 s


Nous obtenons le task-graph suivants:
  
<img src="./images/datacube-chunks-natif.png">

- `original`: Nous avons 4 tasks `original`, ces tasks correspondent à l'accès aux jeux de données, ici nous en avons 4 car nous avons 4 variables dans notre datacube (e5_tauu, e5_tauv, es_tauu, es_tauv, en effet Dask procède chaque variable de manière parallèle).
- `getitem`: Nous avons 48 tasks `getitem`, en effet si on regarde le nombre de chunks pour la variable `e5_tauu` sur la période `time=slice(0,12))` nous avons **12 chunks**. Comme nous avons 4 variables, nous avons donc 4 * 12 chunks ouvert et pour chacun d'entre eux nous récupérons toutes les données.

```Python
datacube.isel(time=slice(0,12)).e5_tauu.data
```

<img src="./images/datacube-isel-chunks.png">

La 1ère chose que nous voyons ici c'est que notre opération n'a pas nécessité l'éxecution des **210 446 tasks**, mais nous avons drastiquement réduit le nombre de tasks executé. En effet Dask ne s'intérèsse ici qu'au chunks qui sont concerné par notre opération. 

La 2ème chose, est que nous pouvons encore diminuer le nombre de tasks pour celà nous allons redéfinir la taille des chunks lors de la lecture du datacube (ils s'agit de chunks virtuels, mais qui permettent d'optimiser le fonctionnement de dask). 

## Optimisation de la taille des chunks

In [26]:
%%time
chunks = {"time": 6, "latitude": -1, "longitude": -1}
datacube = open_datacube(output, chunks, start, end)

  'dims': dict(self._ds.dims),
  'dims': dict(self._ds.dims),


CPU times: user 1.34 s, sys: 116 ms, total: 1.45 s
Wall time: 1.45 s


  'dims': dict(self._ds.dims),


In [27]:
dask.delayed(datacube).dask

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  1460  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-es_tauv-6e4468c15ef36bfca0863863d6ca9d99",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1460
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-es_tauv-6e4468c15ef36bfca0863863d6ca9d99

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  1464  shape  (8784, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-es_tauv-7408dc9a6965ba35a2bb4084eedf4038",2880  1440  8784

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1464
shape,"(8784, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-es_tauv-7408dc9a6965ba35a2bb4084eedf4038

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  1460  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-es_tauv-95b6265ae7ebcd29481bfa094b60f7f7",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1460
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-es_tauv-95b6265ae7ebcd29481bfa094b60f7f7

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  4384  shape  (26304, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-es_tauv-6e4468c15ef36bfca0863863d6ca9d99  open_dataset-es_tauv-95b6265ae7ebcd29481bfa094b60f7f7  open_dataset-es_tauv-7408dc9a6965ba35a2bb4084eedf4038",2880  1440  26304

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,4384
shape,"(26304, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-es_tauv-6e4468c15ef36bfca0863863d6ca9d99
,open_dataset-es_tauv-95b6265ae7ebcd29481bfa094b60f7f7

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  1460  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-es_tauu-6e4468c15ef36bfca0863863d6ca9d99",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1460
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-es_tauu-6e4468c15ef36bfca0863863d6ca9d99

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  1464  shape  (8784, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-es_tauu-7408dc9a6965ba35a2bb4084eedf4038",2880  1440  8784

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1464
shape,"(8784, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-es_tauu-7408dc9a6965ba35a2bb4084eedf4038

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  1460  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-es_tauu-95b6265ae7ebcd29481bfa094b60f7f7",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1460
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-es_tauu-95b6265ae7ebcd29481bfa094b60f7f7

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  4384  shape  (26304, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-es_tauu-95b6265ae7ebcd29481bfa094b60f7f7  open_dataset-es_tauu-7408dc9a6965ba35a2bb4084eedf4038  open_dataset-es_tauu-6e4468c15ef36bfca0863863d6ca9d99",2880  1440  26304

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,4384
shape,"(26304, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-es_tauu-95b6265ae7ebcd29481bfa094b60f7f7
,open_dataset-es_tauu-7408dc9a6965ba35a2bb4084eedf4038

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  1460  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-e5_tauv-6e4468c15ef36bfca0863863d6ca9d99",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1460
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-e5_tauv-6e4468c15ef36bfca0863863d6ca9d99

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  1464  shape  (8784, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-e5_tauv-7408dc9a6965ba35a2bb4084eedf4038",2880  1440  8784

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1464
shape,"(8784, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-e5_tauv-7408dc9a6965ba35a2bb4084eedf4038

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  1460  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-e5_tauv-95b6265ae7ebcd29481bfa094b60f7f7",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1460
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-e5_tauv-95b6265ae7ebcd29481bfa094b60f7f7

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  4384  shape  (26304, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-e5_tauv-95b6265ae7ebcd29481bfa094b60f7f7  open_dataset-e5_tauv-7408dc9a6965ba35a2bb4084eedf4038  open_dataset-e5_tauv-6e4468c15ef36bfca0863863d6ca9d99",2880  1440  26304

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,4384
shape,"(26304, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-e5_tauv-95b6265ae7ebcd29481bfa094b60f7f7
,open_dataset-e5_tauv-7408dc9a6965ba35a2bb4084eedf4038

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  1460  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-e5_tauu-6e4468c15ef36bfca0863863d6ca9d99",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1460
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-e5_tauu-6e4468c15ef36bfca0863863d6ca9d99

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  1464  shape  (8784, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-e5_tauu-7408dc9a6965ba35a2bb4084eedf4038",2880  1440  8784

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1464
shape,"(8784, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-e5_tauu-7408dc9a6965ba35a2bb4084eedf4038

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  1460  shape  (8760, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-e5_tauu-95b6265ae7ebcd29481bfa094b60f7f7",2880  1440  8760

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,1460
shape,"(8760, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-e5_tauu-95b6265ae7ebcd29481bfa094b60f7f7

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  4384  shape  (26304, 1440, 2880)  dtype  float32  chunksize  (6, 1440, 2880)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-e5_tauu-95b6265ae7ebcd29481bfa094b60f7f7  open_dataset-e5_tauu-6e4468c15ef36bfca0863863d6ca9d99  open_dataset-e5_tauu-7408dc9a6965ba35a2bb4084eedf4038",2880  1440  26304

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,4384
shape,"(26304, 1440, 2880)"
dtype,float32
chunksize,"(6, 1440, 2880)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-e5_tauu-95b6265ae7ebcd29481bfa094b60f7f7
,open_dataset-e5_tauu-6e4468c15ef36bfca0863863d6ca9d99

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1  depends on concatenate-47f0fa9ffd17f692b618af0a846d2a83  concatenate-1bec79754867c80f645aeb2f0c3e4fa8  concatenate-6627f53439e93b61e5ca4d0435263839  concatenate-c663271aa38736da2d3a3fa6047fdd32,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1
depends on,concatenate-47f0fa9ffd17f692b618af0a846d2a83
,concatenate-1bec79754867c80f645aeb2f0c3e4fa8
,concatenate-6627f53439e93b61e5ca4d0435263839
,concatenate-c663271aa38736da2d3a3fa6047fdd32

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1  depends on finalize-63ba5ddd-8de4-4775-b470-b5dbe655e417,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1
depends on,finalize-63ba5ddd-8de4-4775-b470-b5dbe655e417


Le task graph du chargement complet de ce datacube est passé de 30 layers avec 210 446 tasks à **30 layers** avec **35086 tasks**. Nous avons donc considérablement réduit la taille du task-graph. Si on s'intérèsse à nouveau à la sélection d'une tranche, nous obtenons les résultats suivants:

In [29]:
%%time
result = datacube.isel(time=slice(0,12)).compute()

CPU times: user 964 ms, sys: 1.59 s, total: 2.56 s
Wall time: 2.83 s


Nous obtenons le task-graph suivants:

<img src="./images/datacube-chunks-opti.png">

- `original`: Nous avons toujours 4 tasks `original`, ce qui est normal car nous avons toujours 4 variables dans notre datacube. 
- `getitem`: Nous n'avons plus que 8 tasks `getitem`, car nous n'avons plus que 4 * 2 chunks ouvert et pour chacun d'entre eux nous récupérons toutes les données. 

On vois ici l'importance de définir des chunks avec une taille correcte, en plus du nombre de tasks, la taille des chunks joue sur le temps d'éxectution. La taille des chunks n'est donc pas à négliger. 