# AVISO : Ouverture du datacube avec Dask 

Dans la partie précédente, nous avons créé un datacube, ici nous allons voir au moyen d'opérations simple, comment utiliser et configurer ce datacube pour optimiser sont utilisation avec Dask.

In [1]:
import os
import dask
import intake
import xarray as xr
import dask_jobqueue
import dask.distributed

from pathlib import Path

## Cluster Dask

In [2]:
# Création du cluster Dask
cluster = dask_jobqueue.PBSCluster(queue='mpi', 
                                   cores=28,
                                   memory="115GB",
                                   walltime="04:00:00",
                                   interface='ib0',
                                   local_directory='/tmp',
                                   log_directory='/home1/scratch/gcaer/dask-logs',
                                   #processes=,
                                  )
cluster.scale(jobs=1)

Perhaps you already have a cluster running?
Hosting the HTTP server on port 52595 instead


In [3]:
client = dask.distributed.Client(cluster, timeout=600)
client

0,1
Connection method: Cluster object,Cluster type: dask_jobqueue.PBSCluster
Dashboard: http://10.148.1.134:52595/status,

0,1
Dashboard: http://10.148.1.134:52595/status,Workers: 0
Total threads: 0,Total memory: 0 B

0,1
Comm: tcp://10.148.1.134:40204,Workers: 0
Dashboard: http://10.148.1.134:52595/status,Total threads: 0
Started: Just now,Total memory: 0 B


In [4]:
port = client.scheduler_info()["services"]["dashboard"]
ssh_command = f'ssh -N -L {port}:{os.environ["HOSTNAME"]}:{port} {os.environ["USER"]}@datarmor.ifremer.fr'

print(f"{ssh_command}")
print(f"open browser at address of the type: http://localhost:{port}")

ssh -N -L 52595:r2i2n31:52595 gcaer@datarmor.ifremer.fr
open browser at address of the type: http://localhost:52595


## Ouverture du datacube

In [5]:
# Chemin vers le datacube
wrk = Path('/home/datawork-data-terra/odatis/data')
name = 'aviso'
version = 'datacube-year'
output = wrk / name / version

In [6]:
def open_datacube(output, chunks, start, end):
    """Ouverture du datacube"""
    vars = ["crs","lat_bnds","lon_bnds","ugosa","err_ugosa","vgosa","err_vgosa","ugos","vgos","flag_ice","tpa_correction","nv",]
    cat = intake.open_catalog(output / "reference.yaml")
    datacubes = [cat[str(year)](chunks=chunks).to_dask().drop_vars(vars) for year in range(start, end+1)]
    datacube = xr.concat(datacubes, dim="time")
    return datacube

Nous allons commencer par ouvrir le datacube unique (méthode 1) sans préciser la taille des chunks.

In [7]:
%%time
chunks = {}
start, end = 2015, 2017
datacube = open_datacube(output, chunks, start, end)

  'dims': dict(self._ds.dims),
  'dims': dict(self._ds.dims),
  'dims': dict(self._ds.dims),


CPU times: user 20.2 s, sys: 1.96 s, total: 22.2 s
Wall time: 29.9 s


In [8]:
datacube

Unnamed: 0,Array,Chunk
Bytes,8.47 GiB,19.53 kiB
Shape,"(1096, 720, 1440)","(1, 50, 50)"
Dask graph,476760 chunks in 7 graph layers,476760 chunks in 7 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 8.47 GiB 19.53 kiB Shape (1096, 720, 1440) (1, 50, 50) Dask graph 476760 chunks in 7 graph layers Data type float64 numpy.ndarray",1440  720  1096,

Unnamed: 0,Array,Chunk
Bytes,8.47 GiB,19.53 kiB
Shape,"(1096, 720, 1440)","(1, 50, 50)"
Dask graph,476760 chunks in 7 graph layers,476760 chunks in 7 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.47 GiB,19.53 kiB
Shape,"(1096, 720, 1440)","(1, 50, 50)"
Dask graph,476760 chunks in 7 graph layers,476760 chunks in 7 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 8.47 GiB 19.53 kiB Shape (1096, 720, 1440) (1, 50, 50) Dask graph 476760 chunks in 7 graph layers Data type float64 numpy.ndarray",1440  720  1096,

Unnamed: 0,Array,Chunk
Bytes,8.47 GiB,19.53 kiB
Shape,"(1096, 720, 1440)","(1, 50, 50)"
Dask graph,476760 chunks in 7 graph layers,476760 chunks in 7 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.47 GiB,19.53 kiB
Shape,"(1096, 720, 1440)","(1, 50, 50)"
Dask graph,476760 chunks in 7 graph layers,476760 chunks in 7 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 8.47 GiB 19.53 kiB Shape (1096, 720, 1440) (1, 50, 50) Dask graph 476760 chunks in 7 graph layers Data type float64 numpy.ndarray",1440  720  1096,

Unnamed: 0,Array,Chunk
Bytes,8.47 GiB,19.53 kiB
Shape,"(1096, 720, 1440)","(1, 50, 50)"
Dask graph,476760 chunks in 7 graph layers,476760 chunks in 7 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Les chunks par défaut sont de la taille: `{"time": 1, "latitude": 50, "longitude": 50}`. On reconnait ici les chunks natifs des fichiers netCDF.

## Task Graph

Nous regardons maintenant le task-graph correspondant au chargement de ce datacube. 

In [9]:
dask.delayed(datacube).dask

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  158775  shape  (365, 720, 1440)  dtype  float64  chunksize  (1, 50, 50)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-sla-297df2173a269d3c807601d77152b2e4",1440  720  365

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,158775
shape,"(365, 720, 1440)"
dtype,float64
chunksize,"(1, 50, 50)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-sla-297df2173a269d3c807601d77152b2e4

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  159210  shape  (366, 720, 1440)  dtype  float64  chunksize  (1, 50, 50)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-sla-561de9066c14787b37644cd368fd2b2f",1440  720  366

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,159210
shape,"(366, 720, 1440)"
dtype,float64
chunksize,"(1, 50, 50)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-sla-561de9066c14787b37644cd368fd2b2f

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  158775  shape  (365, 720, 1440)  dtype  float64  chunksize  (1, 50, 50)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-sla-ee0ed287cdc557ac5db25b974fe11157",1440  720  365

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,158775
shape,"(365, 720, 1440)"
dtype,float64
chunksize,"(1, 50, 50)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-sla-ee0ed287cdc557ac5db25b974fe11157

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  476760  shape  (1096, 720, 1440)  dtype  float64  chunksize  (1, 50, 50)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-sla-561de9066c14787b37644cd368fd2b2f  open_dataset-sla-ee0ed287cdc557ac5db25b974fe11157  open_dataset-sla-297df2173a269d3c807601d77152b2e4",1440  720  1096

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,476760
shape,"(1096, 720, 1440)"
dtype,float64
chunksize,"(1, 50, 50)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-sla-561de9066c14787b37644cd368fd2b2f
,open_dataset-sla-ee0ed287cdc557ac5db25b974fe11157

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  158775  shape  (365, 720, 1440)  dtype  float64  chunksize  (1, 50, 50)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-err_sla-297df2173a269d3c807601d77152b2e4",1440  720  365

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,158775
shape,"(365, 720, 1440)"
dtype,float64
chunksize,"(1, 50, 50)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-err_sla-297df2173a269d3c807601d77152b2e4

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  159210  shape  (366, 720, 1440)  dtype  float64  chunksize  (1, 50, 50)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-err_sla-561de9066c14787b37644cd368fd2b2f",1440  720  366

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,159210
shape,"(366, 720, 1440)"
dtype,float64
chunksize,"(1, 50, 50)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-err_sla-561de9066c14787b37644cd368fd2b2f

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  158775  shape  (365, 720, 1440)  dtype  float64  chunksize  (1, 50, 50)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-err_sla-ee0ed287cdc557ac5db25b974fe11157",1440  720  365

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,158775
shape,"(365, 720, 1440)"
dtype,float64
chunksize,"(1, 50, 50)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-err_sla-ee0ed287cdc557ac5db25b974fe11157

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  476760  shape  (1096, 720, 1440)  dtype  float64  chunksize  (1, 50, 50)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-err_sla-297df2173a269d3c807601d77152b2e4  open_dataset-err_sla-ee0ed287cdc557ac5db25b974fe11157  open_dataset-err_sla-561de9066c14787b37644cd368fd2b2f",1440  720  1096

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,476760
shape,"(1096, 720, 1440)"
dtype,float64
chunksize,"(1, 50, 50)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-err_sla-297df2173a269d3c807601d77152b2e4
,open_dataset-err_sla-ee0ed287cdc557ac5db25b974fe11157

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  158775  shape  (365, 720, 1440)  dtype  float64  chunksize  (1, 50, 50)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-adt-297df2173a269d3c807601d77152b2e4",1440  720  365

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,158775
shape,"(365, 720, 1440)"
dtype,float64
chunksize,"(1, 50, 50)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-adt-297df2173a269d3c807601d77152b2e4

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  159210  shape  (366, 720, 1440)  dtype  float64  chunksize  (1, 50, 50)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-adt-561de9066c14787b37644cd368fd2b2f",1440  720  366

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,159210
shape,"(366, 720, 1440)"
dtype,float64
chunksize,"(1, 50, 50)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-adt-561de9066c14787b37644cd368fd2b2f

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  158775  shape  (365, 720, 1440)  dtype  float64  chunksize  (1, 50, 50)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-adt-ee0ed287cdc557ac5db25b974fe11157",1440  720  365

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,158775
shape,"(365, 720, 1440)"
dtype,float64
chunksize,"(1, 50, 50)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-adt-ee0ed287cdc557ac5db25b974fe11157

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  476760  shape  (1096, 720, 1440)  dtype  float64  chunksize  (1, 50, 50)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-adt-561de9066c14787b37644cd368fd2b2f  open_dataset-adt-ee0ed287cdc557ac5db25b974fe11157  open_dataset-adt-297df2173a269d3c807601d77152b2e4",1440  720  1096

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,476760
shape,"(1096, 720, 1440)"
dtype,float64
chunksize,"(1, 50, 50)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-adt-561de9066c14787b37644cd368fd2b2f
,open_dataset-adt-ee0ed287cdc557ac5db25b974fe11157

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1  depends on concatenate-d8dab8059d63060ba86b1c84d7b360cc  concatenate-05384e74d478e5a08b5f527557b51da0  concatenate-ce7490627f2c360ea65897cda0266736,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1
depends on,concatenate-d8dab8059d63060ba86b1c84d7b360cc
,concatenate-05384e74d478e5a08b5f527557b51da0
,concatenate-ce7490627f2c360ea65897cda0266736

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1  depends on finalize-c21ba912-ae6b-452b-a372-c55161f19d96,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1
depends on,finalize-c21ba912-ae6b-452b-a372-c55161f19d96


Nous voyons que le chargement complet de ce datacube nécessite **23 layers** avec **2 860 571 tasks**. Il s'agit donc d'un gros task-graph, il serait donc intéréssant de voir comment Dask se comporte si on ne sélectionne qu'une tranche du datacube. 

In [10]:
%%time
result = datacube.isel(time=slice(0,2)).compute()

CPU times: user 9.25 s, sys: 872 ms, total: 10.1 s
Wall time: 23.4 s


Nous obtenons le task-graph suivants:
  
<img src="./images/datacube-chunks-natif.png">

- `original`: Nous avons 3 tasks `original`, ces tasks correspondent à l'accès aux jeux de données, ici nous en avons 3 car nous avons 3 variables dans notre datacube (adt, sla et err_sla, en effet Dask procède chaque variable de manière parallèle).
- `getitem`: Nous avons 2610 tasks `getitem`, en effet si on regarde le nombre de chunks pour la variable `sla` sur la période `time=slice(0,2))` nous avons **870 chunks**. Comme nous avons 3 variables, nous avons donc 3 * 870 chunks ouvert et pour chacun d'entre eux nous récupérons toutes les données.

```Python
datacube.isel(time=slice(0,2)).sla.data
```

<img src="./images/datacube-isel-chunks.png">

La 1ère chose que nous voyons ici c'est que notre opération n'a pas nécessité l'éxecution des **1 430 285 tasks**, mais nous avons drastiquement réduit le nombre de tasks executé. En effet Dask ne s'intérèsse ici qu'au chunks qui sont concerné par notre opération. 

La 2ème chose, est que nous pouvons encore diminuer le nombre de tasks pour celà nous allons redéfinir la taille des chunks lors de la lecture du datacube (ils s'agit de chunks virtuels, mais qui permettent d'optimiser le fonctionnement de dask). 

## Optimisation de la taille des chunks

In [1]:
%%time
chunks = {"time": 1, "latitude": -1, "longitude": -1}
datacube = open_datacube(output, chunks, start, end)

NameError: name 'open_datacube' is not defined

In [16]:
dask.delayed(datacube).dask

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  365  shape  (365, 720, 1440)  dtype  float64  chunksize  (1, 720, 1440)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-sla-4d304e977be29b8ff62402ec65261126",1440  720  365

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,365
shape,"(365, 720, 1440)"
dtype,float64
chunksize,"(1, 720, 1440)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-sla-4d304e977be29b8ff62402ec65261126

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  366  shape  (366, 720, 1440)  dtype  float64  chunksize  (1, 720, 1440)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-sla-357b44b9b5b9c90d62fc0eb6d35d43ea",1440  720  366

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,366
shape,"(366, 720, 1440)"
dtype,float64
chunksize,"(1, 720, 1440)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-sla-357b44b9b5b9c90d62fc0eb6d35d43ea

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  365  shape  (365, 720, 1440)  dtype  float64  chunksize  (1, 720, 1440)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-sla-88fe4ef2d01b58b02b911dc509e06d0f",1440  720  365

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,365
shape,"(365, 720, 1440)"
dtype,float64
chunksize,"(1, 720, 1440)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-sla-88fe4ef2d01b58b02b911dc509e06d0f

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  1096  shape  (1096, 720, 1440)  dtype  float64  chunksize  (1, 720, 1440)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-sla-357b44b9b5b9c90d62fc0eb6d35d43ea  open_dataset-sla-88fe4ef2d01b58b02b911dc509e06d0f  open_dataset-sla-4d304e977be29b8ff62402ec65261126",1440  720  1096

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1096
shape,"(1096, 720, 1440)"
dtype,float64
chunksize,"(1, 720, 1440)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-sla-357b44b9b5b9c90d62fc0eb6d35d43ea
,open_dataset-sla-88fe4ef2d01b58b02b911dc509e06d0f

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  365  shape  (365, 720, 1440)  dtype  float64  chunksize  (1, 720, 1440)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-err_sla-4d304e977be29b8ff62402ec65261126",1440  720  365

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,365
shape,"(365, 720, 1440)"
dtype,float64
chunksize,"(1, 720, 1440)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-err_sla-4d304e977be29b8ff62402ec65261126

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  366  shape  (366, 720, 1440)  dtype  float64  chunksize  (1, 720, 1440)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-err_sla-357b44b9b5b9c90d62fc0eb6d35d43ea",1440  720  366

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,366
shape,"(366, 720, 1440)"
dtype,float64
chunksize,"(1, 720, 1440)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-err_sla-357b44b9b5b9c90d62fc0eb6d35d43ea

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  365  shape  (365, 720, 1440)  dtype  float64  chunksize  (1, 720, 1440)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-err_sla-88fe4ef2d01b58b02b911dc509e06d0f",1440  720  365

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,365
shape,"(365, 720, 1440)"
dtype,float64
chunksize,"(1, 720, 1440)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-err_sla-88fe4ef2d01b58b02b911dc509e06d0f

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  1096  shape  (1096, 720, 1440)  dtype  float64  chunksize  (1, 720, 1440)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-err_sla-357b44b9b5b9c90d62fc0eb6d35d43ea  open_dataset-err_sla-88fe4ef2d01b58b02b911dc509e06d0f  open_dataset-err_sla-4d304e977be29b8ff62402ec65261126",1440  720  1096

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1096
shape,"(1096, 720, 1440)"
dtype,float64
chunksize,"(1, 720, 1440)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-err_sla-357b44b9b5b9c90d62fc0eb6d35d43ea
,open_dataset-err_sla-88fe4ef2d01b58b02b911dc509e06d0f

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  365  shape  (365, 720, 1440)  dtype  float64  chunksize  (1, 720, 1440)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-adt-4d304e977be29b8ff62402ec65261126",1440  720  365

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,365
shape,"(365, 720, 1440)"
dtype,float64
chunksize,"(1, 720, 1440)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-adt-4d304e977be29b8ff62402ec65261126

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  366  shape  (366, 720, 1440)  dtype  float64  chunksize  (1, 720, 1440)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-adt-357b44b9b5b9c90d62fc0eb6d35d43ea",1440  720  366

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,366
shape,"(366, 720, 1440)"
dtype,float64
chunksize,"(1, 720, 1440)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-adt-357b44b9b5b9c90d62fc0eb6d35d43ea

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1

0,1
"layer_type  Blockwise  is_materialized  False  number of outputs  365  shape  (365, 720, 1440)  dtype  float64  chunksize  (1, 720, 1440)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on original-open_dataset-adt-88fe4ef2d01b58b02b911dc509e06d0f",1440  720  365

0,1
layer_type,Blockwise
is_materialized,False
number of outputs,365
shape,"(365, 720, 1440)"
dtype,float64
chunksize,"(1, 720, 1440)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,original-open_dataset-adt-88fe4ef2d01b58b02b911dc509e06d0f

0,1
"layer_type  MaterializedLayer  is_materialized  True  number of outputs  1096  shape  (1096, 720, 1440)  dtype  float64  chunksize  (1, 720, 1440)  type  dask.array.core.Array  chunk_type  numpy.ndarray  depends on open_dataset-adt-357b44b9b5b9c90d62fc0eb6d35d43ea  open_dataset-adt-88fe4ef2d01b58b02b911dc509e06d0f  open_dataset-adt-4d304e977be29b8ff62402ec65261126",1440  720  1096

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1096
shape,"(1096, 720, 1440)"
dtype,float64
chunksize,"(1, 720, 1440)"
type,dask.array.core.Array
chunk_type,numpy.ndarray
depends on,open_dataset-adt-357b44b9b5b9c90d62fc0eb6d35d43ea
,open_dataset-adt-88fe4ef2d01b58b02b911dc509e06d0f

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1  depends on concatenate-f174948f6d08c8e15a8d21610037af87  concatenate-c5bffab1e74f7e06d893a51297a9aeff  concatenate-05c07ec942da0d423756a778dc278606,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1
depends on,concatenate-f174948f6d08c8e15a8d21610037af87
,concatenate-c5bffab1e74f7e06d893a51297a9aeff
,concatenate-05c07ec942da0d423756a778dc278606

0,1
layer_type  MaterializedLayer  is_materialized  True  number of outputs  1  depends on finalize-474590a2-8ea5-43e1-8ea8-73576511c243,

0,1
layer_type,MaterializedLayer
is_materialized,True
number of outputs,1
depends on,finalize-474590a2-8ea5-43e1-8ea8-73576511c243


Le task graph du chargement complet de ce datacube est passé de 23 layers avec 2 860 571 tasks à **23 layers** avec **6587 tasks**. Nous avons donc considérablement réduit la taille du task-graph. Si on s'intérèsse à nouveau à la sélection d'une tranche, nous obtenons les résultats suivants:

In [19]:
%%time
result = datacube.isel(time=slice(0,2)).compute()

CPU times: user 60 ms, sys: 40 ms, total: 100 ms
Wall time: 176 ms


Nous obtenons le task-graph suivants:

<img src="./images/datacube-chunks-opti.png">

- `original`: Nous avons toujours 3 tasks `original`, ce qui est normal car nous avons toujours 3 variables dans notre datacube. 
- `getitem`: Nous n'avons plus que 6 tasks `getitem`, car nous n'avons plus que 3 * 2 chunks ouvert et pour chacun d'entre eux nous récupérons toutes les données. 

On vois ici l'importance de définir des chunks avec une taille correcte, en plus du nombre de tasks, la taille des chunks joue sur le temps d'éxectution. La taille des chunks n'est donc pas à négliger. 