## xcube Data Store Framework - Multi-Level Datasets

*Please checkout the general introduction to xcube data stores in the Jupyter Notebook [Getting Started](./1_getting_started.ipynb) before jumping into this notebook :)* 

This notebook explains how to generate spatial multi-level / multi-resolution datasets (image pyramids) using the `file` or `s3` data stores.

Multi-level datasets can be used with the xcube server (`xcube serve` CLI) to speed up visualisation of large data cubes.

In [1]:
from xcube.core.new import new_cube
from xcube.core.store import find_data_store_extensions
from xcube.core.store import get_data_store_params_schema
from xcube.core.store import new_data_store
from xcube.core.store import new_data_writer

### Getting prepared

In [2]:
root = "testdata"  # Directory for test data

In [3]:
def new_test_cube():
    """Generate some test data"""
    return new_cube(width=3600, height=1800, x_res=0.1, variables=dict(A=1, B=2))

In [4]:
import shutil

shutil.rmtree(root, ignore_errors=True)  # Remove existing test data

Get a data store instance. We use the local filesystem here ("file"), you can also AWS S3 ("s3") if you have a writable bucket (= root).

In [5]:
data_store = new_data_store("file", root=root)  # Could also use "s3"

Get available data openers and writers. 
Data opener and writer identifiers use the format `{data_type}:{format_name}:{storage_type}`.

In [6]:
data_store.get_data_opener_ids()

('dataset:netcdf:file',
 'dataset:zarr:file',
 'dataset:levels:file',
 'mldataset:levels:file',
 'geodataframe:shapefile:file',
 'geodataframe:geojson:file')

In [7]:
data_store.get_data_writer_ids()

('dataset:netcdf:file',
 'dataset:zarr:file',
 'dataset:levels:file',
 'mldataset:levels:file',
 'geodataframe:shapefile:file',
 'geodataframe:geojson:file')

The data openers and writers that support the `levels` format can read or write multi-level datasets.

Get the parameter schema writing datasets using the Zarr format (\*.zarr):

In [8]:
data_store.get_write_data_params_schema(writer_id="dataset:zarr:file")

<xcube.util.jsonschema.JsonObjectSchema at 0x17fe4ab9af0>

Get the parameter schema writing multi-level datasets using the Levels format (\*.levels):

In [9]:
data_store.get_write_data_params_schema(writer_id="dataset:levels:file")

<xcube.util.jsonschema.JsonObjectSchema at 0x17fe4aed400>

### Write multi-level dataset whose level 0 is a link

In [10]:
tile_size = 180
base_dataset_id = f"base_dataset_{tile_size}.zarr"
ml_dataset_id = f"ml_dataset_{tile_size}.levels"

In [11]:
dataset = new_test_cube()

Explicitely define the spatial chunks (= tiles):

In [12]:
dataset = dataset.chunk(dict(lon=tile_size, lat=tile_size))

Write the base dataset that we'll turn into a multi-level dataset:

In [13]:
data_store.write_data(dataset, data_id=base_dataset_id, replace=True)

'base_dataset_180.zarr'

Write the 1+ levels of a multi-level dataset. The output format `levels` is retrieved from used filename extension `.levels`. 
Level 0 remains the original dataset. A link to `base_dataset_id` is created instead: 

In [14]:
data_store.write_data(
    dataset, ml_dataset_id, replace=True, base_dataset_id=base_dataset_id
)

'ml_dataset_180.levels'

In [15]:
data_store.list_data_ids()

['base_dataset_180.zarr', 'ml_dataset_180.levels']

Open the new multi-level dataset:

In [16]:
ml_dataset = data_store.open_data(ml_dataset_id)
ml_dataset

<xcube.core.store.fs.impl.mldataset.FsMultiLevelDataset at 0x17ffa146fd0>

Inspect the new multi-level dataset. Check spatial chunking or `tile_size`:

In [17]:
ml_dataset.grid_mapping

class: **Coords1DGridMapping**
* is_regular: True
* is_j_axis_up: True
* is_lon_360: False
* crs: epsg:4326
* xy_res: (0.1, 0.1)
* xy_bbox: (-180, -90, 180, 90)
* ij_bbox: (0, 0, 3600, 1800)
* xy_dim_names: ('lon', 'lat')
* xy_var_names: ('lon', 'lat')
* size: (3600, 1800)
* tile_size: (180, 180)

In [18]:
ml_dataset.num_levels

5

In [19]:
for level in range(ml_dataset.num_levels):
    dataset_i = ml_dataset.get_dataset(level)
    display(dataset_i)

Unnamed: 0,Array,Chunk
Bytes,28.12 kiB,2.81 kiB
Shape,"(1800, 2)","(180, 2)"
Count,11 Tasks,10 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 28.12 kiB 2.81 kiB Shape (1800, 2) (180, 2) Count 11 Tasks 10 Chunks Type float64 numpy.ndarray",2  1800,

Unnamed: 0,Array,Chunk
Bytes,28.12 kiB,2.81 kiB
Shape,"(1800, 2)","(180, 2)"
Count,11 Tasks,10 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,56.25 kiB,2.81 kiB
Shape,"(3600, 2)","(180, 2)"
Count,21 Tasks,20 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 56.25 kiB 2.81 kiB Shape (3600, 2) (180, 2) Count 21 Tasks 20 Chunks Type float64 numpy.ndarray",2  3600,

Unnamed: 0,Array,Chunk
Bytes,56.25 kiB,2.81 kiB
Shape,"(3600, 2)","(180, 2)"
Count,21 Tasks,20 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(5, 2)","(5, 2)"
Count,2 Tasks,1 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 80 B 80 B Shape (5, 2) (5, 2) Count 2 Tasks 1 Chunks Type datetime64[ns] numpy.ndarray",2  5,

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(5, 2)","(5, 2)"
Count,2 Tasks,1 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,123.60 MiB,632.81 kiB
Shape,"(5, 1800, 3600)","(5, 180, 180)"
Count,201 Tasks,200 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 123.60 MiB 632.81 kiB Shape (5, 1800, 3600) (5, 180, 180) Count 201 Tasks 200 Chunks Type int32 numpy.ndarray",3600  1800  5,

Unnamed: 0,Array,Chunk
Bytes,123.60 MiB,632.81 kiB
Shape,"(5, 1800, 3600)","(5, 180, 180)"
Count,201 Tasks,200 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,123.60 MiB,632.81 kiB
Shape,"(5, 1800, 3600)","(5, 180, 180)"
Count,201 Tasks,200 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 123.60 MiB 632.81 kiB Shape (5, 1800, 3600) (5, 180, 180) Count 201 Tasks 200 Chunks Type int32 numpy.ndarray",3600  1800  5,

Unnamed: 0,Array,Chunk
Bytes,123.60 MiB,632.81 kiB
Shape,"(5, 1800, 3600)","(5, 180, 180)"
Count,201 Tasks,200 Chunks
Type,int32,numpy.ndarray


Unnamed: 0,Array,Chunk
Bytes,30.90 MiB,632.81 kiB
Shape,"(5, 900, 1800)","(5, 180, 180)"
Count,51 Tasks,50 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 30.90 MiB 632.81 kiB Shape (5, 900, 1800) (5, 180, 180) Count 51 Tasks 50 Chunks Type int32 numpy.ndarray",1800  900  5,

Unnamed: 0,Array,Chunk
Bytes,30.90 MiB,632.81 kiB
Shape,"(5, 900, 1800)","(5, 180, 180)"
Count,51 Tasks,50 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,30.90 MiB,632.81 kiB
Shape,"(5, 900, 1800)","(5, 180, 180)"
Count,51 Tasks,50 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 30.90 MiB 632.81 kiB Shape (5, 900, 1800) (5, 180, 180) Count 51 Tasks 50 Chunks Type int32 numpy.ndarray",1800  900  5,

Unnamed: 0,Array,Chunk
Bytes,30.90 MiB,632.81 kiB
Shape,"(5, 900, 1800)","(5, 180, 180)"
Count,51 Tasks,50 Chunks
Type,int32,numpy.ndarray


Unnamed: 0,Array,Chunk
Bytes,7.72 MiB,632.81 kiB
Shape,"(5, 450, 900)","(5, 180, 180)"
Count,16 Tasks,15 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 7.72 MiB 632.81 kiB Shape (5, 450, 900) (5, 180, 180) Count 16 Tasks 15 Chunks Type int32 numpy.ndarray",900  450  5,

Unnamed: 0,Array,Chunk
Bytes,7.72 MiB,632.81 kiB
Shape,"(5, 450, 900)","(5, 180, 180)"
Count,16 Tasks,15 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.72 MiB,632.81 kiB
Shape,"(5, 450, 900)","(5, 180, 180)"
Count,16 Tasks,15 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 7.72 MiB 632.81 kiB Shape (5, 450, 900) (5, 180, 180) Count 16 Tasks 15 Chunks Type int32 numpy.ndarray",900  450  5,

Unnamed: 0,Array,Chunk
Bytes,7.72 MiB,632.81 kiB
Shape,"(5, 450, 900)","(5, 180, 180)"
Count,16 Tasks,15 Chunks
Type,int32,numpy.ndarray


Unnamed: 0,Array,Chunk
Bytes,1.93 MiB,632.81 kiB
Shape,"(5, 225, 450)","(5, 180, 180)"
Count,7 Tasks,6 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 1.93 MiB 632.81 kiB Shape (5, 225, 450) (5, 180, 180) Count 7 Tasks 6 Chunks Type int32 numpy.ndarray",450  225  5,

Unnamed: 0,Array,Chunk
Bytes,1.93 MiB,632.81 kiB
Shape,"(5, 225, 450)","(5, 180, 180)"
Count,7 Tasks,6 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.93 MiB,632.81 kiB
Shape,"(5, 225, 450)","(5, 180, 180)"
Count,7 Tasks,6 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 1.93 MiB 632.81 kiB Shape (5, 225, 450) (5, 180, 180) Count 7 Tasks 6 Chunks Type int32 numpy.ndarray",450  225  5,

Unnamed: 0,Array,Chunk
Bytes,1.93 MiB,632.81 kiB
Shape,"(5, 225, 450)","(5, 180, 180)"
Count,7 Tasks,6 Chunks
Type,int32,numpy.ndarray


Unnamed: 0,Array,Chunk
Bytes,496.58 kiB,397.27 kiB
Shape,"(5, 113, 225)","(5, 113, 180)"
Count,3 Tasks,2 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 496.58 kiB 397.27 kiB Shape (5, 113, 225) (5, 113, 180) Count 3 Tasks 2 Chunks Type int32 numpy.ndarray",225  113  5,

Unnamed: 0,Array,Chunk
Bytes,496.58 kiB,397.27 kiB
Shape,"(5, 113, 225)","(5, 113, 180)"
Count,3 Tasks,2 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,496.58 kiB,397.27 kiB
Shape,"(5, 113, 225)","(5, 113, 180)"
Count,3 Tasks,2 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 496.58 kiB 397.27 kiB Shape (5, 113, 225) (5, 113, 180) Count 3 Tasks 2 Chunks Type int32 numpy.ndarray",225  113  5,

Unnamed: 0,Array,Chunk
Bytes,496.58 kiB,397.27 kiB
Shape,"(5, 113, 225)","(5, 113, 180)"
Count,3 Tasks,2 Chunks
Type,int32,numpy.ndarray


### Write multi-level dataset with specified tile size

In [20]:
tile_size = 450
base_dataset_id = f"base_dataset_{tile_size}.zarr"
ml_dataset_id = f"ml_dataset_{tile_size}.levels"

In [21]:
dataset = new_test_cube()

Write base dataset. No chunking specified, so we use the Zarr package's default chunking here.

In [22]:
data_store.write_data(dataset, data_id=base_dataset_id, replace=True)
dataset = data_store.open_data(base_dataset_id)
dataset

Unnamed: 0,Array,Chunk
Bytes,28.12 kiB,28.12 kiB
Shape,"(1800, 2)","(1800, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 28.12 kiB 28.12 kiB Shape (1800, 2) (1800, 2) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",2  1800,

Unnamed: 0,Array,Chunk
Bytes,28.12 kiB,28.12 kiB
Shape,"(1800, 2)","(1800, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,56.25 kiB,56.25 kiB
Shape,"(3600, 2)","(3600, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 56.25 kiB 56.25 kiB Shape (3600, 2) (3600, 2) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",2  3600,

Unnamed: 0,Array,Chunk
Bytes,56.25 kiB,56.25 kiB
Shape,"(3600, 2)","(3600, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(5, 2)","(5, 2)"
Count,2 Tasks,1 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 80 B 80 B Shape (5, 2) (5, 2) Count 2 Tasks 1 Chunks Type datetime64[ns] numpy.ndarray",2  5,

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(5, 2)","(5, 2)"
Count,2 Tasks,1 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,123.60 MiB,1.54 MiB
Shape,"(5, 1800, 3600)","(1, 450, 900)"
Count,81 Tasks,80 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 123.60 MiB 1.54 MiB Shape (5, 1800, 3600) (1, 450, 900) Count 81 Tasks 80 Chunks Type int32 numpy.ndarray",3600  1800  5,

Unnamed: 0,Array,Chunk
Bytes,123.60 MiB,1.54 MiB
Shape,"(5, 1800, 3600)","(1, 450, 900)"
Count,81 Tasks,80 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,123.60 MiB,1.54 MiB
Shape,"(5, 1800, 3600)","(1, 450, 900)"
Count,81 Tasks,80 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 123.60 MiB 1.54 MiB Shape (5, 1800, 3600) (1, 450, 900) Count 81 Tasks 80 Chunks Type int32 numpy.ndarray",3600  1800  5,

Unnamed: 0,Array,Chunk
Bytes,123.60 MiB,1.54 MiB
Shape,"(5, 1800, 3600)","(1, 450, 900)"
Count,81 Tasks,80 Chunks
Type,int32,numpy.ndarray


Write multi-level dataset and force spatial chunking to follow `tile_size`: 

In [23]:
data_store.write_data(dataset, data_id=ml_dataset_id, replace=True, tile_size=tile_size)

'ml_dataset_450.levels'

Open the new multi-level dataset:

In [24]:
ml_dataset = data_store.open_data(ml_dataset_id)
ml_dataset

<xcube.core.store.fs.impl.mldataset.FsMultiLevelDataset at 0x17ff9ef4c10>

Inspect the new multi-level dataset. Check spatial chunking or `tile_size`:

In [25]:
ml_dataset.grid_mapping

class: **Coords1DGridMapping**
* is_regular: True
* is_j_axis_up: True
* is_lon_360: False
* crs: epsg:4326
* xy_res: (0.1, 0.1)
* xy_bbox: (-180, -90, 180, 90)
* ij_bbox: (0, 0, 3600, 1800)
* xy_dim_names: ('lon', 'lat')
* xy_var_names: ('lon', 'lat')
* size: (3600, 1800)
* tile_size: (450, 450)

Number of levels decreased from 5 to 3:

In [26]:
ml_dataset.num_levels

3

In [27]:
for level in range(ml_dataset.num_levels):
    dataset_i = ml_dataset.get_dataset(level)
    display(dataset_i)

Unnamed: 0,Array,Chunk
Bytes,28.12 kiB,28.12 kiB
Shape,"(1800, 2)","(1800, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 28.12 kiB 28.12 kiB Shape (1800, 2) (1800, 2) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",2  1800,

Unnamed: 0,Array,Chunk
Bytes,28.12 kiB,28.12 kiB
Shape,"(1800, 2)","(1800, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,56.25 kiB,56.25 kiB
Shape,"(3600, 2)","(3600, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 56.25 kiB 56.25 kiB Shape (3600, 2) (3600, 2) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",2  3600,

Unnamed: 0,Array,Chunk
Bytes,56.25 kiB,56.25 kiB
Shape,"(3600, 2)","(3600, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(5, 2)","(5, 2)"
Count,2 Tasks,1 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 80 B 80 B Shape (5, 2) (5, 2) Count 2 Tasks 1 Chunks Type datetime64[ns] numpy.ndarray",2  5,

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(5, 2)","(5, 2)"
Count,2 Tasks,1 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,123.60 MiB,791.02 kiB
Shape,"(5, 1800, 3600)","(1, 450, 450)"
Count,161 Tasks,160 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 123.60 MiB 791.02 kiB Shape (5, 1800, 3600) (1, 450, 450) Count 161 Tasks 160 Chunks Type int32 numpy.ndarray",3600  1800  5,

Unnamed: 0,Array,Chunk
Bytes,123.60 MiB,791.02 kiB
Shape,"(5, 1800, 3600)","(1, 450, 450)"
Count,161 Tasks,160 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,123.60 MiB,791.02 kiB
Shape,"(5, 1800, 3600)","(1, 450, 450)"
Count,161 Tasks,160 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 123.60 MiB 791.02 kiB Shape (5, 1800, 3600) (1, 450, 450) Count 161 Tasks 160 Chunks Type int32 numpy.ndarray",3600  1800  5,

Unnamed: 0,Array,Chunk
Bytes,123.60 MiB,791.02 kiB
Shape,"(5, 1800, 3600)","(1, 450, 450)"
Count,161 Tasks,160 Chunks
Type,int32,numpy.ndarray


Unnamed: 0,Array,Chunk
Bytes,30.90 MiB,791.02 kiB
Shape,"(5, 900, 1800)","(1, 450, 450)"
Count,41 Tasks,40 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 30.90 MiB 791.02 kiB Shape (5, 900, 1800) (1, 450, 450) Count 41 Tasks 40 Chunks Type int32 numpy.ndarray",1800  900  5,

Unnamed: 0,Array,Chunk
Bytes,30.90 MiB,791.02 kiB
Shape,"(5, 900, 1800)","(1, 450, 450)"
Count,41 Tasks,40 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,30.90 MiB,791.02 kiB
Shape,"(5, 900, 1800)","(1, 450, 450)"
Count,41 Tasks,40 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 30.90 MiB 791.02 kiB Shape (5, 900, 1800) (1, 450, 450) Count 41 Tasks 40 Chunks Type int32 numpy.ndarray",1800  900  5,

Unnamed: 0,Array,Chunk
Bytes,30.90 MiB,791.02 kiB
Shape,"(5, 900, 1800)","(1, 450, 450)"
Count,41 Tasks,40 Chunks
Type,int32,numpy.ndarray


Unnamed: 0,Array,Chunk
Bytes,7.72 MiB,791.02 kiB
Shape,"(5, 450, 900)","(1, 450, 450)"
Count,11 Tasks,10 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 7.72 MiB 791.02 kiB Shape (5, 450, 900) (1, 450, 450) Count 11 Tasks 10 Chunks Type int32 numpy.ndarray",900  450  5,

Unnamed: 0,Array,Chunk
Bytes,7.72 MiB,791.02 kiB
Shape,"(5, 450, 900)","(1, 450, 450)"
Count,11 Tasks,10 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.72 MiB,791.02 kiB
Shape,"(5, 450, 900)","(1, 450, 450)"
Count,11 Tasks,10 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 7.72 MiB 791.02 kiB Shape (5, 450, 900) (1, 450, 450) Count 11 Tasks 10 Chunks Type int32 numpy.ndarray",900  450  5,

Unnamed: 0,Array,Chunk
Bytes,7.72 MiB,791.02 kiB
Shape,"(5, 450, 900)","(1, 450, 450)"
Count,11 Tasks,10 Chunks
Type,int32,numpy.ndarray
