<div>
<img src="https://notebooks.dtcglaciers.org/_images/ESA_logo.svg" width="160" align='right'/>
</div>

# Creating DTC-Glaciers User-DT-Enhanced Data cubes (L3)

A central aspect that distinguishes DTC Glaciers as a **Digital Twin Component** is that users are not limited to downloading pre-computed products. Instead, the system is designed to support **active interaction**, allowing users to integrate their **own data** into the workflow.

This notebook introduces **Level 3 (L3) data cubes**, which are **user-driven, model–data cubes** generated by combining user-provided datasets with the DTC Glaciers modelling and assimilation framework. L3 data cubes extend the pre-computed L1 and L2 products by enabling users to explore alternative data sources, assumptions, or scenarios within the same Digital Twin architecture.

In this notebook, we demonstrate how users can provide their own observational or derived data, generate corresponding **L3 data cubes**, and subsequently **validate** these against existing DTC Glaciers products.  
This includes direct comparison with pre-computed L2 data cubes, as well as validation using user-supplied observations.

The workflow illustrated here highlights how DTC Glaciers supports experimentation, hypothesis testing, and application-specific analyses, while maintaining consistency with the underlying DTC framework.

If required, install the DTCG API using the following command:

```
!pip install 'dtcg[jupyter] @ git+https://github.com/DTC-Glaciers/dtcg'
```

Run this command in a notebook cell.

In [None]:
# Imports
import dtcg.integration.oggm_bindings as oggm_bindings
import dtcg.integration.calibration as calibration
from dtcg import DEFAULT_L2_DATACUBE_URL

from oggm.core import massbalance

import matplotlib.pyplot as plt
import numpy as np

## Preparing the scene

In this example, we focus on **Iceland**, where we have example **in-situ mass balance data** from the **National Power Company of Iceland, [Landsvirkjun](https://www.landsvirkjun.com/)**, a key stakeholder of DTC-Glaciers.

We concentrate on **Brúarjökull**, an outlet glacier of **[Vatnajökull](https://en.wikipedia.org/wiki/Vatnaj%C3%B6kull)**, the largest ice cap in Iceland. In the following cells, we open the example data and reproject it onto the same grid as our data cubes.

In [None]:
import logging
import xarray as xr
import salem

from oggm import utils, cfg

# Module logger
log = logging.getLogger(__name__)

if 'iceland_smb' not in cfg.BASENAMES:
    cfg.BASENAMES['iceland_smb'] = ('iceland_smb.nc', 'SMB for Iceland (DTCG)')

file_url = 'https://cluster.klima.uni-bremen.de/~dtcg/test_files/case_study_regions/iceland/vatnajokull_SMB_maps_1996-2024_a&s_v1.nc'


@utils.entity_task(log, writes=['iceland_smb'])
def add_iceland_smb(gdir):
    """Adds SMB to the glacier directory and reprojects the data onto the L1 grid."""

    # Template dataset
    with xr.open_dataset(gdir.get_filepath('gridded_data')) as dsg:
        dsg = dsg[['x', 'y']].copy()

    # Transform SMB to glacier map
    with salem.open_xr_dataset(utils.file_downloader(file_url)) as ds_smb:
        dsg = dsg.salem.transform(ds_smb[['ba', 'b']], interp='linear')

    # Write it up
    dsg.to_netcdf(gdir.get_filepath('iceland_smb'))

In [None]:
rgi_id_ice = "RGI60-06.00377"  # Brúarjökull
dtcg_oggm_ice = oggm_bindings.BindingsOggmModel(rgi_id=rgi_id_ice)

def get_l2_data_tree(rgi_id):
    return xr.open_datatree(
            f"{DEFAULT_L2_DATACUBE_URL}{rgi_id}.zarr",
            chunks={},
            engine="zarr",
            consolidated=True,
            decode_cf=True,
        )

data_tree_ice = get_l2_data_tree(rgi_id_ice)

In [None]:
# add user data
from oggm import workflow
workflow.execute_entity_task(add_iceland_smb, dtcg_oggm_ice.gdir);

# open user data
with xr.open_dataset(dtcg_oggm_ice.gdir.get_filepath('iceland_smb')) as ds_user:
    ds_user = ds_user
ds_user

Let’s take a first look at the **gridded data** and the **mean annual mass balance**:

In [None]:
annual_smb = ds_user['ba'].where(dtcg_oggm_ice.l1_datacube.glacier_mask)
avg_smb = annual_smb.mean(dim='time_a', keep_attrs=True)
avg_smb.plot(cmap='RdBu', vmin=-10, vmax=10)
plt.title('Mean annual mass balance 1996 to 2024')
plt.show()

annual_smb.mean(dim=('y','x')).plot();
plt.grid('on')
plt.title('Annual mean mass balance')
plt.show()

## Create L3 data cubes

The main idea of **DTC-Glaciers** is to **make it easy** for users to **interact with the system** and generate **custom model output**, rather than relying only on preprocessed, static results. Here, we demonstrate a first use case in which a user can provide **their own observations**, generate **L3 data cubes**, and further use their own data for **validation** (shown in one of the next sections).

Let’s prepare the user data as **cumulative mass balance** for the period **2010–2020**:

In [None]:
def get_date_string(date):
    return np.datetime_as_string(date, unit="D").item()

start_date = annual_smb.time_a[14].values  # 2010
end_date = annual_smb.time_a[24].values  # 2020

# the period of the reference mass balance, e.g. '2010-01-01_2020-01-01'
ref_mb_period = (f"{get_date_string(start_date)}_"
                 f"{get_date_string(end_date)}")

# the actual observation value calculated as the cumulative mass balance over the period of interest
annual_cumsum_smb = annual_smb.mean(dim=('y','x')).cumsum()
ref_mb = (annual_cumsum_smb.sel(time_a=end_date).values -
          annual_cumsum_smb.sel(time_a=start_date).values
         ) * 1000  # convert to mm w.e., which is equal to kg m-2

# the unit of the provided observation
ref_mb_unit = 'kg m-2'

# an associated uncertainty, in this example we just set a typical order of magnitude
ref_mb_err = 500

# this description is add to the attributes of the resulting datacubes for reference
calibration_strategy = ("OGGM model DailyTIModel calibrated with user data "
                        f"from Landsvirkjun over the period {ref_mb_period}.")

# this is the name which will be used when adding the datacube to a datatree
l3_datacube_name = (f"L3_Daily_Landsvirkjun_"
                    f"{get_date_string(start_date).split('-')[0]}_"
                    f"{get_date_string(end_date).split('-')[0]}")

Currently, DTC-Glaciers supports calibration of the mass balance only over a **complete reference period**. However, in **OGGM** it is also possible to calibrate using an **incomplete set of mass balance observations**, for example by considering only selected years. This functionality could potentially be added to DTC-Glaciers in the future.

Now we provide the user data to the calibrator and create **L3 data cubes**. For this, we use `calibrator.calibrate_mb_and_create_datacubes`, which takes as input the **mass balance model** to be calibrated and the **reference mass balance values**:

In [None]:
calibrator = calibration.CalibratorCryotempo(l1_datacube=data_tree_ice['L1'].ds,
                                             gdir=dtcg_oggm_ice.gdir)

l3_datacubes = calibrator.calibrate_mb_and_create_datacubes(
    mb_model_class=massbalance.DailyTIModel,
    ref_mb=ref_mb,
    ref_mb_err=ref_mb_err,
    ref_mb_unit=ref_mb_unit,
    ref_mb_period=ref_mb_period,
    calibration_strategy=calibration_strategy,
    datacubes_requested=['monthly', 'annual_hydro', 'daily_smb'],
    show_log=True,  # to see what is happening
    # we set here a small ensemble number for MCS for demonstration to reduce computing time,
    # the precomputed datacubes use 2**4~100 ensemble members
    mcs_sampling_settings={'nr_samples': 2**1},
)

The resulting **L3 data cube** can be added to the existing `data_tree` in the same way as the **L2 data cubes**:

In [None]:
from dtcg.datacube.geozarr import GeoZarrHandler

datacube_handler = GeoZarrHandler(data_tree=data_tree_ice)

datacube_handler.add_layer(datacubes=l3_datacubes,
                           datacube_name=l3_datacube_name)

list(datacube_handler.data_tree.keys())

[![Zarr logo](https://avatars.githubusercontent.com/u/35050297?s=96&v=4)](https://github.com/zarr-developers/geozarr-spec) And could be stored locally as GeoZarr: 

In [None]:
# datacube_handler.export('datacube_including_L3.zarr')

## Validate L3 data cubes

<div class="alert alert-danger">
    <b>IMPORTANT</b>: Please note that the current <b>L3 data cube</b> was created using a <b>smaller ensemble size</b> for the Monte Carlo simulation. Therefore, the resulting uncertainties are <b>not comparable</b> to those of the preprocessed <b>L2 data cubes</b>.
</div>


The general validation of **L3 data cubes** works in the same way as for **L2**, as explained in  
[this notebook](03_validation.ipynb). Let’s have a look:

In [None]:
# select the datacubes which should be evaluated
validation_name_list = ['L2_Daily_Cryosat_2011_2020',
                        'L2_Daily_Hugonnet_2010_2020',
                        'L3_Daily_Landsvirkjun_2010_2020'
                       ]

In [None]:
from dtcg.validation.validator import DatacubeValidator

# conduct the actual validation
validator = DatacubeValidator(datacube_handler.data_tree)
validation_data = validator.get_validation_for_layers(l2_name_list=validation_name_list)

In [None]:
validation_data['WGMS']

In [None]:
validation_data['CryoSat2']

We see the **L3** data cube lies somewhere in between the two other **L2** data cubes for the WGMS and CryoSat valiation. This is also confirmed when visually comparing the data in a plot:

We see that the **L3 data cube** lies somewhere between the two **L2 data cubes** calibrated with **[Hugonnet et al. (2021)](https://doi.org/10.1038/s41586-021-03436-z)** and **CryoSat-2** data. This is also confirmed by a visual comparison of the results in the plot:

In [None]:
validator.get_validation_plot_for_layers(l2_name_list=validation_name_list, obs_name='WGMS')
plt.show()

In [None]:
validator.get_validation_plot_for_layers(l2_name_list=validation_name_list, obs_name='CryoSat2')
plt.show()

## Validation with user data

Another level of **interaction** between the **user** and **DTC-Glaciers** is the ability to bring **your own data** and use the tools of the **validation framework** to evaluate the supported validation metrics or to create plots for visually inspecting differences.

For this, we need to provide the user data in the following format:

In [None]:
annual_mb = annual_smb.mean(dim=('y','x'))

user_observation = {
    # Let the validation framework know which type of observation is provided
    'obs_type': 'annual_mb',
    # The observed values
    'values': annual_mb.values * 1000,
    # Corresponding uncertainties
    'uncertainty': np.full(annual_mb.values.shape, 200),
    # The observation years
    'years': np.array([yr.astype('datetime64[Y]').astype(int) + 1970 for yr in annual_mb.time_a.values]),
    # Name of the data source (displayed in the validation tables)
    'name' : "Landsvirkjun provided annual mass balance",
}

After preparing the data, you can provide it via `user_observation` and interact with the **validation framework** in the same way as before:

In [None]:
validation_data, bootstrap_args = validator.get_validation_for_layers(
    user_observation=user_observation, l2_name_list=validation_name_list, return_bootstrap_args=True)

In [None]:
validation_data['annual_mb']

In this case, we see that the newly generated **L3 data cube** performs **slightly better** than the other two options. This is **expected**, as parts of the user-provided data were also used for calibration, and the validation is therefore **not fully independent**.

You can also provide the `user_observation` to the plotting functions to **visually compare** different data cubes with the provided observations:

In [None]:
validator.get_validation_plot_for_layers(user_observation=user_observation, l2_name_list=validation_name_list)
plt.show()