# Tutorial: **$\delta$ HBV 1.0**

---

This notebook demonstrates, in detail, how to train and forward the $\delta$ HBV 1.0 model developed by [Dapeng Feng et al. (2022)](https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2022WR032404). A pre-trained model is provided for those who only wish to run the model forward. For explanation of model structure, methodologies, data, and performance metrics, please refer to Feng's publication [below](#publication). If you find this code is useful in your own work, please include the aforementioned citation.

<br>

*We include some discussion of differentiable modeling methodology, and recommend this notebook as a starting point for an operational understanding of the dMG framework*

<br>

### Before Running:
- **Environment**: From `env/` a minimal Python environment can be setup for running this code... (see `docs/getting_started.md` for more details.)
    - Conda -- `deltamodel_env.yaml`
    - Pip -- `requirements.txt`


- **Model**: The trained $\delta$ HBV 1.0 model can be downloaded from [AWS](https://mhpi-spatial.s3.us-east-2.amazonaws.com/mhpi-release/models/dHBV_1_0_trained.zip). After downloading...

    - Update the `trained_model` key in model config `example/conf/config_dhbv_1_0.yaml` with the path to you directory containing the trained model `dHBV_1_0_Ep50.pt` (or *Ep100*) AND normalization `test1989-1999_Ep50/normalization_statistics.json`.

- **Data**: The CAMELS data extraction used in model training and evaluation can be downloaded from [AWS](https://mhpi-spatial.s3.us-east-2.amazonaws.com/mhpi-release/camels/camels_data.zip). After downloading, in the data configs `example/conf/observations/camels_531.yaml` and `camels_671.yaml` update...
    1. `train_path` key with your path to `training_file`,
    2. `test_path` with your path to `valication file`,
    3. `gage_info` with your path to `gage_ids.npy`,
    4. (CAMELS 531 only) `subset_path` with your path to `531_subset.txt`.

- **Hardware**: The LSTMs used in this model require CUDA support only available with Nvidia GPUs. For those without access, T4 GPUs can be used when running this notebook with dMG on [Google Colab](https://colab.research.google.com/).



### Publication:

*Dapeng Feng, Jiangtao Liu, Kathryn Lawson, Chaopeng Shen. "Differentiable, Learnable, Regionalized Process‐Based Models With Multiphysical Outputs can Approach State‐Of‐The‐Art Hydrologic Prediction Accuracy." Water Resources Research 58, no. 10 (2022): e2022WR032404. https://doi.org/10.1029/2022WR032404.*

<br>

### Issues:
For questions, concerns, bugs, etc., please reach out by posting an issue on the [dMG repo](https://github.com/mhpi/generic_deltaModel/issues).

---

<br>

## Differentiable Modeling

In general, differentiable modeling represents the coupling of a neural network and a (differentiable) physical model. This enables several capabilities, like introducing missing processses and bias corrections. In the applications of these notebooks, we demonstrate parameter learning;

Physics models include parameters for which True values are seldom known, but can be approximated with various methods. By coupling a neural network, we can learn a set of a physics model's parameters (static or dynamic in, e.g., time), which can then be passed alongside other input variables to the physical model for making predictions.

For $\delta$ HBV 1.0, we use an LSTM neural network (NN) in concert with the physical model HBV (Beck 2020; Seibert 2005). HBV uses input forcing (time-varying) variables precipitation, temperature, and potential evapotranspiration (PET) accross a collection of hydrologic basins, with physical parameters learned at the same spatiotemporal resolution [timesteps, basins], to make predictions about hydrologic states and fluxes (e.g., streamflow in our case) in both space and time. In general, this differentiable model takes the form

$
\delta \text{HBV} = 
\begin{cases}
P = \text{LSTM}(X \text{, \ } A) \\
Q \text{, \ } \mu = \text{HBV}(X \text{, \ } P)
\end{cases}
$

where
- \( $P$ \) the physical parameters used in HBV's equations, learned by a NN;
- \( $X$ \) represents input weather forcings;
- \( $A$ \) is the set of attributes belonging to each basin. (This could be any other data that could be related to understanding $P$);
- \( $Q, \mu$ \) these are the fluxes and states that the physical model can output.

Currently, $\delta$ HBV is setup to train and make hydrologic predictions on **streamflow**, but it can
also be reconfigured without much effort to predict percolation, recharge, and groundwater flow, among others.

After showing an example implementation, we'll demonstrate how to train the model and expose critical details of the process.

---

<br>

## 1. Create and Forward $\delta$ HBV 1.0

After completing [these](#before-running) steps (model file not needed), $\delta$ HBV 1.0 can be built with the code cells [below](#13-demonstration), where we illustrate the model creation process in detail.

See section [4](#4-forward--hbv-11p) to see a high-level demonstration of the model forward.

<br>

### 1.1 Background

To create $\delta$ HBV 1.0 with dMG, we interface with the repository [HydroDL2](https://github.com/mhpi/hydroDL2) containing hydrologic models including those used in these tutorials (see `docs/getting_started.md` for properly setting up this connection). 

<br>

- #### Set Model, Experiment, Dataset Configurations

    Two flexible YAML configuration files exist for augmenting behaviors of the dMG framework: One defines model settings and training/testing parameters, and another, defines parameters for your dataset (observations). For this tutorial, two such configuration files have been created and require minimal preparation to use in this notebook:
    
    1. `/example/conf/config_dhbv_1_0.yaml` -- Model/experiment settings
    2. `/example/conf/observations/camels_531.yaml` and `camels_671.yaml` -- CAMELS 531- and 671-basin dataset parameters.

    With these, all aspects of dMG model creation, training, testing, etc. can be controlled. As it is, the model/experiment config is setup to use reproduce benchmark results for [$\delta$ HBV 1.0](#publication) (see [here](https://mhpi.github.io/benchmarks/#10-year-training-comparison)) using the CAMELS 531-basin subset of weather forcings and static basin attributes. Full 671-basin benchmark models can also be trained/tested, and both can be configured by setting the following options in the model config:

    - For CAMELS 531-basin, 10-year benchmark (Default):
        - `observations: camels_531`
        - `train:` 1999/10/01 to 2008/09/30 (`start_time` to `end_time`)
        - `test:` 1989/10/01 to 1999/09/30

    - For CAMELS 671-basin, 15-year benchmark:
        - `observations: camels_671`
        - `train:` 1980/10/01 to 1995/09/30 (`start_time` to `end_time`)
        - `test:` 1995/10/01 to 2010/09/30

<br>

- #### Building the Model

    There are two ways to build a differentiable model in dMG:
    - **Implicit**: Best for small-scale experiments and distribution of final products.

        Add/change modules in dMG to create your own differentiable model, and tailor model and dataset configuration files like `deltaModel/conf/config.yaml` and `deltaModel/conf/observations/{dataset}.yaml` to reflect desired behaviors model and experiment behaviors. (Modules like trainers, physics models, neural networks, loss functions, and data loaders/samplers are designed to be hot-swappable per user needs. The differentiable model modality will also be made more flexible to meet diverse modeling needs.)

        With these this done, dMG can be run with
        ```shell
        cd ./deltaModel
        python __main__.py
        ```

    - **Explicit**: Best for exploratory research and prototyping (illustrated in the code block below).

        This approach is similar in that we still use config files to hold settings (though
        a dictionary object could also be used). The difference is that we are able to expose the fundamental steps in the
        modeling process; data preprocesing, model building, and experimentation/forwarding. In doing so, we make it
        quicker to develop model and data pipelines, and easier to follow internal processes.

<br>

### 1.2 Walkthrough

The following is an explicit implementation of dMG to create and forward $\delta$ HBV 1.0:

1.  **Load a configuration file**: Using Hydra and OmegaConf packages, we can convert the model/dataset configs into a dictionary object `config`. For example, if your config file contains `mode: train`, the dictionary yields `config['mode'] == 'train'`. However, the config can also contain sub-dictionaries. For instance, 
    
    ```yaml
    training: 
        start_time: 1999/10/01
    ```

    which is accessed like `config['training']['start_time'] == '1999/10/01'`.

2.  **Initialize sub-models**: Next, we initialize the physics model and neural network our differentiable model will use, in this case an LSTM from `deltaModel/models/neural_networks/lstm_models.py` and HBV 1.0 from [HydroDL2](https://github.com/mhpi/hydroDL2).

3.  **Load in data**: At this step, we load and process our data as a dictionary of variable and attribute datasets that are used by the neural network and physics model. This dataset_dict is created by a data_loader, and should meet minimum requirements of the base class `deltaModel/core/data/data_loaders/base.py`.

    For this example, we take a small, arbitrarily selected sample of the data to illustrate the modeling process.

4.  **Create a differentiable model**: Now, the sub-models are linked together by a differentiable model wrapper (`DeltaModel`). This has the effect of interfacing both models to achieve the desired modality, e.g., having the LSTM generate parameters for HBV. 

5.  **Forward/Experiment**: With the differentiable model created, it can be forwarded (as demonstrated
    below), or trained/tested/applied in any user-defined experiments.

<br>

### 1.3 Demonstration

The above steps are demonstrated below...

In [None]:
import sys

sys.path.append('../../')
sys.path.append('../../dMG')  # Add the dMG root directory

from core.data.loaders.hydro_loader import HydroLoader
from core.utils import print_config
from example import load_config, take_data_sample
from hydroDL2.models.hbv.hbv import HBV as hbv
from models.differentiable_model import DeltaModel as dHBV
from models.neural_networks import init_nn_model

#------------------------------------------#
# Define model settings here.
CONFIG_PATH = '../example/conf/config_dhbv_1_0.yaml'
#------------------------------------------#



# 1. Load configuration dictionary of model parameters and options.
config = load_config(CONFIG_PATH)
config['mode'] = 'predict'  # <-- Confirm that we are doing forwward if not set in the config file.
print_config(config)

# 2. Initialize physical model and NN.
device = config['device']
phy_model = hbv(config['dpl_model']['phy_model'])
nn = init_nn_model(phy_model, config['dpl_model'])

# 3. Load and initialize a dataset dictionary of NN and HBV model inputs.
# Take a sample to reduce size on GPU.
dataset_dict = HydroLoader(config).dataset
dataset_sample = take_data_sample(config, dataset_dict, days=730, basins=100)

# 4. Create the differentiable model dHBV 1.0: a torch.nn.Module describing how
# the NN is linked to the physical model HBV.
dpl_model = dHBV(phy_model=phy_model, nn_model=nn).to(device)

## From here, forward or train dpl_model just as any torch.nn.Module model.

# 5. For example, to forward:
output = dpl_model.forward(dataset_sample)


print("-------------\n")
print(f"Streamflow predictions for {output['flow_sim'].shape[0]} days and " \
      f"{output['flow_sim'].shape[1]} basins:\nShowing the first 5 days for " \
        f"5 basins \n {output['flow_sim'][:3,:3]}")

<br>

## 2. Training $\delta$ HBV 1.0 -- Walkthrough

Now that we can build $\delta$ HBV 1.0, we proceed to train the model and expose critical steps in the process below.

See [here](#3-training--hbv-10----abbreviated) to skip the walkthrough.

<br>

### 2.1 Load the Config and Dataset

In [None]:
import sys

sys.path.append('../../')
sys.path.append('../../dMG')  # Add the dMG root directory

from core.data.loaders.hydro_loader import HydroLoader
from example import load_config

#------------------------------------------#
# Define model settings here.
CONFIG_PATH = '../example/conf/config_dhbv_1_0.yaml'
#------------------------------------------#



# Load configuration dictionary of model parameters and options.
config = load_config(CONFIG_PATH)
config['mode'] = 'train'  # <-- Confirm that we are doing forwward if not set in the config file.

# Get training dataset
train_dataset = HydroLoader(config, test_split=True).train_dataset

### 2.2 Initialize a $\delta$ HBV 1.0, Optimizer, and Loss Function

These are the auxillary tasks completed by the Trainer before beginning the training loop.


<!-- We use the Adadelta optimizer from PyTorch, feeding it both learnable model
parameters and a learning rate from the config file.


Dynamically load the loss function identified in the config (RMSE for
dHBV 1.0 and NSE for dHBV 1.1p). -->

In [None]:
import torch
from core.utils.factory import load_loss_func
from hydroDL2.models.hbv.hbv import HBV as hbv
from models.differentiable_model import DeltaModel as dHBV
from models.neural_networks import load_nn_model

# Initialize physical model and neural network
phy_model = hbv(config['dpl_model']['phy_model'])
nn = load_nn_model(phy_model, config['dpl_model'])

# Create the differentiable model dHBV:
device = config['device']
dpl_model = dHBV(phy_model=phy_model, nn_model=nn).to(device)
print(f"Here is our dHBV framework: \n ----- \n {dpl_model}")

# Init an Adadelta optimizer
optimizer = torch.optim.Adadelta(
    dpl_model.parameters(),
    lr=config['dpl_model']['nn_model']['learning_rate'],
)

# init a loss function
loss_func = load_loss_func(train_dataset['target'], config['loss_function'], device=device)


### 2.3 Train the Model

Below we use a basic training loop to train the LSTM in $\delta$ HBV 1.0 to optimize HBV's parameters and streamflow predictions.

#### Key Steps in the Training Loop
1. **Calculate Training Parameters**  
   The `calc_training_params` function calculates the training settings:
   - `n_sites`: The number of unique locations/sites in the dataset.
   - `n_minibatch`: The number of samples to process per epoch.
   - `n_timesteps`: The number of timesteps per sample.

2. **Epoch Loop**  
   Each epoch represents one full cycle through the training data. For each epoch:
   - `total_loss` is reset to track the total error across all batches within the epoch.

3. **Batch Loop**  
   Within each epoch, the code processes data in smaller chunks (minibatches) to improve training efficiency and avoid
   oversaturation of GPU VRAM. 
   
   For each batch:
   - **Sample Data**: `HydroDataSampler` randomly selects a sample of training data for the batch.
   - **Forward Pass**: The model processes the input data to produce predictions.
   - **Calculate Loss**: `loss_func` compares predictions to observed values to calculate the error for the batch.
   - **Backward Pass and Optimization**: 
     - `loss.backward()` computes gradients to adjust the model’s parameters.
     - `optimizer.step()` updates the LSTM parameters.
     - `optimizer.zero_grad()` resets gradients for the next batch.


In [None]:
import tqdm
from core.data import create_training_grid
from core.data.samplers.hydro_sampler import HydroSampler
from core.utils import save_model

# Initialize training sampler.
sampler = HydroSampler(config)

# Get target variable for training.
target = config['train']['target'][0]

# Number of training samples per epoch, batch size, and number of timesteps.
n_samples, n_minibatch, n_timesteps = create_training_grid(
    train_dataset['xc_nn_norm'],
    config,
)

# Start of training.
for epoch in range(1, config['train']['epochs'] + 1):
    total_loss = 0.0  # Initialize epoch loss to zero.

    prog_str = f"Epoch {epoch}/{config['train']['epochs']}"

    # Work through training data in batches.
    for _ in tqdm.tqdm(range(1, n_minibatch + 1), desc=prog_str,
                       leave=False, dynamic_ncols=True):

        # Take a sample of the training data for the batch.
        dataset_sample = sampler.get_training_sample(
            train_dataset,
            n_samples,
            n_timesteps,
        )

        # Forward pass through dPL model.
        predictions = dpl_model.forward(dataset_sample)

        # Calculate loss.
        loss = loss_func(
            predictions[target],
            dataset_sample['target'],
            n_samples=dataset_sample['batch_sample'],
        )

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        total_loss += loss.item()

    avg_loss = total_loss / n_minibatch + 1
    print(f"Avg model loss after epoch {epoch}: {avg_loss}")

    # Save the model every save_epoch (set in the config).
    model_name = config['dpl_model']['phy_model']['model']
    if epoch % config['train']['save_epoch'] == 0:
        save_model(config, dpl_model, model_name, epoch)

## 3.1 Training $\delta$ HBV 1.0 -- Abbreviated

Now we demonstrate the high level training loop for $\delta$ HBV 1.0 in the code block below.

--> For default settings with 50 training epochs, expect train times of ~8 hours with an Nvidia RTX 3090.

**Note**
- The settings defined in the config `../example/conf/config_dhbv_1_0.yaml` are set to replecate benchmark performance.
- For model training, set `mode: train` in the config, or modify after config dict has been created (see below).
- An `example/results/` directory will be generated to store experiment and model files. This location can be adjusted by changing the `save_path` key in your config. 
- The default training window from 1 October 1999 to 30 September 2008 with `batch_size=100` should use ~2.8GB of vram.

In [None]:
import sys

sys.path.append('../../')
sys.path.append('../../dMG')  # Add the dMG root directory.

from core.utils import print_config
from core.utils.factory import import_data_loader, import_trainer
from example import load_config
from models.model_handler import ModelHandler as dHBV

#------------------------------------------#
# Define model settings here.
CONFIG_PATH = '../example/conf/config_dhbv_1_0.yaml'
#------------------------------------------#



# 1. Load configuration dictionary of model parameters and options.
config = load_config(CONFIG_PATH)
config['mode'] = 'train'  # <-- Confirm that we are doing training if not set in the config file.
print_config(config)

# 2. Initialize the differentiable HBV 1.0 model (LSTM + HBV 1.0).
model = dHBV(config, verbose=True)

# 3. Load and initialize a dataset dictionary of NN and HBV model inputs.
data_loader_cls = import_data_loader(config['data_loader'])
data_loader = data_loader_cls(config, test_split=True, overwrite=False)


# 4. Initialize trainer to handle model training.
trainer_cls = import_trainer(config['trainer'])
trainer = trainer_cls(
    config,
    model,
    train_dataset=data_loader.train_dataset,
    verbose=True,
)

# 5. Start model training.
trainer.train()

## 3.2 Evaluate Model Performance

After completing the training in [3.1](#31-training--hbv-10----abbreviated), or with the trained model provided, test $\delta$ HBV 1.0 below on the evaluation data.

--> For default settings expect evaluation time of ~5 minutes with an Nvidia RTX 3090.

**Note**
- For model evaluation, set `mode: test` in `example/conf/config_dhbv_1_0.yaml`, or modify after the config dict has been created (see below).
- When evaluating provided models, confirm that `test: test_epoch` in the config corresponds to your desired model (50 or 100 epochs).
- The default evaluation window from 1 October 1989 to 30 September 1999 with `batch_size=25` should use ~2.7GB of vram.

In [None]:
import sys

sys.path.append('../../')
sys.path.append('../../dMG')  # Add the dMG root directory.

from core.utils import print_config
from core.utils.factory import import_data_loader, import_trainer
from example import load_config
from models.model_handler import ModelHandler as dHBV

#------------------------------------------#
# Define model settings here.
CONFIG_PATH = '../example/conf/config_dhbv_1_0.yaml'
#------------------------------------------#



# 1. Load configuration dictionary of model parameters and options.
config = load_config(CONFIG_PATH)
config['mode'] = 'test'  # <-- Confirm that we are doing testing if not set in the config file.
print_config(config)

# 2. Initialize the differentiable HBV 1.0 model (LSTM + HBV 1.0).
model = dHBV(config, verbose=True)

# 3. Load and initialize a dataset dictionary of NN and HBV model inputs.
data_loader_cls = import_data_loader(config['data_loader'])
data_loader = data_loader_cls(config, test_split=True, overwrite=False)

# 4. Initialize trainer to handle model evaluation.
trainer_cls = import_trainer(config['trainer'])
trainer = trainer_cls(
    config,
    model,
    eval_dataset=data_loader.eval_dataset,
    verbose=True,
)

# 5. Start testing the model.
print('Evaluating model...')
trainer.evaluate()

### Visualizing Trained Model Performance

After running evaluation on the model, a new directory (e.g., for a model trained for 50 epochs and tested from years 1989-1999), `test1989-1999_Ep50/` will be created in the same directory containing the model files. This path will be populated with...

1. All model outputs (fluxes, states), including the target variable, *streamflow* (`flow_sim.npy`),

2. `flow_sim_obs`, streamflow observation data for comparison against model predictions,

2. `metrics.json`, containing evaluation metrics accross the test time range for every gage in the dataset,

3. `metrics_agg.json`, containing evaluation metrics statistics across all gages (mean, median, standar deviation).

4. `normalization_statistics.json`, containing statistics used for normalizing the testing data.


We can use these outputs to visualize $\delta$ HBV 1.0's performance with a 
1. Cumulative distribution function (CDF) plot, 

2. CONUS map of gage locations and metric (e.g., NSE) performance.

<br>

But first, let's first check the (basin-)aggregated metrics for NSE, KGE, bias, RMSE, and, for both high/low flow regimes, RMSE and absolute percent bias...

In [None]:
import os

from core.data import load_json
from core.post import print_metrics

print(f"Evaluation output files saved to: {config['out_path']} \n")


# 1. Load the basin-aggregated evaluation results.
metrics_path = os.path.join(config['out_path'], 'metrics_agg.json')
metrics = load_json(metrics_path)
print(f"Available metrics: {metrics.keys()} \n")

# 2. Print the evaluation results.
metric_names =  [
    # Choose metrics to show.
    'nse', 'kge', 'bias', 'rmse', 'rmse_low', 'rmse_high', 'flv_abs', 'fhv_abs',
]
print_metrics(metrics, metric_names, mode='median', precision=3)

#### Generate the CDF Plot

The cumulative distribution function (CDF) plot tells us what percentage (CDF on the y-axis) of basins performed at least better than a given metric on the evaluation data.
We give an example of such a plot below for NSE, but you can adjust this to your preferred metric. See the output from the previous cell to see what metrics are available. (Note some may require changing `xbounds` in the `plot_cdf`.)

In [None]:
### Plot CDF of the evaluation results.
from core.post.plot_cdf import plot_cdf

#------------------------------------------#
# Choose the metric to plot. (See available metrics printed above, or in the metrics_agg.json file).
METRIC = 'nse'
#------------------------------------------#



# 1. Load the evaluation metrics.
metrics_path = os.path.join(config['out_path'], 'metrics.json')
metrics = load_json(metrics_path)

# 2. Plot the CDF for NSE.
plot_cdf(
    metrics=[metrics],
    metric_names=[METRIC],
    model_labels=['dHBV 1.0'],
    title=r"CDF of NSE for $\delta$HBV 1.0",
    xlabel=METRIC.capitalize(),
    figsize=(8, 6),
    xbounds=(0, 1),
    ybounds=(0, 1),
    show_arrow=True
)

#### Generate the Spatial Plot

This plot shows the locations of each basin in the evaluation data, color-coded by performance on a metric. Here we give a plot for NSE, but as before, this metric can be changed to your preference. (See above for what is available; for metrics not valued between 0 and 1, you will need to set `dynamic_colorbar=True` in `geoplot_single_metric` to ensure proper coding.)

In [None]:
### Plot the evaluation results spatially.
import geopandas as gpd
import numpy as np
import pandas as pd
from core.data import txt_to_array
from core.post.plot_geo import geoplot_single_metric

#------------------------------------------#
# Choose the metric to plot. (See available metrics printed above, or in the metrics_agg.json file).
METRIC = 'nse'

# Set the paths to the gage id lists and shapefiles...
GAGE_ID_PATH = 'your/path/to/gageid.npy'
GAGE_ID_531_PATH = 'your/path/to/Sub531ID.txt'
SHAPEFILE_PATH = 'your/path/to/camels_671_loc.shp'
#------------------------------------------#



# 2. Load gage ids + basin shapefile with geocoordinates (lat, long) for every gage.
gage_ids = np.load(GAGE_ID_PATH, allow_pickle=True)
gage_ids_531 = txt_to_array(GAGE_ID_531_PATH)
coords = gpd.read_file(SHAPEFILE_PATH)

# 3. Format geocoords for 531- and 671-basin CAMELS sets.
coords_531 = coords[coords['gage_id'].isin(list(gage_ids_531))].copy()

coords['gage_id'] = pd.Categorical(coords['gage_id'], categories=list(gage_ids), ordered=True)
coords_531['gage_id'] = pd.Categorical(coords_531['gage_id'], categories=list(gage_ids_531), ordered=True)

coords = coords.sort_values('gage_id')  # Sort to match order of metrics.
basin_coords_531 = coords_531.sort_values('gage_id')

# 4. Load the evaluation metrics.
metrics_path = os.path.join(config['out_path'], 'metrics.json')
metrics = load_json(metrics_path)

# 5. Add the evaluation metrics to the basin shapefile.
if config['observations']['name'] == 'camels_671':
    coords[METRIC] = metrics[METRIC]
    full_data = coords
elif config['observations']['name'] == 'camels_531':
    coords_531[METRIC] = metrics[METRIC]
    full_data = coords_531
else:
    raise ValueError(f"Observation data supported: 'camels_671' or 'camels_531'. Got: {config['observations']}")

# 6. Plot the evaluation results spatially.
geoplot_single_metric(
    full_data,
    METRIC,
    rf"Spatial Map of {METRIC.upper()} for $\delta$HBV 1.0 on CAMELS " \
        f"{config['observations']['name'].split('_')[-1]}",
    dynamic_colorbar=False,
)

## 4. Forward $\delta$ HBV 1.0

After completing [these](#before-running) steps, forward the $\delta$ HBV 1.0 model with the code block below. This is intended to be an abbreviation of the forward demonstration in [1.3](#13-demonstration).

Note:
- The settings defined in `../example/conf/config_dhbv_1_0.yaml` are set to replecate benchmark performance.
- The default inference window is set from 1 October 2012 to 30 September 2014, which should use ~2.7GB of vram.
- The first year (`warm_up` in the config, 365 days is default) of the inference period is used for initializing HBV's internal states (water storages) and is, therefore, excluded from the model's prediction output.

In [None]:
import sys

sys.path.append('../../')
sys.path.append('../../dMG')  # Add the dMG root directory.

from core.utils import print_config
from core.utils.factory import import_data_loader
from example import load_config
from models.model_handler import ModelHandler as dHBV

#------------------------------------------#
# Define model settings here.
CONFIG_PATH = '../example/conf/config_dhbv_1_1p.yaml'
#------------------------------------------#



# 1. Load configuration dictionary of model parameters and options.
config = load_config(CONFIG_PATH)
config['mode'] = 'predict'  # <-- Confirm that we are doing training if not set in the config file.
print_config(config)

# 2. Initialize the differentiable HBV 1.0 model (LSTM + HBV 1.0).
model = dHBV(config, verbose=True)

# 3. Load and initialize a dataset dictionary of NN and HBV model inputs.
data_loader_cls = import_data_loader(config['data_loader'])
data_loader = data_loader_cls(config, test_split=False, overwrite=False)

# 4. Forward the model to get the predictions.
predictions = model.forward(
    data_loader.dataset,
    eval=True
)

### Visualizing Model Predictions

After running model inference we can, e.g., view the hydrograph for one of the basins to see we are getting expected outputs.

We can do this with our target variable, streamflow, for instance... (though, there are many other states and fluxes we can output as shown in the output cell below.)

In [None]:
import numpy as np
from core.data import txt_to_array
from core.post.plot_hydrograph import plot_hydrograph
from core.utils.dates import Dates

#------------------------------------------#
# Choose a basin by USGS gage ID to plot.
GAGE_ID = 1022500
TARGET = 'flow_sim'

# Resample to 3-day prediction. Options: 'D', 'W', 'M', 'Y'.
RESAMPLE = '3D'

# Set the paths to the gage ID lists...
GAGE_ID_PATH = 'your/path/to/gageid.npy'
GAGE_ID_531_PATH = 'your/path/to/Sub531ID.txt'
#------------------------------------------#



print(f"HBV states and fluxes: {predictions['HBV_1_1p'].keys()} \n")


# 1. Get the streamflow predictions and daily timesteps of the prediction window.
pred = predictions['HBV_1_1p'][TARGET]
timesteps = Dates(config['predict'], config['dpl_model']['rho']).batch_daily_time_range

# Remove warm-up period to match model output (see Note above.)
timesteps = timesteps[config['dpl_model']['phy_model']['warm_up']:]


# 2. Load the gage ID lists and get the basin index.
gage_ids = np.load(GAGE_ID_PATH, allow_pickle=True)
gage_ids_531 = txt_to_array(GAGE_ID_531_PATH)

print(f"First 20 available gage IDs: \n {gage_ids[:20]} \n")
print(f"First 20 available gage IDs (531 subset): \n {gage_ids_531[:20]} \n")

if config['observations']['name'] == 'camels_671':
    if GAGE_ID in gage_ids:
        basin_idx = list(gage_ids).index(GAGE_ID)
    else:
        raise ValueError(f"Basin with gage ID {GAGE_ID} not found in the CAMELS 671 dataset.")

elif config['observations']['name'] == 'camels_531':
    if GAGE_ID in gage_ids_531:
        basin_idx = list(gage_ids_531).index(GAGE_ID)
    else:
        raise ValueError(f"Basin with gage ID {GAGE_ID} not found in the CAMELS 531 dataset.")
else:
    raise ValueError(f"Observation data supported: 'camels_671' or 'camels_531'. Got: {config['observations']}")


# 3. Get the data for the chosen basin and plot.
streamflow_pred_basin = pred[:, basin_idx].squeeze()

plot_hydrograph(
    timesteps,
    streamflow_pred_basin,
    streamflow_pred_basin,
    resample=RESAMPLE,
    title=f"Hydrograph for Gage ID {GAGE_ID}",
    ylabel='Streamflow (ft$^3$/s)',
)