# Tutorial: **HydroDL LSTM**

---

This notebook is a faithful implementation of the original [HydroDL](https://github.com/mhpi/hydroDL) LSTM model developed by [Dapeng Feng et al. (2020)](https://doi.org/10.1029/2019WR026793), and demonstrates both training and forward simulation in δMG. A pre-trained model is provided for those who only wish to run the model forward.

For explanation of model structure, methodologies, data, and performance metrics, please refer to Feng's publications [below](#publication). If you find this code is useful in your own work, please include the aforementioned citation.

**Note**: If you are new to the δMG framework, we suggest first looking at our [δHBV 1.0 tutorial](./../hydrology/example_dhbv.ipynb).

<br>

### Before Running:
- **Environment**: See [setup.md](./../../docs/setup.md) for ENV setup. δMG must be installed to run this notebook.

- **Model**: Download pretrained LSTM model weights from [AWS](https://mhpi-spatial.s3.us-east-2.amazonaws.com/mhpi-release/models/lstm_trained.zip). Then update the model config:

    - In [`./generic_deltamodel/example/conf/config_lstm.yaml`](./../conf/config_lstm.yaml), update *model_dir* with your path to the parent directory containing both trained model weights `cudnnlstmmodel_ep300.pt` **and** normalization file `normalization_statistics.json`.
    - **Note**: make sure this path includes the last closing forward slash: e.g., `./your/path/to/model/`.

- **Data**: Download the CAMELS data extraction from [AWS](https://mhpi-spatial.s3.us-east-2.amazonaws.com/mhpi-release/camels/camels_data.zip). Then, update the data configs:

    - In [`./generic_deltamodel/example/conf/observations/camels_531.yaml`](./../conf/observations/camels_531.yaml) and [`camels_671.yaml`](./../conf/observations/camels_671.yaml), update...
        1. *data_path* with `camels_dataset` path,
        2. *gage_info* with `gage_ids.npy` path,
        3. *subset_path* with `531_subset.txt` path (camels_531 only).

    - The full 671-basin or 531-basin CAMELS datasets can be selected by setting `observations: camels_671` or `camels_531` in the model config, respectively.

- **Hardware**: The HydroDL LSTM requires CUDA support only available with Nvidia GPUs. For those without access, T4 GPUs can be used when running this notebook with δMG on [Google Colab](https://colab.research.google.com/).

<br>

### Publications:

*Dapeng Feng, Kathyrn Lawson, Chaopeng Shen. "Mitigating prediction error of deep learning streamflow models in large data-sparse regions with ensemble modeling and soft data." Geophysical Research Letters (2021). https://doi.org/10.1029/2021GL092999.*

*Dapeng Feng, Kuai Fang, Chaopeng Shen. "Enhancing Streamflow Forecast and Extracting Insights Using Long-Short Term Memory Networks With Data Integration at Continental Scales." Water Resources Research (2020). https://doi.org/10.1029/2019WR026793.*

<br>

### Issues:
For questions, concerns, bugs, etc., please reach out by posting an [issue](https://github.com/mhpi/generic_deltamodel/issues).

--


<br>

## 1. Forward LSTM

After completing [these](#before-running) steps, forward the LSTM with the code block below.

Note:
- The settings defined in the config `./generic_deltamodel/example/conf/config_lstm.yaml` are set to replicate benchmark performance on 531 CAMELS basins.
- While published results are an average of 6 models using different random seeds, we only use one model and seed here for demonstration.

### 1.1 Demonstration

In [None]:
import sys

sys.path.append('../../')

from dmg import ModelHandler
from dmg.core.utils import import_data_loader, print_config, set_randomseed
from example.example_utils import load_config

# ------------------------------------------#
# Define model settings here.
CONFIG_PATH = '../example/conf/config_lstm.yaml'
# ------------------------------------------#


# 1. Load configuration dictionary of model parameters and options.
config = load_config(CONFIG_PATH)
config['mode'] = 'sim'
print_config(config)

# Set random seed for reproducibility.
set_randomseed(config['seed'])

# 2. Initialize the LSTM.
model = ModelHandler(config, verbose=True)

# 3. Load and initialize a dataset dictionary of normalized NN model inputs.
data_loader_cls = import_data_loader(config['data_loader'])
data_loader = data_loader_cls(config, test_split=True, overwrite=False)

# 4. Forward the model to get the predictions.
output = model(
    data_loader.eval_dataset,
    eval=True,
)

# Denormalize the runoff predictions.
runoff = output['CudnnLstmModel']['runoff']

runoff = data_loader.from_norm(
    output['CudnnLstmModel']['runoff'].cpu().detach().numpy(),
    vars='runoff',
)

print("-------------\n")
print(
    f"Streamflow predictions for {runoff.shape[0]} days and "
    f"{runoff.shape[1]} basins ~ \nShowing the first 5 days for "
    f"first basin: \n {runoff[:5, :1]}"
)

## 1. Creating a Model
For this example, we demonstrate how to setup an minimal LSTM example with DeltaModel.

In [None]:
data_loader._denormalize(output)

In [None]:
import sys

sys.path.append('../../')
sys.path.append('../../deltaModel')  # Add the root directory of deltaModel

from example.example_utils import load_config
from deltaModel.models.neural_networks import CudnnLstmModel as LSTM
from deltaModel.models.neural_networks import init_nn_model
from deltaModel.core.data.dataset_loading import get_dataset_dict
from deltaModel.models.differentiable_model import DeltaModel as dHBV
from deltaModel.core.data import take_sample


CONFIG_PATH = '../example/conf/config_lstm.yaml'


# 1. Load configuration dictionary of model parameters and options.
config = load_config(CONFIG_PATH)
device = config['device']

# 2. Setup a dataset dict of NN model inputs.
# Take a sample to reduce size on GPU.
dataset = get_dataset_dict(config, train=True)
dataset_sample = take_sample(config, dataset, days=730, basins=100)

nx = dataset_sample['x'].shape[-1]
ny = dataset_sample['target'].shape[-1]
hidden_size = config['nn_model']['hidden_size']
dr = config['nn_model']['dr']


# 3. Initialize an LSTM
lstm = LSTM(nx=nx, ny=ny, hiddenSize=hidden_size, dr=dr)

## From here, forward or train the lstm just as any torch.nn.Module model.

# 5. For example, to forward:
output = lstm.forward(dataset_sample)


print(
    f"Streamflow predictions for {output['flow_sim'].shape[0]} days and {output['flow_sim'].shape[1]} basins: Showing the first 5 days for 5 basins \n {output['flow_sim'][:3, :3]}"
)  # TODO: Add a visualization here.