In [1]:
# Imports
from pathlib import Path

import pandas as pd
import torch
from neuralhydrology.nh_run import start_run, eval_run, finetune

In [2]:
# by default we assume that you have at least one CUDA-capable NVIDIA GPU
if torch.cuda.is_available():
    
    start_run(config_file=Path("kz_basins.yml"))

# fall back to CPU-only mode
else:
    start_run(config_file=Path("kz_basins.yml"), gpu=-1)

2024-06-28 13:08:43,062: Logging to /home/spectre/Projects/ISSAI/Internship/aqua_rate/train/runs/cudalstm_531_basins_2806_130843/output.log initialized.
2024-06-28 13:08:43,063: ### Folder structure created at /home/spectre/Projects/ISSAI/Internship/aqua_rate/train/runs/cudalstm_531_basins_2806_130843
2024-06-28 13:08:43,063: ### Run configurations for cudalstm_531_basins
2024-06-28 13:08:43,063: experiment_name: cudalstm_531_basins
2024-06-28 13:08:43,063: run_dir: /home/spectre/Projects/ISSAI/Internship/aqua_rate/train/runs/cudalstm_531_basins_2806_130843
2024-06-28 13:08:43,064: train_basin_file: basins.txt
2024-06-28 13:08:43,064: validation_basin_file: basins.txt
2024-06-28 13:08:43,064: test_basin_file: basins.txt
2024-06-28 13:08:43,065: train_start_date: 2000-01-01 00:00:00
2024-06-28 13:08:43,065: train_end_date: 2014-12-31 00:00:00
2024-06-28 13:08:43,065: validation_start_date: 2015-01-01 00:00:00
2024-06-28 13:08:43,065: validation_end_date: 2016-12-31 00:00:00
2024-06-28 1

KeyboardInterrupt: 

In [5]:
# Load validation results from the last epoch
import os


run_dir = Path(f"./runs/{sorted(os.listdir('./runs'))[-1]}")
df = pd.read_csv(run_dir / "validation" / "model_epoch028" / "validation_metrics.csv", dtype={'basin': str})
df = df.set_index('basin')

# Compute the median NSE from all basins, where discharge observations are available for that period
print(f"Median NSE of the validation period {df['NSE'].median():.3f}")

# Select a random basins from the lower 50% of the NSE distribution
basin = df.loc[df["NSE"] < df["NSE"].median()].sample(n=1).index[0]

print(f"Selected basin: {basin} with an NSE of {df.loc[df.index == basin, 'NSE'].values[0]:.3f}")

Median NSE of the validation period 0.372
Selected basin: 77819 with an NSE of -0.346


## Finetuning

Next, we will show how to perform finetuning for the basin selected above, based on the model we just trained. The function to use is `finetune` from `neuralhydrology.nh_run` if you want to train from within a script or notebook. If you want to start finetuning from the command line, you can also call the `nh-run` utility with the `finetune` argument, instead of e.g. `train` or `evaluate`.

The only thing required, similar to the model training itself, is a config file. This config however has slightly different requirements to a normal model config and works slightly different:
- The config has to contain the following two arguments:
    - `base_run_dir`: The path to the directory of the pre-trained model.
    - `finetune_modules`: Which parts of the pre-trained model you want to finetune. Check the documentation of each model class for a list of all possible parts. Often only parts, e.g. the output layer, are trained during finetuning and the rest is kept fixed. There is no general rule of thumb and most likely you will have to try both.
- Any additional argument contained in this config will overwrite the config argument of the pre-trained model. Everything _not_ specified will be taken from the pre-trained model. That is, you can e.g. specify a new basin file in the finetuning config (by `train_basin_file`) to finetune the pre-trained model on a different set of basins, or even just a single basin as we will do in this notebook. You can also change the learning rate, loss function, evaluation metrics and so on. The only thing you can not change are arguments that change the model architecture (e.g. `model`, `hidden_size` etc.), because this leads to errors when you try to load the pre-trained weights into the initialized model.

Let's have a look at the `finetune.yml` config that we prepared for this tutorial (you can find the file in the same directory as this notebook).

In [6]:
!cat finetune.yml

# --- Experiment configurations --------------------------------------------------------------------

# experiment name, used as folder name
experiment_name: cudalstm_531_basins_finetuned

# files to specify training, validation and test basins (relative to code root or absolute path)
train_basin_file: finetune_basin.txt
validation_basin_file: finetune_basin.txt
test_basin_file: finetune_basin.txt

# --- Training configuration -----------------------------------------------------------------------

# specify learning rates to use starting at specific epochs (0 is the initial learning rate)
learning_rate:
    0: 5e-4
    2: 5e-5	

# Number of training epochs
epochs: 10

finetune_modules:
- head
- lstm

base_run_dir: /home/spectre/Projects/ISSAI/Internship/aqua_rate/test_train_25_06_24/runs/cudalstm_531_basins_2506_183200

So out of the two arguments that are required, `base_run_dir` is still missing. We will add the argument from here and point at the directory of the model we just trained. Furthermore, we point to a new file for training, validation and testing, called `finetune_basin.txt`, which does not yet exist. We will create this file and add the basin we selected above as the only basin we want to use here. The rest are some changes to the learning rate and the number of training epochs as well as a new name. Also note that here, we train the full model, by selecting all model parts available for the `CudaLSTM` under `finetune_modules`.

In [20]:
# Add the path to the pre-trained model to the finetune config
with open("finetune.yml", "a") as fp:
    fp.write(f"\nbase_run_dir: {run_dir.absolute()}")
    
# Create a basin file with the basin we selected above
with open("finetune_basin.txt", "w") as fp:
    fp.write(basin)

With that, we are ready to start the finetuning. As mentioned above, we have two options to start finetuning:
1. Call the `finetune()` function from a different Python script or a Jupyter Notebook with the path to the config.
2. Start the finetuning from the command line by calling

```bash
nh-run finetune --config-file /path/to/config.yml
```

Here, we will use the first option.

In [24]:
finetune(Path("finetune.yml"))

2024-06-28 13:02:46,258: Logging to /home/spectre/Projects/ISSAI/Internship/aqua_rate/test_train_25_06_24/runs/cudalstm_531_basins_finetuned_2806_130246/output.log initialized.
2024-06-28 13:02:46,259: ### Folder structure created at /home/spectre/Projects/ISSAI/Internship/aqua_rate/test_train_25_06_24/runs/cudalstm_531_basins_finetuned_2806_130246
2024-06-28 13:02:46,259: ### Start finetuning with pretrained model stored in /home/spectre/Projects/ISSAI/Internship/aqua_rate/test_train_25_06_24/runs/cudalstm_531_basins_2806_124648
2024-06-28 13:02:46,260: ### Run configurations for cudalstm_531_basins_finetuned
2024-06-28 13:02:46,260: batch_size: 256
2024-06-28 13:02:46,261: clip_gradient_norm: 1
2024-06-28 13:02:46,261: commit_hash: 44bd92d
2024-06-28 13:02:46,262: data_dir: ../data/CAMELS_KZ
2024-06-28 13:02:46,262: dataset: generic
2024-06-28 13:02:46,263: device: cuda:0
2024-06-28 13:02:46,263: dynamic_inputs: ['prcp', 'srad', 't_max', 't_min', 'pp_mean']
2024-06-28 13:02:46,263: e

KeyboardInterrupt: 

Looking at the validation result, we can see an increase of roughly 0.05 NSE.

Last but not least, we will compare the pre-trained and the finetuned model on the test period. For this, we will make use of the `eval_run` function from `neuralhydrolgy.nh_run`. Alternatively, you could evaluate both runs from the command line by calling

```bash
nh-run evaluate --run-dir /path/to/run_directory/
```

In [17]:
eval_run(run_dir, period="test", epoch=28)

2024-06-28 13:00:25,455: Using the model weights from runs/cudalstm_531_basins_2806_124648/model_epoch028.pt


# Evaluation:  10%|▉         | 4/42 [00:00<00:04,  9.25it/s]2024-06-28 13:00:25,924: The following basins had not enough valid target values to calculate a standard deviation: 11163. NSE loss values for this basin will be NaN.
# Evaluation:  45%|████▌     | 19/42 [00:02<00:02,  9.36it/s]2024-06-28 13:00:27,803: The following basins had not enough valid target values to calculate a standard deviation: 13035. NSE loss values for this basin will be NaN.
# Evaluation:  67%|██████▋   | 28/42 [00:03<00:01,  9.14it/s]2024-06-28 13:00:28,888: The following basins had not enough valid target values to calculate a standard deviation: 19013. NSE loss values for this basin will be NaN.
# Evaluation: 100%|██████████| 42/42 [00:05<00:00,  8.28it/s]
2024-06-28 13:00:30,540: Stored metrics at runs/cudalstm_531_basins_2806_124648/test/model_epoch028/test_metrics.csv
2024-06-28 13:00:30,546: Stored results at runs/cudalstm_531_basins_2806_124648/test/model_epoch028/test_results.p


Now we can call the `eval_run()` function as above, but pointing to the directory of the finetuned run. By default, this function evaluates the last checkpoint, which can be changed with the `epoch` argument. Here however, we use the default. Again, if you want to run this notebook locally, make sure to adapt the folder name of the finetune run.

In [18]:
finetune_dir = Path(f"./runs/{sorted(os.listdir('./runs'))[-1]}")
eval_run(finetune_dir, period="test")

2024-06-28 13:00:32,694: Using the model weights from runs/cudalstm_531_basins_finetuned_2806_125858/model_epoch010.pt
# Evaluation: 100%|██████████| 1/1 [00:00<00:00,  7.99it/s]
2024-06-28 13:00:32,823: Stored metrics at runs/cudalstm_531_basins_finetuned_2806_125858/test/model_epoch010/test_metrics.csv
2024-06-28 13:00:32,824: Stored results at runs/cudalstm_531_basins_finetuned_2806_125858/test/model_epoch010/test_results.p


Now let's look at the test period results of the pre-trained base model and the finetuned model for the basin that we chose above.

In [19]:
# load test results of the base run
df_pretrained = pd.read_csv(run_dir / "test/model_epoch030/test_metrics.csv", dtype={'basin': str})
df_pretrained = df_pretrained.set_index("basin")
    
# load test results of the finetuned model
df_finetuned = pd.read_csv(finetune_dir / "test/model_epoch010/test_metrics.csv", dtype={'basin': str})
df_finetuned = df_finetuned.set_index("basin")
    
# extract basin performance
base_model_nse = df_pretrained.loc[df_pretrained.index == basin, "NSE"].values[0]
finetune_nse = df_finetuned.loc[df_finetuned.index == basin, "NSE"].values[0]
print(f"Basin {basin} base model performance: {base_model_nse:.3f}")
print(f"Performance after finetuning: {finetune_nse:.3f}")

Basin 77819 base model performance: 0.100
Performance after finetuning: 0.098


So we see roughly the same performance increase in the test period (slightly higher), which is great. However, note that a) our base model was not optimally trained (we stopped quite early) but also b) the finetuning settings were chosen rather randomly. From our experience so far, you can almost always get performance increases for individual basins with finetuning, but it is difficult to find settings that are universally applicable. However, this tutorial was just a showcase of how easy it actually is to finetune models with the NeuralHydrology library. Now it is up to you to experiment with it.