In [1]:
# Imports
from pathlib import Path

import pandas as pd
import torch
from neuralhydrology.nh_run import start_run, eval_run, finetune

In [2]:
# by default we assume that you have at least one CUDA-capable NVIDIA GPU
if torch.cuda.is_available():
    
    # start_run(config_file=Path("kz_basins.yml"))
    start_run(config_file=Path("./conf/kz_basins.yml"))

# fall back to CPU-only mode
else:
    start_run(config_file=Path("./conf/kz_basins.yml"), gpu=-1)

2024-07-01 16:37:57,686: Logging to /home/spectre/Projects/ISSAI/Internship/aqua_rate/train/runs/experiment_1_0107_163757/output.log initialized.
2024-07-01 16:37:57,687: ### Folder structure created at /home/spectre/Projects/ISSAI/Internship/aqua_rate/train/runs/experiment_1_0107_163757
2024-07-01 16:37:57,687: ### Run configurations for experiment_1
2024-07-01 16:37:57,687: experiment_name: experiment_1
2024-07-01 16:37:57,688: run_dir: /home/spectre/Projects/ISSAI/Internship/aqua_rate/train/runs/experiment_1_0107_163757
2024-07-01 16:37:57,688: train_basin_file: basins.txt
2024-07-01 16:37:57,688: validation_basin_file: basins.txt
2024-07-01 16:37:57,688: test_basin_file: basins.txt
2024-07-01 16:37:57,688: train_start_date: 2012-01-01 00:00:00
2024-07-01 16:37:57,689: train_end_date: 2016-12-31 00:00:00
2024-07-01 16:37:57,689: validation_start_date: 2017-01-01 00:00:00
2024-07-01 16:37:57,689: validation_end_date: 2019-12-31 00:00:00
2024-07-01 16:37:57,689: test_start_date: 2019-

KeyboardInterrupt: 

In [46]:
# Load validation results from the last epoch
import os


run_dir = Path(f"./runs/{sorted(os.listdir('./runs'))[-1]}")
df = pd.read_csv(run_dir / "validation" / "model_epoch019" / "validation_metrics.csv", dtype={'basin': str})
df = df.set_index('basin')

# Compute the median NSE from all basins, where discharge observations are available for that period
print(f"Median NSE of the validation period {df['NSE'].median():.3f}")
print(f"Mean NSE of the validation period {df['NSE'].mean():.3f}")
print(df)
# Select a random basins from the lower 50% of the NSE distribution
# for i, row in df.iterrows():
#     print(row)
#     basin = row
#     print(f"Selected basin: {basin['Name']} with an NSE of {basin['NSE']:.3f}")

Median NSE of the validation period 0.352
Mean NSE of the validation period 0.361
            NSE
basin          
11001  0.585831
11068 -0.384398
11126  0.854831
11129  0.764054
11163       NaN
11164  0.628861
11275  0.008243
11293  0.445322
11397  0.074017
11433  0.163400
11469  0.737924
12002  0.328793
12008 -0.214038
12032 -0.014399
12072  0.308092
12075  0.554264
12564  0.351611
13002  0.343412
13005  0.310955
13035       NaN
13038  0.538745
13048  0.325846
13064  0.195148
13090  0.760546
13091  0.494310
13115  0.540329
13128  0.364835
13221  0.357655
19013       NaN
19022  0.079550
19195  0.224465
19196  0.485065
19205  0.395473
19208  0.168145
19218  0.329480
19243  0.229327
19289  0.247906
19300  0.547841
19302  0.711471
19462  0.466795
19463  0.461266
77819  0.309240


## Finetuning

Next, we will show how to perform finetuning for the basin selected above, based on the model we just trained. The function to use is `finetune` from `neuralhydrology.nh_run` if you want to train from within a script or notebook. If you want to start finetuning from the command line, you can also call the `nh-run` utility with the `finetune` argument, instead of e.g. `train` or `evaluate`.

The only thing required, similar to the model training itself, is a config file. This config however has slightly different requirements to a normal model config and works slightly different:
- The config has to contain the following two arguments:
    - `base_run_dir`: The path to the directory of the pre-trained model.
    - `finetune_modules`: Which parts of the pre-trained model you want to finetune. Check the documentation of each model class for a list of all possible parts. Often only parts, e.g. the output layer, are trained during finetuning and the rest is kept fixed. There is no general rule of thumb and most likely you will have to try both.
- Any additional argument contained in this config will overwrite the config argument of the pre-trained model. Everything _not_ specified will be taken from the pre-trained model. That is, you can e.g. specify a new basin file in the finetuning config (by `train_basin_file`) to finetune the pre-trained model on a different set of basins, or even just a single basin as we will do in this notebook. You can also change the learning rate, loss function, evaluation metrics and so on. The only thing you can not change are arguments that change the model architecture (e.g. `model`, `hidden_size` etc.), because this leads to errors when you try to load the pre-trained weights into the initialized model.

Let's have a look at the `finetune.yml` config that we prepared for this tutorial (you can find the file in the same directory as this notebook).

In [None]:
!cat finetune.yml

# --- Experiment configurations --------------------------------------------------------------------

# experiment name, used as folder name
experiment_name: cudalstm_531_basins_finetuned

# files to specify training, validation and test basins (relative to code root or absolute path)
train_basin_file: finetune_basin.txt
validation_basin_file: finetune_basin.txt
test_basin_file: finetune_basin.txt

# --- Training configuration -----------------------------------------------------------------------

# specify learning rates to use starting at specific epochs (0 is the initial learning rate)
learning_rate:
    0: 1e-4
    5: 5e-6

# Number of training epochs
epochs: 20

finetune_modules:
- head
- lstm


So out of the two arguments that are required, `base_run_dir` is still missing. We will add the argument from here and point at the directory of the model we just trained. Furthermore, we point to a new file for training, validation and testing, called `finetune_basin.txt`, which does not yet exist. We will create this file and add the basin we selected above as the only basin we want to use here. The rest are some changes to the learning rate and the number of training epochs as well as a new name. Also note that here, we train the full model, by selecting all model parts available for the `CudaLSTM` under `finetune_modules`.

In [13]:
# Add the path to the pre-trained model to the finetune config
with open("./conf/finetune.yml", "a") as fp:
    fp.write(f"\nbase_run_dir: {run_dir.absolute()}")
    
# Create a basin file with the basin we selected above
with open("./basins/finetune_basin.txt", "w") as fp:
    fp.write(basin)

With that, we are ready to start the finetuning. As mentioned above, we have two options to start finetuning:
1. Call the `finetune()` function from a different Python script or a Jupyter Notebook with the path to the config.
2. Start the finetuning from the command line by calling

```bash
nh-run finetune --config-file /path/to/config.yml
```

Here, we will use the first option.

In [14]:
finetune(Path("./conf/finetune.yml"))

2024-07-01 12:40:28,921: Logging to /home/spectre/Projects/ISSAI/Internship/aqua_rate/train/runs/experiment_1_0107_124028/output.log initialized.
2024-07-01 12:40:28,921: ### Folder structure created at /home/spectre/Projects/ISSAI/Internship/aqua_rate/train/runs/experiment_1_0107_124028
2024-07-01 12:40:28,922: ### Start finetuning with pretrained model stored in /home/spectre/Projects/ISSAI/Internship/aqua_rate/train/runs/experiment_1_0107_122035
2024-07-01 12:40:28,922: ### Run configurations for experiment_1
2024-07-01 12:40:28,922: batch_size: 256
2024-07-01 12:40:28,923: clip_gradient_norm: 1
2024-07-01 12:40:28,923: commit_hash: 21f2e4a
2024-07-01 12:40:28,924: data_dir: ../data/CAMELS_KZ
2024-07-01 12:40:28,924: dataset: generic
2024-07-01 12:40:28,924: device: cuda:0
2024-07-01 12:40:28,925: dynamic_inputs: ['prcp', 'srad', 't_max', 't_min', 'pp_mean']
2024-07-01 12:40:28,925: dynamics_embedding: {'type': 'fc', 'hiddens': [24, 32, 64], 'activation': 'tanh', 'dropout': 0.0}
202



# Epoch 1: 100%|██████████| 4/4 [00:00<00:00,  8.82it/s, Loss: 0.0016]
2024-07-01 12:40:29,499: Epoch 1 average loss: avg_loss: 0.00243, avg_total_loss: 0.00243
# Validation: 100%|██████████| 1/1 [00:00<00:00,  4.91it/s]
2024-07-01 12:40:29,711: Stored metrics at /home/spectre/Projects/ISSAI/Internship/aqua_rate/train/runs/experiment_1_0107_124028/validation/model_epoch001/validation_metrics.csv
2024-07-01 12:40:29,712: Stored results at /home/spectre/Projects/ISSAI/Internship/aqua_rate/train/runs/experiment_1_0107_124028/validation/model_epoch001/validation_results.p
2024-07-01 12:40:29,713: Epoch 1 average validation loss: 0.00053 -- Median validation metrics: avg_loss: 0.00053, NSE: -34.77277
# Epoch 2: 100%|██████████| 4/4 [00:00<00:00,  8.64it/s, Loss: 0.0018]
2024-07-01 12:40:30,178: Epoch 2 average loss: avg_loss: 0.00162, avg_total_loss: 0.00162
# Validation: 100%|██████████| 1/1 [00:00<00:00,  6.14it/s]
2024-07-01 12:40:30,350: Stored metrics at /home/spectre/Projects/ISSAI/In

Looking at the validation result, we can see an increase of roughly 0.05 NSE.

Last but not least, we will compare the pre-trained and the finetuned model on the test period. For this, we will make use of the `eval_run` function from `neuralhydrolgy.nh_run`. Alternatively, you could evaluate both runs from the command line by calling

```bash
nh-run evaluate --run-dir /path/to/run_directory/
```

In [15]:
eval_run(run_dir, period="test")

2024-07-01 12:41:08,843: Using the model weights from runs/experiment_1_0107_122035/model_epoch010.pt




# Evaluation:  10%|▉         | 4/42 [00:01<00:17,  2.19it/s]2024-07-01 12:41:10,731: The following basins had not enough valid target values to calculate a standard deviation: 11163. NSE loss values for this basin will be NaN.
# Evaluation:  45%|████▌     | 19/42 [00:08<00:09,  2.34it/s]2024-07-01 12:41:17,319: The following basins had not enough valid target values to calculate a standard deviation: 13035. NSE loss values for this basin will be NaN.
# Evaluation:  67%|██████▋   | 28/42 [00:12<00:06,  2.33it/s]2024-07-01 12:41:21,324: The following basins had not enough valid target values to calculate a standard deviation: 19013. NSE loss values for this basin will be NaN.
# Evaluation: 100%|██████████| 42/42 [00:18<00:00,  2.24it/s]
2024-07-01 12:41:27,567: Stored metrics at runs/experiment_1_0107_122035/test/model_epoch010/test_metrics.csv
2024-07-01 12:41:27,572: Stored results at runs/experiment_1_0107_122035/test/model_epoch010/test_results.p


Now we can call the `eval_run()` function as above, but pointing to the directory of the finetuned run. By default, this function evaluates the last checkpoint, which can be changed with the `epoch` argument. Here however, we use the default. Again, if you want to run this notebook locally, make sure to adapt the folder name of the finetune run.

In [16]:
finetune_dir = Path(f"./runs/{sorted(os.listdir('./runs'))[-1]}")
eval_run(finetune_dir, period="test")

2024-07-01 12:41:27,591: Using the model weights from runs/experiment_1_0107_124028/model_epoch050.pt




# Evaluation:   0%|          | 0/1 [00:00<?, ?it/s]

# Evaluation: 100%|██████████| 1/1 [00:00<00:00,  2.28it/s]
2024-07-01 12:41:28,039: Stored metrics at runs/experiment_1_0107_124028/test/model_epoch050/test_metrics.csv
2024-07-01 12:41:28,039: Stored results at runs/experiment_1_0107_124028/test/model_epoch050/test_results.p


Now let's look at the test period results of the pre-trained base model and the finetuned model for the basin that we chose above.

In [18]:
# load test results of the base run
df_pretrained = pd.read_csv(run_dir / "test/model_epoch010/test_metrics.csv", dtype={'basin': str})
df_pretrained = df_pretrained.set_index("basin")
    
# load test results of the finetuned model
df_finetuned = pd.read_csv(finetune_dir / "test/model_epoch050/test_metrics.csv", dtype={'basin': str})
df_finetuned = df_finetuned.set_index("basin")
    
# extract basin performance
base_model_nse = df_pretrained.loc[df_pretrained.index == basin, "NSE"].values[0]
finetune_nse = df_finetuned.loc[df_finetuned.index == basin, "NSE"].values[0]
print(f"Basin {basin} base model performance: {base_model_nse:.3f}")
print(f"Performance after finetuning: {finetune_nse:.3f}")

Basin 11068 base model performance: -478.009
Performance after finetuning: -7.258


So we see roughly the same performance increase in the test period (slightly higher), which is great. However, note that a) our base model was not optimally trained (we stopped quite early) but also b) the finetuning settings were chosen rather randomly. From our experience so far, you can almost always get performance increases for individual basins with finetuning, but it is difficult to find settings that are universally applicable. However, this tutorial was just a showcase of how easy it actually is to finetune models with the NeuralHydrology library. Now it is up to you to experiment with it.