Running the model
================


This note explains in a nutshell how the data should be formatted and what preprocessing steps are needed to run the GAMCR model.

# 1. Check that your dataset has the right format

Your datasets should be a `.txt` file with the following column names:

- timeyear: for the timeyear (e.g. 2022.45)
- p: for precipitation
- pet: for potential evapotranspiration
- q: for streamflow
- date: for datetime object representing the date

# 2. Data preprocessing

Training GAMCR is much more efficient when some computations are made offline (before lauching training). To process the data, you should use the following script:
    

In [None]:
import pandas as pd
import numpy as np
import sys
sys.path.append('YOUR PATH TO GAMCR')
import GAMCR

for site in ['Basel_notflashy']: 
    model = GAMCR.model.GAMCR(features = {'date':True})    
    save_folder = './{0}/data/'.format(site)
    datafile = './{0}/data_{0}.txt'.format(site)
    model.save_batch(save_folder, datafile)

# 3.A Training a model with predefined hyperparameters

As explained in our paper, GAMCR consider two regularization parameters. If you decide to set yourself the values of this parameters, then you can use the code below to train the model.


In [None]:
import pandas as pd
import numpy as np
import sys
sys.path.append('/mydata/watres/quentin/code/FLOW/hourly_analysis/')
import GAMCR

station = 'Basel'
mode = 'notflashy'
model_ghost = GAMCR.model.GAMCR(lam=0.1)
save_folder = './{0}_{1}/data/'.format(station, mode)
X, matJ, y, timeyear, dates = model_ghost.load_data(save_folder, max_files=96)


# Hyperparameter 1: control the smoothnees of the coefficients with respect to the different features
lam = 0.0005

# Hyperparameter 2: control the smoothness of the transfer functions
global_lam = 0.1

model = GAMCR.model.GAMCR(lam=lam)
model.load_model('./{0}_{1}/data/params.pkl'.format(station, mode), lam=lam)
save_folder = './{0}_{1}/'.format(station, mode)
name_model = '{0}_{1}_best_model'.format(station, mode)
loss = model.train(X, matJ, y, dates=dates, lr=1e-1, max_iter=10000, warm_start=False, save_folder=save_folder, name_model=name_model, normalization_loss=1, lam_global = global_lam)


# 3.B Training a model selecting hyperparameters with cross validation

In case you would like to optimize the selection of the hyperparamters, you can launch the script `CV_model.py` which will train the model for different values of the hyperparameters (located on a 2D grid).

Once all models are trained, you can investigate the results yourself to find the best one and use the script `find_best_model_CV.py` to use an automated processure to find the best model.