Running the model
================


This note explains in a nutshell how the data should be formatted and what preprocessing steps are needed to run the WATRES model.

# 1. Check That Your Dataset Has the Right Format


Your dataset should be a file named `{site_name}.txt` with the following required column names:

- **`t`**: Represents the year in decimal format (e.g., 2022.45).  
- **`p`**: Precipitation.  
- **`pet`**: Potential evapotranspiration.  
- **`q`**: Streamflow.  
- **`Cp`**: Input tracer data.  
- **`Cq`**: Output tracer data.  

---

### Folder Structure for Using WATRES

To properly use the WATRES package, create a folder for your site named `{site_name}`. This folder should follow the structure below:

- WATRES will save the models you train for this site in the same folder.
- Two subfolders
  - **`data/`**:  
    This subfolder stores the data: `{site_name}.txt`.

  - **`save/`**:  
    WATRES will save the models you train for this site in this folder. This folder will also store the statistics on the results of a trained model, created when calling the `compute_results.py` method (see the `scripts` folder.



# 2. Training a Model

### ➡️ Run the Script: `train_models.py`

This scripts looks like this:

In [None]:
def trainf(x): 
    import sys
    sys.path.append(os.path.join(path_root))
    from WATRES import WATRES
    model_bert = WATRES(pathsite=x['pathsite'], site=x['site'], algo=x['algo'], site_name2save=x['site_name2save'])
    model_bert.train(BATCH_SIZE=4000, Tmax = 43200, n_validation = 365*24*2, n_train=365*24*10, seed = x['seed'], nb_epochs=400, std_input_noise=x['input_std'], std_output_noise=x['output_std'])
    return 1

if __name__ == "__main__":
    os.chdir(os.path.join(path_root, 'WATRES'))

    # Define the sites and algorithms
    
    input_std = 0.1
    output_std = 0.1
    sites = ['Pully_small_storage', 'Pully_large_storage', 'Lugano_small_storage','Lugano_large_storage','Basel_small_storage','Basel_large_storage'] 


    algos = ['WATRES']

    
    settings_algos = []
    for site in sites:
        pathsite = os.path.join(path_root, f'data/{site}/')
        
        for algo in algos:
            site_name2save = 'input_std_' + str(input_std) + '-output_std_' + str(output_std)

            settings_algos.append({
                'site': site,
                'site_name2save':site_name2save,
                'pathsite': pathsite,
                'algo': algo,
                'seed': 0,
                'input_std':input_std,
                'output_std':output_std
            })

    for sett in settings_algos:
        trainf(sett)

In the following, we explain step by step the meaning of the parameter you can define:

- `input_std`: standard deviation of the Gaussian white noise that will be added on the input tracer data before training the model.

- `output_std`: standard deviation of the Gaussian white noise that will be added on the output tracer data before training the model.

- `sites`: list of the names of the site on which you want to train the model.

- `algos`: list of the names of the model you want to use.

- `site_name2save`: You can freely decide to give the name you want to the model that you be trained. This name will be used to save the checkpoint and the results.


- `model_bert.train(BATCH_SIZE=4000, n_validation = 365*24*2, n_train=365*24*10, seed = x['seed'], nb_epochs=400, std_input_noise=x['input_std'], std_output_noise=x['output_std'])`
    - `BATCH_SIZE`: size of the training set. Note that these points will be sampled equally across the four quartiles of the discharge range.
    - `Tmax`: age horizon used to model the transit time distributions.
    - `n_validation`: number of most recent time points to be withheld from training (e.g., for forecasting).
    - `n_train`: number of time steps directly before the validation period to be used for sampling the training data.
    - `seed`: random seed
    - `nb_epochs`: number of epochs to train the model

# 3. Inference

In the following, we explain how inference can be made easily with a trained model.

In [None]:
site = 'Pully_small_storage'

# Name that you provided to you trained model (TO BE MODIFIED)
site_name2save = 'input_std_0.1-output_std_0.1'

algo = 'WATRES'


# Define the dates for which you would like to get the predicted transit time distributions.
def filter_dates(dates):
    return np.where(dates>=2020)[0]


x = {
    'pathsite': os.path.join(root_path, f"data/{site}/"),
    'path_model': os.path.join(root_path, f"data/{site}/save/save_{site_name2save}_{algo}.pth.tar"),
    'site': site,
    'algo': algo
}

# Loading the pretrained model
model = WATRES(pathsite=x['pathsite'], site=x['site'], algo=x['algo'], path_model=x['path_model'])

# Getting the results for the dates defined by the filter
results = model.model_estimate(filter_dates, BATCH_SIZE=400)

# Showing prediction on output tracer data
dates = results['timeyear']

# Observed output tracer
Cout = results['Cout']

# Predicted output tracer
Chat = results['Chat']

# 4. Computing automatically some relevant statistics

Note that if you want, you can also run the Script: `compute_results.py`.

Running this script will allow you to precompute a lot of relevant quantities automatically.

You can then rely on the notebook `reproducing_figures_paper.ipynb` in the folder `notebook` to get nice visualizations similar to the ones produced in our paper.
