# Tutorial
This tutorial explains how to :
- Prepare experiments for two models with different scalers and set up an experiments
- Train the model
- Compare model results thanks to tensorboard
- Generate validation data and perform unscaling process.

## Review of two convolutional network
First go to src\model\components\conv1d_surr.py and read the model component definition.

Then you can compare the conv1d_surr model with the conv1d_surr_nopca one.
for conv1d_surr_nopca no PCA is shall be used during pre-processing. src\model\components\conv1d_surr_nopca.py

Open now the files configs\model_net\conv1d_surr_nopca.yaml and configs\model_net\conv1d_surr.yaml.

User parameters to instanciate the models are defined inside them.

## Create new experiment files
Run the code below to create two new experiment files.
When running an experiment with train.py, you run the defaults parameters defined in the train.yaml file but you overide parameters specified in the experiment file.

In [2]:
import yaml
from yaml import CLoader as Loader

# create a dictionary with the parameters
header = '# @package _global_'
Document = """
defaults:
   - override /model_net: conv1d_surr.yaml
task_name: "tutorial"
preprocessing:
   perform_decomp : True
   
tags: ["surrogate", "conv1d", "PCA"]
"""

# The above experiment will use the 'conv1d_surr' model.
# It will outputs the results in the folder 'outputs\tutorial'.
# The default decomposition which is a PCA will be performed during pre-processing.
# The experiments tags, accessible in Tensorboard visualisation will be "surrogate", "conv1d", "PCA"

yaml_doc = yaml.load(Document, Loader=Loader)

# write the dictionary to a yaml file
with open('../configs/experiment/tuto_conv_pca.yaml', 'w') as f:
    f.write(header+'\n')
    yaml.dump(yaml_doc, f, default_flow_style=False)

In [3]:
with open('../configs/experiment/tuto_conv_nopca.yaml', 'w') as f :
    Document = """
    defaults:
        - override /model_net: conv1d_surr_nopca.yaml
    task_name: "tutorial"
    preprocessing:
        perform_decomp : False
    tags: ["surrogate", "conv1d", "NoPCA"]
    """
    yaml_doc = yaml.load(Document, Loader=Loader)
    f.write(header+'\n')
    yaml.dump(yaml_doc, f)

## Train both models
Now train the 2x models with and without PCA by running train.py script.

First set up the subprocess call for a jupyter notebook to emulate a terminal call.

In [3]:
# WARNING : run this only one time, otherwise the current directory may be wrong

# Change of working directory to the root of the project
import os
os.chdir('..')

import subprocess

def notebook_subprocess(command) :
    # Execute the command within a subprocess
    process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True)

    # Read and print the output dynamically
    for line in process.stdout:
        print(line, end='')

    # Wait for the process to finish
    process.wait()

In [2]:
# run the model with PCA
command = ["python", "src/train.py", "experiment=tuto_conv_pca.yaml" ]
notebook_subprocess(command)


[2023-05-16 09:19:40,033][tutorial][INFO] - Instantiating Preprocessing <Preprocessing.Preprocessing>
[2023-05-16 09:19:43,454][tutorial][INFO] -  run <get_cos_sin_decomposition> method
[2023-05-16 09:19:43,454][tutorial][INFO] - ###
[2023-05-16 09:19:43,454][tutorial][INFO] - get cos and sin decomposition of the data {'mag10': 'theta10', 'hs': 'dp'}
[2023-05-16 09:19:43,455][tutorial][INFO] - get cos and sin decomposition of the data mag10 and theta10
[2023-05-16 09:19:43,460][tutorial][INFO] - get cos and sin decomposition of the data hs and dp
[2023-05-16 09:19:43,471][tutorial][INFO] - ###
[2023-05-16 09:19:43,471][tutorial][INFO] - Splitting data into training and test sets with method <find_test_set_in_model_validity_domain>
[2023-05-16 09:19:43,472][tutorial][INFO] - #####
[2023-05-16 09:19:43,472][tutorial][INFO] - start guessing valid training / test set with the following environmental bin :
[2023-05-16 09:19:43,472][tutorial][INFO] - {'hs': 2, 'tp': 2, 'dp': 45, 'mag10': 2, 

You can see the logs in the jupyternotebook above.
Full hydra output files are copied in a new folder of outputs\tutorial\runs.    
Note that you defined the tutorial folder by specifying the task_name: "tutorial" in the experiment yaml file.    

**You can order your experiments by task_name and model type to compare the models with tensorboard more easily.**


In [None]:
# run the model without PCA
command = ["python", "src/train.py", "experiment=tuto_conv_nopca.yaml" ]
notebook_subprocess(command)

## Compare models with tensorboard

When training is over, you can see the results in tensorboard by running the code below in the root directory of the project.

In [2]:
# run tensorboard
import subprocess
command = ["tensorboard", "--logdir", r"outputs/tutorial/runs" ]
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True)
for line in process.stdout:
        print(line, end='')

Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.11.0 at http://localhost:6006/ (Press CTRL+C to quit)


Open a browser and go to :
http://localhost:6006/

Now open the time series tab and compare the results of the two models.  
open the train card to see the training results and the validation card to see the validation results.  
Now open the HPARAMS tab and compare the hyperparameters of the two models.  

Scroll down the hyperparameters and activate tags and hp_metric.
Now you can identify which model use a PCA and compare the metrics of both models.

# Perform inference and unscaling
Now we will generate validation data and perform unscaling process.


In [4]:
# Specify the folder path
folder_path =  r"outputs/tutorial/runs"

# Get a list of all items (files and folders) within the folder
items = os.listdir(folder_path)

# Filter out the folders
folders = [item for item in items if os.path.isdir(os.path.join(folder_path, item))]

# Sort the folders by creation time (most recent first)
sorted_folders = sorted(folders, key=lambda x: os.path.getctime(os.path.join(folder_path, x)), reverse=True)

# Get the last two folders
last_two_folders = sorted_folders[:2]

# Print the last two folders
for folder in last_two_folders:
    print(folder)

2023-05-15_13-34-48
2023-05-15_11-38-05


In [6]:
# Run first model trained
command = ["python", "src/surrogate_inference.py", f"experiment_folder=outputs/tutorial/runs/{last_two_folders[0]}" ]
notebook_subprocess(command)

[2023-06-05 16:28:33,149][utils.utils][INFO] - Enforcing tags! <cfg.extras.enforce_tags=True>
[2023-06-05 16:28:33,157][utils.utils][INFO] - Printing config tree with Rich! <cfg.extras.print_config=True>
CONFIG
├── paths
│   └── root_dir: C:\Users\romain.ribault\Documents\GitHub\torchydra           
│       data_dir: C:\Users\romain.ribault\Documents\GitHub\torchydra/data/     
│       log_dir: C:\Users\romain.ribault\Documents\GitHub\torchydra/outputs/   
│       output_dir: c:\Users\romain.ribault\Documents\GitHub\torchydra\outputs\
│       work_dir: c:\Users\romain.ribault\Documents\GitHub\torchydra           
│       dataset: C:\Users\romain.ribault\Documents\GitHub\torchydra/data//netcd
│       training_env_dataset: c:\Users\romain.ribault\Documents\GitHub\torchydr
│                                                                              
├── extras
│       enforce_tags: true                                                     
│       print_config: true                      

In [1]:
# Run second model trained
command = ["python", "src/surrogate_inference.py", f"experiment_folder=outputs/tutorial/runs/{last_two_folders[1]}" ]
notebook_subprocess(command)

NameError: name 'last_two_folders' is not defined

You can now observe the results of the two models on the validation data.
Go to the save_path of the infer.yaml : by default it was set to  : 
```yaml	
save_path: ${paths.data_dir}/for_1RRI_tutorial

In [7]:
import xarray as xr
result_name = 'surrogate_2023-05-15_13-34-48_2022-12-02.nc'
save_path = r'data\for_1RRI_tutorial\2022\12\02\ANN\surrogate_2023-05-15_13-34-48\surrogate_2023-05-15_13-34-48_2022-12-02.nc'

# Load the result
result_ann = xr.load_dataset(save_path)
result_ann