## Pipeline template

Both Jupyter Notebook and Python script should have the same code. Sync is done via Jupytext library.

### Preamble

This preamble is required for every pipeline, regardless of being executed via Jupyter Notebook or Python script. It is responsible for adding the root directory into the system PATH during execution. 

In [20]:
%load_ext autoreload
%autoreload 2

from pathlib import Path
import sys

current_working_directory = Path.cwd()
root_directory = current_working_directory.parent.parent
sys.path.append(str(root_directory))

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Loading the modules
Modules should be loaded as follows:

In [21]:
from yaml import safe_load, YAMLError
from snapflow.utils import setup_output_folder, timing_decorator
from snapflow.snapshots import snapshots_assembly, data_normalization
from snapflow.linear_reduction import SVD
from snapflow.nonlinear_reduction import AutoEncoder
from snapflow.data_split import DataSplitter
from snapflow.postprocessing import compute_errors, save_paraview_visualization

### Reading the YAML file containing the parameters:

Every pipeline should have its own parameters YAML file following the one presented in this template. It should be read using the following block of code:

In [22]:
with open("parameters.yaml", "r") as stream:
    try:
        params = safe_load(stream)
    except YAMLError as exc:
        print(exc)

### Experiment name
Notice that each pipeline can have multiple experiments. Each experiment should have its own name for output dumping purposes. If the `origin_experiment_name` key on the parameters file returns `input` (specially for debugging), the terminal will request a name for that experiment. 

In [23]:
if params["origin_experiment_name"] == "input":
     params["experiment_name"] = input("Experiment name: ")

In [24]:
params

{'origin_experiment_name': 'yaml',
 'experiment_name': 'template_pipeline',
 'random_state': 42,
 'snapshots': {'file_type_str': 'numpy',
  'folder': 'data/input',
  'file_name_contains': ['snapshots'],
  'dataset': None},
 'splitting': {'strategy': 'train_val',
  'number_of_folds_or_splits': 5,
  'train_size': 0.8,
  'validation_size': 0.1,
  'test_size': 0.1,
  'gap': 0},
 'normalization': {'snapshots': None,
  'svd': None,
  'auto_encoder': 'min_max',
  'surrogate': 'min_max'},
 'svd': {'trunc_basis': 30,
  'normalization': 'min_max',
  'svd_type': 'randomized_svd',
  'power_iterations': 1,
  'oversampling': 20},
 'auto_encoder': {'data_loader': 'data_loader',
  'data_loader_parameters': {'active': True,
   'parameters': {'batch_size': 30, 'num_workers': 2}},
  'num_epochs': 1,
  'initializer': 'kaiming_normal',
  'initializer_parameters': {'active': True,
   'parameters': {'mode': 'fan_in', 'nonlinearity': 'leaky_relu'}},
  'optimizer': 'adam',
  'optimizer_parameters': {'active': 

### Loading the data
For snapshots existent in the simulation output files, the `fenics_h5` and `libmesh_h5` files. Also, loading `.npy` is possible. If data is available in any other file type, the pipeline can be used as long as the snapshots are stacked on a $n \times m$ matrix, where `n` the spatial discretization of the vector and `m` is the number of snapshots.

In [25]:
filenames, snapshots = snapshots_assembly(params["snapshots"])

file_type_str numpy


In [None]:
# TODO: jogar no google docs
# TODO: Save e load do modelo
# TODO: Plotters e cálculos de erros devem ser classe?
# TODO: plots de erro estão errados
# TODO: surrogate
# TODO: create_pipeline script -> create folders, gitkeeps, headers, notebooks and scripts with preambles