# Multirun experiment pipeline (HydroShoot)

The following notebook establishes a generalized pipeline for evaluating a computing reservoir against a given task, given multiple experimental runs of the same reservoir.


In [3]:
import pandas as pd
import numpy as np

from matplotlib import pyplot as plt
sys.path.insert(1, os.path.join(sys.path[0], '../../'))  # for importing local packages from src

### Loading the datasets

Currently we aqree loading the HydroShoot dataset generated during the first semester.

In [4]:
DATASET_PATH = '../datasets/hydroshoot_large_trimmed.csv'

In [5]:
from src.model.rc_dataset import ExperimentDataset

dataset = ExperimentDataset(csv_path=DATASET_PATH)
print(dataset)

Dataset properties:
	n_runs:      84
	n_steps:    168
	state_size: 360

Available targets: 
	input_Tac, input_u, input_hs, input_Rg, output_Rg, output_An, output_E, output_Tleaf

Available state variables: 
	state_An, state_E, state_Eabs, state_Ei, state_Flux, state_FluxC, state_Tlc, state_gb, state_gs, state_psi_head, state_u



### Defining targets and observed state variables

In [7]:
%reload_ext autoreload
%autoreload 2 

from model_config import targets, state_variables

print(f'Targets:')
for target in targets:
  print(f'\t- {target}')

print(f'\nState variables:')
for state_var in state_variables:
  print(f'\t- {state_var}')

Targets:
	- input_Tac
	- input_u
	- input_hs
	- input_Rg
	- output_Rg
	- output_An
	- output_E
	- output_Tleaf

State variables:
	- state_An
	- state_E
	- state_Eabs
	- state_Ei
	- state_Flux
	- state_FluxC
	- state_Tlc
	- state_gb
	- state_gs
	- state_psi_head
	- state_u


#### Data preprocessing, grouping and train-test splitting

Preprocessing performed:

In [9]:
from preprocessing import preprocess_data

print(preprocess_data.__doc__)


    Preprocessing performed: 

    1. The target signal for each run is computed.
        - Target and reservoir are cast into a ndarray.
    2. Target and reservoir signals are trimmed.
        - A warmup mask is applied to target and reservoir.
        - A night-time mask is applied to target and reservoir.
    3. Target and reservoir are rescaled to zero-mean and unit variance
        - Normalizing transform is fitted on the entire dataset of included experiment runs.
    


In [11]:
from src.learning.preprocessing import generate_mask


STATE_SIZE = 16                         # Sixteen random nodes are selected as reservoir readouts
RUN_IDS = np.arange(dataset.n_runs())   # All runs are used
WARMUP_STEPS = 4 * 24                   # First 4 days of each simulation are discarded
DAY_MASK = generate_mask(5, 21)         # All nighttime data between 5am and 9pm (inclusive) is discarded

Grouping strategy:

In [12]:
from learning import group_by_day

print(group_by_day.__doc__)

Simulation state from the same calendar day of simulation inputs, 
    across all runs, are grouped together per day. Shape of X is assumed to be (runs, time_steps, nodes)

    ```
    GROUP 1 | GROUP 2 | GROUP 3 | GROUP 4 | ...
    --------+---------+---------+---------+----
    sim1/d1  sim1/d2   sim1/d3   /         /
    /        sim2/d2   sim2/d3   sim2/d4   /       ...
    /        /         sim3/d3   sim3/d4   sim3/d5 
                                ...                ...
    ```
    
