## Path variables and names
- WD: the working directory where the input data is and the results will be stored
- name: a specific name for the results, such that they can be distinguished from other runs
- data: what the zipped file with the input data will be named or if it already exists what it is called

## Training parameters
Hyperparameters to set the model dimensions:
- length: int or None. Length of the input time-series. If the input is multivariate, each channel will have the specified length. Setting it to a smaller value than the actual length of the trajectories can be used for data augmentation (see RandomCrop). If None, automatically detects the longest common length across al trajectories.
- nclass: int or None. Number of output classes. If None, automatically detects the number of unique values in the class column of the dataset.
- nfeatures: int, size of the input representation before the output layer. This also corresponds to the number of filters in the last convolution layer. Usually between 5 and 15.

Hyperparameters for training:
- nepochs: int, number of training epochs. Increase this if the loss curves are still decreasing at the end of the training. The range is very variable, but can be over hundreds.
- batch: int, number of samples per batch. Conventionally, this is set to a power of 2.
- L2_reg: float, L2 regularization factor. This helps to prevent overfitting by penalizing large weights in the model parameters. In general try to always have a mild regularization, say 1e-3. Increase if you face overfitting issues, decrease if you face underfitting.
- lr: float, initial learning rate. This is the most important parameter to tweak to have a smooth learning curve. 1e-2 is usually a good starting value.

By default, the learning rate is scheduled to decrease by a factor gamma at fixed epochs.
- lr_decrease_schedule: list of integers or None, epochs at which to decrease the learning_rate. If None, the number of epochs will be evenly divided in bins of the same length. 
- lr_decrease_factor: float, factor by which the lr is multiplied at the specified epochs. Usually between 0.1 (divide by 10) and 0.5 (divide by 2).

Other parameters:
- ngpu: int, number of GPU used to perform the training. -1 means use all; 0 means use CPU only.
- ncpu_LoadData: int, number of CPU cores used to load and preprocess the data that is passed to the network. This parameter is unrelated to `ngpu`: the model can be trained on GPU, while the data are loaded with CPU. If set too low, this parameter can become a bottleneck and slow down the training process.

## Prototypes and PCA parameters
- batch: how many trajectories can be loaded at once, set as high as memory allows
- n_prototypes: number of prototypes for each group
- threshold_confidence: the lowest class probability that the least correlated trajectories can take
- perc_selected_ids: the percentage of point that should go into the PCAs of the subsets


## Motifs extraction parameters
- n_series_perclass: int, maximum number of series, per class, on which motif extraction is attempted.
- n_pattern_perseries: int, maximum number of motifs to extract out of a single trajectory.
- mode_series_selection: str one of ['top_confidence', 'least_correlated']. Mode to select the trajectories from which to extract the motifs (see Prototype analysis). If top confidence, the motifs might be heavily biased towards a representative subpopulation of the class. Hence, the output might not reflect the whole diversity of motifs induced by the class.
- extend_patt: int, by how many points to extend motifs? After binarization into 'relevant' and 'non-relevant time points', the motifs are usually fragmented because a few points in their middle are improperly classified as 'non-relevant'. This parameter allows to extend each fragment by a number of time points (in both time directions) before extracting the actual patterns.
- min_len_patt/max_len_patt: int, set minimum/maximum size of a motif. **/!\ The size is given in number of time-points. This means that if the input has more than one channel, the actual length of the motifs will be divided across them.** For example, a motif that spans over 2 channels for 10 time points will be considered of length 20.


In [None]:
import yaml
import os


config_path = './source'

# open the config
with open('./source/config.yaml', 'r') as file:
    config = yaml.safe_load(file)
    
    
# Set working directory/where data is
WD = './AKTH'
    
# Update values as needed
config['data'] = 'new_value_for_data'
config['name'] = 'new_value_for_name'
config['scripts'] = './source'

# Update values in the 'prep' section
config['prep']['training'] = 0.6
config['prep']['validation'] = 0.25
config['prep']['test'] = 0.15
config['prep']['seed'] = 2

# Update values in the 'training' section
config['training']['nclass'] = ''
config['training']['length'] = ''
config['training']['nfeatures'] = 10
config['training']['batch'] = 50
config['training']['lr'] = 0.01
config['training']['schedule'] = ''
config['training']['gamma'] = 0.01
config['training']['penalty'] = 0.001
config['training']['measurement'] = ''
config['training']['startTime'] = ''
config['training']['endTime'] = ''
config['training']['nepochs'] = 30
config['training']['ngpu'] = 0
config['training']['ncpuLoad'] = 8
config['training']['seed'] = 7

# Update values in the 'prototypes' section
config['prototypes']['seed'] = 7
config['prototypes']['batch'] = 2048
config['prototypes']['n_prototypes'] = 5
config['prototypes']['threshold_confidence'] = 0.75

# Update values in the 'pca' section
config['pca']['seed'] = 7
config['pca']['batch'] = 2048
config['pca']['perc_selected_ids'] = 0.1
config['pca']['threshold_confidence'] = 0.75

# Update values in the 'motif analysis' section
config['motif analysis']['seed'] = 7
config['motif analysis']['n_series_perclass'] = 10
config['motif analysis']['n_pattern_perseries'] = 10
config['motif analysis']['mode_series_selection'] = 'least_correlated'
config['motif analysis']['thresh_confidence'] = 0.75
config['motif analysis']['extend_patt'] = 0
config['motif analysis']['min_len_patt'] = 0
config['motif analysis']['max_len_patt'] = 200

    

# save the new config
with open(WD + '/config.yaml', 'w') as file:
        yaml.dump(config, file, default_flow_style=False)

print('Configurations sucessfully updated!')

In [None]:
os.system('snakemake -s \'./source/Snakefile.txt\' -d ' + WD + ' --configfile ' + WD + '/config.yaml' ' -c10 --latency-wait 120')