# Execute a workflow

A workflow is used to quantitatively evaluate an algorithm on a large set of time series. Here we show how to start a workflow from the code. Notice that all the configurations are dictionaries. Because of this, it is possible to pass a path to a `.json` file, which contains the configuration. More information can be found in the [documentation](https://u0143709.pages.gitlab.kuleuven.be/dtaianomaly/getting_started/large_scale_experiments.html).

In [1]:
from dtaianomaly.workflow import execute_algorithms
from dtaianomaly.data_management import DataManager
from dtaianomaly.anomaly_detection import PyODAnomalyDetector, Windowing

First we need to specify which time series to use. It is possible to select time series with specific features (e.g., at least 5 attributes). In particular, we select all datasets from the Demo-collection here. Additionally, we also need a `DataManager` to effectively read the data. 

In [2]:
data_manager = DataManager('../data/datasets.csv')
data_configuration = {
    'select': [
        {'collection_name': 'Demo'}
    ]
}

An algorithm configuration can be either a dictionary, which is then passed to the corresponding `load()` function of the correct anomaly detector, or it is a `TimeSeriesAnomalyDetector` object.  

In [3]:
anomaly_detector = PyODAnomalyDetector('IForest', Windowing(64))
anomaly_detector_name = 'IForest-window-size-64'

The metric configuration dictates which metrics to be computed. If the metric has certain parameters, then these can be passed under the `"metric_parameters"` key. If the metric can not cope with reel anomaly scores, some thresholding should be applied. This can be done by giving the `"thresholding_strategy"` and `"thresholding_parameters"` properties.

In [4]:
metric_configuration = {
    "roc_vus": { },
    "pr_vus": { },
    "fbeta": {
        # We do not need to provide the 'metric_parameters', because the default value for beta is 1
        "thresholding_strategy": "contamination",
        "thresholding_parameters": {
            "contamination": 0.1
        }
    },
    "fbeta_05": {
        "metric_name": "fbeta",
        "metric_parameters": {
            "beta": 0.5
        },
        "thresholding_strategy": "contamination",
        "thresholding_parameters": {
            "contamination": 0.1
        }
    },
    "fbeta_2": {
        "metric_name": "fbeta",
        "metric_parameters": {
            "beta": 2.0
        },
        "thresholding_strategy": "contamination",
        "thresholding_parameters": {
            "contamination": 0.1
        }
    }
}

Lastly, an output configuration is required. This is not important for algorithm execution itself, but rather for having an indication of what's happening while the workflow is happening. 

In [5]:
output_configuration = {
  "directory_path": "test_workflow",
  "verbose": True,

  "trace_time": True,
  "trace_memory": True,

  "print_results": False,
  "save_results": False,
  "results_file": "results.csv",

  "save_anomaly_scores_plot": True,
  "anomaly_scores_directory": "anomaly_score_plots",
  "anomaly_scores_file_format": "svg",
  "show_anomaly_scores": "overlay",
  "show_ground_truth": None,

  "invalid_train_type_raise_error": True
}

Now, we can execute the workflow simply as follows. 

In [6]:
execute_algorithms(
    data_manager,
    data_configuration, 
    (anomaly_detector, anomaly_detector_name),
    metric_configuration,
    output_configuration
)

>>> Starting the workflow for IForest-window-size-64
>>> Iterating over the datasets
Total number of datasets: 4
>>> Handling dataset '('Demo', 'Demo1')'
>> Checking  algorithm-dataset compatibility
>> Loading the train data
Using **test** data but no labels for unsupervised algorithm
>> Loading the test data
>> Setting the seed to '0'
>> Fitting the algorithm
>> Predicting the decision scores on the test data
>> Storing the results
Computing the evaluation metrics metrics
Computing the evaluation metric 'roc_vus'




Evaluation: '0.9917555354175803'
Computing the evaluation metric 'pr_vus'
Evaluation: '0.9867305202782413'
Computing the evaluation metric 'fbeta'
Evaluation: '0.4946236559139785'
Computing the evaluation metric 'fbeta_05'
Evaluation: '0.3795379537953795'
Computing the evaluation metric 'fbeta_2'
Evaluation: '0.7098765432098766'
Saving the timing information
Saving the memory usage
>> Saving the anomaly score plot
path: test_workflow/IForest-window-size-64/anomaly_score_plots/demo_demo1.svg
format: svg
show_anomaly_scores: overlay
show_ground_truth: None
>>> Handling dataset '('Demo', 'Demo2')'
>> Checking  algorithm-dataset compatibility
>> Loading the train data
Using **test** data but no labels for unsupervised algorithm
>> Loading the test data
>> Setting the seed to '0'
>> Fitting the algorithm
>> Predicting the decision scores on the test data




>> Storing the results
Computing the evaluation metrics metrics
Computing the evaluation metric 'roc_vus'
Evaluation: '0.9884359557389997'
Computing the evaluation metric 'pr_vus'
Evaluation: '0.9242428800732871'
Computing the evaluation metric 'fbeta'
Evaluation: '0.3216783216783217'
Computing the evaluation metric 'fbeta_05'
Evaluation: '0.22862823061630222'
Computing the evaluation metric 'fbeta_2'
Evaluation: '0.5424528301886793'
Saving the timing information
Saving the memory usage
>> Saving the anomaly score plot
path: test_workflow/IForest-window-size-64/anomaly_score_plots/demo_demo2.svg
format: svg
show_anomaly_scores: overlay
show_ground_truth: None
>>> Handling dataset '('Demo', 'Demo3')'
>> Checking  algorithm-dataset compatibility
>> Loading the train data
Using **test** data but no labels for unsupervised algorithm
>> Loading the test data
>> Setting the seed to '0'
>> Fitting the algorithm
>> Predicting the decision scores on the test data
>> Storing the results
Computin



Evaluation: '0.9898335903419229'
Computing the evaluation metric 'pr_vus'
Evaluation: '0.9546554542183916'
Computing the evaluation metric 'fbeta'
Evaluation: '0.7267904509283819'
Computing the evaluation metric 'fbeta_05'
Evaluation: '0.6244302643573383'
Computing the evaluation metric 'fbeta_2'
Evaluation: '0.8692893401015228'
Saving the timing information
Saving the memory usage
>> Saving the anomaly score plot
path: test_workflow/IForest-window-size-64/anomaly_score_plots/demo_demo3.svg
format: svg
show_anomaly_scores: overlay
show_ground_truth: None
>>> Handling dataset '('Demo', 'Demo4')'
>> Checking  algorithm-dataset compatibility
>> Loading the train data
Using **test** data but no labels for unsupervised algorithm
>> Loading the test data
>> Setting the seed to '0'
>> Fitting the algorithm
>> Predicting the decision scores on the test data




>> Storing the results
Computing the evaluation metrics metrics
Computing the evaluation metric 'roc_vus'
Evaluation: '0.933479342893215'
Computing the evaluation metric 'pr_vus'
Evaluation: '0.7095242887367348'
Computing the evaluation metric 'fbeta'
Evaluation: '0.5061425061425061'
Computing the evaluation metric 'fbeta_05'
Evaluation: '0.45696539485359355'
Computing the evaluation metric 'fbeta_2'
Evaluation: '0.567180616740088'
Saving the timing information
Saving the memory usage
>> Saving the anomaly score plot
path: test_workflow/IForest-window-size-64/anomaly_score_plots/demo_demo4.svg
format: svg
show_anomaly_scores: overlay
show_ground_truth: None
>>> Formatting the results of the individual datasets


{'IForest-window-size-64':                              Seed   roc_vus    pr_vus     fbeta  fbeta_05  \
 collection_name dataset_name                                                
 Demo            Demo1         0.0  0.991756  0.986731  0.494624  0.379538   
                 Demo2         0.0  0.988436  0.924243  0.321678  0.228628   
                 Demo3         0.0  0.989834  0.954655   0.72679   0.62443   
                 Demo4         0.0  0.933479  0.709524  0.506143  0.456965   
 
                                fbeta_2 Time fit (s) Time predict (s)  \
 collection_name dataset_name                                           
 Demo            Demo1         0.709877      0.72391          0.16528   
                 Demo2         0.542453      2.33693          0.64618   
                 Demo3         0.869289      0.88207          0.23113   
                 Demo4         0.567181      0.80929          0.26188   
 
                              Peak memory fit (KiB) Peak memory 

The workflow has been saved, but for now we will remove the results to clean up the directory.

In [7]:
import shutil
shutil.rmtree(output_configuration['directory_path'])