# Quantitative evaluation with a workflow

It is crucial to qualitatively evaluate the performance of anomaly detectors to know their capabilities. For this, ``dtaianomaly`` offers the ``Workflow``: detect anomalies in a large set of time series using various detectors, and to measure their performance using multiple evaluation criteria. The ``Workflow`` facilitates the validation of the anomaly detectors, because you only need to define the different components.

There are two ways to run a ``Workflow`` from Python or from a configuration file.

> You can also evaluate [custom components](https://dtaianomaly.readthedocs.io/en/stable/getting_started/custom_models.html) in ``dtaianomaly`` via a ``Workflow`` in Python. However, this is not possible via a configuration file without extending the functionality of the workflow_from_config function!

In [1]:
from dtaianomaly.workflow import Workflow
from dtaianomaly.data import UCRLoader
from dtaianomaly.evaluation import AreaUnderROC, AreaUnderPR, Precision
from dtaianomaly.thresholding import TopN, FixedCutoff
from dtaianomaly.preprocessing import Identity, StandardScaler, ChainedPreprocessor, MovingAverage, ExponentialMovingAverage
from dtaianomaly.anomaly_detection import LocalOutlierFactor, IsolationForest

## Run a workflow from Python

Here we will initialize the different components to evaluate in the ``Workflow``. We start by creating a list of ``LazyDataLoader`` objects. We manually selected two time series to use for evaluation, but alternatively you can use all datasets in some directory using the ``from_directory()`` method in the data module. 

In [2]:
dataloaders = [
    UCRLoader('../data/UCR-time-series-anomaly-archive/001_UCR_Anomaly_DISTORTED1sddb40_35000_52000_52620.txt'),
    UCRLoader('../data/UCR-time-series-anomaly-archive/002_UCR_Anomaly_DISTORTED2sddb40_35000_56600_56900.txt')
]

Next, we initialize a number of ``Preprocessor``s. Below, we create 4 preprocessors to analyze the effect of Z-normalization combined with smoothing. We also add the ``Identity`` preprocessor, to analyze what happens if no preprocessing is done. 

In [3]:
preprocessors = [
    Identity(),
    StandardScaler(),
    ChainedPreprocessor([MovingAverage(10), StandardScaler()]),
    ChainedPreprocessor([ExponentialMovingAverage(0.8), StandardScaler()])
]

We will now initialize our anomaly detectors. Each anomaly detector will be combined with each preprocessor, and applied to each time series. 

In [4]:
detectors = [LocalOutlierFactor(50), IsolationForest(50)]

Finally, we need to define the ``Metric``s used to evaluate the models. Both ``BinaryMetric`` and ``ProbaMetric`` can be provided. However, the workflow evaluates the scores obtained by the ``predict_proba`` method of the ``BaseDetector``. To evaluate a ``BinaryMetric``, a number of thresholding strategies must be provided to convert the continuous anomaly probabilities to discrete anomaly labels. Each thresholding strategy is combined with each thresholding metric. The thresholds have no effect on the ``ProbaMetric``s.

> To save on computational resources, the anomaly detector is used once to detect anomalies in a time series, and the predicted anomaly scores are used to evaluate all anomaly scores. This means that there is no computational overhead on providing more metrics, besides the resources required to compute the metric. 

In [5]:
thresholds = [TopN(20), FixedCutoff(0.5)]
metrics = [Precision(), AreaUnderPR(), AreaUnderROC()]

Once all components are defined, we initialize the ``Workflow``. We also define additional parameters, such ``n_jobs``, to allow for multiple anomaly detectors to detect anomalies in parallel. Then, we can execute the workflow by calling the ``run()`` method, which returns a dataframe with the results. 

In [6]:
workflow = Workflow(
    dataloaders=dataloaders,
    metrics=metrics,
    thresholds=thresholds,
    preprocessors=preprocessors,
    detectors=detectors,
    n_jobs=4
)
workflow.run()

Unnamed: 0,Dataset,Detector,Preprocessor,Runtime Fit [s],Runtime Predict [s],Runtime [s],TopN(n=20)->Precision(),FixedCutoff(cutoff=0.5)->Precision(),AreaUnderPR(),AreaUnderROC()
0,UCRLoader(path='../data/UCR-time-series-anomal...,LocalOutlierFactor(window_size=50),Identity(),3.49525,5.416694,8.911944,1.0,0.520376,0.352438,0.818397
1,UCRLoader(path='../data/UCR-time-series-anomal...,IsolationForest(window_size=50),Identity(),1.508298,1.311756,2.820054,1.0,0.178232,0.320528,0.727579
2,UCRLoader(path='../data/UCR-time-series-anomal...,LocalOutlierFactor(window_size=50),StandardScaler(),3.659286,5.280546,8.939832,1.0,0.520376,0.352438,0.818397
3,UCRLoader(path='../data/UCR-time-series-anomal...,IsolationForest(window_size=50),StandardScaler(),1.673083,1.71311,3.386193,1.0,0.157045,0.309004,0.729511
4,UCRLoader(path='../data/UCR-time-series-anomal...,LocalOutlierFactor(window_size=50),MovingAverage(window_size=10)->StandardScaler(),6.280157,7.82056,14.100716,0.1,0.505415,0.296962,0.824632
5,UCRLoader(path='../data/UCR-time-series-anomal...,IsolationForest(window_size=50),MovingAverage(window_size=10)->StandardScaler(),4.113696,3.303394,7.41709,1.0,0.172599,0.320814,0.735145
6,UCRLoader(path='../data/UCR-time-series-anomal...,LocalOutlierFactor(window_size=50),ExponentialMovingAverage(alpha=0.8)->StandardS...,2.389394,5.630403,8.019797,0.0,0.336207,0.204116,0.822224
7,UCRLoader(path='../data/UCR-time-series-anomal...,IsolationForest(window_size=50),ExponentialMovingAverage(alpha=0.8)->StandardS...,1.280379,1.097336,2.377714,1.0,0.196994,0.316824,0.737015
8,UCRLoader(path='../data/UCR-time-series-anomal...,LocalOutlierFactor(window_size=50),Identity(),4.431303,3.844468,8.27577,0.0,0.0,0.009798,0.473974
9,UCRLoader(path='../data/UCR-time-series-anomal...,IsolationForest(window_size=50),Identity(),1.512294,1.126129,2.638423,0.0,0.0,0.003503,0.110995


## Run a workflow from a configuration file

Alternatively, you can define a workflow using JSON configuration files. The file [Config.json]() illustrates how the workflow defined above can be written as a configuration file. More details regarding the syntax are provided below. Using the ``workflow_from_config`` method, you can pass the path to a configuration file to create the corresponding ``Workflow``, as shown in the example below. Then, you can run the ``Workflow`` via the ``run()`` function. 
 

In [7]:
from dtaianomaly.workflow import workflow_from_config
workflow = workflow_from_config("Config.json")
workflow.run()

Unnamed: 0,Dataset,Detector,Preprocessor,Runtime Fit [s],Runtime Predict [s],Runtime [s],TopN(n=20)->Precision(),FixedCutoff(cutoff=0.5)->Precision(),AreaUnderPR(),AreaUnderROC()
0,UCRLoader(path='../data/UCR-time-series-anomal...,LocalOutlierFactor(window_size=50),Identity(),3.085908,7.11976,10.205668,1.0,0.520376,0.352438,0.818397
1,UCRLoader(path='../data/UCR-time-series-anomal...,IsolationForest(window_size=50),Identity(),1.43326,1.283798,2.717058,1.0,0.173035,0.300732,0.720613
2,UCRLoader(path='../data/UCR-time-series-anomal...,LocalOutlierFactor(window_size=50),StandardScaler(),3.10317,7.654244,10.757415,1.0,0.520376,0.352438,0.818397
3,UCRLoader(path='../data/UCR-time-series-anomal...,IsolationForest(window_size=50),StandardScaler(),1.440609,1.300194,2.740803,1.0,0.186493,0.309487,0.72876
4,UCRLoader(path='../data/UCR-time-series-anomal...,LocalOutlierFactor(window_size=50),MovingAverage(window_size=10)->StandardScaler(),8.44237,14.4951,22.937471,0.1,0.505415,0.296962,0.824632
5,UCRLoader(path='../data/UCR-time-series-anomal...,IsolationForest(window_size=50),MovingAverage(window_size=10)->StandardScaler(),4.782272,6.292908,11.075181,1.0,0.179104,0.306882,0.731224
6,UCRLoader(path='../data/UCR-time-series-anomal...,LocalOutlierFactor(window_size=50),ExponentialMovingAverage(alpha=0.8)->StandardS...,4.540011,10.203604,14.743615,0.0,0.336207,0.204116,0.822224
7,UCRLoader(path='../data/UCR-time-series-anomal...,IsolationForest(window_size=50),ExponentialMovingAverage(alpha=0.8)->StandardS...,2.102705,1.98676,4.089464,1.0,0.174528,0.328237,0.732513
8,UCRLoader(path='../data/UCR-time-series-anomal...,LocalOutlierFactor(window_size=50),Identity(),7.950846,8.083394,16.03424,0.0,0.0,0.009798,0.473974
9,UCRLoader(path='../data/UCR-time-series-anomal...,IsolationForest(window_size=50),Identity(),3.366484,2.010452,5.376936,0.0,0.0,0.003464,0.09725


A configuration file is build from different entries, with each entry representing a component of the ``Workflow``. These entries are build as follows:
```json
    { 'type': <name-of-component>, 'optional-param': <value-optional-parameter>}
```
The ``'type'`` equals the name of the component, for example ``'LocalOutlierFactor'`` or ``'StandardScaler'``. This string must exactly match the object name of the component you want to add to the workflow. In addition, it is possible to define hyperparameters of the component. For example for ``'LocalOutlierFactor'``, you must define a
``'window_size'``, but can optionally also define a ``'stride'``. An error will be raised if the entry has missing obligated parameters or unknown parameters.

The configuration file itself is also a dictionary, in JSON format. The keys of this dictionary correspond to the parameters of the ``Workflow``. The corresponding values can be either a single entry (if one component is requested) or a list of entries (if multiple components are requested).

Below, we show a simplified version of the configuration in [Config.json](). 

```json
{
  "dataloaders": {
    "type": "UCRLoader",
    "path":"../data/UCR-time-series-anomaly-archive/001_UCR_Anomaly_DISTORTED1sddb40_35000_52000_52620.txt"
  },
  "metrics": [{"type": "Precision"}, {"type": "AreaUnderPR"}],
  "thresholds": {"type": "FixedCutoff", "cutoff": 0.5},
  "preprocessors": {"type": "StandardScaler"},
  "detectors": [
    {"type": "LocalOutlierFactor", "window_size": 50},
    {"type": "IsolationForest", "window_size": 50}
  ],
  "n_jobs": 4
}
```