# Getting Started with the `pyCLAD` library

In [1]:
import pyclad

ModuleNotFoundError: No module named 'pyclad'

# Example usage
In this example, we will show how to use the `pyCLAD` library with a simple synthetic dataset to demonstrate the functionality of the library.

## Dataset creation
We start with creating a synthetic dataset with three concepts, each having training and testing data. The `Concept` class is used to represent each concept.

In [None]:
import numpy as np
from pyclad.data.concept import Concept

concept1_train = Concept("concept1", data=np.random.rand(100, 10))
concept1_test = Concept("concept1", data=np.random.rand(100, 10), labels=np.random.randint(0, 2, 100))

concept2_train = Concept("concept2", data=np.random.rand(100, 10))
concept2_test = Concept("concept2", data=np.random.rand(100, 10), labels=np.random.randint(0, 2, 100))

concept3_train = Concept("concept3", data=np.random.rand(100, 10))
concept3_test = Concept("concept3", data=np.random.rand(100, 10), labels=np.random.randint(0, 2, 100))


In [None]:
from pyclad.data.datasets.concepts_dataset import ConceptsDataset

dataset = ConceptsDataset(
    name="GeneratedDataset",
    train_concepts=[concept1_train, concept2_train, concept3_train],
    test_concepts=[concept1_test, concept2_test, concept3_test],
)


## Model
The next step is to define a model that will be used for anomaly detection. In this example, we will use the `OneClassSVM` model from the `pyod` library, which is adapted to work with the `pyCLAD` library. `pyCLAD` provides adapters for various models, allowing you to use them seamlessly within the library's framework. It is also possile to implement your own model if needed.

In [None]:
from pyclad.models.adapters.pyod_adapters import OneClassSVMAdapter

model = OneClassSVMAdapter()

## Strategy

The next step is defining a continual learning strategy that is responsible for managing the learning process across different concepts and ensuring that the model can adapt to new data without forgetting previous knowledge. In this example, we will use a simple cumulative strategy, which trains the model on all available data from previous concepts.

In [None]:
from pyclad.strategies.baselines.cumulative import CumulativeStrategy

strategy = CumulativeStrategy(model)

## Callbacks definition

`pyCLAD` follows a callback-based architecture, allowing you to monitor and evaluate the model's performance during training and testing. You can define custom callbacks or use built-in ones to track metrics, log information, and visualize results. In this example, we will use the `RocAuc` metric to evaluate the model's performance on each concept. We also define `TimeEvaluationCallback` to measure the execution time.

In [None]:
from pyclad.metrics.continual.backward_transfer import BackwardTransfer
from pyclad.metrics.continual.average_continual import ContinualAverage
from pyclad.metrics.base.roc_auc import RocAuc
from pyclad.metrics.continual.forward_transfer import ForwardTransfer
from pyclad.callbacks.evaluation.concept_metric_evaluation import ConceptMetricCallback
from pyclad.callbacks.evaluation.time_evaluation import TimeEvaluationCallback

time_callback = TimeEvaluationCallback()
metric_callback = ConceptMetricCallback(
    base_metric=RocAuc(), metrics=[ContinualAverage(), BackwardTransfer(), ForwardTransfer()]
)


## Scenario creation & execution

The scenario class is responsible for orchestrating the entire process, including dataset loading, model training, and evaluation. You can create a scenario by providing the dataset, strategy, and callbacks. The scenario handles the execution process that depends on the selected scenario type. In this example, we will use the `ConceptAgnosticScenario`, which means that the model is not aware of the concept boundaries.

In [None]:
from pyclad.scenarios.concept_agnostic import ConceptAgnosticScenario

scenario = ConceptAgnosticScenario(dataset=dataset, strategy=strategy, callbacks=[metric_callback, time_callback])
scenario.run()

## See the results from callbacks
We can inspect the results of the callbacks to see how the model performed on each concept and how the metrics evolved over time. All callbacks provide an `info()` method that prints the results in a dict-like format.

In [None]:
metric_callback.info()

{'concept_metric_callback_ROC-AUC': {'base_metric_name': 'ROC-AUC',
  'metrics': {'ContinualAverage': np.float64(0.48260726512827357),
   'BackwardTransfer': np.float64(-0.001481481481481417),
   'ForwardTransfer': np.float64(0.49327286470143616)},
  'concepts_order': ['concept1', 'concept2', 'concept3'],
  'metric_matrix': defaultdict(dict,
              {'concept1': {'concept1': 0.5268686868686868,
                'concept2': 0.3777777777777778,
                'concept3': 0.5474189675870349},
               'concept2': {'concept1': 0.5252525252525253,
                'concept2': 0.38383838383838376,
                'concept3': 0.5546218487394958},
               'concept3': {'concept1': 0.5216161616161616,
                'concept2': 0.3846464646464647,
                'concept3': 0.5534213685474189}})}}

In [None]:
time_callback.info()

{'time_evaluation_callback': {'time_by_concept': defaultdict(<function pyclad.callbacks.evaluation.time_evaluation.TimeEvaluationCallback.__init__.<locals>.<lambda>()>,
              {'concept1': {'train_time': 0.0013430118560791016,
                'eval_time': 0.003213167190551758},
               'concept2': {'train_time': 0.0010581016540527344,
                'eval_time': 0.0028150081634521484},
               'concept3': {'train_time': 0.0019419193267822266,
                'eval_time': 0.0026607513427734375}}),
  'train_time_total': 0.0043430328369140625,
  'eval_time_total': 0.008688926696777344}}

## Write results to file
We can leverage the `JsonOutputWriter` to save the results of the scenario execution saved in the callbacks, as well as an additional information about the model, dataset, and strategy. This allows for easy sharing and reproducibility of the results.

In [None]:
import pathlib
from pyclad.output.json_writer import JsonOutputWriter

# Save the results
output_writer = JsonOutputWriter(pathlib.Path("output.json"))
output_writer.write([model, dataset, strategy, metric_callback, time_callback])


# UNSW Dataset Example
We can also leverage real-world datasets to test the functionality of the `pyCLAD` library. In this example, we will use the UNSW dataset, which is a well-known dataset for anomaly detection tasks. The dataset is adapted to continual anomaly detection. `pyCLAD` provides a few datasets out of the box, including the UNSW dataset. You can find more datasets in the `pyclad.data.datasets` module. It is also possible to implement your own dataset by following the structure of the existing datasets (see [docs](https://pyclad.readthedocs.io/en/latest/) for more details).

In [None]:
import logging
import pathlib

from pyod.models.vae import VAE

from pyclad.callbacks.evaluation.concept_metric_evaluation import ConceptMetricCallback
from pyclad.callbacks.evaluation.memory_usage import MemoryUsageCallback
from pyclad.callbacks.evaluation.time_evaluation import TimeEvaluationCallback
from pyclad.data.datasets.unsw_dataset import UnswDataset
from pyclad.metrics.base.roc_auc import RocAuc
from pyclad.metrics.continual.average_continual import ContinualAverage
from pyclad.metrics.continual.backward_transfer import BackwardTransfer
from pyclad.metrics.continual.forward_transfer import ForwardTransfer
from pyclad.models.adapters.pyod_adapters import PyODAdapter
from pyclad.output.json_writer import JsonOutputWriter
from pyclad.scenarios.concept_aware import ConceptAwareScenario
from pyclad.strategies.replay.buffers.adaptive_balanced import (
    AdaptiveBalancedReplayBuffer,
)
from pyclad.strategies.replay.replay import ReplayEnhancedStrategy
from pyclad.strategies.replay.selection.random import RandomSelection

logging.basicConfig(level=logging.INFO, handlers=[logging.FileHandler("debug.log"), logging.StreamHandler()])

"""
This example showcase how to run a concept aware scenario using the UNSW dataset adopted to continual anomaly
detection using the method proposed here <https://github.com/lifelonglab/lifelong-anomaly-detection-scenarios>
"""
dataset = UnswDataset(dataset_type="random_anomalies")
model = PyODAdapter(
    VAE(
        encoder_neuron_list=[32, 24, 16],
        decoder_neuron_list=[16, 24, 32],
        latent_dim=8,
        epoch_num=20,
        preprocessing=False,
    ),
    model_name="VAE",
)
replay_buffer = AdaptiveBalancedReplayBuffer(selection_method=RandomSelection(), max_size=1000)
strategy = ReplayEnhancedStrategy(model, replay_buffer)
callbacks = [
    ConceptMetricCallback(
        base_metric=RocAuc(),
        metrics=[ContinualAverage(), BackwardTransfer(), ForwardTransfer()],
    ),
    TimeEvaluationCallback(),
    MemoryUsageCallback(),
]
scenario = ConceptAwareScenario(dataset, strategy=strategy, callbacks=callbacks)
scenario.run()

output_writer = JsonOutputWriter(pathlib.Path("output-unsw.json"))
output_writer.write([model, dataset, strategy, *callbacks])


  from .autonotebook import tqdm as notebook_tqdm
INFO:pyclad.scenarios.concept_aware:Starting training on concept Cluster_0
Training: 100%|██████████| 20/20 [00:17<00:00,  1.14it/s]
INFO:pyclad.scenarios.concept_aware:Starting evaluation of concept Cluster_0
INFO:pyclad.scenarios.concept_aware:Starting evaluation of concept Cluster_1
INFO:pyclad.scenarios.concept_aware:Starting evaluation of concept Cluster_2
INFO:pyclad.scenarios.concept_aware:Starting evaluation of concept Cluster_3
INFO:pyclad.scenarios.concept_aware:Starting evaluation of concept Cluster_4
INFO:pyclad.scenarios.concept_aware:Starting evaluation of concept Cluster_5
INFO:pyclad.scenarios.concept_aware:Starting evaluation of concept Cluster_6
INFO:pyclad.scenarios.concept_aware:Starting evaluation of concept Cluster_7
INFO:pyclad.scenarios.concept_aware:Starting evaluation of concept Cluster_8
INFO:pyclad.scenarios.concept_aware:Starting evaluation of concept Cluster_9
INFO:pyclad.scenarios.concept_aware:Starting tr

KeyboardInterrupt: 

# Try it yourself!
In this section, we will guide you through the process of running experiments with different datasets, models, and strategies using the `pyCLAD` library. You can follow the steps below to set up your own experiments and analyze the results. You will experiment with two strategies on the same dataset to compare their performance. This will help you understand how different strategies affect the performance of the model on the same dataset.

In [None]:
# Imports

import pathlib

# Datasets
from pyclad.data.datasets.unsw_dataset import UnswDataset
from pyclad.data.datasets.nsl_kdd_dataset import NslKddDataset
from pyclad.data.datasets.wind_energy_dataset import WindEnergyDataset
from pyclad.data.datasets.energy_plants_dataset import EnergyPlantsDataset


# Scenarios
from pyclad.scenarios.concept_aware import ConceptAwareScenario

# Models
from pyclad.models.adapters.pyod_adapters import IsolationForestAdapter, OneClassSVMAdapter, LocalOutlierFactorAdapter, PyODAdapter

# Strategies
from pyclad.strategies.baselines.cumulative import CumulativeStrategy
from pyclad.strategies.baselines.naive import NaiveStrategy
from pyclad.strategies.replay.replay import ReplayEnhancedStrategy

# Additional imports for replay strategies
from pyclad.strategies.replay.buffers.adaptive_balanced import (
    AdaptiveBalancedReplayBuffer,
)
from pyclad.strategies.replay.selection.random import RandomSelection


# Callback and metrics
from pyclad.callbacks.evaluation.concept_metric_evaluation import ConceptMetricCallback
from pyclad.callbacks.evaluation.memory_usage import MemoryUsageCallback
from pyclad.callbacks.evaluation.time_evaluation import TimeEvaluationCallback
from pyclad.metrics.base.roc_auc import RocAuc
from pyclad.metrics.continual.average_continual import ContinualAverage
from pyclad.metrics.continual.backward_transfer import BackwardTransfer
from pyclad.metrics.continual.forward_transfer import ForwardTransfer
from pyclad.output.json_writer import JsonOutputWriter


Select the dataset that you want to use.

In [None]:
dataset = ...  # Replace with your dataset, e.g., `UnswDataset(dataset_type="random_anomalies")`

Select the base model that you want to use. You can choose from the available models in `pyCLAD` or implement your own model.

In [None]:
model = ...  # Replace with your model, e.g., `IsolationForestAdapter()`

Select the strategy that you want to use. You can choose from the available strategies in `pyCLAD` or implement your own strategy.

In [None]:
strategy = ...  # Replace with your strategy, e.g., `CumulativeStrategy(model)`

Let's run the experiment with the selected dataset, model, and strategy. We will also use the `ConceptMetricCallback` to evaluate the model's performance on each concept and the `TimeEvaluationCallback` to measure the execution time. The results will be saved to a JSON file (`output-strategy1.json`) using the `JsonOutputWriter`.

In [None]:
# Run the experiment
callbacks = [
    ConceptMetricCallback(
        base_metric=RocAuc(),
        metrics=[ContinualAverage(), BackwardTransfer(), ForwardTransfer()],
    ),
    TimeEvaluationCallback(),
    MemoryUsageCallback(),
]
scenario = ConceptAwareScenario(dataset, strategy=strategy, callbacks=callbacks)
scenario.run()

output_writer = JsonOutputWriter(pathlib.Path("output-strategy1.json"))
output_writer.write([model, dataset, strategy, *callbacks])



AttributeError: 'ellipsis' object has no attribute 'train_concepts'

Let's recreate the model and select another strategy to compare the results with the previous experiment. This will help you understand how different strategies affect the performance of the model on the same dataset.

In [None]:
model = ... # Use the same model as before

Select a different strategy to compare the results with the previous experiment.

In [None]:
strategy = ...  # Replace with your strategy, e.g., `CumulativeStrategy(model)`

 Now, let's run the experiment again with the new strategy. The results will be saved in `output-strategy2.json` file.

In [None]:
# Run the experiment
callbacks = [
    ConceptMetricCallback(
        base_metric=RocAuc(),
        metrics=[ContinualAverage(), BackwardTransfer(), ForwardTransfer()],
    ),
    TimeEvaluationCallback(),
    MemoryUsageCallback(),
]
scenario = ConceptAwareScenario(dataset, strategy=strategy, callbacks=callbacks)
scenario.run()

output_writer = JsonOutputWriter(pathlib.Path("output-strategy2.json"))
output_writer.write([model, dataset, strategy, *callbacks])

You can analyze the results from both experiments by inspecting the output files (`output-strategy1.json` and `output-strategy2.json`). This will help you understand how different strategies affect the performance of the model on the same dataset. You can focus on the summarized metrics such as `ContinualAverage`, `BackwardTransfer`, and `ForwardTransfer` to compare the performance of the model across different strategies. You can find them in the `concept_metric_callback_ROC-AUC` -> `metrics` section of the output files.

# Extend the `pyCLAD` library :)
You can try the `pyCLAD` library with your own datasets and models. The library is designed to be flexible and extensible, allowing you to adapt it to your specific needs. You can also contribute to the library by implementing new models, strategies, or datasets. Check out the [documentation](https://pyclad.readthedocs.io/en/latest/) for more information on how to get started.