Only xStream could detect anomalous cases in the example #5

dangmanhtruong1995 · 2022-06-20T12:08:45Z

Hi, I tried different models based on example_usage.py but only xStream could detect anomalous cases, the other model either fail to run or does not predict any anomalous cases. Here is the code:

# Import modules.
from sklearn.utils import shuffle
from pysad.evaluation import AUROCMetric
from pysad.models import xStream
from pysad.models import xStream, ExactStorm, HalfSpaceTrees, IForestASD, KitNet, KNNCAD, LODA, LocalOutlierProbability, MedianAbsoluteDeviation, RelativeEntropy, RobustRandomCutForest, RSHash
from pysad.utils import ArrayStreamer
from pysad.transform.postprocessing import RunningAveragePostprocessor
from pysad.transform.preprocessing import InstanceUnitNormScaler
from pysad.transform.probability_calibration import ConformalProbabilityCalibrator, GaussianTailProbabilityCalibrator
from pysad.utils import Data
from tqdm import tqdm
import numpy as np
from pdb import set_trace

# This example demonstrates the usage of the most modules in PySAD framework.
if __name__ == "__main__":
    np.random.seed(61)  # Fix random seed.

    # Get data to stream.
    data = Data("data")
    X_all, y_all = data.get_data("arrhythmia.mat")
    X_all, y_all = shuffle(X_all, y_all)

    iterator = ArrayStreamer(shuffle=False)  # Init streamer to simulate streaming data.
    # set_trace()
    model = xStream()  # Init xStream anomaly detection model.
    # model = ExactStorm(window_size=25)
    # model = HalfSpaceTrees(feature_mins=np.zeros(X_all.shape[1]), feature_maxes=np.ones(X_all.shape[1]))
    # model = IForestASD()
    # model = KitNet(grace_feature_mapping =100, max_size_ae=100)
    # model = KNNCAD(probationary_period=10)
    # model = LODA()
    # model = LocalOutlierProbability()
    # model = MedianAbsoluteDeviation()
    # model = RelativeEntropy(min_val=0, max_val=1)
    # model = RobustRandomCutForest(num_trees=200)
    # model = RSHash(feature_mins=0, feature_maxes=1)
    
    preprocessor = InstanceUnitNormScaler()  # Init normalizer.
    postprocessor = RunningAveragePostprocessor(window_size=5)  # Init running average postprocessor.
    auroc = AUROCMetric()  # Init area under receiver-operating- characteristics curve metric.

    calibrator = GaussianTailProbabilityCalibrator(window_size=100)  # Init probability calibrator.
    idx = 0
    for X, y in tqdm(iterator.iter(X_all[100:], y_all[100:])):  # Stream data.
        X = preprocessor.fit_transform_partial(X)  # Fit preprocessor to and transform the instance.

        score = model.fit_score_partial(X)  # Fit model to and score the instance.        
        score = postprocessor.fit_transform_partial(score)  # Apply running averaging to the score.
        
        # print(score)
        auroc.update(y, score)  # Update AUROC metric.
        try:
            # set_trace()
            calibrated_score = calibrator.fit_transform(score)  # Fit & calibrate score.
        except:           
            calibrated_score = 0
            # set_trace()
        # set_trace()
        # Output if the instance is anomalous.
        if calibrated_score > 0.95:  # If probability of being normal is less than 5%.
            print(f"Alert: {idx}th data point is anomalous.")
            
        idx += 1

    # Output resulting AUROCS metric.
    # print("AUROC: ", auroc.get())

Does anyone know how to fix this problem ? Thank you very much.

The text was updated successfully, but these errors were encountered:

selimfirat · 2023-10-05T23:54:33Z

Sorry for a very late reply. I was able to reproduce the issue. Yet, it seems it is due to the chosen hyperparameters and model combination. When I changed algorithm to ExactStorm for example, I set window size of calibrator to 10 and removed postprocessor. Then it was finding anomalies.

RunningAveragePostprocessor probably averages out anomalies so they no longer seen as anomaly for the other algorithms. Besides, arrythmia is a very small dataset. That's another reason why methods do not work very welll, especially with this parameter setting.

Please feel free to reopen the issue if you have any further questions.

selimfirat closed this as completed Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only xStream could detect anomalous cases in the example #5

Only xStream could detect anomalous cases in the example #5

dangmanhtruong1995 commented Jun 20, 2022

selimfirat commented Oct 5, 2023

Only xStream could detect anomalous cases in the example #5

Only xStream could detect anomalous cases in the example #5

Comments

dangmanhtruong1995 commented Jun 20, 2022

selimfirat commented Oct 5, 2023