Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only xStream could detect anomalous cases in the example #5

Closed
dangmanhtruong1995 opened this issue Jun 20, 2022 · 1 comment
Closed

Comments

@dangmanhtruong1995
Copy link

Hi, I tried different models based on example_usage.py but only xStream could detect anomalous cases, the other model either fail to run or does not predict any anomalous cases. Here is the code:

# Import modules.
from sklearn.utils import shuffle
from pysad.evaluation import AUROCMetric
from pysad.models import xStream
from pysad.models import xStream, ExactStorm, HalfSpaceTrees, IForestASD, KitNet, KNNCAD, LODA, LocalOutlierProbability, MedianAbsoluteDeviation, RelativeEntropy, RobustRandomCutForest, RSHash
from pysad.utils import ArrayStreamer
from pysad.transform.postprocessing import RunningAveragePostprocessor
from pysad.transform.preprocessing import InstanceUnitNormScaler
from pysad.transform.probability_calibration import ConformalProbabilityCalibrator, GaussianTailProbabilityCalibrator
from pysad.utils import Data
from tqdm import tqdm
import numpy as np
from pdb import set_trace

# This example demonstrates the usage of the most modules in PySAD framework.
if __name__ == "__main__":
    np.random.seed(61)  # Fix random seed.

    # Get data to stream.
    data = Data("data")
    X_all, y_all = data.get_data("arrhythmia.mat")
    X_all, y_all = shuffle(X_all, y_all)

    iterator = ArrayStreamer(shuffle=False)  # Init streamer to simulate streaming data.
    # set_trace()
    model = xStream()  # Init xStream anomaly detection model.
    # model = ExactStorm(window_size=25)
    # model = HalfSpaceTrees(feature_mins=np.zeros(X_all.shape[1]), feature_maxes=np.ones(X_all.shape[1]))
    # model = IForestASD()
    # model = KitNet(grace_feature_mapping =100, max_size_ae=100)
    # model = KNNCAD(probationary_period=10)
    # model = LODA()
    # model = LocalOutlierProbability()
    # model = MedianAbsoluteDeviation()
    # model = RelativeEntropy(min_val=0, max_val=1)
    # model = RobustRandomCutForest(num_trees=200)
    # model = RSHash(feature_mins=0, feature_maxes=1)
    
    preprocessor = InstanceUnitNormScaler()  # Init normalizer.
    postprocessor = RunningAveragePostprocessor(window_size=5)  # Init running average postprocessor.
    auroc = AUROCMetric()  # Init area under receiver-operating- characteristics curve metric.

    calibrator = GaussianTailProbabilityCalibrator(window_size=100)  # Init probability calibrator.
    idx = 0
    for X, y in tqdm(iterator.iter(X_all[100:], y_all[100:])):  # Stream data.
        X = preprocessor.fit_transform_partial(X)  # Fit preprocessor to and transform the instance.

        score = model.fit_score_partial(X)  # Fit model to and score the instance.        
        score = postprocessor.fit_transform_partial(score)  # Apply running averaging to the score.
        
        # print(score)
        auroc.update(y, score)  # Update AUROC metric.
        try:
            # set_trace()
            calibrated_score = calibrator.fit_transform(score)  # Fit & calibrate score.
        except:           
            calibrated_score = 0
            # set_trace()
        # set_trace()
        # Output if the instance is anomalous.
        if calibrated_score > 0.95:  # If probability of being normal is less than 5%.
            print(f"Alert: {idx}th data point is anomalous.")
            
        idx += 1

    # Output resulting AUROCS metric.
    # print("AUROC: ", auroc.get())

Does anyone know how to fix this problem ? Thank you very much.

@selimfirat
Copy link
Owner

Sorry for a very late reply. I was able to reproduce the issue. Yet, it seems it is due to the chosen hyperparameters and model combination. When I changed algorithm to ExactStorm for example, I set window size of calibrator to 10 and removed postprocessor. Then it was finding anomalies.

RunningAveragePostprocessor probably averages out anomalies so they no longer seen as anomaly for the other algorithms. Besides, arrythmia is a very small dataset. That's another reason why methods do not work very welll, especially with this parameter setting.

Please feel free to reopen the issue if you have any further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants