Benchmarking data stream outlier detection methods

Main Contributions

Data stream datasets have characteristics depending on the underlying domain and context. From their proximity to time series, we can characterize a data stream by the presence of seasonality, trend, and cycle; due to the way data are arriving, we can add concept drift which is a non-negligible phenomenon which currently occurs in data stream context.

This work:

✅ Compare some data stream anomaly detection methods on their latences and performances

✅ Focus on characteristics presents on the datasets (seasonality, trend, cycle, concept drift)

Interested in my work?

Feel free to contact me at: anne.ngobibinbe@gmail.com

The final version of our paper (in French) on the benchmark of data stream outlier detection methods is being submitted to the 2022 French Speaking Conference on the Extraction and Management of Knowledge (EGC).

README Structure

Methods compared: Presentation of methods we compared
Datasets and their characteristics: Brief Description of datasets and characteristics identified
Description of the experimental protocol: Description of the experimental protocol
Results: Presentation of results obtained
Reproducibility: Details on how to reproduce our tests
Referencies

Methods compared

As it's the case for most of the anomaly detection methods, the following methods produce an anomaly score for each incoming instance showing how well the instance could be an anomaly, finally a threshold fixed by the user permits to say that instances with anomaly scores higher than the threshold are anomalies. In the literature, data stream anomaly detection methods are mostly separated into statistical based, tree based, proximity based and deep learning based approaches. We have chosen highly used and recommended approaches in each of those categories.

Methods:

Online ARIMA : Statistic based methods which provide the anomaly score by computing the distance between the value of the instance forecasted from past instances and the real value of the instance.
HStree : Tree based approach, providing the anomaly score according to how well an instance is isolated from other instances in an ensemble of pre-constructed trees
IforestASD : Similar to HStree
KitNet : Deep learning based methods providing the anomaly score as the reconstruction error of an instance (Autoencoder)
MILOF : Proximity based approach, providing the anomaly score according to how locally reachable is an instance compared to its nearest neighbours.

Datasets and their characteristics

We selected datasets mostly from IOT domain and whose anomalies causes are known to avoid errors due to human or tools labeling. In the boards, no trend means the dataset has a constant trend. Those characteristics have been identified by visualizing the datasets and are support by STL decompositions for trends and seasonalities.

🔗 Anchor Links:

Univariate datasets
Multivariate datasets

Univariate datasets

We used here the Real known cause group of datasets from the NAB Benchmark.

Dataset	Domain	Dataset length	number of anomalies	Concept Drift	Seasonality	Trend	Cylce
ambiant temperature system failure	industry	7267	2	yes	yes	yes	no
cpu utilization asg misconfiguration	IOT	18050	1	yes	yes	yes	yes
ec2 request latency system failure	IOT	4032	3	no	no	yes	no
machine temperature system failure	industry	22695	4	no	no	no	no
new york taxi	real life scenario	10320	5	no	yes	yes	yes
rogue agent keyhold	IOT	1882	2	yes	no	yes	no
rogue agent key up down	IOT	5315	2	yes	no	no	no

Multivariate datasets

We selected some datasets showing a great number of our specified characteristics from the SKAB benchmark. All those datasets have 7 dimensions.

Dataset	Domain	Dataset length	number of anomalies	Concept Drift	Seasonality	Trend	Cylce
other 9: Closing the valve at the flow inlet to the pump	Industrial IOT	751	2	no	no	yes	yes
other 11: Closing the valve at the flow inlet to the pump	Industrial IOT	665	4	no	yes	no	no
other 13: Sharply behavior of rotor imbalance	Industrial IOT	7267	2	yes	yes	yes	no
other 14: Linear behavior of rotor imbalance	Industrial IOT	1153	2	yes	yes	yes	yes
other 15: Step behavior of rotor imabalance	Industrial IOT	1147	2	yes	yes	yes	no
other 17: Exponential behavior of rotor imbalance	Industrial IOT	1147	4	no	yes	no	yes
other 20: Draining water from the tank until cavation	Industrial IOT	1191	4	yes	yes	yes	no
other 22: Water supply of increased temperature	Industrial IOT	1079	4	yes	yes	yes	yes

Description of the experimental protocol

For each dataset, a bayesian optimization is performed to find best hyperparameters (details of the hyperparameter search space of each method could be found in the implementation details (page 8) section of the summary_of_the_experiment file), then we test the method with the best hyperparameters and record the execution time and the f1-score. Finally we process the latence or response time (average time to treat an instance) (latence =the execution time on the dataset/length of the dataset). To process the f1-score, we consider a method find an anomaly if it 1% of the length of the dataset around the position of the anomaly (this because an anomaly generaly occurs on a small period and the point given as the position of the anomaly is a point inside the period on which the anomaly occured).

Results

Due to conception restrictions KitNet couldn't be applied on univariate datasets and Online ARIMA can't be applied on multivariate datasets.

🔗 Anchor Links:

Results on univariate datasets
Results on multivariate datasets

Results on univariate datasets

F1-score

Dataset	MILOF	IforestASD	HStree	Online ARIMA
ambiant temperature system failure	0.4	0.67	0.3	0.67
cpu utilization asg misconfiguration	0.5	0.42	0.45	1
ec2 request latency system failure	0.5	0.343	0.94	0.8
machine temperature system failure	0.15	0.7825	0.88	0.66
new york taxi	0.25	0.31	0.5	0.6
rogue agent keyhold	0.136	0.33	0.079	0.1
rogue agent key up down	0.4	0.67	0.15	0.11

Here we summarize the number of datasets where the methods had the best scores, and among those the number having conceptual drift, seasonality, trends and cycles (knowing that a dataset can have more than one of the possible characteristics).

Method	Number of best scores	Concept drift	Seasonality	Trend	Cycle
MILOF	0	0	0	0	0
HStree	2	0	0	1	0
iforestASD	3	2	1	2	0
Online ARIMA	3	2	3	3	2

Execution time (ms) we rounded execution time.

Dataset	MILOF	IforestASD	HStree	Online ARIMA
ambiant temperature system failure	172	200	212	50
cpu utilization asg misconfiguration	430	438	738	129
ec2 request latency system failure	51	167	125	38
machine temperature system failure	560	580	9752	109
new york taxi	275	269	4776	391
rogue agent keyhold	31	76	16	17
rogue agent key up down	26	203	8	37

Here we summarize the average latency on univariate datasets

	MILOF	IforestASD	HStree	Online ARIMA
univariées (ms)	22.2	27.8	222.8	11.06

Results on multivariate datasets

F1-score

Dataset	MILOF	IforestASD	HStree	KitNet
other 9: Closing the valve at the flow inlet to the pump	0.67	0.25	0.248	0.285
other 11: Closing the valve at the flow inlet to the pump	0.21	0.5	0.6	0.46
other 13: Sharply behavior of rotor imbalance	0.167	0.4	0.69	0.6
other 14: Linear behavior of rotor imbalance	0.14	0.8	0.5	1
other 15: Step behavior of rotor imabalance	0.167	0.5	0.292	0.52
other 17: Exponential behavior of rotor imbalance	0.102	0.122	0.121	0.125
other 20: Draining water from the tank until cavation	0.15	0.29	0.278	0.67
other 22: Water supply of increased temperature	0.32	0.295	0.286	0.37

Here we summarize the number of datasets where the methods had the best scores, and among those the number having conceptual drift, seasonality, trends and cycles (knowing that a dataset can have more than one of the possible characteristics).

Method	Number of best scores	Concept drift	Seasonality	Trend	Cycle
MILOF	1	0	0	1	1
HStree	5	5	4	4	3
IforestASD	0	0	0	0	0
KitNet	2	0	2	0	0

Execution time (ms) we rounded execution time except for Kitnet because its execution time is very low.

Dataset	MILOF	IforestASD	KitNet	HStree
other 9: Closing the valve at the flow inlet to the pump	9	27	0.25	27
other 11: Closing the valve at the flow inlet to the pump	7	31	0.17	2.8
other 13: Sharply behavior of rotor imbalance	10.3	38	0.53	153.7
other 14: Linear behavior of rotor imbalance	22	37	0.48	189
other 15: Step behavior of rotor imabalance	7	32	0.39	7
other 17: Exponential behavior of rotor imbalance	12	32	0.4	48
other 20: Draining water from the tank until cavation	6	31	0.23	206
other 22: Water supply of increased temperature	5	31	0.17	3

Here we summarize the average latency on multivariate datasets.

	MILOF	IforestASD	HStree	KitNet
multivariées (ms)	9.5	31.9	80.7	0.32

Reproducibility

🔗 Anchor Links:

Dependencies
Launch test

Dependencies:

Make sure you have at least python 3.6

to install requirement type: pip install -r requirements.txt

Launch test:

On univariate dataset: python test_univariate.py name-of-the-method-to-test

On multivariate datasets: python test_multivariate.py name-of-the-method-to-test

The name of methods are the following: MILOF for MILOF, ARIMAFD for online ARIMA, HS-tree for Hs-tree, iforestASD for iForestASD, KitNet for KitNet.

The results of the test will be in the folder result. The result file contains (In the result folder):

The execution time on the dataset
The F1-score of each method
The best hyperparameters of each method For each dataset and each method.

Notices: It is possible to change the score used for the experiment by default the MERLIN score (1% around the anomaly )is used, the NAB score is also available. Details on characteristics of the datasets and hyperparameters we found are summarized in the file: summary_of_the_experiment.pdf. IforestASD, KitNet, and HStree has been tested from their pysad implementation

Referencies:

1. Methods:

Togbe, M. U., Y. Chabchoub, A. Boly, M. Barry, R. Chiky, et M. Bahri (2021). Anomalies Detection Using Isolation in Concept-Drifting Data Streams. Computers 10(1).

Ding, Z. et M. Fei (2013). An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. IFAC Proceedings Volumes 46(20), 12–17. 3rd IFAC Conference on Intelligent Control and Automation Science ICONS 2013

an, S. C., K. M. Ting, et T. F. Liu (2011). Fast anomaly detection for streaming data. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Volume Volume Two, IJCAI’11, pp. 1511–1516. AAAI Press.

Salehi, M., C. Leckie, J. C. Bezdek, T. Vaithianathan, et X. Zhang (2016). Fast memory efficient local outlier detection in data streams. IEEE Transactions on Knowledge and Data Engineering 28, 3246–3260.

Mirsky, Y., T. Doitshman, Y. Elovici, et A. Shabtai (2018). Kitsune : An ensemble of autoencoders for online network intrusion detection. arXiv :1802.09089 [cs]. version : 2

Liu, C., S. C. H. Hoi, P. Zhao, et J. Sun (2016). Online arima algorithms for time series prediction. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp. 1867–1873. AAAI Press

2. Datasets:

Lavin, A. et S. Ahmad (2015). Evaluating real-time anomaly detection algorithms - the numenta anomaly benchmark. CoRR abs/1510.03336.

Iurii Katser, Viacheslav Kozitsin, V. L. et I. Maksimov (2021). Unsupervised offline change point detection ensembles. Applied sciences 11, 4280

3. Comparative studies:

Togbe, M., Y. Chabchoub, A. Boly, R. Chiky, C. Etude, et M. U. Togbe (2020). Etude compa- rative des méthodes de détection d’anomalies. Revue des Nouvelles Technologies de l’Information Extraction et Gestion des Connaissances , RNTI-E-36, 109–120

SalehiMahsa et RashidiLida (2018). A Survey on Anomaly detection in Evolving Data. ACM SIGKDD Explorations Newsletter 20(1), 13–23.

Nakamura, T., M. Imamura, R. Mercer, et E. Keogh (2020). Merlin : Parameter-free discovery of arbitrary length anomalies in massive time series archives. In 2020 IEEE International Conference on Data Mining (ICDM), pp. 1190–1195

Chandola, V., A. Banerjee, et V. Kumar (2009). Anomaly detection : A survey. ACM Comput. Surv. 41(3).

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
MILOF		MILOF
Pure code		Pure code
no more used		no more used
result		result
soft		soft
streaming_results		streaming_results
Pattern_lengths_and_Number_of_discords2.xlsx		Pattern_lengths_and_Number_of_discords2.xlsx
README.md		README.md
base_model.py		base_model.py
discord.xlsx		discord.xlsx
drag_stream.py		drag_stream.py
f1score_merlin_abnormal_multivariate_point_results.xlsx		f1score_merlin_abnormal_multivariate_point_results.xlsx
multivariate_abnormal_point.csv		multivariate_abnormal_point.csv
real_known_point_datasets.xlsx		real_known_point_datasets.xlsx
requirements.txt		requirements.txt
score_nab.py		score_nab.py
stream_discord.py		stream_discord.py
summary_of_the_experiments.pdf		summary_of_the_experiments.pdf
test.sh		test.sh
test2_univariate.py		test2_univariate.py
test_ARIMAFD.py		test_ARIMAFD.py
test_KitNet.py		test_KitNet.py
test_LAMP.py		test_LAMP.py
test_MILOF.py		test_MILOF.py
test_Milof.py		test_Milof.py
test_copy2.py		test_copy2.py
test_discord.py		test_discord.py
test_hs_tree.py		test_hs_tree.py
test_iforestASD.py		test_iforestASD.py
test_multivariate.py		test_multivariate.py
test_univariate.py		test_univariate.py

ngobibibnbe/anomaly-detection-in-data-stream

Folders and files

Latest commit

History

Repository files navigation

Benchmarking data stream outlier detection methods

Main Contributions

Interested in my work?

README Structure

Methods compared

Datasets and their characteristics

Univariate datasets

Multivariate datasets

Description of the experimental protocol

Results

Results on univariate datasets

Results on multivariate datasets

Reproducibility

Dependencies:

Launch test:

Referencies:

1. Methods:

2. Datasets:

3. Comparative studies:

About

Resources

Stars

Watchers

Forks

Languages