Dimensionality-Aware Outlier Detection (DAO)

Repository of the paper:

Dimensionality-Aware Outlier Detection
Alastair Anderberg, James Bailey, Ricardo J. G. B. Campello,
Michael E. Houle, Henrique O. Marques, Miloš Radovanović, Arthur Zimek
SDM24

In this paper, we present a nonparametric method for outlier detection that takes full account of local variations in intrinsic dimensionality within the dataset. Using the theory of Local Intrinsic Dimensionality (LID), our 'dimensionality-aware' outlier detection method, DAO, is derived as an estimator of an asymptotic local expected density ratio involving the query point and a close neighbor drawn at random. The dimensionality-aware behavior of DAO is due to its use of local estimation of LID values in a theoretically-justified way.

Through comprehensive experimentation on more than 800 synthetic and real datasets, we show that DAO significantly outperforms three popular and important benchmark outlier detection methods: Local Outlier Factor (LOF), Simplified LOF, and kNN.

Detailed numbers for all experiments are given in tables in the Supplementary Material

Repository setup

pip install -r requirements.txt

Downloading real datasets

Rscript R/downloadRealDatasets.r
Rscript R/preprocessing.r

Summary of real datasets

Rscript R/compileResults.r 'summaryRealDatasets'

Experimental Results

Evaluation of LID Estimation on DAO Performance

python run_synthetic.py
Rscript R/compileResults.r 'summaryResultsSyntheticDatasets'

Fig. 1. ROC AUC values for outlier detection performance over 480 synthetic datasets containing 2 clusters. One of the clusters (c₁) has intrinsic dimension fixed at 8. The intrinsic dimension of the other cluster (c₂) varies across the datasets (x-axis). The dashed vertical line indicates the reference set where both clusters lie on manifolds with the same intrinsic dimension (8). The results shown are averages over 30 datasets with the same characteristics. Bars indicate standard deviation.

Comparative Evaluation on Synthetic Datasets

Rscript R/compileResults.r 'lrSyntheticDatasets'

Comparative Evaluation on Real Datasets

python run_real.py
python stats.py

Simple linear regression

Rscript R/compileResults.r 'lrRealDatasets'

Visualizing Outlier Detection Performance

Rscript R/compileResults.r 'plot_R_MoransI'

Fig. 2. Differences in ROC AUC performance between DAO_MLE and the dimensionality-unaware methods over 393 real datasets. Blue dots indicate datasets where DAO outperforms its competitor, whereas red dots indicate the opposite. The 'Oracle' method indicates the best-performing competitor for each individual dataset. Color intensity is proportional to the ROC AUC difference. On the x- and y-axis, we show Moran's I autocorrelation and dispersion R of log-LID estimates, respectively.

Critical Distance Diagram

Rscript R/compileResults.r 'plotCDRealDatasets'

Fig. 3. Critical difference diagram (significance level α = 1e-16) of average ranks of the methods on 393 real datasets: DAO_MLE vs. baseline competitors.

Runtime Performance and Computational Complexity

python runtime.py
Rscript R/compileResults.r 'printRuntime'

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
R		R
datasets/synthetic		datasets/synthetic
files		files
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
algorithms.py		algorithms.py
estimators.py		estimators.py
requirements.txt		requirements.txt
run_real.py		run_real.py
run_synthetic.py		run_synthetic.py
runtime.py		runtime.py
stats.py		stats.py
stats_seq.py		stats_seq.py

homarques/DAO

Folders and files

Latest commit

History

Repository files navigation

Dimensionality-Aware Outlier Detection (DAO)

Repository setup

Downloading real datasets

Summary of real datasets

Experimental Results

Evaluation of LID Estimation on DAO Performance

Comparative Evaluation on Synthetic Datasets

Comparative Evaluation on Real Datasets

Simple linear regression

Visualizing Outlier Detection Performance

Critical Distance Diagram

Runtime Performance and Computational Complexity

About

Resources

Stars

Watchers

Forks

Languages