The ldbod package provides flexible functions for computing local density-based outlier scores. Both exact and approximate nearest neighbor search can be implemented based on an efficient k-d tree method, while also accomodating multiple neigbhorhood sizes and four different local density-based methods, LOF, LDF, RKOF, and LPDF. It allows for subsampling of input data or a user specified reference data set to compute outlier scores against, so both unsupervised and semi-supervised outlier detection can be done.
Two functions included are,
ldbod(X,k,...) computes outlier scores referencing a random subsample of the input data, X. Function
ldbod.ref(X,Y,k,...) computes outlier scores for X based on a reference data set, Y. Y can be a set of "normal" data points for semi-supervised outlier detection. Note: Outlier score lpdr is only designed for unsupervised outlier detection and should not be used in the semi-supervised setting. Both functions can return nine outlier scores based on the methods LOF, LDF, RKOF, and LPDF. Each method returns both densities and relative densities.
All kNN computations are carried out using the
nn2 function from the RANN package. For method LPDF, multivariate t densities are computed using the
dmt function from the mnormt package. Refer to specific packages for more details. Note: all neighborhoods are strickly of size k; therefore, the algorithms for LOF, LDF, and RKOF are not exact implementations, but are similar for most situation and equivalent when distance to k-th nearest neighbor is unique. If there are many duplicate data points, then implementation of algorithms could lead to dramatically different results than those that allow neighborhood sizes larger than k, especially if k is relatively small. Removing duplicates is recommended before computing outlier scores unless there is good reason to keep them.
The main motivation for this package is the need for more flexible implementations of local density-based outlier detection methods, that can be used to create ensemble outlier scores. The package is based on the PhD dissteration work by K. T. Williams (2016).
To install the most up to date version in R use the following commands:
or using CRAN