Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[MRG+2] LOF algorithm (Anomaly Detection) #5279
Local Outlier Factor implementation.
Motivated by previous discussions
I would like to see some more extensive unit tests, particularly in cases where the algorithm should fail (wrong dimensions or other incorrect types of data passed in). I'll be able to look more at the performance of the code once you merge the mixin with the other class, and change the API to always take in an X matrix.
If you have a dataset X and want to remove outliers from it, you don't want to do
because then each sample is considered in its own neighbourhoud: in predict(X), X is considered as 'new observations'.
What the user wants is:
which is allowed by
It is like looking for k-nearest-neighbors of points in a dataset X: you can do:
which is different from
I can make
and allows taking X=None in argument... Is it allowed ?
I think caching the LRD on the training set would be good (and actually make the code easier to follow). I think either
decision_function should both be private or neither. I kinda tend towards both, as making public is easier than hiding.
The rest is mostly minor, though how to tune
n_neighbors seems pretty important.