New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] LOF algorithm (Anomaly Detection) #5279

Merged
merged 18 commits into from Oct 25, 2016

alex review + rebase

  • Loading branch information...
ngoix committed Aug 29, 2016
commit c9f0eacee23d169edbcc775636a5ca3864f269f8
@@ -191,7 +191,7 @@ lower density than their neighbors. These are considered to be outliers.
In practice the local density is obtained from the k-nearest neighbors.
The LOF score of an observation is equal to the ratio of the
average local density of his k-nearest neighbors, and his own local density:
average local density of his k-nearest neighbors, and its own local density:
a normal instance is expected to have a local density similar to that of its
neighbors, while abnormal data are expected to have much smaller local density.
@@ -203,15 +203,15 @@ with respect to the surrounding neighborhood.
This strategy is illustrated below.

This comment has been minimized.

@amueller

amueller Oct 20, 2016

Member

I don't feel that the example illustrates the point that was just made about the different densities. I'm fine to leave it as-is but I don't get a good idea of the global vs local. It would be nice to also illustrate a failure mode maybe?

@amueller

amueller Oct 20, 2016

Member

I don't feel that the example illustrates the point that was just made about the different densities. I'm fine to leave it as-is but I don't get a good idea of the global vs local. It would be nice to also illustrate a failure mode maybe?

This comment has been minimized.

@ngoix

ngoix Oct 22, 2016

Contributor

No global vs local anymore!

@ngoix

ngoix Oct 22, 2016

Contributor

No global vs local anymore!

.. figure:: ../auto_examples/neighbors/images/plot_lof_001.png
.. figure:: ../auto_examples/neighbors/images/sphx_glr_plot_lof_001.png
:target: ../auto_examples/neighbors/plot_lof.html
:align: center
:scale: 75%
.. topic:: Examples:
* See :ref:`example_neighbors_plot_lof.py` for

This comment has been minimized.

@tguillemot

tguillemot Sep 13, 2016

Contributor

The link are broken. I think it's :

See :ref:`sphx_glr_auto_examples_neighbors_plot_lof.py` for
@tguillemot

tguillemot Sep 13, 2016

Contributor

The link are broken. I think it's :

See :ref:`sphx_glr_auto_examples_neighbors_plot_lof.py` for
an illustration of the use of IsolationForest.
an illustration of the use of LocalOutlierFactor.

This comment has been minimized.

@amueller

amueller Oct 20, 2016

Member

:class:LocalOutlierFactor?

@amueller

amueller Oct 20, 2016

Member

:class:LocalOutlierFactor?

* See :ref:`example_covariance_plot_outlier_detection.py` for a

This comment has been minimized.

@tguillemot

tguillemot Sep 13, 2016

Contributor

The link are broken. I think it's :

See :ref:`sphx_glr_auto_examples_covariance_plot_outlier_detection.py` for
@tguillemot

tguillemot Sep 13, 2016

Contributor

The link are broken. I think it's :

See :ref:`sphx_glr_auto_examples_covariance_plot_outlier_detection.py` for
comparison with other anomaly detection methods.
View
@@ -62,6 +62,9 @@ class LocalOutlierFactor(NeighborsBase, KNeighborsMixin, UnsupervisedMixin):
metric to use for distance computation. Any metric from scikit-learn

This comment has been minimized.

@tguillemot

tguillemot Sep 13, 2016

Contributor

The metric used for the distance computation.

@tguillemot

tguillemot Sep 13, 2016

Contributor

The metric used for the distance computation.

or scipy.spatial.distance can be used.

This comment has been minimized.

@agramfort

agramfort Aug 19, 2016

Member

Precomputed works? I see it in fit docstring

@agramfort

agramfort Aug 19, 2016

Member

Precomputed works? I see it in fit docstring

This comment has been minimized.

@ngoix

ngoix Aug 29, 2016

Contributor

Yes it works. I'll add a test for it btw.

@ngoix

ngoix Aug 29, 2016

Contributor

Yes it works. I'll add a test for it btw.

If 'precomputed', the training input X is expected to be a distance
matrix.
If metric is a callable function, it is called on each
pair of instances (rows) and the resulting value recorded. The callable
should take two arrays as input and return one value indicating the
@@ -1,10 +1,16 @@
# Authors: Nicolas Goix <nicolas.goix@telecom-paristech.fr>
# Alexandre Gramfort <alexandre.gramfort@telecom-paristech.fr>
# License: BSD 3 clause
from math import sqrt
import numpy as np
from sklearn import neighbors
from numpy.testing import assert_array_equal
from sklearn import metrics
from sklearn.metrics import roc_auc_score
from sklearn.utils import check_random_state
from sklearn.utils.testing import assert_greater
from sklearn.utils.testing import assert_array_almost_equal
@@ -61,3 +67,28 @@ def test_lof_values():
assert_array_almost_equal(-clf.decision_function([[2., 2.]]), [s_0])
# # check predict(one sample already in train)
assert_array_almost_equal(-clf.decision_function([[1., 1.]]), [s_1])
def test_lof_precomputed(random_state=42):
"""Tests LOF with a distance matrix."""
# Note: smaller samples may result in spurious test success
rng = np.random.RandomState(random_state)
X = rng.random_sample((10, 4))
Y = rng.random_sample((3, 4))
DXX = metrics.pairwise_distances(X, metric='euclidean')
DYX = metrics.pairwise_distances(Y, X, metric='euclidean')
# As a feature matrix (n_samples by n_features)
lof_X = neighbors.LocalOutlierFactor(n_neighbors=3)
lof_X.fit(X)
pred_X_X = lof_X.predict()
pred_X_Y = lof_X.predict(Y)
# As a dense distance matrix (n_samples by n_samples)
lof_D = neighbors.LocalOutlierFactor(n_neighbors=3, algorithm='brute',
metric='precomputed')
lof_D.fit(DXX)
pred_D_X = lof_D.predict()
pred_D_Y = lof_D.predict(DYX)
assert_array_almost_equal(pred_X_X, pred_D_X)
assert_array_almost_equal(pred_X_Y, pred_D_Y)
ProTip! Use n and p to navigate between commits in a pull request.