Skip to content

Commit

Permalink
DOC Some see alsos between dbscan and optics (#11616)
Browse files Browse the repository at this point in the history
  • Loading branch information
jnothman authored and qinhanmin2014 committed Jul 22, 2018
1 parent b448b3a commit 6e11386
Show file tree
Hide file tree
Showing 3 changed files with 34 additions and 8 deletions.
7 changes: 4 additions & 3 deletions doc/modules/clustering.rst
Expand Up @@ -802,9 +802,10 @@ by black points below.
be used (e.g. with sparse matrices). This matrix will consume n^2 floats.
A couple of mechanisms for getting around this are:

- Use OPTICS clustering in conjunction with the `extract_dbscan` method. OPTICS
clustering also calculates the full pairwise matrix, but only keeps one row in
memory at a time (memory complexity n).
- Use :ref:`OPTICS <optics>` clustering in conjunction with the
`extract_dbscan` method. OPTICS clustering also calculates the full
pairwise matrix, but only keeps one row in memory at a time (memory
complexity n).

- A sparse radius neighborhood graph (where missing entries are presumed to
be out of eps) can be precomputed in a memory-efficient way and dbscan
Expand Down
20 changes: 20 additions & 0 deletions sklearn/cluster/dbscan_.py
Expand Up @@ -87,6 +87,14 @@ def dbscan(X, eps=0.5, min_samples=5, metric='minkowski', metric_params=None,
labels : array [n_samples]
Cluster labels for each point. Noisy samples are given the label -1.
See also
--------
DBSCAN
An estimator interface for this clustering algorithm.
optics
A similar clustering at multiple values of eps. Our implementation
is optimized for memory usage.
Notes
-----
For an example, see :ref:`examples/cluster/plot_dbscan.py
Expand All @@ -107,6 +115,9 @@ def dbscan(X, eps=0.5, min_samples=5, metric='minkowski', metric_params=None,
Another way to reduce memory and computation time is to remove
(near-)duplicate points and use ``sample_weight`` instead.
:func:`cluster.optics` provides a similar clustering with lower memory
usage.
References
----------
Ester, M., H. P. Kriegel, J. Sander, and X. Xu, "A Density-Based
Expand Down Expand Up @@ -233,6 +244,12 @@ class DBSCAN(BaseEstimator, ClusterMixin):
Cluster labels for each point in the dataset given to fit().
Noisy samples are given the label -1.
See also
--------
OPTICS
A similar clustering at multiple values of eps. Our implementation
is optimized for memory usage.
Notes
-----
For an example, see :ref:`examples/cluster/plot_dbscan.py
Expand All @@ -253,6 +270,9 @@ class DBSCAN(BaseEstimator, ClusterMixin):
Another way to reduce memory and computation time is to remove
(near-)duplicate points and use ``sample_weight`` instead.
:class:`cluster.OPTICS` provides a similar clustering with lower memory
usage.
References
----------
Ester, M., H. P. Kriegel, J. Sander, and X. Xu, "A Density-Based
Expand Down
15 changes: 10 additions & 5 deletions sklearn/cluster/optics_.py
Expand Up @@ -127,6 +127,14 @@ def optics(X, min_samples=5, max_bound=np.inf, metric='euclidean',
labels_ : array, shape (n_samples,)
The estimated labels.
See also
--------
OPTICS
An estimator interface for this clustering algorithm.
dbscan
A similar clustering for a specified neighborhood radius (eps).
Our implementation is optimized for runtime.
References
----------
Ankerst, Mihael, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander.
Expand Down Expand Up @@ -256,11 +264,8 @@ class OPTICS(BaseEstimator, ClusterMixin):
--------
DBSCAN
CPU optimized algorithm that clusters at specified neighborhood
radius (eps).
HDBSCAN
Related clustering algorithm that calculates the minimum spanning tree
across mutual reachability space.
A similar clustering for a specified neighborhood radius (eps).
Our implementation is optimized for runtime.
References
----------
Expand Down

0 comments on commit 6e11386

Please sign in to comment.