Skip to content

Scikit-learn style estimator for Minimum Spanning Tree Clustering in Python

License

Notifications You must be signed in to change notification settings

jakevdp/mst_clustering

Minimum Spanning Tree Clustering

build status version status license DOI JOSS

This package implements a simple scikit-learn style estimator for clustering with a minimum spanning tree.

Motivation

Automated clustering can be an important means of identifying structure in data, but many of the more popular clustering algorithms do not perform well in the presence of background noise. The clustering algorithm implemented here, based on a trimmed Euclidean Minimum Spanning Tree, can be useful in this case.

Example

The API of the mst_clustering code is designed for compatibility with the scikit-learn project.

from mst_clustering import MSTClustering
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

# create some data with four clusters
X, y = make_blobs(200, centers=4, random_state=42)

# predict the labels with the MST algorithm
model = MSTClustering(cutoff_scale=2)
labels = model.fit_predict(X)

# plot the results
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='rainbow');

Simple Clustering Plot

For a detailed explanation of the algorithm and a more interesting example of it in action, see the MST Clustering Notebook.

Installation & Requirements

The mst_clustering package itself is fairly lightweight. It is tested on Python 2.7 and 3.4-3.5, and depends on the following packages:

Using the cross-platform conda package manager, these requirements can be installed as follows:

$ conda install numpy scipy scikit-learn

Finally, the current release of mst_clustering can be installed using pip:

$ conda install pip  # if using conda
$ pip install mst_clustering

To install mst_clustering from source, first download the source repository and then run

$ python setup.py install

Contributing & Reporting Issues

Bug reports, questions, suggestions, and contributions are welcome. For these, please make use the Issues or Pull Requests associated with this repository.

Citing

If you use this code in an academic publication, please consider citing this JOSS Paper.

About

Scikit-learn style estimator for Minimum Spanning Tree Clustering in Python

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published