Skip to content

Algorithms

protruser edited this page Dec 16, 2024 · 7 revisions

Affinity Propagation Algorithm

Overview

Affinity Propagation is a clustering algorithm that identifies a set of "exemplars" among the data points and forms clusters around these exemplars. Unlike other clustering methods (e.g., K-Means), it does not require the number of clusters to be specified beforehand. Instead, it works by exchanging messages between data points until a good set of exemplars and clusters emerge.

This example demonstrates how to perform Affinity Propagation clustering using the scikit-learn library.


Code

# Import required libraries
from numpy import unique
from numpy import where
from sklearn.datasets import make_classification
from sklearn.cluster import AffinityPropagation
from matplotlib import pyplot

# Define the dataset
X, _ = make_classification(
    n_samples=1000, 
    n_features=2, 
    n_informative=2, 
    n_redundant=0, 
    n_clusters_per_class=1, 
    random_state=4
)

# Define the model
model = AffinityPropagation(damping=0.9)

# Fit the model
model.fit(X)

# Assign a cluster to each example
yhat = model.predict(X)

# Retrieve unique clusters
clusters = unique(yhat)

# Create scatter plot for samples from each cluster
for cluster in clusters:
    # Get row indexes for samples with this cluster
    row_ix = where(yhat == cluster)
    # Create scatter plot of these samples
    pyplot.scatter(X[row_ix, 0], X[row_ix, 1])

# Show the plot
pyplot.show()



# Birch Clustering

## Overview
Birch (Balanced Iterative Reducing and Clustering using Hierarchies) is a clustering algorithm designed for large datasets. It incrementally builds a clustering feature tree (CF tree) and performs clustering in a hierarchical manner. It is particularly efficient for datasets with a large number of samples due to its memory-efficient structure.

This example demonstrates how to perform Birch clustering using the `scikit-learn` library.

---

## Code

```python
# Import required libraries
from numpy import unique
from numpy import where
from sklearn.datasets import make_classification
from sklearn.cluster import Birch
from matplotlib import pyplot

# Define the dataset
X, _ = make_classification(
    n_samples=1000, 
    n_features=2, 
    n_informative=2, 
    n_redundant=0, 
    n_clusters_per_class=1, 
    random_state=4
)

# Define the model
model = Birch(threshold=0.01, n_clusters=2)

# Fit the model
model.fit(X)

# Assign a cluster to each example
yhat = model.predict(X)

# Retrieve unique clusters
clusters = unique(yhat)

# Create scatter plot for samples from each cluster
for cluster in clusters:
    # Get row indexes for samples with this cluster
    row_ix = where(yhat == cluster)
    # Create scatter plot of these samples
    pyplot.scatter(X[row_ix, 0], X[row_ix, 1])

# Show the plot
pyplot.show()

Clone this wiki locally