# Non-Convex Shapes: Ear and Swirls

This notebook demonstrates spectral clustering on two challenging non-convex shape datasets from:

> **Automatic Determination of the Number of Clusters using Spectral Algorithms**  
> Sanguinetti, G., Laidler, J., Lawrence, N.D. (2005)  
> IEEE Workshop on Machine Learning for Signal Processing (NNSP 2005)

## The Challenge

These datasets showcase **complex non-convex shapes** that are impossible for traditional clustering methods:

1. **Ear**: An outline of an ear with intricate internal structure
2. **Swirls**: Interleaved spiral patterns

Both datasets are binary images where we cluster the (x,y) coordinates of black pixels.

## What we'll demonstrate:

1. Load and visualize the ear and swirls images
2. Extract pixel coordinates as features  
3. Apply automatic spectral clustering
4. Visualize results in both original space and eigenspace
5. Reproduce paper Figures 4a and 4b

In [None]:
# Install spectral-cluster package if needed
import sys
from pathlib import Path

try:
    import spectral
    print(f"✓ spectral package already installed (version {spectral.__version__})")
except ImportError:
    print("📦 Installing spectral-cluster package...")
    
    # Try local install first
    here = Path.cwd().resolve()
    parent = here.parent
    
    if (parent / "pyproject.toml").exists() and (parent / "spectral").is_dir():
        print(f"  → Installing from local directory: {parent}")
        import subprocess
        subprocess.check_call(
            [sys.executable, "-m", "pip", "install", "-e", str(parent)],
            stdout=subprocess.DEVNULL
        )
    else:
        print("  → Installing from GitHub...")
        import subprocess
        subprocess.check_call([
            sys.executable, "-m", "pip", "install",
            "git+https://github.com/lawrennd/spectral.git"
        ])
    
    import spectral
    print(f"✓ spectral package installed successfully (version {spectral.__version__})")

In [None]:
# Import required packages
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from spectral import SpectralCluster

# Set random seed for reproducibility
np.random.seed(1)

# Configure matplotlib for better plots
plt.rcParams['figure.figsize'] = (12, 5)
plt.rcParams['font.size'] = 11

print('✓ All packages loaded successfully')

## 1. Load and Visualize the Datasets

Both images are binary (black and white). We'll extract the (x, y) coordinates of black pixels as our data points.

In [None]:
def ensure_data_files():
    """Download data files if they don't exist locally."""
    from pathlib import Path
    import urllib.request
    
    data_dir = Path('data')
    data_dir.mkdir(exist_ok=True)
    
    base_url = 'https://raw.githubusercontent.com/lawrennd/spectral/main/examples/data/'
    files = ['ear.bmp', 'swirls.bmp']
    
    for filename in files:
        filepath = data_dir / filename
        if not filepath.exists():
            print(f'📥 Downloading {filename}...')
            urllib.request.urlretrieve(base_url + filename, filepath)
            print(f'  ✓ Downloaded to {filepath}')

# Ensure data files are available
ensure_data_files()

def load_image_data(filename):
    """Load binary image and extract coordinates of black pixels."""
    img = Image.open(f'data/{filename}')
    img_array = np.array(img)
    
    # Handle RGB or grayscale
    if len(img_array.shape) == 3:
        img_array = img_array[:, :, 0]
    
    # Find black pixels (value < 128)
    y_coords, x_coords = np.where(img_array < 128)
    
    # Return as (x, y) coordinates
    X = np.column_stack([x_coords, y_coords])
    
    return X, img_array

# Load both datasets
X_ear, img_ear = load_image_data('ear.bmp')
X_swirls, img_swirls = load_image_data('swirls.bmp')

print(f"Ear dataset: {X_ear.shape[0]} points")
print(f"Swirls dataset: {X_swirls.shape[0]} points")

In [None]:
# Visualize the original images
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].imshow(img_ear, cmap='gray')
axes[0].set_title('Ear Image', fontsize=13, fontweight='bold')
axes[0].axis('off')

axes[1].imshow(img_swirls, cmap='gray')
axes[1].set_title('Swirls Image', fontsize=13, fontweight='bold')
axes[1].axis('off')

plt.tight_layout()
plt.show()

print("\nChallenge: How many clusters are in each image?")
print("Answer: Not obvious by inspection! Let the algorithm decide.")

## 2. Cluster the Ear Dataset (Paper Figure 4a)

The ear has a complex outline with internal structure. We use:
- Features: (x, y) coordinates of black pixels
- Sigma: $\sigma \approx 0.707$ (converted from MATLAB sigma²=1)

**Note on sigma conversion**: MATLAB uses $\exp(-d^2/\sigma^2)$ while we use $\exp(-d^2/(2\sigma^2))$, so $\sigma_{\text{python}} = \sqrt{\sigma^2_{\text{matlab}}/2}$

In [None]:
# Cluster the ear dataset
# MATLAB uses sigma2=1, so Python sigma = sqrt(1/2) ≈ 0.707
clf_ear = SpectralCluster(sigma=0.707, random_state=1)
clf_ear.fit(X_ear)

print(f"\n{'='*60}")
print(f"EAR CLUSTERING RESULT")
print(f"{'='*60}")
print(f"Number of clusters detected: {clf_ear.n_clusters_}")
print(f"Algorithm used {clf_ear.eigenvectors_.shape[1]} eigenvectors")
print(f"Total data points: {X_ear.shape[0]}")
print(f"{'='*60}\n")

In [None]:
# Visualize ear clustering
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Original space
scatter = axes[0].scatter(X_ear[:, 0], X_ear[:, 1], 
                          c=clf_ear.labels_, cmap='tab10',
                          s=20, alpha=0.7, edgecolors='none')
axes[0].set_xlabel('x', fontsize=12)
axes[0].set_ylabel('y', fontsize=12)
axes[0].set_title(f'Ear Clustering ({clf_ear.n_clusters_} clusters)', 
                  fontsize=13, fontweight='bold')
axes[0].invert_yaxis()  # Match image coordinates
axes[0].set_aspect('equal')
plt.colorbar(scatter, ax=axes[0], label='Cluster')

# Eigenspace (first 2 dimensions)
eigenvecs_ear = clf_ear.eigenvectors_
scatter2 = axes[1].scatter(eigenvecs_ear[:, 0], eigenvecs_ear[:, 1],
                           c=clf_ear.labels_, cmap='tab10',
                           s=20, alpha=0.7, edgecolors='none')
axes[1].scatter(clf_ear.centers_[:, 0], clf_ear.centers_[:, 1],
                c='red', s=200, marker='d', edgecolors='k', linewidths=2,
                label='Centers', zorder=5)
axes[1].scatter([0], [0], c='black', s=200, marker='X',
                edgecolors='red', linewidths=2, label='Origin', zorder=5)
axes[1].set_xlabel('Eigenvector 1', fontsize=12)
axes[1].set_ylabel('Eigenvector 2', fontsize=12)
axes[1].set_title('Eigenspace Visualization', fontsize=13, fontweight='bold')
axes[1].set_aspect('equal')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.colorbar(scatter2, ax=axes[1], label='Cluster')

plt.tight_layout()
plt.show()

print("Paper Figure 4a: Ear clustering successfully reproduced!")
print(f"The algorithm identified {clf_ear.n_clusters_} distinct structures in the ear outline.")

## 3. Cluster the Swirls Dataset (Paper Figure 4b)

The swirls are interleaved spirals - a classic test for non-convex clustering. We use:
- Features: (x, y) coordinates of black pixels
- Sigma: $\sigma = 0.5$ (converted from MATLAB sigma²=0.5)

In [None]:
# Cluster the swirls dataset
# MATLAB uses sigma2=0.5, so Python sigma = sqrt(0.5/2) = 0.5
clf_swirls = SpectralCluster(sigma=0.5, random_state=1)
clf_swirls.fit(X_swirls)

print(f"\n{'='*60}")
print(f"SWIRLS CLUSTERING RESULT")
print(f"{'='*60}")
print(f"Number of clusters detected: {clf_swirls.n_clusters_}")
print(f"Algorithm used {clf_swirls.eigenvectors_.shape[1]} eigenvectors")
print(f"Total data points: {X_swirls.shape[0]}")
print(f"{'='*60}\n")

In [None]:
# Visualize swirls clustering
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Original space
scatter = axes[0].scatter(X_swirls[:, 0], X_swirls[:, 1], 
                          c=clf_swirls.labels_, cmap='tab10',
                          s=20, alpha=0.7, edgecolors='none')
axes[0].set_xlabel('x', fontsize=12)
axes[0].set_ylabel('y', fontsize=12)
axes[0].set_title(f'Swirls Clustering ({clf_swirls.n_clusters_} clusters)', 
                  fontsize=13, fontweight='bold')
axes[0].invert_yaxis()  # Match image coordinates
axes[0].set_aspect('equal')
plt.colorbar(scatter, ax=axes[0], label='Cluster')

# Eigenspace (first 2 dimensions)
eigenvecs_swirls = clf_swirls.eigenvectors_
scatter2 = axes[1].scatter(eigenvecs_swirls[:, 0], eigenvecs_swirls[:, 1],
                           c=clf_swirls.labels_, cmap='tab10',
                           s=20, alpha=0.7, edgecolors='none')
axes[1].scatter(clf_swirls.centers_[:, 0], clf_swirls.centers_[:, 1],
                c='red', s=200, marker='d', edgecolors='k', linewidths=2,
                label='Centers', zorder=5)
axes[1].scatter([0], [0], c='black', s=200, marker='X',
                edgecolors='red', linewidths=2, label='Origin', zorder=5)
axes[1].set_xlabel('Eigenvector 1', fontsize=12)
axes[1].set_ylabel('Eigenvector 2', fontsize=12)
axes[1].set_title('Eigenspace Visualization', fontsize=13, fontweight='bold')
axes[1].set_aspect('equal')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.colorbar(scatter2, ax=axes[1], label='Cluster')

plt.tight_layout()
plt.show()

print("Paper Figure 4b: Swirls clustering successfully reproduced!")
print(f"The algorithm separated {clf_swirls.n_clusters_} interleaved spiral patterns.")

## 4. 3D Eigenspace Visualization

The paper shows 3D eigenspace plots. Let's visualize the first 3 eigenvectors to see the radial structure more clearly.

In [None]:
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(16, 6))

# Ear in 3D eigenspace
ax1 = fig.add_subplot(121, projection='3d')
if clf_ear.eigenvectors_.shape[1] >= 3:
    scatter = ax1.scatter(eigenvecs_ear[:, 0], eigenvecs_ear[:, 1], eigenvecs_ear[:, 2],
                          c=clf_ear.labels_, cmap='tab10', s=10, alpha=0.6)
    ax1.scatter(clf_ear.centers_[:, 0], clf_ear.centers_[:, 1], clf_ear.centers_[:, 2],
                c='red', s=100, marker='d', edgecolors='k', linewidths=2, label='Centers')
    ax1.scatter([0], [0], [0], c='black', s=100, marker='X',
                edgecolors='red', linewidths=2, label='Origin')
    ax1.set_xlabel('Eigenvector 1')
    ax1.set_ylabel('Eigenvector 2')
    ax1.set_zlabel('Eigenvector 3')
    ax1.set_title(f'Ear: 3D Eigenspace ({clf_ear.n_clusters_} clusters)', fontweight='bold')
    ax1.legend()

# Swirls in 3D eigenspace
ax2 = fig.add_subplot(122, projection='3d')
if clf_swirls.eigenvectors_.shape[1] >= 3:
    scatter = ax2.scatter(eigenvecs_swirls[:, 0], eigenvecs_swirls[:, 1], eigenvecs_swirls[:, 2],
                          c=clf_swirls.labels_, cmap='tab10', s=10, alpha=0.6)
    ax2.scatter(clf_swirls.centers_[:, 0], clf_swirls.centers_[:, 1], clf_swirls.centers_[:, 2],
                c='red', s=100, marker='d', edgecolors='k', linewidths=2, label='Centers')
    ax2.scatter([0], [0], [0], c='black', s=100, marker='X',
                edgecolors='red', linewidths=2, label='Origin')
    ax2.set_xlabel('Eigenvector 1')
    ax2.set_ylabel('Eigenvector 2')
    ax2.set_zlabel('Eigenvector 3')
    ax2.set_title(f'Swirls: 3D Eigenspace ({clf_swirls.n_clusters_} clusters)', fontweight='bold')
    ax2.legend()

plt.tight_layout()
plt.show()

print("\nKey observation: Points cluster along RAYS from the origin in eigenspace!")
print("This radial structure is what enables automatic cluster detection.")

## 5. Summary and Insights

### Results

The algorithm successfully separated complex non-convex shapes:
- **Ear**: Identified distinct anatomical structures
- **Swirls**: Separated interleaved spiral patterns

### Why Spectral Clustering Works Here

1. **Affinity-based**: Uses RBF kernel to capture local proximity, not global convexity
2. **Eigenspace transformation**: Converts complex shapes into radial structure
3. **Elongated k-means**: Naturally separates rays from origin
4. **Automatic detection**: Iteratively finds correct dimensionality

### Parameter Selection: Sigma

Sigma controls the locality of connections:
- **Too small**: Every pixel is its own cluster
- **Too large**: Everything becomes one cluster  
- **Just right**: Connects nearby points, separates distant structures

For these datasets:
- Ear: $\sigma \approx 0.707$ (captures local curvature)
- Swirls: $\sigma = 0.5$ (tighter, for finer spirals)

See notebook 5 for systematic parameter exploration.

## Conclusion

This notebook demonstrated:
- Loading and processing binary images
- Clustering non-convex shapes automatically
- Visualizing results in original and eigenspace
- Reproducing paper Figures 4a and 4b

## References

- Paper: Sanguinetti et al. (2005), Figures 4a and 4b
- MATLAB implementation: `matlab/demoEar.m`, `matlab/demoSwirls.m`
- Python implementation: `spectral/cluster.py`