# VERUS Clustering

This example demonstrate the clustering process within VERUS following the steps below:

1. Load the data
2. Extract initial centers using OPTICS
3. Cluster the data with KMeans using the VI as weights
4. Visualize the clusters

In [2]:
from verus.clustering import GeOPTICS, KMeansHaversine
import pandas as pd

## 1. Load Data

For this example, the data is loaded from a dataset.

In [3]:
poi_data = pd.read_csv("../../data/poti/Porto_dataset_buffered.csv")

## 2. Run OPTICS clustering to obtain initial centers


In [4]:
optics = GeOPTICS(min_samples=5, xi=0.05, min_cluster_size=5, verbose=True)
optics_results = optics.run(data_source=poi_data)

2025-03-13 17:55:44 [INFO] Using provided DataFrame
2025-03-13 17:55:44 [INFO] Loaded 486 points of interest
2025-03-13 17:55:44 [INFO] Running OPTICS clustering on 486 points
2025-03-13 17:55:44 [INFO] Using epsilon: 0.00018835321652671395 radians
2025-03-13 17:55:44 [INFO] Assigning noise points to nearest clusters using KNN
2025-03-13 17:55:44 [INFO] Found 31 clusters


### 2.1 Visualize the OPTICS clustering results

In [6]:
optics.view(cluster_df=optics_results["clusters"], centroids_df=optics_results["centroids"])

2025-03-13 17:55:44 [INFO] Creating interactive map
2025-03-13 17:55:44 [INFO] Adding cluster points to map


## 3. Use OPTICS centers to initialize KMeans

In [7]:
if optics_results["centroids"] is not None and len(optics_results["centroids"]) > 1:
    centers = optics_results["centroids"]
    print(f"Running KMeans with {len(centers)} OPTICS centers")

    kmeans = KMeansHaversine(
        n_clusters=len(centers),
        init="predefined",
        random_state=42,
        predefined_centers=centers,
    )

    kmeans_results = kmeans.run(data_source=poi_data, centers_input=centers)

    # Access clustering results
    clusters = kmeans_results["clusters"]
    centroids = kmeans_results["centroids"]

    print(f"Found {len(centroids)} clusters")
    print(f"Cluster distribution:\n{clusters['cluster'].value_counts()}")

Running KMeans with 31 OPTICS centers
2025-03-13 17:55:44 [INFO] Using provided DataFrame
2025-03-13 17:55:44 [INFO] Loaded 486 points of interest
2025-03-13 17:55:44 [INFO] Using provided DataFrame with 31 centers
2025-03-13 17:55:44 [INFO] Using vulnerability indices as sample weights
2025-03-13 17:55:44 [INFO] Starting K-means clustering with 31 clusters
2025-03-13 17:55:44 [INFO] Using predefined initialization method
2025-03-13 17:55:44 [INFO] Using predefined centroids
2025-03-13 17:55:44 [INFO] K-means iteration 1/300
2025-03-13 17:55:44 [INFO] Maximum centroid shift: 0.244308 km
2025-03-13 17:55:44 [INFO] K-means iteration 2/300
2025-03-13 17:55:44 [INFO] Maximum centroid shift: 0.133866 km
2025-03-13 17:55:44 [INFO] K-means iteration 3/300
2025-03-13 17:55:44 [INFO] Maximum centroid shift: 0.148268 km
2025-03-13 17:55:44 [INFO] K-means iteration 4/300
2025-03-13 17:55:44 [INFO] Maximum centroid shift: 0.112348 km
2025-03-13 17:55:44 [INFO] K-means iteration 5/300
2025-03-13 17

### 3.1. Visualize the clusters

In [8]:
kmeans.view(clusters, centroids)

2025-03-13 17:55:44 [INFO] Creating interactive map
2025-03-13 17:55:44 [INFO] Creating map with 31 clusters
