We model the dataset as a graph where nodes represent data points and edges connect points within a fixed distance threshold. Communities are identified as connected components of this graph, illustrating graph-based clustering and structure discovery.

In [1]:
import numpy as np
import pandas as pd

from rice_ml.unsupervised_learning import GraphCommunityDetection

df = pd.read_csv("../data/lesions_processed.csv")

X = df[["x_norm","y_norm","slice_norm"]].values


In [2]:
#fit
model = GraphCommunityDetection(eps=0.15)
model.fit(X)

labels = model.labels_
df["community"] = labels
labels[:10]


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [3]:
# number of communities
np.unique(labels).size

#sizes
df["community"].value_counts().sort_index()



community
0    136
1      1
2      2
3      1
4      1
5      3
6      1
7      2
8     21
9      1
Name: count, dtype: int64

The graph-based community detection identifies one large connected component, indicating that most lesions are spatially close in the chosen feature space. Several small communities and isolated nodes correspond to sparse regions or outliers. This behavior is expected when using a fixed distance threshold and illustrates how community detection reveals global connectivity structure rather than balanced clusters.