-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
How to get clusters for each feature vector like displayed on the README page?

I'm currently using this function to implement clusterization algorithm, but it is not fast enough:
def annoy_clustering(data, num_trees=10, num_neighbors=10):
n_samples, n_features = data.shape
# Step 1: Build the Annoy index
annoy_index = AnnoyIndex(n_features, metric='euclidean')
for i in range(n_samples):
annoy_index.add_item(i, data[i])
annoy_index.build(num_trees)
# Step 2: Assign clusters based on nearest neighbors
labels = np.full(n_samples, -1) # Initialize all labels as -1
cluster_id = 0
for i in range(n_samples):
if labels[i] == -1: # If the point is not yet labeled
# Get nearest neighbors
neighbors = annoy_index.get_nns_by_item(i, num_neighbors)
# Assign the same cluster ID to the point and its neighbors
labels[neighbors] = cluster_id
cluster_id += 1
return labelsIs this even possible with ANNOY algorithm to get clusters directly without involving get_nns_by_item, which bloats computational complexity?
Metadata
Metadata
Assignees
Labels
No labels