Cluster analysis

How to get clusters for each feature vector like displayed on the README page?
![image](https://github.com/user-attachments/assets/b61c3c97-2520-4002-8a51-f0e0db50d2d3)

I'm currently using this function to implement clusterization algorithm, but it is not fast enough:
```python

def annoy_clustering(data, num_trees=10, num_neighbors=10):
    n_samples, n_features = data.shape

    # Step 1: Build the Annoy index
    annoy_index = AnnoyIndex(n_features, metric='euclidean')
    for i in range(n_samples):
        annoy_index.add_item(i, data[i])
    annoy_index.build(num_trees)

    # Step 2: Assign clusters based on nearest neighbors
    labels = np.full(n_samples, -1)  # Initialize all labels as -1
    cluster_id = 0

    for i in range(n_samples):
        if labels[i] == -1:  # If the point is not yet labeled
            # Get nearest neighbors
            neighbors = annoy_index.get_nns_by_item(i, num_neighbors)
            # Assign the same cluster ID to the point and its neighbors
            labels[neighbors] = cluster_id
            cluster_id += 1

    return labels
```
Is this even possible with ANNOY algorithm to get clusters directly without involving `get_nns_by_item`, which bloats computational complexity?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cluster analysis #674

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cluster analysis #674

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions