# 🤖 Clustering User Feedback with LLM Embeddings

This notebook demonstrates how to use **LLM sentence embeddings** for clustering tasks like sentiment discovery and user feedback grouping.

In [None]:
from sentence_transformers import SentenceTransformer
from sklearn.decomposition import PCA
from src.clustering import run_hdbscan, run_kmeans
from src.visualization import plot_embedding, plot_clusters


# === 1. Load model and sample data ===
model = SentenceTransformer('all-MiniLM-L6-v2')
texts = [
    'The app crashes when I try to log in',
    'Payment process was smooth and fast',
    'Customer support was very helpful',
    'Delivery was late by 3 days',
    'I love the new design',
    'Checkout keeps failing with error code 500',
    'App is slow and unresponsive sometimes',
    'Great shopping experience overall',
]

# === 2. Encode into embeddings ===
embeddings = model.encode(texts)
print('Embeddings shape:', embeddings.shape)

# === 3. Dimensionality reduction for visualization ===
X_pca = PCA(n_components=2, random_state=42).fit_transform(embeddings)
plot_embedding(X_pca, labels=None, title='PCA-reduced embeddings').show()

In [None]:
# === 4. Apply clustering ===
labels_hdb = run_hdbscan(X_pca, min_cluster_size=2)
labels_kmeans, _ = run_kmeans(X_pca, n_clusters=3)
plot_clusters(X_pca, labels_hdb, title='HDBSCAN clusters on feedback').show()
plot_clusters(X_pca, labels_kmeans, title='KMeans clusters on feedback').show()

# === 5. Inspect clustering results ===
for text, label in zip(texts, labels_hdb):
    print(f'[{label}] {text}')

# ✅ Conclusion
- HDBSCAN can automatically find themes (e.g., crashes, payment, design).
- KMeans forces fixed groups (e.g., sentiment categories).
- This approach works well for **sentiment clustering** and **feedback grouping**.