## **8. Algorithm: KMeans Clustering**  
### **Type: Unsupervised**

In unsupervised learning, the model identifies patterns in the data without predefined labels. KMeans is commonly used for clustering tasks like customer segmentation, where we want the model to discover natural groupings of data.


In [0]:
# Upgrade threadpoolctl to avoid Databricks internal warnings
%pip install --upgrade threadpoolctl
dbutils.library.restartPython()

In [0]:
# Prepare training dataset
import pandas as pd

df = pd.DataFrame({
    "customer_id": [1, 2, 3, 4, 5],                     # Unique customer ID
    "monthly_spend": [100, 150, 300, 1200, 1100],       # Customer's monthly spend
    "visits_per_month": [1, 2, 3, 12, 10]               # Number of visits per month
})

In [0]:
# Train the KMeans clustering model
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=2, n_init=10)  # We want to divide the customers into 2 groups
kmeans.fit(df[["monthly_spend", "visits_per_month"]])  # Train only on features

df['cluster'] = kmeans.labels_  # Assign cluster labels to customers
df

In [0]:
# Make predictions on new data - Predict cluster for new customer
new_data = pd.DataFrame({
    "customer_id": [6],     
    "monthly_spend": [700],      
    "visits_per_month": [8]  
})
new_data['cluster'] = kmeans.predict(new_data[["monthly_spend", "visits_per_month"]])  
print(f"New customer {new_data['customer_id'][0]} belongs to cluster: {new_data['cluster'][0]}")
