# üõí P2.1.1.8 ‚Äì Machine Learning Foundations

## Topic: Customer Segmentation & Unsupervised Clustering Example


## üéØ Learning Objectives

By the end of this notebook, you will be able to:

- Understand the concept of customer segmentation
- Apply K-Means clustering to group customers
- Prepare and vectorize categorical data for clustering
- Interpret and visualize clustering results


## üìù Problem Statement

A retail store wants to group its customers based on their purchase category (e.g., Electronics, Groceries, Clothing) and city (e.g., Mumbai, Delhi, Bangalore) to personalize marketing and improve sales.

**Why is this important?**
- Helps target promotions
- Improves customer experience
- Increases sales and loyalty

## üîç Why Unsupervised Learning?

- We do not have labels (no predefined groups like 'Electronics Lovers', 'Groceries Shoppers', etc.)
- The goal is to discover natural groupings in the data based on text features (category and city)

**Why not supervised?**
- Supervised learning requires labeled data, which we don't have for this segmentation task

## ü§ñ Choosing the Model & Why

We use **K-Means Clustering** because:
- It is simple and widely used
- It groups customers based on similarity in their purchase category and city
- It works well after vectorizing text features

**Why not other models?**
- Hierarchical clustering, DBSCAN, etc. can be used, but K-Means is a classic choice for customer segmentation

## üõ†Ô∏è Example: Customer Segmentation Pipeline

This example shows the steps:
1. Prepare customer features (text: purchase category + city)
2. Vectorize text features
3. Apply K-Means clustering
4. Interpret clusters
```

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.cluster import KMeans

# Customer purchase categories and city (text)
categories = [
    "Electronics Mumbai",
    "Groceries Delhi",
    "Clothing Mumbai",
    "Electronics Delhi",
    "Groceries Mumbai",
    "Clothing Delhi",
    "Electronics Bangalore",
    "Groceries Bangalore"
]
# Vectorize text features
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(categories).toarray()
# Apply K-Means clustering
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)
print("Cluster Centers:\n", kmeans.cluster_centers_)
print("Labels for each customer:", kmeans.labels_)
print("Feature Names:", vectorizer.get_feature_names_out())

## üìä Understanding Clusters & Interpretation

- **Cluster Centers:** Show the average presence of each word (category/city) for each group
- **Labels:** Assign each customer to a group based on their purchase category and city
- **Feature Names:** Show which words are most important for each cluster

**Why do we need these?**
- To understand customer segments
- To personalize marketing and offers
- To make data-driven business decisions