https://gamma.app/docs/Customer-Segmentation-Clustering-Business-Insights-Report-xswkoh6gg32h24o

# 📊 Key Business Insights from Customer Segmentation

---

## 1. Customer Segmentation Overview

Customers were segmented using **RFM (Recency, Frequency, Monetary)** metrics.

Three clustering methods were applied:
- KMeans
- Hierarchical
- DBSCAN

✅ **KMeans with 4 clusters** provided the best performance based on internal metrics.

---

## 2. Customer Groups Identified (via KMeans)

| Cluster | Size | Description |
|--------:|-----:|-------------|
| **3**   | 1669 | 🏆 **Best Customers**: Recent purchases, high frequency, and high monetary value. |
| **1**   | 952  | 😊 **Loyal Regulars**: Moderate frequency and spending, fairly recent activity. |
| **0**   | 904  | ⚠️ **At-Risk Customers**: Long time since last purchase, low spending. |
| **2**   | 386  | 💰 **Big Spenders but Infrequent**: High spenders but low purchase frequency. |

✅ **Actionable Insight**: Focus marketing efforts on **retaining Cluster 3** and **reactivating Cluster 0**.

---

### 📉 Cluster Distribution Plot (KMeans Output)

![Cluster Distribution](plots\kmeans_clusters.png)


---

## 3. RFM Metric Observations

- **Recency**: A significant portion of customers haven’t purchased recently — suggests room for re-engagement campaigns.
- **Frequency**: A small subset buys frequently — ideal for loyalty programs.
- **Monetary**: Top 10% of customers account for a large share of revenue — need VIP-level service or incentives.

---

## 4. Data Quality & Cleaning Impact

🔍 **Removed**:
- Missing `CustomerID`s (~25% of rows)
- Cancellations and zero/negative prices

📉 Dataset reduced from **541,909 → 397,884 rows** (valid customers only)

✅ This ensures the insights are based on **reliable, active customer behavior**.

---

## 5. Business Opportunities

🎯 **Personalized Marketing**: Target each cluster with tailored promotions:
- **Best Customers**: Early access to products, loyalty rewards
- **At-Risk Customers**: Win-back discounts or emails
- **Infrequent Big Spenders**: Reminders & upsell opportunities

📦 **Inventory Planning**: Align stock with preferences of frequent and high-spending segments.

🔄 **Customer Retention Strategy**: Cluster analysis helps identify early signs of churn based on recency and frequency.

---

## 6. Strategic Recommendations

✅ Implement **cluster-specific email marketing** and **loyalty programs**  
✅ Use **RFM scores** to automate customer lifecycle marketing  
✅ Track **movement between clusters** over time to measure customer journey effectiveness


# 🧠 Customer Segmentation & Clustering – Business Insights Report

---

## 📌 Project Title:
**Customer Segmentation and Clustering using RFM Analysis and Unsupervised Learning**

---

## 🎯 Objective:
To segment customers based on their purchasing behavior using **RFM (Recency, Frequency, Monetary)** metrics and apply clustering algorithms to derive **actionable business insights**.

---

## 1. 📊 Summary of Data and Preprocessing

- **Raw Data Shape**: (541,909 rows × 8 columns)  
- **Cleaned Data Shape**: (397,884 rows)

### Issues Resolved:
- Missing `CustomerID` (~25% of rows)
- Negative/zero `Quantity` or `UnitPrice`
- Canceled transactions (`InvoiceNo` starts with `'C'`)

✅ Clean dataset ensured reliable analysis and insights.

---

## 2. 🔍 RFM Feature Engineering

- **Customers analyzed**: 4,338 unique customers

### Features:
- **Recency** – Days since last purchase  
- **Frequency** – Number of purchases  
- **Monetary** – Total spend

### RFM handling outlier:  
- **Before**
![Cluster Distribution](plots\Frequency_distribution_Before.png)
- **After**
![Cluster Distribution](plots\Frequency_distribution_After.png)
- **Before**
![Cluster Distribution](plots\Monetary_distribution_Before.png)
- **After**
![Cluster Distribution](plots\Monetary_distribution_After.png)
- **Before**
![Cluster Distribution](plots\Recency_distribution_Before.png)
- **After**
![Cluster Distribution](plots\Recency_distribution_After.png)
D:\new_segmentation\plots\kmeans_clusters.png

Filtered for outliers → **3,911 customers used for clustering**

---

## 3. 🔗 Clustering Results & Interpretation

### A. KMeans Clustering (Best Performance)
- **Optimal Clusters**: 4  
- **Silhouette Score**: 0.4255  
- **Davies-Bouldin Index**: 0.8735  

| Cluster | Count | Characteristics |
|--------:|------:|------------------|
| 3 | 1669 | 🏆 High frequency & spend, recent buyers – **Top Customers** |
| 1 | 952  | 😊 Good spenders, moderately recent – **Loyal Regulars** |
| 0 | 904  | ⚠️ Inactive customers, low spend – **At Risk** |
| 2 | 386  | 💰 High spend, low frequency – **Big Spenders** |

---

### B. Hierarchical Clustering
- **Silhouette Score**: 0.3975  
- **DB Index**: 0.7988  
- **Formed 3 Clusters**:
  - Large base of moderate customers
  - A few premium or outlier groups

---

### C. DBSCAN Clustering
- **Cluster 0 (Core)**: 3,869 customers  
- **Noise (Outliers)**: 42 customers

---
![Cluster Distribution](plots\kmeans_clusters.png)

## 4. 💡 Business Insights

### 🧍‍♂️ Customer Behavior:
- **Cluster 3 (Best Customers)**: Generate highest revenue → prioritize with **loyalty programs** and **early product access**
- **Cluster 0 (At Risk)**: Signs of churn → run **reactivation campaigns**
- **Cluster 2 (Big Spenders, Low Frequency)**: Profitable → incentivize **regular purchases**
- **Cluster 1 (Loyal Regulars)**: Stable → engage with **personalized offers**

### 📊 RFM Metric Learnings:
- Many customers haven’t purchased recently
- **Monetary** values are **highly skewed** — a small group drives most of the revenue
- **Frequency** analysis shows a sharp drop-off after 3 purchases

---

## 5. 📦 Strategic Recommendations

- 🔁 **Retention Programs**: Design for **Cluster 3** and **2** to maintain loyalty
- ✉️ **Targeted Email Campaigns**:
  - Win-back strategies for **Cluster 0**
  - Promotions for **Loyal Cluster 1**
- 🛍️ **VIP & Premium Segments**: Target based on high-Monetary clusters
- 📉 **Churn Monitoring**: Use **Recency** as an early churn signal
- 📊 **Segment Tracking**: Recalculate clusters **monthly** to monitor behavior shifts

---

## 6. 🧾 Final Artifacts and Storage

- **Cleaned Data**: `data/cleaned_data.csv`  
- **RFM Data**: `data/rfm_data_filtered.csv`  
- **Final Labeled Data**: `data/clustered_rfm.csv`

### Models Saved:
- `models/kmeans_model.pkl`  
- `models/hierarchical_model.pkl`  
- `models/dbscan_model.pkl`  
- `models/scaler.pkl`

### 📈 Visualizations:
- All plots stored in `plots/`

![Cluster Distribution](plots\3d_rfm_scatter.png)

![Cluster Distribution](plots\kmeans_clusters.png)

---

## ✅ Conclusion:
**KMeans clustering** outperformed other methods, offering **meaningful customer segments** for marketing and operational strategy.
