# Demo 5: Customer Segmentation with RFM Scoring & Persona Mapping


In this notebook, we perform customer segmentation using RFM analysis — a simple but powerful technique based on Recency, Frequency, and Monetary value.

Instead of using clustering, we define customer personas based on RFM score combinations and provide actionable insights.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


## Load and preview dataset

In [None]:
# Replace with actual data path
df = pd.read_csv("data/online_retail.csv", encoding='ISO-8859-1')
df.head()


## Data Cleaning

In [None]:
df = df.dropna()
df = df[(df['Quantity'] > 0) & (df['UnitPrice'] > 0)]
df['TotalPrice'] = df['Quantity'] * df['UnitPrice']
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])


## RFM Metric Calculation

In [None]:
snapshot_date = df['InvoiceDate'].max() + pd.Timedelta(days=1)

rfm = df.groupby('CustomerID').agg({
    'InvoiceDate': lambda x: (snapshot_date - x.max()).days,
    'InvoiceNo': 'count',
    'TotalPrice': 'sum'
})
rfm.rename(columns={'InvoiceDate': 'Recency', 'InvoiceNo': 'Frequency', 'TotalPrice': 'Monetary'}, inplace=True)
rfm = rfm[rfm.Monetary > 0]
rfm.head()


## RFM Scoring and Segmentation

In [None]:
# Score each RFM metric from 1 (low) to 5 (high)
rfm['R_Score'] = pd.qcut(rfm['Recency'], 5, labels=[5,4,3,2,1]).astype(int)
rfm['F_Score'] = pd.qcut(rfm['Frequency'].rank(method='first'), 5, labels=[1,2,3,4,5]).astype(int)
rfm['M_Score'] = pd.qcut(rfm['Monetary'], 5, labels=[1,2,3,4,5]).astype(int)

rfm['RFM_Score'] = rfm[['R_Score', 'F_Score', 'M_Score']].sum(axis=1)
rfm.head()


## Define Customer Personas

In [None]:
def segment_customer(row):
    if row['RFM_Score'] >= 13:
        return 'VIP'
    elif row['RFM_Score'] >= 10:
        return 'Loyal'
    elif row['RFM_Score'] >= 6:
        return 'Potential'
    else:
        return 'At Risk'

rfm['Segment'] = rfm.apply(segment_customer, axis=1)
rfm.head()


## Segment Analysis

In [None]:
segment_stats = rfm.groupby('Segment').agg({
    'Recency': 'mean',
    'Frequency': 'mean',
    'Monetary': ['mean', 'count']
}).round(1)
segment_stats.columns = ['Recency_Mean', 'Frequency_Mean', 'Monetary_Mean', 'Count']
segment_stats.sort_values('Monetary_Mean', ascending=False)


## Visualization

In [None]:
plt.figure(figsize=(8,5))
sns.countplot(x='Segment', data=rfm, order=rfm['Segment'].value_counts().index)
plt.title('Customer Counts by Segment')
plt.show()


## Conclusion


This RFM segmentation provides a human-readable, business-friendly way to understand customer behavior:

- **VIPs**: Frequent, recent, and high spend — target with loyalty programs.
- **Loyal**: Regular buyers — maintain engagement.
- **Potential**: Moderate metrics — room to grow.
- **At Risk**: Infrequent or low spend — consider win-back campaigns.

This approach is fast, interpretable, and forms a foundation for retention, upsell, or re-engagement strategies.


## Bonus: DBSCAN Clustering on RFM Scores

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN
from sklearn.decomposition import PCA

# Prepare scaled features
scaler = StandardScaler()
rfm_scaled = scaler.fit_transform(rfm[['Recency', 'Frequency', 'Monetary']])

# Reduce to 2D for clustering visualization
pca = PCA(n_components=2)
rfm_pca = pca.fit_transform(rfm_scaled)

# Apply DBSCAN
db = DBSCAN(eps=0.5, min_samples=5)
clusters = db.fit_predict(rfm_pca)

# Add cluster to DataFrame
rfm['Cluster_DBSCAN'] = clusters


In [None]:
# Visualize clusters
rfm_pca_df = pd.DataFrame(rfm_pca, columns=["PC1", "PC2"])
rfm_pca_df['Cluster'] = clusters

plt.figure(figsize=(10,6))
sns.scatterplot(data=rfm_pca_df, x="PC1", y="PC2", hue="Cluster", palette="tab10")
plt.title("DBSCAN Clustering of RFM Segments")
plt.show()



Adding DBSCAN provides a **clustering-based view** of the same customer population, allowing for the identification of density-based customer groups and potential outliers. This adds depth to the earlier rule-based segmentation.
