# Customer Segmentation with RFM Analysis

**Dataset**: Synthetic Online Retail (structure matches UCI).  
**Stack**: pandas, numpy, matplotlib (seaborn optional).  
**Flow**: Load → Clean/Validate → RFM → Score → Segment → Visualize → Insights.

## 1. Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns  # optional
pd.set_option('display.max_columns', 50)

## 2. Load Data

In [None]:
df = pd.read_csv(r"/mnt/data/online_retail_synthetic.csv", parse_dates=["InvoiceDate"])
df.head()

## 3. Feature Engineering (Amount) & Reference Date

In [None]:
df['Amount'] = df['Quantity'] * df['UnitPrice']
ref_date = df['InvoiceDate'].max() + pd.Timedelta(days=1)

## 4. Compute RFM

In [None]:
rfm = (
    df.groupby('CustomerID').agg(
        Recency=('InvoiceDate', lambda x: (ref_date - x.max()).days),
        Frequency=('InvoiceNo', pd.Series.nunique),
        Monetary=('Amount', 'sum'),
        Country=('Country', lambda x: x.mode().iat[0] if not x.mode().empty else x.iloc[0])
    )
).reset_index()
rfm.head()

## 5. Scoring (1–5)

In [None]:
def score_by_quantiles(s, q=5, reverse=False):
    try:
        labels = list(range(1, q+1))
        if reverse:
            labels = labels[::-1]
        return pd.qcut(s.rank(method="first"), q, labels=labels)
    except Exception:
        import numpy as np
        bins = np.linspace(s.min()-1e-9, s.max()+1e-9, q+1)
        out = pd.cut(s, bins=bins, labels=list(range(1, q+1)), include_lowest=True)
        return out if not reverse else out.cat.rename_categories(lambda x: str(q+1-int(x)))

rfm['R_Score'] = score_by_quantiles(rfm['Recency'], q=5, reverse=True).astype(int)
rfm['F_Score'] = score_by_quantiles(rfm['Frequency'], q=5, reverse=False).astype(int)
rfm['M_Score'] = score_by_quantiles(rfm['Monetary'], q=5, reverse=False).astype(int)
rfm.head()

## 6. Segmentation

In [None]:
def assign_segment(row):
    R, F, M = row['R_Score'], row['F_Score'], row['M_Score']
    if R >= 4 and F >= 4 and M >= 4:
        return 'Champions'
    if R >= 4 and F >= 3:
        return 'Loyal Customers'
    if R >= 4 and F >= 2 and M >= 2:
        return 'Potential Loyalists'
    if R == 5 and F == 1:
        return 'Recent Customers'
    if R == 4 and F == 1:
        return 'Promising'
    if R == 3 and F >= 2:
        return 'Needs Attention'
    if R <= 2 and F >= 4:
        return "Can't Lose Them"
    if R <= 2 and F >= 3:
        return 'At Risk'
    if R == 1 and F == 1:
        return 'Lost'
    if R <= 2 and F <= 2 and M <= 2:
        return 'Hibernating'
    return 'Others'

rfm['Segment'] = rfm.apply(assign_segment, axis=1)
rfm.head()

## 7. Visualizations

In [None]:
seg_counts = rfm['Segment'].value_counts().sort_values(ascending=False)
plt.figure(figsize=(8,5))
seg_counts.plot(kind='bar')
plt.title('Customer Segments (Count)')
plt.xlabel('Segment')
plt.ylabel('Number of Customers')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

In [None]:
pivot_m = rfm.pivot_table(index='R_Score', columns='F_Score', values='Monetary', aggfunc='mean')
plt.figure(figsize=(6,5))
plt.imshow(pivot_m.values, aspect='auto')
plt.title('Avg Monetary by R and F')
plt.xlabel('F_Score')
plt.ylabel('R_Score')
plt.colorbar()
plt.xticks(ticks=range(pivot_m.shape[1]), labels=list(pivot_m.columns))
plt.yticks(ticks=range(pivot_m.shape[0]), labels=list(pivot_m.index))
plt.tight_layout()
plt.show()

In [None]:
df_with_seg = df.merge(rfm[['CustomerID','Segment']], on='CustomerID', how='left')
rev_by_seg = df_with_seg.groupby('Segment')['Amount'].sum().sort_values(ascending=False)
plt.figure(figsize=(8,5))
rev_by_seg.plot(kind='bar')
plt.title('Revenue by Segment')
plt.xlabel('Segment')
plt.ylabel('Total Revenue')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

## 8. Insights

- Champions: VIP care and exclusives
- Loyal: reward and referral programs
- Potential Loyalists: targeted cross/upsell
- At Risk/Can't Lose: win-back incentives
- Hibernating/Lost: low-cost touches and feedback