# üìä Week 2: Customer Segmentation & RFM Analysis

**Goal:** Group customers by their behavior patterns

## What we'll do:
1. Calculate RFM scores (Recency, Frequency, Monetary)
2. Segment customers into groups
3. Identify churned customers
4. Visualize customer segments

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

# Load data
customers = pd.read_csv('../data/customers.csv')
transactions = pd.read_csv('../data/transactions.csv')

# Convert date column
transactions['transaction_date'] = pd.to_datetime(transactions['transaction_date'])

print(f"‚úÖ Loaded {len(customers):,} customers")
print(f"‚úÖ Loaded {len(transactions):,} transactions")

In [None]:
# Set analysis date (today's date)
analysis_date = transactions['transaction_date'].max()
print(f"Analysis date: {analysis_date.date()}")

# Calculate RFM for each customer
rfm = transactions.groupby('customer_id').agg({
    'transaction_date': lambda x: (analysis_date - x.max()).days,  # Recency
    'transaction_id': 'count',                                      # Frequency
    'total_amount': 'sum'                                           # Monetary
}).reset_index()

# Rename columns
rfm.columns = ['customer_id', 'recency', 'frequency', 'monetary']

print(f"\n‚úÖ RFM calculated for {len(rfm):,} customers")
rfm.head(10)

In [None]:
# Define customer segments based on simple rules
def segment_customer(row):
    if row['recency'] > 180:
        return 'Churned'
    elif row['frequency'] >= 10 and row['monetary'] >= 1000:
        return 'Champion'
    elif row['frequency'] >= 5:
        return 'Loyal'
    elif row['frequency'] >= 2:
        return 'Occasional'
    else:
        return 'One-time'

rfm['segment'] = rfm.apply(segment_customer, axis=1)

# Count customers in each segment
segment_counts = rfm['segment'].value_counts()
print("\nüìä Customer Segments:")
print(segment_counts)
print(f"\nPercentage:")
print((segment_counts / len(rfm) * 100).round(1))

In [None]:
# Revenue by segment
revenue_by_segment = rfm.groupby('segment')['monetary'].sum().sort_values(ascending=False)

plt.figure(figsize=(10, 6))
revenue_by_segment.plot(kind='bar', color='steelblue', edgecolor='black')
plt.title('Total Revenue by Customer Segment', fontsize=16, fontweight='bold')
plt.xlabel('Segment')
plt.ylabel('Revenue (‚Ç¨)')
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print("\nüí∞ Revenue by Segment:")
print(revenue_by_segment)

## üéØ Key Insights

**Fill in after running the analysis:**

1. Segment distribution:
   - Champions: ____%
   - Loyal: ____%
   - Churned: ____%

2. Revenue insights:
   - Which segment generates most revenue? _______
   - What percentage are churned? _______

3. Actions to take:
   - How to retain Champions? _______
   - How to win back Churned customers? _______

---

## üìù Next Steps (Week 3):
- Build machine learning model to predict churn
- Feature engineering for ML
- Model evaluation and interpretation

## üí° Pro Tip: Using Reusable Functions

Instead of copying code between notebooks, you can use functions from `src/common.py`:

```python
import sys
sys.path.append('../')
from src.common import load_data, calculate_rfm

# Use them:
customers, transactions = load_data()
rfm = calculate_rfm(transactions)
```

This makes your code cleaner and more professional!