# ISSE Module 1: D2C Customer Lifetime Value (LTV) Model

This notebook builds the first component of our Customer Intelligence Engine. We will:
1. Load the cleaned transactional data.
2. Transform it into the RFM (Recency, Frequency, Monetary) format.
3. Fit the BG/NBD model to understand purchase and churn behavior.
4. Fit the Gamma-Gamma model to estimate average transaction values.
5. Combine both models to forecast the Customer Lifetime Value (CLV) for each customer.

### Step 1: Load Processed Data

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from lifetimes.utils import summary_data_from_transaction_data
from lifetimes import BetaGeoFitter, GammaGammaFitter
from lifetimes.plotting import plot_frequency_recency_matrix, plot_probability_alive_matrix

sns.set_style('whitegrid')

# Load the cleaned data from our processing pipeline
try:
    orders_df = pd.read_csv('../data/processed/transactional_data.csv')
    orders_df['order_date'] = pd.to_datetime(orders_df['order_date'])
    print("Processed transactional data loaded successfully.")
    display(orders_df.head())
except FileNotFoundError:
    print("ERROR: Processed data not found. Please run 'make process-data' first.")

### Step 2: Create RFM Data

In [None]:
rfm_df = summary_data_from_transaction_data(
    orders_df,
    customer_id_col='customer_id',
    datetime_col='order_date',
    monetary_value_col='order_value'
)

print("RFM data created:")
display(rfm_df.head())

### Step 3: Fit BG/NBD Model and Visualize Behavior

In [None]:
bgf = BetaGeoFitter(penalizer_coef=0.001) # Small penalizer for stability
bgf.fit(rfm_df['frequency'], rfm_df['recency'], rfm_df['T'])

print("BG/NBD Model Summary:")
display(bgf.summary)

plt.figure(figsize=(12, 8))
plot_frequency_recency_matrix(bgf)
plt.title('Expected Number of Future Purchases')
plt.show()

plt.figure(figsize=(12, 8))
plot_probability_alive_matrix(bgf)
plt.title('Probability Customer is Still Active')
plt.show()

### Step 4: Fit Gamma-Gamma Model and Forecast CLV

In [None]:
# The Gamma-Gamma model requires customers with frequency > 0
returning_customers = rfm_df[rfm_df['frequency'] > 0]

# Check assumption: monetary value and frequency should not be correlated
print("Correlation Matrix:")
display(returning_customers[['monetary_value', 'frequency']].corr())

ggf = GammaGammaFitter(penalizer_coef=0.001)
ggf.fit(returning_customers['frequency'], returning_customers['monetary_value'])

print("Gamma-Gamma Model Summary:")
display(ggf.summary)

# Forecast CLV for the next 12 months
clv_forecast = ggf.customer_lifetime_value(
    bgf, # The trained BG/NBD model
    rfm_df['frequency'],
    rfm_df['recency'],
    rfm_df['T'],
    rfm_df['monetary_value'],
    time=12,  # 12 months
    discount_rate=0.01 # Monthly discount rate (12% annually)
)

rfm_df['predicted_clv_12_months'] = clv_forecast

print("\nTop 10 Most Valuable Customers (Forecasted 12-Month CLV):")
display(rfm_df.sort_values(by='predicted_clv_12_months', ascending=False).head(10))

### Step 5: Save Results

In [None]:
output_path = '../data/processed/customer_ltv_predictions.csv'
rfm_df.to_csv(output_path)
print(f"LTV predictions saved to {output_path}")