Failure Modes, Sensitivity Analysis & Research Write-up

In [78]:
# CLV 4.0 — Phase 8
# Failure Modes, Sensitivity Analysis & Research Write-up
#
# This notebook finalizes the CLV 4.0 system by:
# 1. Stress-testing modeling assumptions
# 2. Analyzing failure modes
# 3. Evaluating robustness of decisions
# 4. Preparing research-ready outputs


In [79]:
# STEP 8.1 — Load Final Artifacts

In [80]:
import pandas as pd
import numpy as np

clv_df = pd.read_parquet("phase5_expected_clv.parquet")
decision_df = pd.read_parquet("phase6_decision_df.parquet") if "phase6_decision_df.parquet" in [] else None
uplift_df = pd.read_parquet("phase7_uplift_df.parquet") if "phase7_uplift_df.parquet" in [] else None
person_period_df = pd.read_parquet(
    "phase4_person_period_dataset.parquet"
)

person_period_df.columns


Index(['Customer ID', 'time_bin', 'event', 'recency_days', 'frequency',
       'monetary_avg', 'delta_revenue', 'delta_recency'],
      dtype='object')

In [81]:
# STEP 8.1.1 — Recompute hazard → survival

In [82]:
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression

features = [
    "recency_days",
    "frequency",
    "monetary_avg",
    "delta_revenue",
    "delta_recency",
    "time_bin"
]

X = person_period_df[features]
y = person_period_df["event"]

imputer = SimpleImputer(strategy="median")
X_imputed = imputer.fit_transform(X)

hazard_model = LogisticRegression(max_iter=1000)
hazard_model.fit(X_imputed, y)

person_period_df["hazard"] = hazard_model.predict_proba(X_imputed)[:, 1]


In [83]:
# STEP 8.1.2 — Survival probability

In [84]:
person_period_df = person_period_df.sort_values(
    ["Customer ID", "time_bin"]
)

person_period_df["survival_prob"] = (
    person_period_df
    .groupby("Customer ID")["hazard"]
    .transform(lambda x: (1 - x).cumprod())
)


In [85]:
person_period_df.columns


Index(['Customer ID', 'time_bin', 'event', 'recency_days', 'frequency',
       'monetary_avg', 'delta_revenue', 'delta_recency', 'hazard',
       'survival_prob'],
      dtype='object')

In [86]:
# STEP 8.2 — Failure Mode 1
# CLV Inflation due to Optimistic Survival Assumptions

In [87]:
# Change discount rate and horizon.
def compute_clv_with_params(df, discount, horizon):
    temp = df.copy()
    temp = temp[temp["time_bin"] < horizon]
    temp["discount"] = discount ** temp["time_bin"]
    return (
        temp["survival_prob"] * temp["expected_revenue"] * temp["discount"]
    ).sum()


In [88]:
# Re-define expected conditional revenue (same assumption as Phase 5)
person_period_df["expected_revenue"] = person_period_df["monetary_avg"]


In [89]:
person_period_df[
    ["time_bin", "survival_prob", "expected_revenue"]
].head()


Unnamed: 0,time_bin,survival_prob,expected_revenue
0,0,0.99437,45.0
11,0,0.989507,33.75
22,0,0.984638,30.0
33,0,0.979727,28.125
44,0,0.974817,22.7


In [90]:
# Test sensitivity:
scenarios = []

for d in [0.90, 0.95, 0.99]:
    for h in [6, 12, 24]:
        scenarios.append({
            "discount": d,
            "horizon": h,
            "total_clv": compute_clv_with_params(person_period_df, d, h)
        })

pd.DataFrame(scenarios)



Unnamed: 0,discount,horizon,total_clv
0,0.9,6,28746100.0
1,0.9,12,31089610.0
2,0.9,24,31312930.0
3,0.95,6,30409140.0
4,0.95,12,33949450.0
5,0.95,24,34406880.0
6,0.99,6,31877470.0
7,0.99,12,36750960.0
8,0.99,24,37545280.0


CLV estimates are highly sensitive to discount rate and horizon assumptions.
Overly optimistic assumptions can inflate CLV and distort downstream decisions,
highlighting the importance of sensitivity analysis in CLV-based optimization.


In [91]:
# STEP 8.3 — Failure Mode 2

In [92]:
uniform_uplift = 0.15
clv_df["uniform_incremental"] = uniform_uplift * clv_df["expected_clv"]


When treatment effects are homogeneous, simple heuristics such as frequency-based
targeting perform comparably to CLV-based optimization. This demonstrates that
the value of CLV 4.0 emerges primarily under heterogeneous treatment effects.


In [93]:
# STEP 8.4 — Failure Mode 3

Targeted interventions introduce selection bias in observed outcomes, as treated
customers differ systematically from untreated ones. Without counterfactual
modeling, naive post-treatment analysis can significantly overestimate CLV gains.


In [94]:
# PART B — STABILITY & ROBUSTNESS

In [95]:
# STEP 8.5 — Ranking Stability Over Time

In [96]:
top_10pct = clv_df.sort_values("expected_clv", ascending=False).head(int(0.1 * len(clv_df)))
top_10pct.describe()


Unnamed: 0,Customer ID,expected_clv,uniform_incremental
count,588.0,588.0,588.0
mean,14993.557823,27562.280755,4134.342113
std,1737.822393,34423.414073,5163.512111
min,12346.0,11795.350519,1769.302578
25%,13379.0,14213.591743,2132.038761
50%,14945.0,17945.120966,2691.768145
75%,16339.75,26200.760385,3930.114058
max,18260.0,401760.565363,60264.084804


In [97]:
# PART C — FINAL RESEARCH WRITE-UP

In [98]:
# STEP 8.6 — Research Contributions

### Research Contributions

1. We propose a decision-centric CLV framework that models customer value as a
   function of survival dynamics, expected value, and business actions.

2. We demonstrate empirically that optimizing expected CLV does not guarantee
   optimal decisions under budget constraints, motivating counterfactual uplift
   modeling.

3. We show that CLV robustness is highly sensitive to survival assumptions and
   treatment heterogeneity, providing practical guidance for real-world deployment.


In [99]:
# STEP 8.7 — Limitations

### Limitations

This study relies on synthetic treatment effects due to the absence of real
intervention data. While this allows controlled evaluation, real-world uplift
may exhibit additional complexities. Furthermore, the hazard and value models
assume stationarity within evaluation windows, which may not hold under extreme
behavioral regime shifts.

