
# 📈 Simulating Uplift Modeling Experiment for Subscription Renewal

This notebook simulates a randomized controlled experiment for uplift modeling in the context of a **subscription renewal offer**.

- Users are randomly assigned to **treatment** (renewal discount) or **control**.
- We observe whether each user **renews** their subscription.
- We simulate **heterogeneous treatment effects** based on user features.
- We estimate uplift using S-, T-, and X-learners.


In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from econml.metalearners import SLearner, TLearner, XLearner

np.random.seed(42)


In [None]:

# Simulate user features
n = 10000
X = np.random.normal(0, 1, (n, 5))
df = pd.DataFrame(X, columns=[f"x{i}" for i in range(1, 6)])

# Simulate treatment assignment (randomized experiment)
df['treatment'] = np.random.binomial(1, 0.3, size=n)

# Simulate heterogeneous treatment effect
# Users with x3 > 0 respond more positively to the renewal offer
base_renewal_prob = 0.4 + 0.1 * (df['x1'] > 0) - 0.1 * (df['x2'] < 0)
treatment_effect = 0.15 * (df['x3'] > 0) - 0.05 * (df['x4'] < 0)

# Simulate outcome
p_renewal = base_renewal_prob + df['treatment'] * treatment_effect
df['renewed'] = np.random.binomial(1, p_renewal.clip(0, 1))

df.head()


In [None]:

# Prepare data
X = df[[f"x{i}" for i in range(1, 6)]].values
T = df['treatment'].values
Y = df['renewed'].values


## Train and Compare Metalearners

In [None]:

# Use GradientBoostingClassifier as base model
base_model = GradientBoostingClassifier(n_estimators=100, max_depth=3)

s_learner = SLearner(overall_model=base_model)
t_learner = TLearner(models=GradientBoostingClassifier())
x_learner = XLearner(models=GradientBoostingClassifier())

# Fit
s_learner.fit(Y, T, X)
t_learner.fit(Y, T, X)
x_learner.fit(Y, T, X)

# Predict uplift
tau_s = s_learner.effect(X)
tau_t = t_learner.effect(X)
tau_x = x_learner.effect(X)


In [None]:

# Plot distribution of estimated uplift
plt.figure(figsize=(8,5))
plt.hist(tau_s, bins=30, alpha=0.5, label="S-Learner")
plt.hist(tau_t, bins=30, alpha=0.5, label="T-Learner")
plt.hist(tau_x, bins=30, alpha=0.5, label="X-Learner")
plt.title("Estimated Uplift Distribution")
plt.xlabel("Estimated Treatment Effect")
plt.ylabel("Number of Users")
plt.legend()
plt.show()


## Summary

In [None]:

# Show top users to target (highest estimated uplift)
df['tau_x'] = tau_x
df.sort_values(by='tau_x', ascending=False).head(10)
