<a href="https://colab.research.google.com/github/julie-dfx/causal-decision-analytics/blob/main/00_reboot_05_heterogeneous_effects.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Heterogeneous Effects: Compensation and Retention

## Core question
Does the effect of compensation on retention vary across customer or order characteristics?

## Why this matters
Average effects might hide important differences, but subgroup analysis introduces additional identification and inference risks

## Results

- heterogeneous effects were estimated using interaction terms to avoid repeated subgroup conditioning and reduce false discovery risk
- PRedictive targeting based on expected outcomes differs from uplift-based targeting, which focuses on incremental r=treatment. Interaction models provide a simple, interpretable approximation of uplift under causal assumptions

- Targeting decisions require estimating incremental effects rather than predicting outcomes; uplift-basedapproaches differ fundamentally from predictive models

## Limitations

- interpretation of heterogeneous effects assumes that identification holds within subgroups and that subgroup definitions are pre-treatment

- Uplift estimates from observational data rely on strong assumptions and shou;d be validated through experimentation before deployement

In [None]:
# Simulate data with true heterogeneity

import numpy as np
import pandas as pd
import statsmodels.api as sm

np.random.seed(8)
n = 6000

# Segment variable (observed)
high_value = np.random.binomial(1, 0.4, n)

# Severity (observed)
severity = np.random.normal(0, 1, n)

# Unobserved sentiment
sentiment = np.random.normal(0, 1, n)

#compensation decision
comp = (
    1.2 * severity
    + 0.8 * sentiment
    + np.random.normal(0, 1, n) # noise
) > 0
comp = comp.astype(int)

#true heterogeneous effect
true_effect = 2.5 * high_value + 0.5 * (1 - high_value)

#retention outcome
retention = (
    true_effect * comp
    - 2.0 * severity
    + sentiment
    + np.random.normal(0, 1, n) # noise
)

df = pd.DataFrame({
    "retention": retention,
    "high_value": high_value,
    "severity": severity,
    "comp": comp
})

# truth:
## compensation helps high-value customers much more
## avg effect hides this

In [None]:
# naive (dont trust)

for v in [0, 1]:
  res = sm.OLS(
      df[df["high_value"] == v]["retention"],
      sm.add_constant(df[df["high_value"] == v][["comp", "severity"]])
  ).fit()
  print(f"high_value={v}", res.params)

  #looks reasonable but is fragile

high_value=0 const      -0.571778
comp        1.526292
severity   -2.301062
dtype: float64
high_value=1 const      -0.457211
comp        3.453030
severity   -2.246760
dtype: float64


In [None]:
# Interaction model (preferred baseline)

df["interaction"] = df["high_value"] * df["comp"]

het_res = sm.OLS(
    df["retention"],
    sm.add_constant(df[["comp", "severity", "high_value", "interaction"]])
).fit()

het_res.params

# in these results, coeff on comp is for low-value customers (1,5); coeff on interaction is the delta for high value customer (+2)
# this is our baseline heterogeneous estimate

Unnamed: 0,0
const,-0.559114
comp,1.501646
severity,-2.279446
high_value,0.083257
interaction,1.987315


## Results
Compensation has a positive effect on retention, with substantially larger effects for high value customers. Heterogeneity is estimated using an interaction model to avoid repeated subgroup conditioning

# Targeting   vs average effects

Yhe goal is not to maximise outcomes among treated users, but to maximise the incremental effect caused by treatment. This requires estimating uplift rather than predicting outcomes

Key reframing:
- Prediction: who will have high Y
- Targeting: who will gain the most from treatment

In [10]:
# simulate a world where prediction fails
import numpy as np
import pandas as pd
import statsmodels.api as sm

np.random.seed(9)
n = 6000


# Segment variable (observed)
high_value = np.random.binomial(1, 0.4, n)

# Severity (observed)
severity = np.random.normal(0, 1, n)

# Unobserved sentiment
sentiment = np.random.normal(0, 1, n)

#compensation decision = treatment assignment
comp = (
    1.2 * severity
    + 0.8 * sentiment
    + np.random.normal(0, 1, n) # noise
) > 0
comp = comp.astype(int)

#Outcome baseline (without treatment)
baseline_retention = (
    3.0 * high_value
    - 2.0 * severity
    + sentiment
)

#true uplift
uplift = 2.5 * high_value + 0.5 * (1 - high_value)

#observed outcome
retention = (
    baseline_retention
    + uplift * comp
    + np.random.normal(0, 1, n) # noise
)

df = pd.DataFrame({
    "retention": retention,
    "high_value": high_value,
    "severity": severity,
    "comp": comp,
    "baseline_retention": baseline_retention,
    "uplift": uplift
})

# truth:
## high value customers already retain well
## but they also benefit more from comp
## prediction and uplift are not the same

In [11]:
# Naive targeting
## ranking customers by how good they look, not by how much they would change if treated
## it answers the wrong question: "who is good?" instead of " who benefits from the treatment"

df["predicted_retention"] = (
    3.0 * df["high_value"]
    - 2.0 * df["severity"]
)

df.sort_values("predicted_retention", ascending=False).head()

#these users already retain well, may not even need treatment.
# this is inefficient targeting

Unnamed: 0,retention,high_value,severity,comp,baseline_retention,uplift,predicted_retention
1309,11.717981,1,-3.630473,0,11.271441,2.5,10.260947
2648,10.368366,1,-3.551892,0,10.149091,2.5,10.103783
4420,9.512876,1,-3.270712,0,9.025514,2.5,9.541425
532,9.362632,1,-3.02515,0,9.75558,2.5,9.050301
2847,6.415346,1,-2.875031,0,7.311555,2.5,8.750061


In [12]:
# True uplift-based targeting

df.sort_values("uplift", ascending=False).head()

#those are the users whose behaviour changes most because of treamtent --> that is what we wish to approximate


Unnamed: 0,retention,high_value,severity,comp,baseline_retention,uplift,predicted_retention
5968,3.665467,1,0.806026,1,1.17383,2.5,1.387948
5969,10.418067,1,-1.151086,1,6.188287,2.5,5.302173
5970,6.590679,1,0.192801,1,4.361886,2.5,2.614398
5971,5.992872,1,-1.77411,0,5.268965,2.5,6.54822
5974,3.728973,1,-0.49624,0,2.813452,2.5,3.99248


In [14]:
# Practical uplift - proxy via interactions

df["interaction"] = df["high_value"] * df["comp"]

uplift_res = sm.OLS(
    df["retention"],
    sm.add_constant(df[["comp", "severity", "high_value", "interaction"]])
).fit()

uplift_res.params

# we can't run 2 separate models, one per group, because this breaks the hanges the identification, the confounding structure and the effective population
# this would make comparisons invalid by default

#the interaction models forces: one population, one identifying story, one comparision

# the interacrion term actually creates a 2x2 matrix for comp 0 1 and segment 0 1. It's only active when we treat high value customers.
# this encodes the questions: does treatment behave differently for HV customers

# the coefficient on interaction is uplift heterogeneity, not prediction
# it is separate from the coeff on baseline, and the coeff on high value (which absorbs "VIP custoemrs retain better anyways") --> without it, heterogeinity is confiooundedd

# this assumes:
## - linearity
## - same confounding structure within segments
## - that segmentation is pre-treatment
## - no interaction with unobserved confounders
# so this is a directional uplift signal, not a deployable targeting model

Unnamed: 0,0
const,-0.52005
comp,1.573484
severity,-2.286075
high_value,2.973809
interaction,1.959257
