<a href="https://colab.research.google.com/github/julie-dfx/causal-decision-analytics/blob/main/00_reboot_05_heterogeneous_effects.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Heterogeneous Effects: Compensation and Retention

## Core question
Does the effect of compensation on retention vary across customer or order characteristics?

## Why this matters
Average effects might hide important differences, but subgroup analysis introduces additional identification and inference risks

## Results

- heterogeneous effects were estimated using interaction terms to avoid repeated subgroup conditioning and reduce false discovery risk



## Limitations

- interpretation of heterogeneous effects assumes that identification holds within subgroups and that subgroup definitions are pre-treatment

In [1]:
# Simulate data with true heterogeneity

import numpy as np
import pandas as pd
import statsmodels.api as sm

np.random.seed(8)
n = 6000

# Segment variable (observed)
high_value = np.random.binomial(1, 0.4, n)

# Severity (observed)
severity = np.random.normal(0, 1, n)

# Unobserved sentiment
sentiment = np.random.normal(0, 1, n)

#compensation decision
comp = (
    1.2 * severity
    + 0.8 * sentiment
    + np.random.normal(0, 1, n) # noise
) > 0
comp = comp.astype(int)

#true heterogeneous effect
true_effect = 2.5 * high_value + 0.5 * (1 - high_value)

#retention outcome
retention = (
    true_effect * comp
    - 2.0 * severity
    + sentiment
    + np.random.normal(0, 1, n) # noise
)

df = pd.DataFrame({
    "retention": retention,
    "high_value": high_value,
    "severity": severity,
    "comp": comp
})

# truth:
## compensation helps high-value customers much more
## avg effect hides this

In [3]:
# naive (dont trust)

for v in [0, 1]:
  res = sm.OLS(
      df[df["high_value"] == v]["retention"],
      sm.add_constant(df[df["high_value"] == v][["comp", "severity"]])
  ).fit()
  print(f"high_value={v}", res.params)

  #looks reasonable but is fragile

high_value=0 const      -0.571778
comp        1.526292
severity   -2.301062
dtype: float64
high_value=1 const      -0.457211
comp        3.453030
severity   -2.246760
dtype: float64


In [4]:
# Interaction model (preferred baseline)

df["interaction"] = df["high_value"] * df["comp"]

het_res = sm.OLS(
    df["retention"],
    sm.add_constant(df[["comp", "severity", "high_value", "interaction"]])
).fit()

het_res.params

# in these results, coeff on comp is for low-value customers (1,5); coeff on interaction is the delta for high value customer (+2)
# this is our baseline heterogeneous estimate

Unnamed: 0,0
const,-0.559114
comp,1.501646
severity,-2.279446
high_value,0.083257
interaction,1.987315


## Results
Compensation has a positive effect on retention, with substantially larger effects for high value customers. Heterogeneity is estimated using an interaction model to avoid repeated subgroup conditioning