# Conversion Model in Banks

## Understanding the Multiple Filters

You have three main layers that determine whether someone ends up as a “final sign-up”:

### Texting Filter – Who gets contacted?

1. Historically, you have a “propensity to answer” (or “likely to respond”) model. Anyone above a threshold gets texted; others do not.  
2. However, you also maintain a random subset that bypasses your propensity threshold (they get texted at random).  

This yields two subpopulations:  
$$
T = 1 \quad (\text{texted, often chosen by your old model + a random group})  
$$
$$
T = 0 \quad (\text{not texted}).  
$$

---

### Response Filter – Who actually responds?

Even among those texted ($T=1$), not everyone answers or applies. We observe “responded” or “applied” as a further selection step.

---

### Bank Acceptance Filter – Who is accepted by the bank?

For those who do apply, the bank runs its own credit model. Only those who pass the bank’s credit threshold get truly onboarded, i.e., show up as “final sign-ups.”

You said you can turn off your credit filter, so we’ll ignore your side’s credit cutoff. But the bank’s filter is not under your control, so it’s effectively the last gate.

Thus, you only observe “final sign-up” for those who:

- Are texted ($T=1$)  
- Decide to respond/apply  
- Pass the bank’s filter  

Everyone else is either unobserved (not texted) or observed with an outcome of “did not sign up,” but that “did not sign up” might be due to not responding or due to being rejected by the bank.

---

## A Conceptual Multi-Stage Modeling Approach

One way to tackle this is to model each stage separately, then combine. In essence:

1. **Stage A:** Model the probability of being texted ($T=1$)  
   This corrects for the fact that your older model (plus the random subset) determined who was texted.

2. **Stage B:** Among those texted, model the probability of responding / applying.

3. **Stage C:** Among those who respond, model the probability of being accepted by the bank.

Finally, you can combine these probabilities to get an overall estimate of $\,P(\text{final sign-up} \mid X)$. Specifically:

$$
P(\text{Texted} = 1 \mid X) \;(\text{Stage A}) \; \times \; P(\text{Respond} = 1 \mid \text{Texted} = 1, X) \;(\text{Stage B}) \; \times \; P(\text{BankAccept} = 1 \mid \text{Respond} = 1, X) \;(\text{Stage C}).
$$

However, in many marketing contexts, you have control over **Stage A** going forward (you can choose to text or not). So your real question might be:

> “If I do text this person, what is the probability they end up a final sign-up?”

Then, the formula simplifies to:

$$
P(\text{final sign-up} \mid \text{Texted} = 1, X) = P(\text{Respond} = 1 \mid X) \times P(\text{BankAccept} = 1 \mid \text{Respond} = 1, X).
$$

Where:

- $P(\text{Respond} = 1 \mid X)$ (Stage B) is the “likelihood they answer/apply if texted.”  
- $P(\text{BankAccept} = 1 \mid \text{Respond} = 1, X)$ (Stage C) is the “likelihood the bank accepts them if they do apply.”

---

### Accounting for the Random Subset and Non-Random Subset

Because your historical texting was mostly guided by a propensity model, you only see responses for the subset that got texted. This is selection bias.

Fortunately, you also have a random group that was texted purely at random, bypassing your old model. **This random group is crucial** to estimate the true relationship between $X$ and “likelihood of responding,” free from the old model’s bias.

Similarly, for bank acceptance, you only observe acceptance decisions for those who responded. Another selection. But you do see accept/reject outcomes among that subset, so you can approximate the bank’s credit filter from that data.

---

## Causal / Double-Robust Strategies

To correct for these multiple selection steps, you can use a combination of:

- **Inverse Probability Weighting (IPW)**
  - Estimate the probability of each filtering step. Weight or “rebalance” the data so that it reflects the entire population.

- **Heckman 2-Stage or Double-Robust (DR) Learners**
  - Popular in econometrics or libraries like `econml`.  
  - DR Learners can handle partial observability (some people not texted, some texted but no response, etc.) by modeling both the outcome and the selection process.  

**Key**: Your random subset helps you build or validate these models, because randomization ensures at least some fraction of every type of $X$ is texted, letting you estimate the true patterns.

---

## Detailed Explanation of the Multi-Layer Strategy

Below is how you might implement a three-model approach:

1. **Model the Bank’s Acceptance (Stage C)**  
   - **Data**: Among those who responded, you observe who got accepted vs. rejected by the bank.  
   - **Features**: (Credit-related info, demographics, maybe bureau data if you have it).  
   - Train a classifier $\hat{m}_{\text{accept}}(X)$ that predicts acceptance $\in \{0,1\}$. This approximates the bank’s “proprietary” threshold.

2. **Model the Probability of Responding (Stage B)**  
   - **Data**: Among those who were texted, label “1” if they responded, “0” otherwise.  
   - Because your old texting strategy was not purely random, incorporate the randomly texted subset to correct bias. Specifically:  
     - Either do a propensity-score weighting for “who got texted” (to mimic a random scenario),  
     - Or incorporate an indicator for random vs. non-random into the model.  
   - The result: $\hat{m}_{\text{respond}}(X) = P(\text{Respond}=1 \mid \text{Texted}=1, X)$.

3. **Combine for “If Texted” Probability**  
   $$
   P(\text{final sign-up} \mid \text{Texted}=1, X) = \hat{m}_{\text{respond}}(X) \times \hat{m}_{\text{accept}}(X).
   $$

   This is your key final metric: if I text this person, what’s the chance they end up fully signed up?

4. **Rank or Threshold**  
   Once you have an estimate of $P(\text{final sign-up} \mid \text{Texted}=1, X)$ for each potential lead, you can sort them in descending order and choose how many to text (depending on your budget or capacity).

---

### Where Does Causal “Uplift” Fit In?

If you also want the incremental effect—i.e., the difference in outcome if text vs. not text—then you need a method that models both scenarios:

$$
\tau(X) = P(\text{final}=1 \mid T=1, X) \;-\; P(\text{final}=1 \mid T=0, X).
$$

For $P(\text{final}=1 \mid T=0, X)$, you might have a random “no text” group for comparison.

If your random subset includes both “text” and “no text” for a portion of your population, you can do an uplift model or a DR Learner from `econml` to estimate $\tau(X)$.

However, in practice, many marketing teams focus on “What is the predicted final sign-up probability if texted?” to decide who to contact.

---

## Putting It All Together in Python

Below is a more detailed code snippet illustrating a two-step approach (respond + accept) while also correcting for non-random texting. We’ll outline the logic rather than just a minimal code—this should help you see how each filter is modeled.

**Important**: This is not a copy-paste final solution. It’s a template showing how you might structure a multi-stage approach with Python libraries like scikit-learn. You will adjust data columns, hyperparameters, etc.

---

### Example Data Assumptions

`df` has columns:

- `texted`: 1 if the individual was texted, 0 if not.
- `responded`: 1 if the individual applied or responded, 0 if not (only makes sense if `texted=1`).
- `accepted`: 1 if the bank accepted them, 0 if rejected (only makes sense if `responded=1`).
- `features...`: your input features known before deciding to text.
- A special column `random_group` (1 if in the random texting subset, 0 otherwise) may exist. This can help you correct for your old texting model’s bias.


In [1]:
import numpy as np
import pandas as pd

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

# For reproducibility
np.random.seed(42)

###############################################################################
# 1. SIMULATE SYNTHETIC DATA
###############################################################################
N = 5000  # number of individuals

# Generate 5 numeric features (X1 through X5)
X1 = np.random.normal(0, 1, N)
X2 = np.random.normal(2, 1.5, N)
X3 = np.random.normal(-1, 2, N)
X4 = np.random.uniform(-2, 2, N)
X5 = np.random.exponential(1, N)

# Create a DataFrame with these features
df = pd.DataFrame({
    'X1': X1,
    'X2': X2,
    'X3': X3,
    'X4': X4,
    'X5': X5
})

###############################################################################
# 1.1 Create a "random_group" indicator (20% of individuals)
###############################################################################
df['random_group'] = (np.random.rand(N) < 0.20).astype(int)

###############################################################################
# 2. SIMULATE TEXTING (Stage A)
# -----------------------------------------------------------------------------
# For individuals NOT in the random group, an "old model" decides who to text.
# For those in the random group, we text them with a fixed 50% probability.
###############################################################################
def old_model_prob(x1, x2):
    # A simple logistic function based on X1 and X2
    return 1 / (1 + np.exp(- (0.5 * x1 + 0.3 * x2)))

# Compute the old-model probability for each individual
old_probs = old_model_prob(df['X1'], df['X2'])

# Decide who is texted:
texted = []
for i in range(N):
    if df.loc[i, 'random_group'] == 1:
        # For random group: 50% chance
        t = (np.random.rand() < 0.50)
    else:
        # Otherwise, follow the old model's probability
        t = (np.random.rand() < old_probs[i])
    texted.append(int(t))

df['texted'] = texted

###############################################################################
# 3. SIMULATE RESPONSE (Stage B)
# -----------------------------------------------------------------------------
# Only those who are texted can respond.
# We simulate the response probability using features X3 and X4.
###############################################################################
def prob_respond(x3, x4):
    return 1 / (1 + np.exp(- (0.4 * x3 - 0.2 * x4)))

# Initialize an array for responses
responded = np.zeros(N, dtype=int)
for i in range(N):
    if df.loc[i, 'texted'] == 1:
        p = prob_respond(df.loc[i, 'X3'], df.loc[i, 'X4'])
        # Convert the boolean outcome to int (1 if responded, 0 otherwise)
        responded[i] = int(np.random.rand() < p)
    else:
        responded[i] = 0  # Cannot respond if not texted

df['responded'] = responded

###############################################################################
# 4. SIMULATE BANK ACCEPTANCE (Stage C)
# -----------------------------------------------------------------------------
# Among those who responded, simulate bank acceptance based on X2 and X5.
###############################################################################
def prob_accept(x2, x5):
    return 1 / (1 + np.exp(- (0.3 * x2 - 0.5 * x5)))

accepted = np.zeros(N, dtype=int)
for i in range(N):
    if df.loc[i, 'responded'] == 1:
        p = prob_accept(df.loc[i, 'X2'], df.loc[i, 'X5'])
        accepted[i] = int(np.random.rand() < p)
    else:
        accepted[i] = 0  # Cannot be accepted if not responded

df['accepted'] = accepted

# Print overall counts for a quick check
print("Number texted:", df['texted'].sum())
print("Number responded:", df['responded'].sum())
print("Number accepted:", df['accepted'].sum())

###############################################################################
# 5. MULTI-STAGE MODELING TO CORRECT FOR SELECTION BIAS
###############################################################################
# Our goal: estimate P(final sign-up | texted, X) = P(respond | texted, X) * P(accept | responded, X)
#
# To correct for the fact that historical texting was non-random,
# we first model the probability of being texted using all data,
# then use IPW to reweight the response model.

# 5.1 Define the feature set (include the random_group indicator)
feature_cols = ['X1', 'X2', 'X3', 'X4', 'X5', 'random_group']
X_all = df[feature_cols]

# ---------------------------------------------------------------------------
# Optional: Model the "texted" decision to compute propensity scores.
# This approximates P(texted=1 | X) based on historical data.
# ---------------------------------------------------------------------------
model_texted = LogisticRegression(max_iter=1000)
model_texted.fit(X_all, df['texted'])
df['propensity_texted'] = model_texted.predict_proba(X_all)[:, 1]

# 5.2 Model "responded" among those texted, using IPW to correct for non-random texting.
df_texted = df[df['texted'] == 1].copy()
X_texted = df_texted[feature_cols]
y_respond = df_texted['responded']

# IPW weights: w = 1 / P(texted=1 | X)
df_texted['weight_ipw'] = 1.0 / df_texted['propensity_texted']

model_respond = RandomForestClassifier(n_estimators=100, random_state=0)
model_respond.fit(X_texted, y_respond, sample_weight=df_texted['weight_ipw'])

def predict_respond_if_texted(X_new):
    return model_respond.predict_proba(X_new)[:, 1]

# 5.3 Model "accepted" among those who responded.
df_responded = df_texted[df_texted['responded'] == 1].copy()
X_responded = df_responded[feature_cols]
y_accepted = df_responded['accepted']

model_accept = RandomForestClassifier(n_estimators=100, random_state=0)
model_accept.fit(X_responded, y_accepted)

def predict_bank_accept(X_new):
    return model_accept.predict_proba(X_new)[:, 1]

# 5.4 Combine the two probabilities:
# Final sign-up probability if texted = P(respond | texted, X) * P(accept | responded, X)
def predict_final_if_texted(X_new):
    p_resp = predict_respond_if_texted(X_new)
    p_acc = predict_bank_accept(X_new)
    return p_resp * p_acc

df['pred_final_if_texted'] = predict_final_if_texted(df[feature_cols])

###############################################################################
# 6. RANK INDIVIDUALS BY FINAL PREDICTION
###############################################################################
df_ranked = df.sort_values('pred_final_if_texted', ascending=False)

# Display the top 10 individuals by predicted final sign-up probability
print("\nTop 10 individuals (texted, responded, accepted, predicted final probability):")
print(df_ranked[['texted','responded','accepted','pred_final_if_texted']].head(10))

###############################################################################
# 7. QUICK EVALUATION: Compare Acceptance Rates in Top vs. Bottom Deciles
###############################################################################
decile = int(0.1 * len(df_ranked))
top_decile = df_ranked.head(decile)
bottom_decile = df_ranked.tail(decile)

actual_conv_top = top_decile['accepted'].mean()
actual_conv_bottom = bottom_decile['accepted'].mean()

print("\nTop decile actual acceptance rate: {:.3f}".format(actual_conv_top))
print("Bottom decile actual acceptance rate: {:.3f}".format(actual_conv_bottom))


Number texted: 3040
Number responded: 1278
Number accepted: 678



Top 10 individuals (texted, responded, accepted, predicted final probability):
      texted  responded  accepted  pred_final_if_texted
1475       1          1         1                0.9114
1155       1          1         1                0.9000
145        1          1         1                0.8827
3187       1          1         1                0.8640
4458       1          1         1                0.8633
4828       1          1         1                0.8556
4055       1          1         1                0.8556
770        1          1         1                0.8554
2390       1          1         1                0.8554
4605       1          1         1                0.8550

Top decile actual acceptance rate: 0.996
Bottom decile actual acceptance rate: 0.000


### What This Code Does

- **(Optional) Stage A**: Learns how your old model decided who to text. We then compute each individual’s “propensity to be texted” to do IPW (if needed).  
- **Stage B**: Restricts to historically texted people, trains a response model (`model_respond`), and uses IPW to partially correct for the non-random texting.  
- **Stage C**: Among responders, trains a bank acceptance model (`model_accept`), approximating the bank’s filter.

**Combine**: For any new individual, we estimate

$$
P(\text{final} \mid \text{texted}=1, X) = P(\text{respond}=1 \mid X) \times P(\text{accept}=1 \mid X).
$$

We then sort or rank by that final probability to see who is truly most likely to end up signed up if they receive a text.

---

### Why This Corrects the Multiple Filters

- We only observe “responded=1” for `texted=1` people, so **Stage B** is conditioned on “texted=1.” We use IPW to handle that texting was not fully random.  
- We only observe “accepted=1” for `responded=1` people, so **Stage C** is conditioned on “responded=1.” We fit a bank acceptance model on that subset.  

By chaining these models, we produce an estimate of the final sign-up probability for anyone under the scenario “If we text them.”

This approach, in effect, deals with each selection layer. If you had further layers (e.g., your own credit filter, or further user actions), you could add more stages similarly.

---

### Important Nuances

#### IPW Reliance
In Stage B, we used $1 / \text{propensity_to_be_texted}$ as sample weights to address who historically got texted. If your random subset is large enough, you can build a robust model for “texted=1” vs. “not texted=0” that generalizes to the entire population.  
You can refine weighting strategies (e.g., trim or cap extreme weights).

#### Bank Acceptance
We assume the bank acceptance model can be approximated from the data of people who responded. You do not see the bank’s internal process, but you observe who got accepted vs. not. This is typically enough to train a decent classifier.  
If the bank changes its rules, you must retrain periodically.

#### Response vs. Application
“Response” might be “clicked link,” “started application,” or something similar. The key is that only a subset of texted individuals respond. That’s your second filter.  
Make sure your labeling is consistent with your actual funnel steps.

#### Random Subset
The random subset is crucial for building an accurate stage B model. It ensures that you observe a variety of $X$-profiles who were texted, even if your old model would have excluded them.  
If your random subset is too small, you might face high variance, but it’s still better than having no randomization.

#### (Optional) One-Stage vs. Multi-Stage
Instead of building separate models for response & acceptance, you could do a single “final outcome” model—but you’d still have to correct for the fact you only see final outcomes for texted + responded + accepted.  
Some data scientists prefer Heckman selection or Double-Robust meta-learners (like `DRLearner` from `econml`) that attempt to unify everything. But conceptually, you still need to handle the multi-filter problem.

---

### Conclusion (Putting It Into Practice)

By explicitly modeling each layer—especially the response stage and the bank acceptance stage—you correct the missing-data problem where you only observe final sign-ups for those who got texted and responded. The random subset is the anchor that lets you handle your older non-random texting rules. Then you can produce an accurate measure:

> **“If I text this person, what’s the probability they end up fully signed up (passed the bank’s filter, etc.)?”**

Finally, you can rank potential leads by that probability and target the top group—thereby maximizing your conversions and dealing with the multi-layer selection bias.


## Benefits of the Random Texted Group

Beyond simply serving as a way to randomly determine who gets texted (i.e., a 50% chance for those in the random group), the **random group provides several key benefits**:

### Unbiased Baseline for Causal Inference
- The random group acts as a **gold standard** because it is **not subject to the selection biases** inherent in your older targeting model.  
- This **unbiased sample** helps you accurately estimate the **true treatment effect (uplift)** by providing a direct comparison between texted and non-texted individuals **without the confounding influence of historical selection rules**.

### Improved Propensity Score Estimation
- When you build models to estimate the **propensity of being texted** or the **likelihood of responding**, having a randomized subgroup allows you to **calibrate these models more reliably**.  
- The random group **captures the natural variability** in responses across all segments of your population, making your **propensity score estimates (and any IPW corrections) more robust**.

### Validation and Model Evaluation
- The **random group can be used as a validation set** to check whether your models (for response, acceptance, or the overall uplift) are performing as expected.  
- If you compare outcomes in the random group against those **predicted by your models**, you get a clearer signal of how well your models generalize to the broader population.

### Enhanced Uplift Estimation
- In **uplift modeling** (estimating the incremental effect of texting), having a **truly random assignment** in part of your dataset helps you distinguish between:
  - The **natural propensity to convert**  
  - The **conversion that is actually caused by receiving a text**
- This leads to **more precise targeting**, as you can better isolate the **incremental benefit of texting** for different customer segments.

### Reduced Confounding
- Because the random group is **free from the biases** introduced by the old propensity model, it helps **reduce confounding** when you combine the data to train your causal or double robust models.  
- This makes your estimates of **treatment effect less susceptible to bias** from unobserved factors that might influence both the **decision to text** and the **outcome**.

---

### Summary

The **random group is not just a mechanism** to decide who gets texted randomly—it also serves as an **essential tool** for:

- **Establishing a causal baseline**  
- **Improving the accuracy and robustness of your modeling efforts**  
- **Validating your estimated treatment effects**  
- **Reducing confounding bias in selection**  
- **Ensuring that targeting is based on the **true incremental impact** of texting, rather than artifacts of previous selection rules**  

These benefits are **critical** for making sure that when you **target customers**, you’re **truly focusing on those who will benefit** from receiving a text—thereby improving both **conversion rates** and the **efficiency of your marketing spend**.
