<a href="https://colab.research.google.com/github/pgurazada/causal_inference/blob/master/tuned_metalearners.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.ensemble import (
    GradientBoostingClassifier,
    GradientBoostingRegressor
)

# Data

In [2]:
data_df = pd.read_csv("hillstrom_clean.csv")

In [3]:
data_df.sample(5)

Unnamed: 0,recency,history,mens,womens,newbie,visit,conversion,spend,zip_code__rural,zip_code__surburban,zip_code__urban,channel__multichannel,channel__phone,channel__web,treatment
32805,2,345.96,1,0,0,0,0,0.0,0,1,0,0,1,0,2
46928,10,29.99,0,1,0,0,0,0.0,0,1,0,0,1,0,1
28338,8,115.33,0,1,1,0,0,0.0,0,1,0,0,0,1,1
47298,1,218.87,0,1,0,0,0,0.0,1,0,0,1,0,0,2
1236,1,29.99,1,0,1,0,0,0.0,1,0,0,0,1,0,0


Historical customer attributes at your disposal include:
- Recency: Months since last purchase.
- History_Segment: Categorization of dollars spent in the past year.
- History: Actual dollar value spent in the past year.
- Mens: 1/0 indicator, 1 = customer purchased Mens merchandise in the past year.
- Womens: 1/0 indicator, 1 = customer purchased Womens merchandise in the past year.
- Zip_Code: Classifies zip code as Urban, Suburban, or Rural. - Newbie: 1/0 indicator, 1 = New customer in the past twelve months. - Channel: Describes the channels the customer purchased from in the past year.
- Treatment: Mens E-Mail, Womens E-Mail, No E-Mail

Finally, we have a series of variables describing activity in the two weeks following delivery of the e-mail campaign:
- Visit: 1/0 indicator, 1 = Customer visited website in the following two weeks.
- Conversion: 1/0 indicator, 1 = Customer purchased merchandise in the following two weeks.
- Spend: Actual dollars spent in the following two weeks.

In [4]:
data_df.visit.describe()

count    64000.000000
mean         0.146781
std          0.353890
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max          1.000000
Name: visit, dtype: float64

In [5]:
data_df.conversion.describe()

count    64000.000000
mean         0.009031
std          0.094604
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max          1.000000
Name: conversion, dtype: float64

# Overall Impact

In [6]:
treatment_map = {
    0: 'control',
    1: 'womens_email',
    2: 'mens_email'
}

In [7]:
# Men's emailer
(
    data_df.query("(treatment == 0 | treatment == 2)")
           .groupby('treatment')
           .agg({'visit': 'mean', 'conversion': 'mean', 'spend': 'mean'})
)

Unnamed: 0_level_0,visit,conversion,spend
treatment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.106167,0.005726,0.652789
2,0.182757,0.012531,1.422617


In [8]:
# Women's emailer
(
    data_df.query("(treatment == 0 | treatment == 1)")
           .groupby('treatment')
           .agg({'visit': 'mean', 'conversion': 'mean', 'spend': 'mean'})
)

Unnamed: 0_level_0,visit,conversion,spend
treatment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.106167,0.005726,0.652789
1,0.1514,0.008837,1.077202


# CATE

## Base Learners

We choose gradient boosted regressors and classifiers as base learners through hyperparameter tuning over randomly chosen sets of feature combinations.

In [9]:
NUM_ITERATIONS = 25

##S-Learner

Estimated CATE:

$$
\hat{\tau}(x) = E[Y|X=x, T=1]-E[Y|X=x, T=0]=\hat{\mu}(x, 1) - \hat{\mu}(x, 0)
$$

where $\hat{\mu}=M(Y\sim(X, T))$ is any machine learning algorithm that is estimated on training data.

In [10]:
X = data_df.drop(columns=['visit', 'conversion', 'spend'])
y_visit = data_df['visit']
y_spend = data_df['spend']

*Visits*

In [11]:
random_grid_params = {
    "n_estimators": [15, 25, 50, 100, 200, 300, 400],
    "max_depth": [2, 4, 6, 10, 12, 14, 16],
    "learning_rate": [0.001, 0.005, 0.01, 0.03, 0.1, 0.2, 0.3]
}

In [12]:
classifier_random_grid = RandomizedSearchCV(
    GradientBoostingClassifier(),
    random_grid_params,
    scoring="accuracy",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

In [13]:
X_train, X_test, y_visit_train, y_visit_test = train_test_split(
    X, y_visit, test_size=0.3, random_state=42
)

In [14]:
classifier_random_grid.fit(X_train, y_visit_train)

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [15]:
slearner_visit = classifier_random_grid.best_estimator_

In [16]:
slearner_visit

In [17]:
# Calculate the difference in predictions when T=1 (womens emailer) vs T=0

slearner_te_womens = (
    slearner_visit.predict_proba(X_test.assign(**{'treatment': 1}))[:, 1] -
    slearner_visit.predict_proba(X_test.assign(**{'treatment': 0}))[:, 1]
)

In [18]:
slearner_te_womens.mean()

0.010825286750673563

In [19]:
# Calculate the difference in predictions when T=2 (womens emailer) vs T=0

slearner_te_mens = (
    slearner_visit.predict_proba(X_test.assign(**{'treatment': 2}))[:, 1] -
    slearner_visit.predict_proba(X_test.assign(**{'treatment': 0}))[:, 1]
)

In [20]:
slearner_te_mens.mean()

0.015093346540360708

*Spends*

In [21]:
X_train, X_test, y_spend_train, y_spend_test = train_test_split(
    X, y_spend, test_size=0.3, random_state=42
)

In [22]:
random_grid_params = {
    "n_estimators": [5, 10, 15, 25, 50, 100, 200],
    "max_depth": [6, 10, 12, 14, 16, 18, 20],
    "learning_rate": [0.0005, 0.001, 0.005, 0.01, 0.03, 0.1]
}

In [23]:
regressor_random_grid = RandomizedSearchCV(
    GradientBoostingRegressor(),
    random_grid_params,
    scoring="neg_mean_squared_error",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

In [24]:
regressor_random_grid.fit(X_train, y_spend_train)

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [25]:
slearner_spend = regressor_random_grid.best_estimator_

In [26]:
slearner_spend

In [27]:
# Calculate the difference in predictions when T=1 (womens emailer) vs T=0

slearner_te_womens = (
    slearner_spend.predict(X_test.assign(**{'treatment': 1})) -
    slearner_spend.predict(X_test.assign(**{'treatment': 0}))
)

In [28]:
slearner_te_womens.mean()

0.0012442702151642946

In [29]:
# Calculate the difference in predictions when T=2 (womens emailer) vs T=0

slearner_te_mens = (
    slearner_spend.predict(X_test.assign(**{'treatment': 2})) -
    slearner_spend.predict(X_test.assign(**{'treatment': 0}))
)

In [30]:
slearner_te_mens.mean()

0.0020744018369889764

##T-Learner

Estimated CATE:

$$
\hat{\tau}(x) = E[Y|X=x, T=1]-E[Y|X=x, T=0]=\hat{\mu}_1(x, 1) - \hat{\mu}_0(x, 0)
$$

where $\hat{\mu}_0=M_0(Y^0 \sim X^0)$, $\hat{\mu}_1=M_1(Y^1 \sim X^1)$ are any machine learning algorithms that are estimated on control and treatment subsets of training data respectively.

In [31]:
train_df, test_df = train_test_split(
    data_df, test_size=0.3, random_state=42
)

In [32]:
train_df.shape, test_df.shape

((44800, 15), (19200, 15))

*Visits*

In [33]:
target = 'visit'

In [34]:
# Split data into treated and untreated
train_0_df = train_df[train_df['treatment'] == 0]
train_1_df = train_df[train_df['treatment'] == 1]
train_2_df = train_df[train_df['treatment'] == 2]

In [35]:
random_grid_params = {
    "n_estimators": [15, 25, 50, 100, 200, 300, 400],
    "max_depth": [2, 4, 6, 10, 12, 14, 16],
    "learning_rate": [0.001, 0.005, 0.01, 0.03, 0.1, 0.2, 0.3]
}

In [36]:
# Fit the models on each sample
classifier_random_grid_0 = RandomizedSearchCV(
    GradientBoostingClassifier(),
    random_grid_params,
    scoring="accuracy",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

classifier_random_grid_0.fit(train_0_df.drop(columns=['visit', 'conversion', 'spend']), train_0_df[target])

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [37]:
tlearner_0 = classifier_random_grid_0.best_estimator_

In [38]:
tlearner_0

In [39]:
classifier_random_grid_1 = RandomizedSearchCV(
    GradientBoostingClassifier(),
    random_grid_params,
    scoring="accuracy",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

classifier_random_grid_1.fit(train_1_df.drop(columns=['visit', 'conversion', 'spend']), train_1_df[target])

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [40]:
tlearner_1 = classifier_random_grid_1.best_estimator_

In [41]:
tlearner_1

In [42]:
classifier_random_grid_2 = RandomizedSearchCV(
    GradientBoostingClassifier(),
    random_grid_params,
    scoring="accuracy",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

classifier_random_grid_2.fit(train_2_df.drop(columns=['visit', 'conversion', 'spend']), train_2_df[target])

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [43]:
tlearner_2 = classifier_random_grid_2.best_estimator_

In [44]:
tlearner_2

In [45]:
# Calculate the difference in predictions for womens campaign
tlearner_te_womens = (
    tlearner_1.predict_proba(test_df.drop(columns=['visit', 'conversion', 'spend']))[:, 1] -
    tlearner_0.predict_proba(test_df.drop(columns=['visit', 'conversion', 'spend']))[:, 1]
)

In [46]:
tlearner_te_womens.mean()

0.043275721321136496

In [47]:
# Calculate the difference in predictions for mens campaign
tlearner_te_mens = (
    tlearner_2.predict_proba(test_df.drop(columns=['visit', 'conversion', 'spend']))[:, 1] -
    tlearner_0.predict_proba(test_df.drop(columns=['visit', 'conversion', 'spend']))[:, 1]
)

In [48]:
tlearner_te_mens.mean()

0.07389059749764053

*Spends*

In [49]:
target = 'spend'

In [50]:
random_grid_params = {
    "n_estimators": [5, 10, 15, 25, 50, 100, 200, 300, 400],
    "max_depth": [6, 10, 12, 14, 16, 18, 20, 22],
    "learning_rate": [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.03, 0.1, 0.2]
}

In [51]:
# Fit the models on each sample
regressor_random_grid_0 = RandomizedSearchCV(
    GradientBoostingRegressor(),
    random_grid_params,
    scoring="neg_mean_squared_error",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

regressor_random_grid_0.fit(train_0_df.drop(columns=['visit', 'conversion', 'spend']), train_0_df[target])

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [52]:
tlearner_0 = regressor_random_grid_0.best_estimator_

In [53]:
tlearner_0

In [54]:
regressor_random_grid_1 = RandomizedSearchCV(
    GradientBoostingRegressor(),
    random_grid_params,
    scoring="neg_mean_squared_error",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

regressor_random_grid_1.fit(train_1_df.drop(columns=['visit', 'conversion', 'spend']), train_1_df[target])

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [55]:
tlearner_1 = regressor_random_grid_1.best_estimator_

In [56]:
tlearner_1

In [57]:
regressor_random_grid_2 = RandomizedSearchCV(
    GradientBoostingRegressor(),
    random_grid_params,
    scoring="neg_mean_squared_error",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

regressor_random_grid_2.fit(train_2_df.drop(columns=['visit', 'conversion', 'spend']), train_2_df[target])

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [58]:
tlearner_2 = regressor_random_grid_2.best_estimator_

In [59]:
tlearner_2

In [60]:
# Calculate the difference in predictions for womens campaign
tlearner_te_womens = (
    tlearner_1.predict(test_df.drop(columns=['visit', 'conversion', 'spend'])) -
    tlearner_0.predict(test_df.drop(columns=['visit', 'conversion', 'spend']))
)

In [61]:
tlearner_te_womens.mean()

0.4296231263216593

In [62]:
# Calculate the difference in predictions for mens campaign
tlearner_te_mens = (
    tlearner_2.predict(test_df.drop(columns=['visit', 'conversion', 'spend'])) -
    tlearner_0.predict(test_df.drop(columns=['visit', 'conversion', 'spend']))
)

In [63]:
tlearner_te_mens.mean()

0.708434312827969

##X-Learner

Estimated CATE:

$\hat{\mu}_0=M_0(Y^0 \sim X^0), \hat{\mu}_1=M_1(Y^1 \sim X^1)$

$\hat{D}^1 = Y^1 - \mu_0(X^1), \hat{D}^0 = \mu_1(X^0) - Y^0$

$\hat{\tau}_0 = M_3(\hat{D}^0 \sim X^0), \hat{\tau}_1 = M_4(\hat{D}^1 \sim X^1)$

$\hat{\tau}(x) = g(x)\hat{\tau}_0(x) + (1-g(x))\hat{\tau}_1(x)$

Where $M_1, M_2$ are any machine learning models to estimate the treatment and control outcomes & $M_3 \& M_4$ are any machine learning models to estimate the residuals. $g(x)$ is a propensity model that is used to weigh the CATT and CATC.



*Visits*

In [64]:
target = 'visit'

In [65]:
train_df, test_df = train_test_split(
    data_df, test_size=0.3, random_state=42
)

In [66]:
# Split data into treated and untreated
train_0_df = train_df[train_df['treatment'] == 0]
train_1_df = train_df[train_df['treatment'] == 1]
train_2_df = train_df[train_df['treatment'] == 2]

In [67]:
random_grid_params = {
    "n_estimators": [15, 25, 50, 100, 200, 300, 400],
    "max_depth": [2, 4, 6, 10, 12, 14, 16],
    "learning_rate": [0.001, 0.005, 0.01, 0.03, 0.1, 0.2, 0.3]
}

In [68]:
# Fit the models on each sample
classifier_random_grid_0 = RandomizedSearchCV(
    GradientBoostingClassifier(),
    random_grid_params,
    scoring="accuracy",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

classifier_random_grid_0.fit(train_0_df.drop(columns=['visit', 'conversion', 'spend']), train_0_df[target])

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [69]:
xlearner_0 = classifier_random_grid_0.best_estimator_

In [70]:
xlearner_0

In [71]:
classifier_random_grid_1 = RandomizedSearchCV(
    GradientBoostingClassifier(),
    random_grid_params,
    scoring="accuracy",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

classifier_random_grid_1.fit(train_1_df.drop(columns=['visit', 'conversion', 'spend']), train_1_df[target])

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [72]:
xlearner_1 = classifier_random_grid_1.best_estimator_

In [73]:
xlearner_1

In [74]:
classifier_random_grid_2 = RandomizedSearchCV(
    GradientBoostingClassifier(),
    random_grid_params,
    scoring="accuracy",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

classifier_random_grid_2.fit(train_2_df.drop(columns=['visit', 'conversion', 'spend']), train_2_df[target])

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [75]:
xlearner_2 = classifier_random_grid_2.best_estimator_

In [76]:
xlearner_2

For womens campaign

In [77]:
target_columns = ['visit', 'conversion', 'spend']

In [78]:
# Calculate the difference between actual outcomes and predictions
xlearner_te_0 = xlearner_1.predict_proba(train_0_df.drop(columns=target_columns))[:, 1] - train_0_df[target]
xlearner_te_1 = train_1_df[target] - xlearner_0.predict_proba(train_1_df.drop(columns=target_columns))[:, 1]

In [79]:
random_grid_params = {
    "n_estimators": [5, 10, 15, 25, 50, 100, 200, 300, 400],
    "max_depth": [6, 10, 12, 14, 16, 18, 20, 22],
    "learning_rate": [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.03, 0.1, 0.2]
}

In [80]:
regressor_random_grid_combined = RandomizedSearchCV(
    GradientBoostingRegressor(),
    random_grid_params,
    scoring="neg_mean_squared_error",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

In [81]:
# Fit the combined model
regressor_random_grid_combined.fit(
  # Stack the X variables for the treated and untreated users
  pd.concat([train_0_df.drop(columns=target_columns), train_1_df.drop(columns=target_columns)]),
  # Stack the X-learner treatment effects for treated and untreated users
  pd.concat([xlearner_te_0, xlearner_te_1])
)

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [82]:
xlearner_combined = regressor_random_grid_combined.best_estimator_

In [83]:
xlearner_combined

In [84]:
xlearner_simple_te = xlearner_combined.predict(test_df.drop(columns=target_columns))

In [85]:
xlearner_simple_te.mean()

0.04305290386017842

For mens campaign

In [86]:
target_columns = ['visit', 'conversion', 'spend']

In [87]:
# Calculate the difference between actual outcomes and predictions
xlearner_te_0 = xlearner_2.predict_proba(train_0_df.drop(columns=target_columns))[:, 1] - train_0_df[target]
xlearner_te_2 = train_2_df[target] - xlearner_0.predict_proba(train_2_df.drop(columns=target_columns))[:, 1]

In [88]:
random_grid_params = {
    "n_estimators": [5, 10, 15, 25, 50, 100, 200, 300, 400],
    "max_depth": [6, 10, 12, 14, 16, 18, 20, 22],
    "learning_rate": [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.03, 0.1, 0.2]
}

In [89]:
regressor_random_grid_combined = RandomizedSearchCV(
    GradientBoostingRegressor(),
    random_grid_params,
    scoring="neg_mean_squared_error",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

In [90]:
# Fit the combined model
regressor_random_grid_combined.fit(
  # Stack the X variables for the treated and untreated users
  pd.concat([train_0_df.drop(columns=target_columns), train_2_df.drop(columns=target_columns)]),
  # Stack the X-learner treatment effects for treated and untreated users
  pd.concat([xlearner_te_0, xlearner_te_2])
)

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [91]:
xlearner_combined = regressor_random_grid_combined.best_estimator_

In [92]:
xlearner_combined

In [93]:
xlearner_simple_te = xlearner_combined.predict(test_df.drop(columns=target_columns))

In [94]:
xlearner_simple_te.mean()

0.07394333184422394

*Spends*

In [95]:
target = 'spend'

In [96]:
train_df, test_df = train_test_split(
    data_df, test_size=0.3, random_state=42
)

In [97]:
# Split data into treated and untreated
train_0_df = train_df[train_df['treatment'] == 0]
train_1_df = train_df[train_df['treatment'] == 1]
train_2_df = train_df[train_df['treatment'] == 2]

In [98]:
random_grid_params = {
    "n_estimators": [5, 10, 15, 25, 50, 100, 200, 300, 400],
    "max_depth": [6, 10, 12, 14, 16, 18, 20, 22],
    "learning_rate": [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.03, 0.1, 0.2]
}

In [99]:
# Fit the models on each sample
regressor_random_grid_0 = RandomizedSearchCV(
    GradientBoostingRegressor(),
    random_grid_params,
    scoring="neg_mean_squared_error",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

regressor_random_grid_0.fit(train_0_df.drop(columns=['visit', 'conversion', 'spend']), train_0_df[target])

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [100]:
xlearner_0 = regressor_random_grid_0.best_estimator_

In [101]:
xlearner_0

In [102]:
regressor_random_grid_1 = RandomizedSearchCV(
    GradientBoostingRegressor(),
    random_grid_params,
    scoring="neg_mean_squared_error",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

regressor_random_grid_1.fit(train_1_df.drop(columns=['visit', 'conversion', 'spend']), train_1_df[target])

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [103]:
xlearner_1 = regressor_random_grid_1.best_estimator_

In [104]:
xlearner_1

In [105]:
regressor_random_grid_2 = RandomizedSearchCV(
    GradientBoostingRegressor(),
    random_grid_params,
    scoring="neg_mean_squared_error",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

regressor_random_grid_2.fit(train_2_df.drop(columns=['visit', 'conversion', 'spend']), train_2_df[target])

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [106]:
xlearner_2 = regressor_random_grid_2.best_estimator_

In [107]:
xlearner_2

For womens campaign

In [108]:
target_columns = ['visit', 'conversion', 'spend']

In [109]:
# Calculate the difference between actual outcomes and predictions
xlearner_te_0 = xlearner_1.predict(train_0_df.drop(columns=target_columns)) - train_0_df[target]
xlearner_te_1 = train_1_df[target] - xlearner_0.predict(train_1_df.drop(columns=target_columns))

In [110]:
random_grid_params = {
    "n_estimators": [5, 10, 15, 25, 50, 100, 200, 300, 400],
    "max_depth": [6, 10, 12, 14, 16, 18, 20, 22],
    "learning_rate": [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.03, 0.1, 0.2]
}

In [111]:
regressor_random_grid_combined = RandomizedSearchCV(
    GradientBoostingRegressor(),
    random_grid_params,
    scoring="neg_mean_squared_error",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

In [112]:
# Fit the combined model
regressor_random_grid_combined.fit(
  # Stack the X variables for the treated and untreated users
  pd.concat([train_0_df.drop(columns=target_columns), train_1_df.drop(columns=target_columns)]),
  # Stack the X-learner treatment effects for treated and untreated users
  pd.concat([xlearner_te_0, xlearner_te_1])
)

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [113]:
xlearner_combined = regressor_random_grid_combined.best_estimator_

In [114]:
xlearner_combined

In [115]:
xlearner_simple_te = xlearner_combined.predict(test_df.drop(columns=target_columns))

In [116]:
xlearner_simple_te.mean()

0.42885769776930827

For mens campaign

In [117]:
target_columns = ['visit', 'conversion', 'spend']

In [118]:
# Calculate the difference between actual outcomes and predictions
xlearner_te_0 = xlearner_2.predict(train_0_df.drop(columns=target_columns)) - train_0_df[target]
xlearner_te_2 = train_2_df[target] - xlearner_0.predict(train_2_df.drop(columns=target_columns))

In [119]:
random_grid_params = {
    "n_estimators": [5, 10, 15, 25, 50, 100, 200, 300, 400],
    "max_depth": [6, 10, 12, 14, 16, 18, 20, 22],
    "learning_rate": [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.03, 0.1, 0.2]
}

In [120]:
regressor_random_grid_combined = RandomizedSearchCV(
    GradientBoostingRegressor(),
    random_grid_params,
    scoring="neg_mean_squared_error",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

In [121]:
# Fit the combined model
regressor_random_grid_combined.fit(
  # Stack the X variables for the treated and untreated users
  pd.concat([train_0_df.drop(columns=target_columns), train_2_df.drop(columns=target_columns)]),
  # Stack the X-learner treatment effects for treated and untreated users
  pd.concat([xlearner_te_0, xlearner_te_2])
)

Fitting 3 folds for each of 25 candidates, totalling 75 fits


In [122]:
xlearner_combined = regressor_random_grid_combined.best_estimator_

In [123]:
xlearner_combined

In [124]:
xlearner_simple_te = xlearner_combined.predict(test_df.drop(columns=target_columns))

In [125]:
xlearner_simple_te.mean()

0.708255968769414