<a href="https://colab.research.google.com/github/pgurazada/causal_inference/blob/master/case%20studies/hillstrom/tuned_Tlearner_visits.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd

from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.ensemble import (
    GradientBoostingClassifier,
    GradientBoostingRegressor
)

# Data

In [2]:
data_df = pd.read_csv("hillstrom_clean.csv")

In [3]:
data_df.sample(5)

Unnamed: 0,recency,history,mens,womens,newbie,visit,conversion,spend,zip_code__rural,zip_code__surburban,zip_code__urban,channel__multichannel,channel__phone,channel__web,treatment
2911,2,90.86,1,0,0,0,0,0.0,0,1,0,0,0,1,0
11273,10,711.59,1,1,1,0,0,0.0,0,0,1,1,0,0,1
55494,7,646.91,1,1,1,0,0,0.0,0,0,1,1,0,0,2
49952,3,179.45,0,1,1,0,0,0.0,0,1,0,0,1,0,0
53127,11,374.5,1,1,0,0,0,0.0,0,0,1,0,1,0,0


Historical customer attributes at your disposal include:
- Recency: Months since last purchase.
- History_Segment: Categorization of dollars spent in the past year.
- History: Actual dollar value spent in the past year.
- Mens: 1/0 indicator, 1 = customer purchased Mens merchandise in the past year.
- Womens: 1/0 indicator, 1 = customer purchased Womens merchandise in the past year.
- Zip_Code: Classifies zip code as Urban, Suburban, or Rural. - Newbie: 1/0 indicator, 1 = New customer in the past twelve months. - Channel: Describes the channels the customer purchased from in the past year.
- Treatment: Mens E-Mail, Womens E-Mail, No E-Mail

Finally, we have a series of variables describing activity in the two weeks following delivery of the e-mail campaign:
- Visit: 1/0 indicator, 1 = Customer visited website in the following two weeks.
- Conversion: 1/0 indicator, 1 = Customer purchased merchandise in the following two weeks.
- Spend: Actual dollars spent in the following two weeks.

In [4]:
data_df.visit.describe()

count    64000.000000
mean         0.146781
std          0.353890
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max          1.000000
Name: visit, dtype: float64

In [5]:
data_df.conversion.describe()

count    64000.000000
mean         0.009031
std          0.094604
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max          1.000000
Name: conversion, dtype: float64

# Overall Impact

In [6]:
treatment_map = {
    0: 'control',
    1: 'womens_email',
    2: 'mens_email'
}

In [8]:
# Women's emailer
(
    data_df.query("(treatment == 0 | treatment == 1)")
           .groupby('treatment')
           .agg({'visit': 'mean', 'conversion': 'mean', 'spend': 'mean'})
)

Unnamed: 0_level_0,visit,conversion,spend
treatment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.106167,0.005726,0.652789
1,0.1514,0.008837,1.077202


In [28]:
.151400 - .106167

0.04523300000000001

In [7]:
# Men's emailer
(
    data_df.query("(treatment == 0 | treatment == 2)")
           .groupby('treatment')
           .agg({'visit': 'mean', 'conversion': 'mean', 'spend': 'mean'})
)

Unnamed: 0_level_0,visit,conversion,spend
treatment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.106167,0.005726,0.652789
2,0.182757,0.012531,1.422617


In [29]:
.182757 - .106167

0.07659

# T-Learner

Estimated CATE:

$$
\hat{\tau}(x) = E[Y|X=x, T=1]-E[Y|X=x, T=0]=\hat{\mu}_1(x, 1) - \hat{\mu}_0(x, 0)
$$

where $\hat{\mu}_0=M_0(Y^0 \sim X^0)$, $\hat{\mu}_1=M_1(Y^1 \sim X^1)$ are any machine learning algorithms that are estimated on control and treatment subsets of training data respectively.

## Base Learners

We choose gradient boosted regressors and classifiers as base learners through hyperparameter tuning over randomly chosen sets of feature combinations.

In [9]:
NUM_ITERATIONS = 5

In [10]:
train_df, test_df = train_test_split(
    data_df, test_size=0.3, random_state=42
)

In [11]:
train_df.shape, test_df.shape

((44800, 15), (19200, 15))

In [12]:
target = 'visit'

In [13]:
# Split data into treated and untreated
train_0_df = train_df[train_df['treatment'] == 0]
train_1_df = train_df[train_df['treatment'] == 1]
train_2_df = train_df[train_df['treatment'] == 2]

In [14]:
random_grid_params = {
    "n_estimators": [15, 25, 50, 100, 200, 300, 400],
    "max_depth": [2, 4, 6, 10, 12, 14, 16],
    "learning_rate": [0.001, 0.005, 0.01, 0.03, 0.1, 0.2, 0.3]
}

In [15]:
# Fit the models on each subset of treatment and control
classifier_random_grid_0 = RandomizedSearchCV(
    GradientBoostingClassifier(),
    random_grid_params,
    scoring="accuracy",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

classifier_random_grid_0.fit(train_0_df.drop(columns=['visit', 'conversion', 'spend']), train_0_df[target])

Fitting 3 folds for each of 5 candidates, totalling 15 fits


In [16]:
tlearner_0 = classifier_random_grid_0.best_estimator_

In [17]:
tlearner_0

In [18]:
classifier_random_grid_1 = RandomizedSearchCV(
    GradientBoostingClassifier(),
    random_grid_params,
    scoring="accuracy",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

classifier_random_grid_1.fit(train_1_df.drop(columns=['visit', 'conversion', 'spend']), train_1_df[target])

Fitting 3 folds for each of 5 candidates, totalling 15 fits


In [19]:
tlearner_1 = classifier_random_grid_1.best_estimator_

In [20]:
tlearner_1

In [21]:
classifier_random_grid_2 = RandomizedSearchCV(
    GradientBoostingClassifier(),
    random_grid_params,
    scoring="accuracy",
    n_iter=NUM_ITERATIONS,
    cv=3,
    verbose=1,
    random_state=42,
    n_jobs=-1
)

classifier_random_grid_2.fit(train_2_df.drop(columns=['visit', 'conversion', 'spend']), train_2_df[target])

Fitting 3 folds for each of 5 candidates, totalling 15 fits


In [22]:
tlearner_2 = classifier_random_grid_2.best_estimator_

In [23]:
tlearner_2

In [24]:
# Calculate the difference in predictions for womens campaign
tlearner_te_womens = (
    tlearner_1.predict_proba(test_df.drop(columns=['visit', 'conversion', 'spend']))[:, 1] -
    tlearner_0.predict_proba(test_df.drop(columns=['visit', 'conversion', 'spend']))[:, 1]
)

In [25]:
tlearner_te_womens.mean()

0.04327459736299011

In [26]:
# Calculate the difference in predictions for mens campaign
tlearner_te_mens = (
    tlearner_2.predict_proba(test_df.drop(columns=['visit', 'conversion', 'spend']))[:, 1] -
    tlearner_0.predict_proba(test_df.drop(columns=['visit', 'conversion', 'spend']))[:, 1]
)

In [27]:
tlearner_te_mens.mean()

0.07389043625268067