## ERUPT under simulated random assignment

In [1]:
%load_ext autoreload
%autoreload 2
import os, sys
import warnings
warnings.filterwarnings('ignore') # suppress sklearn deprecation warnings for now..

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# the below checks for whether we run dowhy, causaltune, and FLAML from source
root_path = root_path = os.path.realpath('../..')
try:
    import causaltune
except ModuleNotFoundError:
    sys.path.append(os.path.join(root_path, "auto-causality"))

try:
    import dowhy
except ModuleNotFoundError:
    sys.path.append(os.path.join(root_path, "dowhy"))

try:
    import flaml
except ModuleNotFoundError:
    sys.path.append(os.path.join(root_path, "FLAML"))

from causaltune import CausalTune
from causaltune.datasets import generate_non_random_dataset
from causaltune.erupt import DummyPropensity, ERUPT


In [2]:
# this makes the notebook expand to full width of the browser window
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [3]:
%%javascript

// turn off scrollable windows for large output
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

## Loading data and model training

In [4]:
# load toy dataset with non-random assignment and apply standard pre-processing
cd = generate_non_random_dataset()
cd.preprocess_dataset()

In [5]:
display(cd.data.head())

Unnamed: 0,T,Y,random,X1,X2,X3,X4,X5,propensity
0,0,-0.072668,1.0,0.270103,-0.263476,1.330273,-1.518261,0.239436,0.309511
1,0,0.041597,0.0,-0.083365,-1.025096,-0.826468,-1.075895,0.870816,0.285757
2,1,2.386202,0.0,-0.193434,0.930571,0.853347,-1.515284,-1.44595,0.558055
3,0,-1.466616,1.0,0.088129,0.567949,0.932358,-0.456103,-0.576629,0.546158
4,0,1.272867,1.0,0.805338,0.405812,0.537663,-1.292221,0.709316,0.558887


## Random ERUPT

Below we demonstrate how to use Estimated Response Under Proposed Treatment (ERUPT) to estimate the average treatment effect had the treatment been assigned randomly. Recall that the dataset used in this example is constructed in a way that the treatment propensity is a function of a unit's covariates.

In [32]:
# computing mean ERUPT over 10 bootstrapped samples

scores_list = []

for i in range(10):

    bootstrap_df = use_df.sample(frac=1, replace=True)
    propensities = bootstrap_df['propensity']
    actual_treatment = bootstrap_df['T']
    outcome = bootstrap_df['Y']

    # define the random assignment policy
    random_policy = np.random.randint(0,2, size=len(bootstrap_df))

    # define a propensity model that will simply return the propensities when calling predict_proba
    propensity_model = DummyPropensity(p=propensities, treatment=actual_treatment)

    # obtain ERUPT under random policy
    e = ERUPT(treatment_name='T', propensity_model=propensity_model)
    scores_list.append(e.score(df=use_df,outcome=outcome,policy=random_policy))

erupt_mean = np.mean(scores_list)
erupt_sd = np.std(scores_list)

In [None]:
# compute naive ate as difference in means
naive_ate, naive_sd, _ = ct.scorer.naive_ate(ct.test_df['T'], ct.test_df['Y'])

In [36]:
# comparison of naive ate to mean random erupt over 10 bootstrap runs
erupt_df = pd.DataFrame([[naive_ate,naive_sd],[erupt_mean,erupt_sd]], columns=['estimated_effect', 'sd'], index=['naive_ate','random_erupt'])
display(erupt_df)

Unnamed: 0,estimated_effect,standard_error
naive_ate,2.027285,0.128181
random_erupt,0.67927,0.215763


For more details on the ERUPT implementation, consult [Hitsch and Misra (2018)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3111957). Note also that we assume that treatment takes integer values from 0 to n.