## CausalityDataset Usage

This notebook demonstrates how to use and configure `CausalityDataset` using an arbitrary `pd.DataFrame`.

In [1]:
%load_ext autoreload
%autoreload 2
import os, sys
import warnings
warnings.filterwarnings('ignore') # suppress sklearn deprecation warnings for now..

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# the below checks for whether we run dowhy, causaltune, and FLAML from source
root_path = root_path = os.path.realpath('../..')
try:
    import causaltune
except ModuleNotFoundError:
    sys.path.append(os.path.join(root_path, "auto-causality"))

try:
    import dowhy
except ModuleNotFoundError:
    sys.path.append(os.path.join(root_path, "dowhy"))

try:
    import flaml
except ModuleNotFoundError:
    sys.path.append(os.path.join(root_path, "FLAML"))


In [2]:
# this makes the notebook expand to full width of the browser window
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [3]:
%%javascript

// turn off scrollable windows for large output
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

In [6]:
from causaltune import CausalTune
from causaltune.datasets import synth_ihdp
from causaltune.data_utils import CausalityDataset

In [5]:
df = synth_ihdp(return_df=True)
display(df.head())

Unnamed: 0,treatment,y_factual,x1,x2,x3,x4,x5,x6,x7,x8,...,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25
0,1,5.599916,-0.528603,-0.343455,1.128554,0.161703,-0.316603,1.295216,1,0,...,1,1,1,1,0,0,0,0,0,0
1,0,6.875856,-1.736945,-1.802002,0.383828,2.24432,-0.629189,1.295216,0,0,...,1,1,1,1,0,0,0,0,0,0
2,0,2.996273,-0.807451,-0.202946,-0.360898,-0.879606,0.808706,-0.526556,0,0,...,1,0,1,1,0,0,0,0,0,0
3,0,1.366206,0.390083,0.596582,-1.85035,-0.879606,-0.004017,-0.857787,0,0,...,1,0,1,1,0,0,0,0,0,0
4,0,1.963538,-1.045229,-0.60271,0.011465,0.161703,0.683672,-0.36094,1,0,...,1,1,1,1,0,0,0,0,0,0


Generally, at least three arguments have to be supplied to `CausalityDataset`:
- `data`: input dataframe
- `treatment`: name of treatment column
- `outcomes`: list of names of outcome columns; provide as list even if there's just one outcome of interest

In addition, if the propensities to treat are known, then provide the corresponding column name(s) via `propensity_modifiers`.

In [None]:
cd = CausalityDataset(data=df, treatment='treatment', outcomes=['y_factual'])

In [None]:
cd.preprocess_dataset()

Subsequently, use the preprocessed `CausalityDataset` object for training as follow: `CausalTune.fit(cd, outcome='y_factual')`.