## Random assignment, binary CATE example

This is a fully worked-out notebook showing how you would apply auto-causality to a dataset.

In [1]:
%load_ext autoreload
%autoreload 2
import os, sys
import warnings
warnings.filterwarnings('ignore') # suppress sklearn deprecation warnings for now..

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# the below checks for whether we run dowhy, auto-causality, and FLAML from source
root_path = root_path = os.path.realpath('../..')
try:
    import auto_causality
except ModuleNotFoundError:
    sys.path.append(os.path.join(root_path, "auto-causality"))

try:
    import dowhy
except ModuleNotFoundError:
    sys.path.append(os.path.join(root_path, "dowhy"))

try:
    import flaml
except ModuleNotFoundError:
    sys.path.append(os.path.join(root_path, "FLAML"))


In [2]:
# this makes the notebook expand to full width of the browser window
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [3]:
%%javascript

// turn off scrollable windows for large output
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

In [4]:
from auto_causality import AutoCausality
from auto_causality.datasets import synth_ihdp
from auto_causality.scoring import Scorer

### Model fitting & scoring
Here we fit a (selection of) model(s) to the data and score them with the normalized ERUPT metric (chosen to specifically look for differences in impctt across customers) on held-out data.

We import an example dataset and pre-process it. The pre-processing fills in the NaNs and one-hot-encodes all categorical and int variables.

If you don't want an int variable to be one-hot-encoded, please cast it to float before preprocessing.

In [5]:
cd = synth_ihdp()
cd.preprocess_dataset()

In [6]:
print(cd.data.head())

   treatment  y_factual  random        x1        x2        x3        x4  \
0          1   5.599916     1.0 -0.528603 -0.343455  1.128554  0.161703   
1          0   6.875856     1.0 -1.736945 -1.802002  0.383828  2.244319   
2          0   2.996273     1.0 -0.807451 -0.202946 -0.360898 -0.879606   
3          0   1.366206     1.0  0.390083  0.596582 -1.850350 -0.879606   
4          0   1.963538     0.0 -1.045228 -0.602710  0.011465  0.161703   

         x5        x6   x7  ...  x16  x17  x18  x19  x20  x21  x22  x23  x24  \
0 -0.316603  1.295216  1.0  ...  1.0  1.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0   
1 -0.629189  1.295216  0.0  ...  1.0  1.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0   
2  0.808706 -0.526556  0.0  ...  1.0  0.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0   
3 -0.004017 -0.857787  0.0  ...  1.0  0.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0   
4  0.683672 -0.360940  1.0  ...  1.0  1.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0   

   x25  
0  0.0  
1  0.0  
2  0.0  
3  0.0  
4  0.0  

[5 rows x 28 

Fitting the model is as simple as calling AutoCausality.fit(), with the only required parameter apart from the data being the amount of time you want to give the optimizer, either for the whole run (`time_budget`) or per FLAML component model (`components_time_budget`), or both.

If you want to use specific estimators, comment in the `estimator_list` below to include any estimators whose full name contains any of the elements of `estimator_list`.

The other allowed values are `all` and `auto`, the default is `auto`.


In [7]:
ct = AutoCausality(
#     time_budget=600,# it's best to specify either time_budget or components_time_budget, and let the other one be inferred
     estimator_list=[
             "Dummy",
             "SparseLinearDML",
             "ForestDRLearner",
#             "TransformedOutcome",
#             "CausalForestDML",
#             ".LinearDML",
#             "DomainAdaptationLearner",
             "SLearner",
#             "XLearner",
#             "TLearner",
#             "Ortho",
         ],
    metric="energy_distance",
    verbose=3,
    components_verbose=2,
    components_time_budget=10,
)


# run autocausality
ct.fit(data=cd, outcome=cd.outcomes[0]))
# return best estimator
print(f"Best estimator: {ct.best_estimator}")
# config of best estimator:
print(f"best config: {ct.best_config}")
# best score:
print(f"best score: {ct.best_score}")


[flaml.tune.tune: 05-12 15:46:39] {636} INFO - trial 1 config: {'estimator': {'estimator_name': 'backdoor.auto_causality.models.NaiveDummy'}}


Fitting a Propensity-Weighted scoring estimator to be used in scoring tasks
Initial configs: [{'estimator': {'estimator_name': 'backdoor.auto_causality.models.NaiveDummy'}}, {'estimator': {'estimator_name': 'backdoor.auto_causality.models.Dummy'}}, {'estimator': {'estimator_name': 'backdoor.econml.metalearners.SLearner'}}, {'estimator': {'estimator_name': 'backdoor.econml.dr.ForestDRLearner', 'min_propensity': 1e-06, 'n_estimators': 100, 'min_samples_split': 5, 'min_samples_leaf': 5, 'min_weight_fraction_leaf': 0.0, 'max_features': 'auto', 'min_impurity_decrease': 0.0, 'max_samples': 0.45, 'min_balancedness_tol': 0.45, 'honest': True, 'subforest_size': 4}}, {'estimator': {'estimator_name': 'backdoor.econml.dml.SparseLinearDML', 'fit_cate_intercept': True, 'n_alphas': 100, 'n_alphas_cov': 10, 'tol': 0.0001, 'max_iter': 10000, 'mc_agg': 'mean'}}]


[flaml.tune.tune: 05-12 15:46:39] {198} INFO - result: {'energy_distance': 0.3409869959994314, 'estimator_name': 'backdoor.auto_causality.models.NaiveDummy', 'scores': {'train': {'ate': 4.029953761597362, 'ate_std': 0.038202260766478265, 'erupt': 6.428079560416515, 'norm_erupt': 2.3474874846745637, 'qini': -9.28235433124249, 'auc': 0.49154811806308435, 'values':      treated  y_factual        p  policy  norm_policy   weights
0          0   7.045186  0.20268    True        False  0.000000
1          0   3.182114  0.20268    True         True  0.000000
2          1   7.647942  0.20268    True        False  4.933884
3          0   4.498734  0.20268    True        False  0.000000
4          1   6.554958  0.20268    True         True  4.933884
..       ...        ...      ...     ...          ...       ...
592        1   8.025306  0.20268    True        False  4.933884
593        0   2.532334  0.20268    True        False  0.000000
594        0   2.874851  0.20268    True         True  0.00

After running a fit, you can resume it without losing past results, for example if you want to search over extra estimators.

In [None]:
# we can now resume the fit to continue with the init_cfgs which we haven't tried yet
# ct.fit(train_df, treatment, outcome, features_W, features_X,resume=True)
# # return best estimator
# print(f"Best estimator: {ct.best_estimator}")
# # config of best estimator:
# print(f"best config: {ct.best_config}")
# # best score:
# print(f"best score: {ct.best_score}")

In [None]:
# ct.results.results

In [None]:
# score all estimators on the test set, which we've kept aside up till now
for est_name, scr in ct.scores.items():
    causal_estimate = scr['estimator']
    scr['scores']['test'] = ct.scorer.make_scores(causal_estimate, test_df, problem=ct.problem, metrics_to_report=ct.metrics_to_report)

In [None]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick

colors = ([matplotlib.colors.CSS4_COLORS['black']] +
    list(matplotlib.colors.TABLEAU_COLORS) + [
    matplotlib.colors.CSS4_COLORS['lime'],
    matplotlib.colors.CSS4_COLORS['yellow'],
    matplotlib.colors.CSS4_COLORS['pink']
])


plt.figure(figsize = (7,5))
plt.title(outcome)

m1 = "ate"
m2 = "norm_erupt"

for (est, scr), col in zip(ct.scores.items(),colors):
    try:
        sc = [scr["scores"]['train'][m1], scr["scores"]['validation'][m1], scr["scores"]['test'][m1]]
        crv = [scr["scores"]['train'][m2], scr["scores"]['validation'][m2], scr["scores"]['test'][m2]]
        plt.plot(sc, crv, color=col, marker="o", label=est)
        plt.scatter(sc[1:2],crv[1:2], c=col, s=70, label="_nolegend_" )
        plt.scatter(sc[2:],crv[2:], c=col, s=120, label="_nolegend_" )

    except:
        pass
plt.xlabel(m1)
plt.ylabel(m2)

plt.legend(bbox_to_anchor=(1.04,1), borderaxespad=0)

plt.grid()
plt.show()


In [None]:
scr = ct.scores[ct.best_estimator]
intrp = scr["scores"]['validation']['intrp']
plt.figure(figsize=(15, 7))
intrp.plot(feature_names=intrp.feature_names, fontsize=10)
plt.title(f"{ct.best_estimator}_{outcome}")
plt.show()


In [None]:
import matplotlib.pyplot as plt
import shap

# and now let's visualize feature importances!
from auto_causality.shap import shap_values

# Shapley values calculation can be slow so let's subsample
this_df = test_df.sample(100)

if "Dummy" not in ct.best_estimator:
    scr = ct.scores[ct.best_estimator]
    print(outcome, ct.best_estimator)
    est = ct.model
    shaps = shap_values(est, this_df)

    plt.title(outcome + '_' + ct.best_estimator.split('.')[-1])
    shap.summary_plot(shaps, this_df[est.estimator._effect_modifier_names])
    plt.show()
else: 
    print(f"The best performing model is {ct.best_estimator} which doesn't depend on features")


In [None]:
# plot out-of sample difference of outcomes between treated and untreated for the points where a model predicts positive vs negative impact
my_est = ct.best_estimator
stats = []

v = ct.scores[my_est]['scores']['test']['values']

sts = ct.scorer.group_ate(test_df.reset_index(), v['norm_policy'])

display(sts)


colors = (matplotlib.colors.CSS4_COLORS['black'],
    matplotlib.colors.CSS4_COLORS['red'],
    matplotlib.colors.CSS4_COLORS['blue'])

grp = sts["policy"].unique()

for i,(p,c) in enumerate(zip(grp, colors)):
    st = sts[sts["policy"] == p]
    plt.errorbar(np.array(range(len(st))) +0.1*i, st["mean"].values[0],  yerr = st["std"].values[0], color=c)
plt.legend(grp)
plt.grid(True)
plt.title(my_est.split('.')[-1])
plt.show()

In [None]:
# plot out-of sample difference of outcomes between treated and untreated for the points where a model predicts positive vs negative impact
my_est = ct.best_estimator
stats = []

v = ct.scores[my_est]['scores']['test']['values']

sts = ct.scorer.group_ate(test_df, v['norm_policy'])

display(sts)


colors = (matplotlib.colors.CSS4_COLORS['black'],
    matplotlib.colors.CSS4_COLORS['red'],
    matplotlib.colors.CSS4_COLORS['blue'])

grp = sts["policy"].unique()

for i,(p,c) in enumerate(zip(grp, colors)):
    st = sts[sts["policy"] == p]
    plt.errorbar(np.array(range(len(st))) +0.1*i, st["mean"].values[0],  yerr = st["std"].values[0], color=c)
plt.legend(grp)
plt.grid(True)
plt.title(my_est.split('.')[-1])
plt.show()