# Propensity Score Weighting in CausalTune 


In [1]:
import os
import sys
import pandas as pd
import numpy as np
import warnings

from scipy.stats import betabinom

from sklearn.ensemble import RandomForestClassifier

root_path = root_path = os.path.realpath('../..')
try:
    import causaltune
except ModuleNotFoundError:
    sys.path.append(os.path.join(root_path, "causaltune"))

from causaltune import CausalTune
from causaltune.data_utils import CausalityDataset
from causaltune.datasets import generate_non_random_dataset

warnings.filterwarnings("ignore")

%load_ext autoreload
%autoreload 2




CausalTune effect estimation consists of multiple models that can / need to be fitted.

1. Propensity model to estimate treatment propensities from features $\mathbb{E}[T|X,W]$.
2. Outcome model to estimate outcomes from features $\mathbb{E}[Y|X,W]$
3. The final causal inference estimator which requires additional hyperparamter tuning.

In this notebook, we focu on the Propensity Score Weighting (1.).

There are four options to finding a propensity model.

1. **[Default:] use a dummy estimator.**
   - natural option for a computationally easy model / when perfect randomisation of the treatment is given 


2. **Letting AutoML fit the propensity model,**
   - has all the advantages of using an elaborate propensity weighting model  


3. **supply a custom sklearn-compatible prediction model,**
   - for more flexibility in terms of propensity prediction model


4. **supply an array of custom propensities to treat.** 
   - can be used, e.g. with custom propensities to treat based on an optimisation procedure such as Thompson sampling when there is an expected benefit from treating some subjects with higher propensity than others

### Generate Data

In [2]:
cd = generate_non_random_dataset()
cd.preprocess_dataset()

In this synthetic dataset, the **true (constant) treatment effect** is $0.2$.

In [3]:
# CausalTune configuration
components_time_budget = 40
train_size = 0.7

target = cd.outcomes[0]

### 1. DEFAULT: Dummy propensity model


The dummy propensity model identifies a constant propensity to treat given by $\frac{\text{Treatment Group Size}}{\text{Total Sample Size}}  $.

In [4]:
ct = CausalTune(
    propensity_model='dummy',
    components_time_budget=components_time_budget,
    metric="energy_distance",
    train_size=train_size,
    verbose=0
)   
ct.fit(data=cd, outcome=target)

Fitting a Propensity-Weighted scoring estimator to be used in scoring tasks
Initial configs: [{'estimator': {'estimator_name': 'backdoor.causaltune.models.NaiveDummy'}}, {'estimator': {'estimator_name': 'backdoor.causaltune.models.Dummy'}}, {'estimator': {'estimator_name': 'backdoor.econml.metalearners.SLearner'}}, {'estimator': {'estimator_name': 'backdoor.econml.metalearners.DomainAdaptationLearner'}}, {'estimator': {'estimator_name': 'backdoor.econml.dr.ForestDRLearner', 'min_propensity': 1e-06, 'n_estimators': 100, 'min_samples_split': 5, 'min_samples_leaf': 5, 'min_weight_fraction_leaf': 0.0, 'max_features': 'auto', 'min_impurity_decrease': 0.0, 'max_samples': 0.45, 'min_balancedness_tol': 0.45, 'honest': True, 'subforest_size': 4}}, {'estimator': {'estimator_name': 'backdoor.econml.dml.CausalForestDML', 'drate': True, 'n_estimators': 100, 'criterion': 'mse', 'min_samples_split': 10, 'min_samples_leaf': 5, 'min_weight_fraction_leaf': 0.0, 'max_features': 'auto', 'min_impurity_decr

In [5]:
print(f'Propensity model: {ct.propensity_model}')

Propensity model: DummyClassifier()


Difference in means estimate (naive ATE):

In [6]:
print(f'{ct.scorer.naive_ate(cd.data[cd.treatment], cd.data[target])[0]:.5f}')

0.30755


CausalTune ATE estimate:

In [7]:
print(f'{ct.effect(ct.test_df).mean():.5f}')

0.26557


### 2. Propensity model estimation via AutoML

The propensity score weighting estimation via AutoML is as simple as selecting `propensity_model='auto'`. 

The computational intensity can then be adapted via supplying a `components_budget_time`.


In [8]:
ct = CausalTune(
    propensity_model='auto',
    components_time_budget=components_time_budget,
    metric="energy_distance",
    train_size=train_size,
    verbose=0
)   
ct.fit(data=cd, outcome=target)
print(f'Estimated ATE: {ct.effect(ct.test_df).mean():.5f}')

Fitting a Propensity-Weighted scoring estimator to be used in scoring tasks
Initial configs: [{'estimator': {'estimator_name': 'backdoor.causaltune.models.NaiveDummy'}}, {'estimator': {'estimator_name': 'backdoor.causaltune.models.Dummy'}}, {'estimator': {'estimator_name': 'backdoor.econml.metalearners.SLearner'}}, {'estimator': {'estimator_name': 'backdoor.econml.metalearners.DomainAdaptationLearner'}}, {'estimator': {'estimator_name': 'backdoor.econml.dr.ForestDRLearner', 'min_propensity': 1e-06, 'n_estimators': 100, 'min_samples_split': 5, 'min_samples_leaf': 5, 'min_weight_fraction_leaf': 0.0, 'max_features': 'auto', 'min_impurity_decrease': 0.0, 'max_samples': 0.45, 'min_balancedness_tol': 0.45, 'honest': True, 'subforest_size': 4}}, {'estimator': {'estimator_name': 'backdoor.econml.dml.CausalForestDML', 'drate': True, 'n_estimators': 100, 'criterion': 'mse', 'min_samples_split': 10, 'min_samples_leaf': 5, 'min_weight_fraction_leaf': 0.0, 'max_features': 'auto', 'min_impurity_decr

In [9]:
print(f'Propensity model: {ct.propensity_model}')

Propensity model: AutoML(append_log=False, auto_augment=True, custom_hp={},
       cv_score_agg_func=None, early_stop=False, ensemble=False,
       estimator_list='auto', eval_method='holdout', fit_kwargs_by_estimator={},
       hpo_method='auto', keep_search_state=False, learner_selector='sample',
       log_file_name='', log_training_metric=False, log_type='better',
       max_iter=None, mem_thres=4294967296, metric='auto',
       metric_constraints=[('pred_time', '<=', 1e-05)], min_sample_size=10000,
       model_history=False, n_concurrent_trials=1, n_jobs=-1, n_splits=5,
       pred_time_limit=1e-05, preserve_checkpoint=True, retrain_full=True,
       sample=True, skip_transform=False, split_ratio=0.30000000000000004, ...)


### 3. Propensity model estimation with a custom model

A custom propensity model that has an sklearn-style `fit` and `predict_proba` method can be supplied as a propensity model.

In [10]:
propensity_model = RandomForestClassifier()

ct = CausalTune(
    propensity_model=propensity_model,
    components_time_budget=components_time_budget,
    metric="energy_distance",
    train_size=train_size,
)   
ct.fit(data=cd, outcome=target)
print(f'Estimated ATE: {ct.effect(ct.test_df).mean():.5f}')

Fitting a Propensity-Weighted scoring estimator to be used in scoring tasks
Initial configs: [{'estimator': {'estimator_name': 'backdoor.causaltune.models.NaiveDummy'}}, {'estimator': {'estimator_name': 'backdoor.causaltune.models.Dummy'}}, {'estimator': {'estimator_name': 'backdoor.econml.metalearners.SLearner'}}, {'estimator': {'estimator_name': 'backdoor.econml.metalearners.DomainAdaptationLearner'}}, {'estimator': {'estimator_name': 'backdoor.econml.dr.ForestDRLearner', 'min_propensity': 1e-06, 'n_estimators': 100, 'min_samples_split': 5, 'min_samples_leaf': 5, 'min_weight_fraction_leaf': 0.0, 'max_features': 'auto', 'min_impurity_decrease': 0.0, 'max_samples': 0.45, 'min_balancedness_tol': 0.45, 'honest': True, 'subforest_size': 4}}, {'estimator': {'estimator_name': 'backdoor.econml.dml.CausalForestDML', 'drate': True, 'n_estimators': 100, 'criterion': 'mse', 'min_samples_split': 10, 'min_samples_leaf': 5, 'min_weight_fraction_leaf': 0.0, 'max_features': 'auto', 'min_impurity_decr

In [11]:
print(f'Propensity model: {ct.propensity_model}')

Propensity model: RandomForestClassifier()


### 4. Supplying individual treatment propensities

In some settings such as uplift modelling, the experiment / study is based on heterogeneous treatment propensities known to the researcher / experimenter. An array of treatment propensities can be directly supplied to CausalTune in the data instantiation of the `CausalityDataset`. This can, e.g. be done by 
```
cd = CausalityDataset(
    ...
    propensity_modifiers=[<individual_treatment_propensity_column_name>]
    ...
)
```
and then using the `passthrough_model` as follows

In [12]:
from causaltune.models.passthrough import passthrough_model

print(cd.data.head())
print(f'True propensities to treat: {cd.propensity_modifiers}')

propensity_model=passthrough_model(
    cd.propensity_modifiers, include_control=False
    )

ct = CausalTune(
    propensity_model=propensity_model,
    components_time_budget=components_time_budget,
    metric="energy_distance",
    train_size=train_size,
    verbose=0
)   
ct.fit(data=cd, outcome=target)
print(f'Estimated ATE: {ct.effect(ct.test_df).mean():.5f}')

   T         Y  random        X1        X2        X3        X4        X5  \
0  0 -1.000312     0.0  0.259595 -0.994360  0.122632 -0.308056  2.110752   
1  0  2.342408     1.0 -0.357165 -1.626471  0.768395  0.239236  0.874304   
2  0 -1.087664     0.0 -0.780095 -1.917028 -0.156848  0.437076  0.516383   
3  1  0.398676     1.0 -0.951582 -0.433123  1.299038  0.193750  1.311885   
4  0  0.897118     1.0 -0.341460 -1.668032 -0.340667  0.548328  1.646835   

   propensity  
0    0.273852  
1    0.148065  
2    0.136952  
3    0.213419  
4    0.188777  
True propensities to treat: ['propensity']
Fitting a Propensity-Weighted scoring estimator to be used in scoring tasks
Initial configs: [{'estimator': {'estimator_name': 'backdoor.causaltune.models.NaiveDummy'}}, {'estimator': {'estimator_name': 'backdoor.causaltune.models.Dummy'}}, {'estimator': {'estimator_name': 'backdoor.econml.metalearners.SLearner'}}, {'estimator': {'estimator_name': 'backdoor.econml.metalearners.DomainAdaptationLearner'