## FLAML for hp optimisation and model selection
We use FLAML twice, first to find the best component model for each estimator, and then to optimise the estimators themselves and choose the best estimator. Here we show how it's done

In [1]:
%load_ext autoreload
%autoreload 2
import os, sys
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

root_path = root_path = os.path.realpath('../..')
data_dir = os.path.realpath(os.path.join(root_path, "auto-causality/data"))
if not os.path.isdir(data_dir):
    os.mkdir(data_dir)

sys.path.append(os.path.join(root_path, "auto-causality"))

In [3]:
from auto_causality.utils import featurize
from auto_causality import AutoCausality

  from pandas import Int64Index as NumericIndex


In [4]:
# set all the control parameters here
train_size = 0.5
test_size = None
time_budget = 300
num_cores = os.cpu_count() - 1
conf_intervals = False


In [5]:
# load raw data
data = pd.read_csv(
    "https://raw.githubusercontent.com/AMLab-Amsterdam/CEVAE/master/datasets/IHDP/csv/ihdp_npci_1.csv",
    header=None,
)
col = [
    "treatment",
    "y_factual",
    "y_cfactual",
    "mu0",
    "mu1",
]
for i in range(1, 26):
    col.append("x" + str(i))
data.columns = col
# drop the columns we don't care about
ignore_patterns = ["y_cfactual", "mu"]
ignore_cols = [c for c in data.columns if any([s in c for s in ignore_patterns])]
data = data.drop(columns=ignore_cols)


# prepare the data

treatment = "treatment"
targets = ["y_factual"]  # it's good to allow multiple ones
features = [c for c in data.columns if c not in [treatment] + targets]

data[treatment] = data[treatment].astype(int)
# this is a trick to bypass some DoWhy/EconML bugs
data["random"] = np.random.randint(0, 2, size=len(data))

used_df = featurize(
    data, features=features, exclude_cols=[treatment] + targets, drop_first=False,
)
used_features = [
    c for c in used_df.columns if c not in ignore_cols + [treatment] + targets
]


# Let's treat all features as effect modifiers
features_X = [f for f in used_features if f != "random"]
features_W = [f for f in used_features if f not in features_X]


train_df, test_df = train_test_split(used_df, train_size=train_size)
if test_size is not None:
    test_df = test_df.sample(test_size)

test_df.to_csv(os.path.join(data_dir, f"test_{time_budget}.csv"))
train_df.to_csv(os.path.join(data_dir, f"train_{time_budget}.csv"))


### Model fitting & scoring
Here we fit a (selection of) model(s) to the data and score them with the ERUPT metric on held-out data

In [39]:
params = {
    "flaml": {
        "component_params": {
            "time_budget": 10,
            "verbose": 0,
            "task": "regression",
            "n_jobs": num_cores,
            "pred_time_limit": 10 / 1e6,
        },
        "estimator_params": {
            "time_budget_s": 10,
            "num_samples": 10,
            "verbose": 0,
            "use_ray": False,
        },
    },
    "estimator_list": [
        "backdoor.econml.dml.LinearDML",
        "backdoor.econml.dr.LinearDRLearner",
        "backdoor.econml.orf.DMLOrthoForest",
        "backdoor.econml.orf.DROrthoForest",
        # # "backdoor.auto_causality.dowhy_wrapper.\
        #     # direct_uplift.DirectUpliftDoWhyWrapper", # Not working
        # "backdoor.econml.dml.CausalForestDML", # Takes too long
        "backdoor.econml.dml.SparseLinearDML",
        # "backdoor.econml.dr.SparseLinearDRLearner", # Not working due to 'precompute' parameter
        "backdoor.econml.dr.ForestDRLearner",
        "backdoor.econml.metalearners.DomainAdaptationLearner",
        "backdoor.econml.metalearners.XLearner",
        "backdoor.econml.metalearners.TLearner",
        "backdoor.econml.metalearners.SLearner",
    ],
    "metric": "ERUPT",
}


outcome = targets[0]
auto_causality = AutoCausality(params)

auto_causality.fit(train_df, test_df, treatment, outcome,
 features_W, features_X)

print(f"Best estimator: {auto_causality.best_estimator}")


Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.


Estimator: backdoor.econml.dml.LinearDML
... ERUPT: 6.315092881520579


Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.


Estimator: backdoor.econml.dr.LinearDRLearner
... ERUPT: 6.315092881520579


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 448 out of 448 | elapsed:    7.9s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 448 out of 448 | elapsed:   11.6s finished
Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.1s remai

Estimator: backdoor.econml.orf.DMLOrthoForest
... ERUPT: 6.315092881520579


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 448 out of 448 | elapsed:    7.0s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 448 out of 448 | elapsed:    6.8s finished
Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.1s remai

Estimator: backdoor.econml.orf.DROrthoForest
... ERUPT: 6.315092881520579


Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.


Estimator: backdoor.econml.dml.SparseLinearDML
... ERUPT: 6.315092881520579


Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.
A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().


Estimator: backdoor.econml.dr.ForestDRLearner
... ERUPT: 6.315092881520579


Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.
A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().


Estimator: backdoor.econml.metalearners.DomainAdaptationLearner
... ERUPT: 6.29965634157214


Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.
A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().


Estimator: backdoor.econml.metalearners.XLearner
... ERUPT: 6.315092881520579


Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.
A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().


Estimator: backdoor.econml.metalearners.TLearner
... ERUPT: 6.327245929120119
Estimator: backdoor.econml.metalearners.SLearner
... ERUPT: 6.2895537775915935
Best estimator: backdoor.econml.metalearners.TLearner


Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.
Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.
