#  Stacking for the estimation of Treatment Effects

# 1. Generate data

generate data according to the same process as Nie X. and Wager S. (2018) 'Quasi-Oracle Estimation of Heterogeneous Treatment Effects'. a library is provided by `causalml` from uber (https://github.com/uber/causalml). 

the goal in simulating data : to provide different examples of data generating processes, in order to conclude upon the effectiveness of stacking for treatment effects in each situation. 

In an experimental setup, or a situation in which we have treated and untreated data, it is necessary to estimate the underlying distribution of the 'nuisance variables', the propensity score (the liklihood, given an observation's characteristics, to be treated), and the underlying treatment effect. As such, we simulate different X, propensity, and treatment functions. 

`causalml` provides an implementation of each data generating function as seen in Nie & Wager, accessible through five possible modes passed to synthetic_data() :          

    `       1 for difficult nuisance components and an easy treatment effect.
            2 for a randomized trial.
            3 for an easy propensity and a difficult baseline.
            4 for unrelated treatment and control groups.
            5 for a hidden confounder biasing treatment.
`

returning a dataset inclduing   : 
            - y ((n,)-array): outcome variable.
            - X ((n,p)-ndarray): independent variables.
            - w ((n,)-array): treatment flag with value 0 or 1.
            - tau ((n,)-array): individual treatment effect.
            - b ((n,)-array): expected outcome.
            - e ((n,)-array): propensity of receiving treatment.
`


# 2. Feed in different models

We continue in the methodology of Nie and Wager, (QUOTE), by creating R learners using various methods to estimate the underlying functions for the X variables. 
(for the moment we will always use ElasticNetPropensityModel to estimate propensity scores.)

How many models can we feed in? 

models_dict = {'XGB': BaseRRegressor(learner=XGBRegressor()),
               'LinearRegression':BaseRRegressor(learner=LinearRegression()),
               'DecisionTree':BaseRRegressor(learner=DecisionTreeRegressor())}
               
               ##CausalTreeRegressor?
               ##KNeighborsRegressor
               ##SVR ?

In [2]:
'''
estimators = {'learner_xgb': BaseRRegressor(learner=XGBRegressor()),
              'learner_lr': BaseRRegressor(learner=LinearRegression()),
              'learner_dtr': BaseRRegressor(learner=DecisionTreeRegressor()),
              'learner_ctr': BaseRRegressor(learner=CausalTreeRegressor()),
              'learner_knr': BaseRRegressor(learner=KNeighborsRegressor()),
              'learner_svr': BaseRRegressor(learner=SVR())}
              '''

from causalml.dataset import *
from causalml.inference.meta import BaseRRegressor
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from xgboost import XGBRegressor


## start a model -- feeding in XGB => to fit the nuisance model. feed in elasticpropensitymodel
learner_xgb = BaseRRegressor(learner=XGBRegressor())
learner_lr = BaseRRegressor(learner=LinearRegression())
learner_dtr = BaseRRegressor(learner=DecisionTreeRegressor())

###would be cool to find some other working learners, and to start messing with the params of each!!!!
#learner_knr = BaseRRegressor(learner=KNeighborsRegressor())
#learner_svr = BaseRRegressor(learner=SVR())
#learner_ctr = BaseRRegressor(learner=CausalTreeRegressor())

learner_xgb = BaseRRegressor(learner=XGBRegressor())
learner_lr = BaseRRegressor(learner=LinearRegression())
learner_dtr = BaseRRegressor(learner=DecisionTreeRegressor())

estimators = {'learner_xgb': BaseRRegressor(learner=XGBRegressor()),
              'learner_lr': BaseRRegressor(learner=LinearRegression()),
              'learner_dtr': BaseRRegressor(learner=DecisionTreeRegressor())}

predictions = get_synthetic_preds(simulate_nuisance_and_easy_treatment,
                                               n=50000,
                                               estimators=estimators)

The sklearn.utils.testing module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.utils. Anything that cannot be imported from sklearn.utils is now part of the private API.


name and learner!!!!
learner_xgb
name and learner!!!!
learner_lr
name and learner!!!!
learner_dtr


In [None]:
## validating accuracy : 
"""Generate a summary for predictions on synthetic data for train and holdout using specified function

    Args:
        synthetic_data_func (function): synthetic data generation function
        n (int, optional): number of samples per simulation
        valid_size(float,optional): validation/hold out data size
        k (int, optional): number of simulations


    Returns:
        (tuple): summary evaluation metrics of predictions for train and validation:

          - summary_train (pandas.DataFrame): training data evaluation summary
          - summary_train (pandas.DataFrame): validation data evaluation summary
    """
    
    ###YOU CAN PROBABLY FUCK WITH THE SOURCE CODE OF THIS TO MAKE A DIFFERENT VALIDATION SUMMARY, NO PROB. 

train_summary, validation_summary = get_synthetic_summary_holdout(simulate_nuisance_and_easy_treatment,
                                                                  n=10000,
                                                                  valid_size=0.2,
                                                                  k=10)