A simple implementation of the two-stage ridge regression model described in Hahn et. al (2018) with a scikit-learn compatible API. This model allows us to use many and potentially highly correlated control variables with observational causal inference.
Hahn, P.R., Carvalho, C.M., Puelz, D., He, J., 2018. Regularization and Confounding in Linear Regression for Treatment Effect Estimation. Bayesian Anal. 13. https://doi.org/10.1214/16-BA1044
We have implemented maximum a-posteriori models rather than the fully Bayesian treatment of the regression weights as described in (Hahn et. al, 2018). The model implemented is;
- Selection model: Z = Xβc + ε,
- Response model: Y = α(Z - Xβc) + Xβd + ν.
Here X, Y and Z are random variables. X are the controls, Z is the treatment, and Y is the outcome. βc are first stage the linear regression weights, βd are the second stage linear regression weights on the control variables. α is the average treatment effect (ATE), and ε ~ N(0, σ2ε), ν ~ N(0, σ2ν).
We place l2 regularizers on the regression weights in the regression objective functions,
- Selection model: λc·||βc||22,
- Response model: λd·||βd||22.
No regularisation is applied to α. This formulation leads to a less biased estimation of α over alternate ridge regression models. We can get an intuition for this from the following graphical model,
Here round nodes with letters inside are random variables, square nodes are deterministic functions, and ⊕ is addition/subtraction. Arrows denote the direction of flow. This can be interpreted like a typical triangle graphical model denoting the causal relationships X 🠖 Y, X 🠖 Z and Z 🠖 Y but with the addition of the stage 1 and 2 modelling influences from the equations above. Here r = Z - Xβc.
We can see that the influence of the control variables, X, on the treatment, Z, has explicitly been removed when predicting Y on the path Z 🠖 Y. That is, only the "residual" signal from the treatment variables, r, that is not explained by the control variables is allowed to influence Y through α. This results in the estimation bias of α being a function of the residual r (Equation 7; Hahn et. al 2018), instead of the treatments Z (Equation 3; Hahn et. al 2018). Since r is close to zero (depending on the strength of the regularisation) we end up with a low bias, but higher variance, estimator of α. The estimator is higher variance since Var(α|r) = σ2ν / (Σi r2i) instead of Var(α|Z) = σ2ν / (Σi z2i).
This repository can be directly installed from GitHub, e.g.
$ pip install git+git://github.com/gradientinstitute/twostageridge.git#egg=twostageridge
TwoStageRidge
uses a scikit learn interface. In order to retain compatibility
with all of the pipelines and model selection tools we have to treat the inputs
to the model specially. That is, we have to concatenate the control variables,
X
and the treatment variables Z
into one input array, e.g. W = np.hstack((Z, X))
. For example,
import numpy as np
from twostagerigde import TwoStageRidge
X, Y, Z = load_data() # for some data function
# Where:
# - X.shape -> (N, D)
# - Y.shape -> (N,)
# - Z.shape -> (N,)
W = np.hstack((Z[:, np.newaxis], X))
ts = TwoStageRidge(treatment_index=0) # Column index of the treatment variable
ts.fit(W, Y) # estimate causal effect, alpha
print(ts.model_statistics())
This will print out the estimated average treatment effect, standard error, t-statistic, p-value and degrees of freedom of a two-sided t-test against a null hypothesis of α = 0. For more information on how to use this model, and how to perform model selection for the model parameters, see the notebooks.
Vector treatments, Z, can also be inferred. You just have to specify the
column indices of all treatment variables in W
. For this you can use a numpy
array or a slice.
Class/Function | Description |
---|---|
estimators.TwoStageRidge |
Two stage ridge regression for causal response surface estimation. |
estimators.StatisticalResults |
Statistical results object. |
estimators.ridge_weights |
Compute ridge regression weights. |
metrics.make_first_stage_scorer |
Make a scorer for the first stage of a two stage ridge estimator. |
metrics.make_combined_stage_scorer |
Make a scorer for both stages of a two stage ridge estimator. |
Notebook | Description |
---|---|
model_selection |
A demonstration of how to perform model selection using scikit-learn tools. |
regularisation_bias_exploration |
Experiments exploring how regularisation impacts ATE estimation. |
dimensionality_effect |
A demonstration of how dimensionality and co-linearity influences effect estimation |
Copyright 2021 Gradient Institute
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.