EconML is a Python package for estimating heterogeneous treatment effects from observational data via machine learning. This package was designed and built as part of the ALICE project at Microsoft Research with the goal to combine state-of-the-art machine learning techniques with econometrics to bring automation to complex causal inference problems. The promise of EconML:
- Implement recent techniques in the literature at the intersection of econometrics and machine learning
- Maintain flexibility in modeling the effect heterogeneity (via techniques such as random forests, boosting, lasso and neural nets), while preserving the causal interpretation of the learned model and often offering valid confidence intervals
- Use a unified API
- Build on standard Python packages for Machine Learning and Data Analysis
In a nutshell, this
toolkit is designed to measure the causal effect of some treatment variable(s)
T on an outcome
Y, controlling for a set of features
X. For detailed information about the package,
consult the documentation at https://econml.azurewebsites.net/.
Table of Contents
About Treatment Effect Estimation
One of the biggest promises of machine learning is to automate decision making in a multitude of domains. At the core of many data-driven personalized decision scenarios is the estimation of heterogeneous treatment effects: what is the causal effect of an intervention on an outcome of interest for a sample with a particular set of features?
Such questions arise frequently in customer segmentation (what is the effect of placing a customer in a tier over another tier), dynamic pricing (what is the effect of a pricing policy on demand) and medical studies (what is the effect of a treatment on a patient). In many such settings we have an abundance of observational data, where the treatment was chosen via some unknown policy, but the ability to run control A/B tests is limited.
06/03/2019: Release v0.4, see release notes here.
05/03/2019: Release v0.3, see release notes here.
04/10/2019: Release v0.2, see release notes here.
03/06/2019: Release v0.1, welcome to have a try and provide feedback.
Install the latest release from PyPI:
pip install econml
To install from source, see For Developers section below.
from econml.dml import DMLCateEstimator from sklearn.linear_model import LassoCV est = DMLCateEstimator(model_y=LassoCV(), model_t=LassoCV()) est.fit(Y, T, X, W) # W -> high-dimensional confounders, X -> features treatment_effects = est.effect(X_test)
from econml.ortho_forest import ContinuousTreatmentOrthoForest # Use defaults est = ContinuousTreatmentOrthoForest() # Or specify hyperparameters est = ContinuousTreatmentOrthoForest(n_trees=500, min_leaf_size=10, max_depth=10, subsample_ratio=0.7, lambda_reg=0.01, model_T=LassoCV(cv=3), model_Y=LassoCV(cv=3) ) est.fit(Y, T, X, W) treatment_effects = est.effect(X_test)
import keras from econml.deepiv import DeepIVEstimator treatment_model = keras.Sequential([keras.layers.Dense(128, activation='relu', input_shape=(2,)), keras.layers.Dropout(0.17), keras.layers.Dense(64, activation='relu'), keras.layers.Dropout(0.17), keras.layers.Dense(32, activation='relu'), keras.layers.Dropout(0.17)]) response_model = keras.Sequential([keras.layers.Dense(128, activation='relu', input_shape=(2,)), keras.layers.Dropout(0.17), keras.layers.Dense(64, activation='relu'), keras.layers.Dropout(0.17), keras.layers.Dense(32, activation='relu'), keras.layers.Dropout(0.17), keras.layers.Dense(1)]) est = DeepIVEstimator(n_components=10, # Number of gaussians in the mixture density networks) m=lambda z, x: treatment_model(keras.layers.concatenate([z, x])), # Treatment model h=lambda t, x: response_model(keras.layers.concatenate([t, x])), # Response model n_samples=1 # Number of samples used to estimate the response ) est.fit(Y, T, X, Z) # Z -> instrumental variables treatment_effects = est.effect(T0, T1, X_test)
Bootstrap Confidence Intervals
from econml.dml import DMLCateEstimator est = DMLCateEstimator(model_y=LassoCV(), model_t=LassoCV(), inference='bootstrap') est.fit(Y, T, X, W) treatment_effect_interval = est.effect_interval(X_test, lower=1, upper=99)
You can get started by cloning this repository. We use
setuptools for building and distributing our package.
We rely on some recent features of setuptools, so make sure to upgrade to a recent version with
pip install setuptools --upgrade. Then from your local copy of the repository you can run
python setup.py develop to get started.
Running the tests
This project uses pytest for testing. To run tests locally after installing the package,
you can use
python setup.py pytest.
Generating the documentation
This project's documentation is generated via Sphinx. To generate a local copy
of the documentation from a clone of this repository, just run
python setup.py build_sphinx, which will build the documentation and place it
The reStructuredText files that make up the documentation are stored in the docs directory; module documentation is automatically generated by the Sphinx build process.
Blogs and Publications
June 2019: Treatment Effects with Instruments paper
May 2019: Open Data Science Conference Workshop
2017: DeepIV paper
Contributing and Feedback
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
V. Syrgkanis, V. Lei, M. Oprescu, M. Hei, K. Battocchi, G. Lewis. Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments. ArXiv preprint arXiv:1905.10176, 2019
D. Foster, V. Syrgkanis. Orthogonal Statistical Learning. Proceedings of the 32nd Annual Conference on Learning Theory (COLT), 2019 (Best Paper Award)
M. Oprescu, V. Syrgkanis and Z. S. Wu. Orthogonal Random Forest for Causal Inference. Proceedings of the 36th International Conference on Machine Learning, ICML'19, 2019.
V. Chernozhukov, D. Nekipelov, V. Semenova, V. Syrgkanis. Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models. Arxiv preprint arxiv:1806.04823, 2018.
Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. Deep IV: A flexible approach for counterfactual prediction. Proceedings of the 34th International Conference on Machine Learning, ICML'17, 2017.
V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, and a. W. Newey. Double Machine Learning for Treatment and Causal Parameters. ArXiv preprint arXiv:1608.00060, 2016.