Skip to content

google-research/sloe-logistic

Repository files navigation

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Not an official Google product.

Method Introduction

This library provides statistical inference for high dimensional logistic regression maximum likelihood, based largely on the breakthrough results from Sur and Candès (PNAS, 2019). The challenge with applying their results is that they depend on an unobserved signal strength quantity. Our method estimates this quantity via a leave-one-out approach, which we outline in our paper [1].

By high-dimensions, we mean that the ratio of the number of covariates p to the sample size n is strictly between 0 and 0.5. When the number of covariates is too large, the data is separable, and our method will not help to recover from such a case. When the number of covariates is small (say, p <= 0.05 * n), the high dimensional adjustment is a bit numerically unstable, and adds little value over the standard large-sample theory.

The setting studied is complementary to sparse high dimensional regimes. We assume that there are a relatively large number of covariates that are weakly correlated with the binary outcome. If one expects only a very small number of the many candidate covariates to have a nonzero coefficient in the model, sparse model selection and post-selective inference is probably a better approach than the one taken here.

Installation and tests

Run run.sh to install requirements and package, and run tests.

Usage

The main approach proposed in our work is implemented in the UnbiasedLogisticRegression class in unbiased_logistic_regression.py. This has an sklearn-like interface, with a fit, decision_function and predict_proba API. Additionally, for inference, we've added a prediction_intervals method. See the inline documentation for more details of usage.

Citation

[1] S. Yadlowsky, T. Yun, C. McLean, A. D'Amour (2021). "SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression". arXiv:2103.12725 [stat.ML].

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published