Skip to content

finite-sample/tworeg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Two Regressions and a Bootstrap

Regression Calibration for ML-Generated Covariates and the Nonlinear Boundary


When ML predictions are used as regressors in downstream models, the prediction error is non-classical and a growing literature proposes purpose-built corrections: GMM (Fong and Tyler 2021), prediction-powered inference (Angelopoulos et al. 2023), instrumental variables (Yang et al. 2022), joint MLE (Battaglia et al. 2025). This note observes that for linear downstream models, regression calibration — replacing the ML prediction with E[X | X̂, Z] estimated on a calibration sample — already eliminates the non-classical error structure under an exogeneity condition these methods also rely on. Two OLS regressions and a two-sample bootstrap give you consistent estimates with valid confidence intervals. For nonlinear downstream models (logistic, Poisson, any GLM), Jensen's inequality breaks the argument and heavier methods are genuinely needed. The linear/nonlinear boundary is the main result.

Replication

# Generate tables, figures, and compile (full: ~30 min, 500 sims)
./build.sh

# Quick mode (~5 min, 100 sims) for verification
./build.sh --quick

Requires Python 3.8+ with numpy, scikit-learn, scipy, matplotlib, and a LaTeX distribution with pdflatex and bibtex.

What the simulations cover

Experiment Downstream model Prediction quality Learner Sims
1 Linear Heteroskedastic (known + estimated) Ridge 500
1b Linear Heteroskedastic (known) Random Forest 200
2 Linear Homoskedastic Ridge 500
3 Linear Heteroskedastic, varying n_cal Ridge 500
4 Logistic Heteroskedastic (known) Ridge 500

Every method reports 95% CIs: two-sample bootstrap for regression calibration and moment EIV, posterior credible intervals for the Gibbs sampler, Wald for oracle/naive, sum-of-variances for PPI.

Key findings

Linear case. All correction methods are approximately unbiased. Regression calibration with known reliability groups has the best RMSE after the latent-variable oracle. The two-sample bootstrap (resample C and U separately) gives near-nominal coverage. Under homoskedasticity, the ordering reverses: moment EIV beats regression calibration.

Nonlinear case. Regression calibration has a Jensen's-inequality bias of ≈ −0.05 in logistic regression (about a quarter of the naive bias). The Metropolis-within-Gibbs latent-variable model removes it. PPI is also unbiased but has high variance.

Estimated heteroskedasticity. Regression calibration remains approximately unbiased when the variance is estimated from calibration residuals. The Gibbs sampler is more sensitive: precision weighting amplifies variance-model errors.

Files

paper.tex          LaTeX source
references.bib     Bibliography (11 entries, validated)
make_tables.py     Replication script → tables/*.tex + figures/*.pdf
build.sh           One-command build
tables/            Generated LaTeX tables (\input from paper.tex)
figures/           Generated PDF figures (\includegraphics from paper.tex)
paper.pdf          Compiled paper

The recipe

For a linear downstream model Y = β_D D + β_x X + β_z Z + ε, where X is predicted by an ML model and D is the coefficient of interest:

  1. Calibration sample C: Regress X on (X̂, Z) — possibly per reliability group
  2. Target sample U: Predict X̃ from the calibration model, regress Y on (D, X̃, Z)
  3. Inference: Resample C and U independently with replacement, re-run both stages, take percentile CIs

That's it. Consistent for β_D under E[ε | X̂, Z] = 0. No GMM, no IV, no MCMC.

When this breaks: logistic regression, Poisson, any GLM with a nonlinear link. Then you need joint estimation or PPI.

References

  • Angelopoulos et al. (2023). Prediction-powered inference. Science, 382(6671), 669–674.
  • Battaglia et al. (2025). Inference for regression with variables generated by AI or ML. arXiv:2402.15585v5.
  • Boonstra, Little, and Mitani (2021). Bias due to Berkson error. Biostatistics, 23(4), 1063–1078.
  • Carroll et al. (2006). Measurement Error in Nonlinear Models. Chapman and Hall/CRC.
  • Fong and Tyler (2021). ML predictions as regression covariates. Political Analysis, 29(4), 467–484.
  • Wang, McCormick, and Leek (2020). Correcting inference based on predicted outcomes. PNAS, 117(48), 30266–30275.
  • Yang et al. (2022). Causal inference with data-mined variables. INFORMS JDS, 1(2), 138–155.
  • Zrnic and Candès (2024). Cross-prediction-powered inference. PNAS, 121(15), e2322083121.

About

Two Regressions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors