Sketched Ridgeless Linear Regression
This repository presents numerical simulations that analyze the empirical risks of the sketched ridgeless estimator, aiming to enhance generalization performance. The simulations focus on determining optimal sketching sizes that minimize out-of-sample prediction risks. The results reveal that the optimally sketched estimator exhibits stable risk curves, effectively eliminating the peaks observed in the full-sample estimator. Additionally, we introduce a practical procedure to empirically identify the optimal sketching size.
Suppose we observe data vectors (xi,yi) that follow a linear model yi=xiTβ+εi, i=1,...n, where yi is a univariate response, xi is a d-dimensional predictor, β denotes the vector of regression coefficients, and εi is a random error. We consider the ridgeless least square estimator β̂=(XTX)+XTY.
With this package, the simulation results in this paper can be reporduced.
Install SRLR_python from PyPI:
pip install SRLR
Please refer to tutorial.ipynb for a comprehensive example and step-by-step guide.
Chen, X., Zeng, Y., Yang, S. and Sun, Q. Sketched Ridgeless Linear Regression: The Role of Downsampling. Paper