Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upFast approximation of biglasso? #12
Comments
|
You probably want to start with the STRONG rules which eliminates regressors that don't project well onto the response based on the penalty. You can then apply the lasso on the remaining regressors. If you still have too much data you can trade off estimator accuracy for computational complexity either by reducing the numerical accuracy of the slope coefficients in the standard implementation or by using ADMM. I personally prefer the former is probably faster when the data randomly distributed over concurrent partitions. |
|
Thanks for the tips. For now, I have not the time to test this, but hopefully I will someday. |
|
I need to implement this for a book I'm writing. If you can wait a few weeks then I can provide a reference implementation. |
|
Strong rules are implemented in the biglasso package of @YaohuiZeng. I'm also using the code in this package. I'm looking forward to seeing your implementation. |
|
The crux STRONG is checking the KKT conditions. Below is reference code similar to what I have in the book chapter to do this. Note that if you were going to optimize for performance, you'd probably want the vector of slope cefficients Also, note that you'd probably want to change
|
|
Now much faster with #14 |
Find if there is a fast near-optimal rule approximation for computing multivariate linear/logistic regression on biobank-scale datasets in a few hours (or minutes).