New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
correcting information criterion calculation in least_angle.py #6080
Conversation
The information criterion calculation is not compatible with the original paper Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192. APA
@mehmetbasbug sorry for the slow reply. @agramfort @GaelVaroquaux do you have any input on this? |
sorry @mehmetbasbug for the slow reaction can you point me to the equation you refer to? |
Eq. 2.15 and Eq. 2.16 |
@@ -1424,6 +1424,7 @@ def fit(self, X, y, copy_X=True): | |||
|
|||
R = y[:, np.newaxis] - np.dot(X, coef_path_) # residuals | |||
mean_squared_error = np.mean(R ** 2, axis=0) | |||
sigma = np.var(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call it sigma2
@@ -1437,7 +1438,7 @@ def fit(self, X, y, copy_X=True): | |||
|
|||
self.alphas_ = alphas_ | |||
with np.errstate(divide='ignore'): | |||
self.criterion_ = n_samples * np.log(mean_squared_error) + K * df | |||
self.criterion_ = n_samples * mean_squared_error/sigma + K * df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me it should be
mean_squared_error / sigma2 + K * df
given that K is 2 or log(n)
ideally everything should be divided by n.
Did you check how it affects the example using BIC and AIC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to divide everything by n then we should have
mean_squared_error / sigma2 + K * df / n_samples
It was a long while ago but I remember getting the same results as the R package from the authors when I used the corrected version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok agreed. can you see why travis is not happy and add a test that compares with R solution? thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reply to #7692. In general, it is hard to estimate sigma2 with a single formula for LASSO. I looked at R implementation of selectiveInference Page 12.
Estimate of error standard deviation. If NULL (default), this is estimated using
the mean squared residual of the full least squares fit when n >= 2p, and using
the standard deviation of y when n < 2p.
So if n >= 2p, it should be mean_squared_error * n_samples / (n_samples - df - 1).
if n < 2p, it should be var(y).
For the diabetes data in travis test, n >= 2p, the first formula should be used. This is probably why travis is not happy.
Also, sigma2 should be moved to the second term as a multiplicative term. Dividing by sigma2 is numerically unstable
also it breaks the tests. Please see travis errors |
@mehmetbasbug @yuachen follow up on #9022 please review if you have time thanks |
The information criterion calculation is not compatible with the original paper
Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192.
APA