-
-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG Fix instability issue of ARDRegression (with speedup) #16849
BUG Fix instability issue of ARDRegression (with speedup) #16849
Conversation
# With the inputs above, ARDRegression prunes both of the two coefficients | ||
# in the first iteration. Hence, the expected shape of `sigma_` is (0, 0). | ||
assert clf.sigma_.shape == (0, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the target is constant, it makes sense to me that both coefficients should bet set to zero. So I think this is a bug fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By setting clf.sigma_ = np.array([])
, the later call to clf.predict(X, return_std=True)
would return a different result for the std
when compared to master. Is this okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's a bug fix. Sigma should be nothing when keep_lambda is all false.
You might want to slightly delay your review @thomasjpfan , I'm going to update this for the n_features > n_samples case
@pytest.mark.parametrize('seed', range(100)) | ||
def test_ard_accuracy_on_easy_problem(seed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test should now run properly on many different seeds (there was a 7% failure rate before), and with a much higher precision
sklearn/linear_model/_bayes.py
Outdated
gram = np.dot(X_keep.T, X_keep) | ||
eye = np.eye(gram.shape[0]) | ||
sigma_inv = lambda_[keep_lambda] * eye + alpha_ * gram | ||
sigma_ = pinvh(sigma_inv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that's basically the main fix. Instead of relying on the woodbury formula, we just directly invert the matrix (called S_N in the ref). This seems to be much more stable and also much faster because here we invert a matrix that is n_features x features
whereas in master the inversion is on a n_samples x n_samples
.
Strangely enough, this somewhat reverts a commit from 10 years ago b2d0a77, whose goal was to make things faster but I'm not sure it did. @vmichel if you're still around your input would be greatly appreciated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could it be that numpy operations are now much faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it, the bottleneck is the matrix inversion here
@@ -12,7 +12,7 @@ | |||
from ._base import LinearModel, _rescale_data | |||
from ..base import RegressorMixin | |||
from ..utils.extmath import fast_logdet | |||
from ..utils.fixes import pinvh | |||
from scipy.linalg import pinvh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the scipy's backport to we can remove it now
Pinging @rth @jnothman @thomasjpfan @glemaitre since you took a look at the previous PR. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to get this one in. It looks good to me, I think.
doc/whats_new/v0.23.rst
Outdated
@@ -290,6 +290,10 @@ Changelog | |||
of strictly inferior for maximum of `absgrad` and `tol` in `utils.optimize._newton_cg`. | |||
:pr:`16266` by :user:`Rushabh Vasani <rushabh-v>`. | |||
|
|||
- |Fix| |Efficiency| :class:`linear_model.ARDRegression` is more stable and | |||
much faster. It can now scale to hundreds of thousands of samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also mention https://github.com/scikit-learn/scikit-learn/pull/16849/files#r403760030 here maybe?
sklearn/linear_model/_bayes.py
Outdated
gram = np.dot(X_keep.T, X_keep) | ||
eye = np.eye(gram.shape[0]) | ||
sigma_inv = lambda_[keep_lambda] * eye + alpha_ * gram | ||
sigma_ = pinvh(sigma_inv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could it be that numpy operations are now much faster?
I've updated the PR so that the woodbury formula is still used when CC @amueller |
Since the results of the two methods are not the same, should it be exposed as kind of an |
they are the same, we have a test for that |
Here's where I'm confused: You changed the solver block, and added the |
The only thing I added about
Yes (it's not strictly speaking a solver it's more of a way of computing the inverse of a specific matrix. In the end, the same
It's not just a performance improvement, it's also a stability improvement. It's more stable to invert |
I'm happy with this one, another review maybe? @thomasjpfan wanna have another look? |
Sorry to bother if I am wrong. However, I have noticed that the plot in the documentation of ARD is different with sklearn 0.23.X compared to version 0.22.X, and I wonder if this commit introduced the issue (maybe it is the expected behavior). cf. #20740 or directly the last plot shown in this example in the documentation and embedded in my post below. |
I open a new issue such that this problem should be investigated. |
This should fix #15186
Closes #16102
This comes with a significant speed up: on master the estimator can only scale to a few hundreds of samples. This PR can comfortably fit 100k samples in about 1 sec.