-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
FIX add attributes X_offset_ and X_scale_ in ARDRegression needed in predict #18607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX add attributes X_offset_ and X_scale_ in ARDRegression needed in predict #18607
Conversation
…in fit method of ARDRegression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @fhaselbeck , looks good.
We would need a non-regression test for this. Also, we'll need to document the attributes in the Attributes
section of the docstring.
FYI I posted a non-regression test in the associated issue. |
…and ARDRegression
Just added the comments in the I saw that there is already a test for def test_return_std():
# Test return_std option for both Bayesian regressors
def f(X):
return np.dot(X, w) + b
def f_noise(X, noise_mult):
return f(X) + np.random.randn(X.shape[0]) * noise_mult
d = 5
n_train = 50
n_test = 10
w = np.array([1.0, 0.0, 1.0, -1.0, 0.0])
b = 1.0
X = np.random.random((n_train, d))
X_test = np.random.random((n_test, d))
for decimal, noise_mult in enumerate([1, 0.1, 0.01]):
y = f_noise(X, noise_mult)
m1 = BayesianRidge()
m1.fit(X, y)
y_mean1, y_std1 = m1.predict(X_test, return_std=True)
assert_array_almost_equal(y_std1, noise_mult, decimal=decimal)
m2 = ARDRegression()
m2.fit(X, y)
y_mean2, y_std2 = m2.predict(X_test, return_std=True)
assert_array_almost_equal(y_std2, noise_mult, decimal=decimal) |
There seem to be some linting errors (probably white spaces). Could you please check them? We'd still need a non-regression test, for example as proposed by @glemaitre in the original issue. Alternatively you mayy parametrize the test you mentionned with @pytest.mark.parametrize('normalize', (True, False)) (We can even parametrize it with the Estimator class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming the comments by @NicolasHug are addressed.
sklearn/linear_model/_bayes.py
Outdated
if normalized, offset subtracted for centering data to mean zero | ||
X_scale_ : float | ||
if normalized, scaling parameter for division |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add the .. versionadded:: 0.24
directive for those attributes? It's a bugfix, they should always had been there so I am not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we don't want them since it's indeed a bugfix
BTW @fhaselbeck we'll also need a bugfix entry in doc/whats_new/v0.24.rst
;)
Hi @fhaselbeck, thanks for your work so far. Will you be able to address comments? Please let us know if you need any help. |
Sorry. I am currently very busy with the preparation of a paper. I will finalize this stuff afterwards, but it may take some time. |
@fhaselbeck @ogrisel 1) test proposedWe looked at the request for a test. Is this what is needed? def test_bayesian_ard_object():
# Test BayesianRegression ARD classifier
X = np.array([[1], [2], [6], [8], [10]])
y = np.array([1, 2, 6, 8, 10])
clf = ARDRegression(normalize=True)
clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
clf.predict([[1, 1]], return_std=True)
# Check that the model could approximately learn the identity function
test = [[1], [3], [4]]
assert_array_almost_equal(clf.predict(test), [1, 3, 4], 2) 2) bug fix entry proposedin file: doc/whats_new/v0.24.rst Here is a proposed entry:
|
@reshamas @Mariam-ke regarding the test, a simpler one is available on the original issue as mentioned above. Yours is good but we don't need this part: # Check that the model could approximately learn the identity function
test = [[1], [3], [4]]
assert_array_almost_equal(clf.predict(test), [1, 3, 4], 2) because the failure wasn't about prediction quality. Also a more explicit name and comment would be something like: def test_ard_reg_predict_normalize_true():
# Make sure predict doesn't raise any error when normalize=True and return_std=True
# Non regression test for https://github.com/scikit-learn/scikit-learn/issues/18605 Note the mention that this is a non-regression test, with reference to the original issue Re whatsnew message: we usually don't explain the details of the fix (this is often too complex, and irrelevant to users):
|
I added the missing test and the what's new entry. When the CIs are green I will merge. |
Thanks @NicolasHug for the explanation. |
Thanks a lot for finishing up and sorry that I had no time to do it on my own |
Fixes #18605
ARDRegression is crashing during prediction when normalize is set true as in
predict
self.X_offset_
andself.X_scale_
are used to transform but are not set in fit method (in contrast to BayesianRidge).Added this part to fit just like in
BayesianRidge
: