Skip to content

Conversation

fhaselbeck
Copy link
Contributor

Fixes #18605

ARDRegression is crashing during prediction when normalize is set true as in predict self.X_offset_ and self.X_scale_ are used to transform but are not set in fit method (in contrast to BayesianRidge).

Added this part to fit just like in BayesianRidge:

self.X_offset_ = X_offset_
self.X_scale_ = X_scale_

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @fhaselbeck , looks good.

We would need a non-regression test for this. Also, we'll need to document the attributes in the Attributes section of the docstring.

@glemaitre
Copy link
Member

FYI I posted a non-regression test in the associated issue.

@fhaselbeck
Copy link
Contributor Author

Just added the comments in the Attributes section.

I saw that there is already a test for return_std in test_bayes.py, so I did not add another one.

def test_return_std():
    # Test return_std option for both Bayesian regressors
    def f(X):
        return np.dot(X, w) + b

    def f_noise(X, noise_mult):
        return f(X) + np.random.randn(X.shape[0]) * noise_mult

    d = 5
    n_train = 50
    n_test = 10

    w = np.array([1.0, 0.0, 1.0, -1.0, 0.0])
    b = 1.0

    X = np.random.random((n_train, d))
    X_test = np.random.random((n_test, d))

    for decimal, noise_mult in enumerate([1, 0.1, 0.01]):
        y = f_noise(X, noise_mult)

        m1 = BayesianRidge()
        m1.fit(X, y)
        y_mean1, y_std1 = m1.predict(X_test, return_std=True)
        assert_array_almost_equal(y_std1, noise_mult, decimal=decimal)

        m2 = ARDRegression()
        m2.fit(X, y)
        y_mean2, y_std2 = m2.predict(X_test, return_std=True)
        assert_array_almost_equal(y_std2, noise_mult, decimal=decimal)

@NicolasHug
Copy link
Member

There seem to be some linting errors (probably white spaces). Could you please check them?

We'd still need a non-regression test, for example as proposed by @glemaitre in the original issue. Alternatively you mayy parametrize the test you mentionned with

@pytest.mark.parametrize('normalize', (True, False))

(We can even parametrize it with the Estimator class)

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming the comments by @NicolasHug are addressed.

if normalized, offset subtracted for centering data to mean zero
X_scale_ : float
if normalized, scaling parameter for division
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add the .. versionadded:: 0.24 directive for those attributes? It's a bugfix, they should always had been there so I am not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we don't want them since it's indeed a bugfix

BTW @fhaselbeck we'll also need a bugfix entry in doc/whats_new/v0.24.rst ;)

@NicolasHug NicolasHug added this to the 0.24 milestone Oct 15, 2020
@cmarmo
Copy link
Contributor

cmarmo commented Oct 27, 2020

Hi @fhaselbeck, thanks for your work so far. Will you be able to address comments? Please let us know if you need any help.

@fhaselbeck
Copy link
Contributor Author

Sorry. I am currently very busy with the preparation of a paper. I will finalize this stuff afterwards, but it may take some time.

@reshamas
Copy link
Member

@fhaselbeck @ogrisel
@Mariam-ke and I are reviewing this PR for learning purposes.

1) test proposed

We looked at the request for a test. Is this what is needed?
In file: sklearn/linear_model/tests/test_bayes.py, here is a proposed test:

def test_bayesian_ard_object(): 
    # Test BayesianRegression ARD classifier 
    X = np.array([[1], [2], [6], [8], [10]]) 
    y = np.array([1, 2, 6, 8, 10]) 
    clf = ARDRegression(normalize=True) 
    clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2]) 
    clf.predict([[1, 1]], return_std=True) 
    
    # Check that the model could approximately learn the identity function 
    test = [[1], [3], [4]] 
    assert_array_almost_equal(clf.predict(test), [1, 3, 4], 2)

2) bug fix entry proposed

in file: doc/whats_new/v0.24.rst

Here is a proposed entry:

:mod:`sklearn.linear_model`

- |Fix| Fixes bug in :class:`linear_model.ARDRegression` where ARDRegression is crashing during prediction when `normalize=True`. `self.X_offset_` and `self.X_scale_` are used to transform but are not set in fit method. :pr:`18607` by :user:`fhaselbeck`.

@NicolasHug
Copy link
Member

@reshamas @Mariam-ke regarding the test, a simpler one is available on the original issue as mentioned above. Yours is good but we don't need this part:

     # Check that the model could approximately learn the identity function 
    test = [[1], [3], [4]] 
    assert_array_almost_equal(clf.predict(test), [1, 3, 4], 2)

because the failure wasn't about prediction quality.

Also a more explicit name and comment would be something like:

def test_ard_reg_predict_normalize_true(): 
    # Make sure predict doesn't raise any error when normalize=True and return_std=True
	# Non regression test for https://github.com/scikit-learn/scikit-learn/issues/18605

Note the mention that this is a non-regression test, with reference to the original issue

Re whatsnew message: we usually don't explain the details of the fix (this is often too complex, and irrelevant to users):

- |Fix| Fixed a bug that would result in an error when calling `predict(return_std=True)` on :class:`linear_model.ARDRegression`, when `normalize=True`. :pr:`18607` by :user:`fhaselbeck`.

@glemaitre glemaitre self-assigned this Oct 27, 2020
@glemaitre glemaitre changed the title fixed issue #18605, set self.X_offset_ and self.X_scale_ in fit metho… FIX add attributes X_offset_ and X_scale_ in ARDRegression needed in predict Oct 27, 2020
@glemaitre
Copy link
Member

I added the missing test and the what's new entry. When the CIs are green I will merge.

@reshamas
Copy link
Member

Thanks @NicolasHug for the explanation.

@glemaitre glemaitre merged commit bb9a8f0 into scikit-learn:master Oct 28, 2020
@fhaselbeck
Copy link
Contributor Author

Thanks a lot for finishing up and sorry that I had no time to do it on my own

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ARDRegression crashed during prediction
6 participants