Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNT Use check_scalar in AdaBoostRegressor #21605

Merged

Conversation

genvalen
Copy link
Contributor

@genvalen genvalen commented Nov 9, 2021

Reference Issues/PRs

Addresses #20724
#DataUmbrella

What does this implement/fix? Explain your changes.

Summary of changes to AdaBoostRegressor:

  • Add tests to ensure estimator raises proper errors when invalid arguments are passed in.
  • Use the helper function check_scalar from sklearn.utils to validate the scalar parameters.

Test and validation progress:

  • n_estimators
  • learning_rate

References

  1. docs
  2. PR #20723

Any other comments?

@genvalen
Copy link
Contributor Author

genvalen commented Nov 11, 2021

I noticed the BaseWeightBoosting class' fit method has one parameter validation starting on line 114.

        # Check parameters
        if self.learning_rate <= 0:
            raise ValueError("learning_rate must be greater than zero")

Should I delete this one?
Or maybe update it to be consistent with the check_scalar output message?:

f"learning_rate == {self.learning_rate}, must be > 0."

Comment on lines 1082 to 1096
check_scalar(
self.n_estimators,
"n_estimators",
target_type=numbers.Integral,
min_val=1,
include_boundaries="left",
)

check_scalar(
self.learning_rate,
"learning_rate",
target_type=numbers.Real,
min_val=0,
include_boundaries="neither",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I delete this one?

We can move both of these checks into BaseWeightBoosting, and replace the check for self.learning_rate:

if self.learning_rate <= 0:
raise ValueError("learning_rate must be greater than zero")

Copy link
Contributor Author

@genvalen genvalen Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. In that case I will remove the checks I added to AdaBoostClassifier too. They are the exact same checks as what I am about to move into BaseWeightBoosting.

@reshamas
Copy link
Member

@genvalen After discussion with @glemaitre, this PR should be prefixed with "MAINT" (and yes, confirming it does mean for "maintenance")

@genvalen
Copy link
Contributor Author

Okay! So just to be clear, the proper abbreviation is "MAINT" and not "MNT"? Or is either one fine?

@reshamas
Copy link
Member

I am seeing both here:

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All is good for me. LGTM.
Thanks @genvalen

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up!

sklearn/ensemble/tests/test_weight_boosting.py Outdated Show resolved Hide resolved
sklearn/ensemble/tests/test_weight_boosting.py Outdated Show resolved Hide resolved
sklearn/ensemble/tests/test_weight_boosting.py Outdated Show resolved Hide resolved
genvalen and others added 3 commits December 14, 2021 17:27
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan merged commit f21f1d7 into scikit-learn:main Dec 16, 2021
@genvalen genvalen deleted the AdaBoostRegressor_add_check_scaler branch December 17, 2021 03:56
@reshamas
Copy link
Member

reshamas commented Jan 7, 2022

@glemaitre @thomasjpfan @jjerphan @ogrisel
I am looking at review comments on my PR and comparing with @genvalen.
I would like to document some consistency points for the check_scalar PRs:

  1. PR prefix should be MAINT (not MNT)
  2. check_scalar call should include explicitly include name (Ex: name="n_estimators", (not "n_estimators", ))
  3. Interval ranges should use the text must be (not should be)
  4. any others?

@jjerphan
Copy link
Member

jjerphan commented Jan 7, 2022

@glemaitre @thomasjpfan @jjerphan @ogrisel I am looking at review comments on my PR and comparing with @genvalen. I would like to document some consistency points for the check_scalar PRs:

1. PR prefix should be `MAINT` (not `MNT`)

2. `check_scalar` call should include explicitly include `name="n_estimators",`  (not `"n_estimators", `)

3. Interval ranges should use the text `must be` (not `should be`)

4. any others?

Thanks for following up, @reshamas.

1., 2., 3. look good to me. I would also make sure error messages in tests are present.

@reshamas
Copy link
Member

reshamas commented Jan 7, 2022

@jjerphan
I am not sure I understand the below comment. Aren't error messages needed when a test is defined?

I would also make sure error messages in tests are present.

Comment on lines +276 to +282
reg = AdaBoostRegressor(loss="foo")
with pytest.raises(ValueError):
reg.fit(X, y_class)

clf = AdaBoostClassifier(algorithm="foo")
with pytest.raises(ValueError):
clf.fit(X, y_class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reshamas: I should have been clearer.

Here for instance, there's a check for ValueErrors being raised but their error messages aren't checked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we re-open this one? Error messages are important and helpful, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we go: #22144

jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 7, 2022
@reshamas
Copy link
Member

reshamas commented Jan 7, 2022

FYI, for reference, @genvalen
PR #22144

@genvalen
Copy link
Contributor Author

genvalen commented Jan 7, 2022

@reshamas Noted. Thanks! I'll incorporate these consistency points moving forward.

mathijs02 pushed a commit to mathijs02/scikit-learn that referenced this pull request Dec 27, 2022
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants