Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC custom scoring usage GridSearchCV and RandomizedSearchCV #28694

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

siddu1324
Copy link

Reference Issues/PRs

References #28671

What does this implement/fix? Explain your changes.

This PR adds documentation examples for using custom scoring functions with GridSearchCV and RandomizedSearchCV, specifically illustrating how to use make_scorer for metrics requiring additional parameters, like d2_pinball_score. This enhancement addresses user requests for clearer guidance on applying custom scorers in model selection.

Copy link

github-actions bot commented Mar 25, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 039d1b1. Link to the linter CI: here

@jeremiedbb
Copy link
Member

Thanks for the PR @siddu1324. I'm not sure the notes is the right place for this added doc. I don't think it would have helped figuring out how to correctly use the scoring parameter.

I think improving the scoring parameter description would have a better impact. ping @ogrisel who originally answered in the linked issue, wdyt ?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to @jeremiedbb's remark above about the location of the snippet in docstring, there are several problem:

  • the linear regression model does not accept an alpha parameter. Calling fit on a dataset generated by make_regression would raise:
ValueError: Invalid parameter 'alpha' for estimator LinearRegression(). Valid parameters are: ['copy_X', 'fit_intercept', 'n_jobs', 'positive'].
  • I would also rather not tune a hyperparameter that has the same name as the metric parameter to avoid introducing any confusion;
  • furthermore, it's weird to tune a linear regression model that estimates the expected value of the target variable conditionally on the features on a metric that assess it's ability to estimate a 0.95 quantile. I would instead a quantile estimator for this loss or alternatively use another parametrized metric such as fbeta_score on a simple classifier such as LogisticRegression.

>>> from scipy.stats import expon
>>> param_dist = {'alpha': expon()}
>>> rnd_search = RandomizedSearchCV(LinearRegression(),
param_distributions=param_dist, scoring=custom_scorer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this is the docstring of the GridSearchCV class but this code snippet shows how to use the RandomizedSearchCV instead.

A similar can be added in the inline examples section of each of those classes but should be adapted accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants