Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] API Change default multioutput in RegressorMixin.score to keep consistent with metrics.r2_score #13157

Merged
merged 11 commits into from Mar 15, 2019

Conversation

Projects
None yet
4 participants
@qinhanmin2014
Copy link
Member

commented Feb 13, 2019

Closes #12772
Wondering if someone has a better way :)
In the original issue, I tried to ask why we prefer uniform_average, but received no reply. I guess we choose uniform_average to keep consistent with other regression metrics.

qinhanmin2014 added some commits Feb 13, 2019

@qinhanmin2014 qinhanmin2014 added this to the 0.21 milestone Feb 13, 2019

qinhanmin2014 added some commits Feb 13, 2019

@jnothman

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

@qinhanmin2014

This comment has been minimized.

Copy link
Member Author

commented Feb 14, 2019

An alternative might be too require multioutput to be explicit specified in r2_score

You mean adding a multioutput parameter in RegressorMixin.score. Does this violate our API design?

@jnothman

This comment has been minimized.

Copy link
Member

commented Feb 14, 2019

No, I mean that r2_score would no longer accept being called on multioutput without an explicit choice.

@qinhanmin2014

This comment has been minimized.

Copy link
Member Author

commented Feb 14, 2019

No, I mean that r2_score would no longer accept being called on multioutput without an explicit choice.

You mean we'll raise an error for things like r2_score(y_true, y_pred) when y_type is continuous-multioutput? I doubt whether it's a good idea and it's not related to this PR, right?
(1) If we do so, we'll need to update other regression metrics
(2) If we do so, we'll need to intruduce a multioutput parameter in RegressorMixin.score

@jnothman

This comment has been minimized.

Copy link
Member

commented Feb 14, 2019

@qinhanmin2014

This comment has been minimized.

Copy link
Member Author

commented Feb 14, 2019

Well, it solves this problem since users of r2_score would become aware that the equivalence to .score cannot be taken for granted...?

So with your proposal:
(1) We'll need to update other regression metrics.
(2) Users can't use RegressorMixin.score to evaluate multioutput problems?

And this is not backward compatible, I doubt whether it's worthwhile.

@qinhanmin2014

This comment has been minimized.

Copy link
Member Author

commented Feb 14, 2019

(2) Users can't use RegressorMixin.score to evaluate multioutput problems?

Maybe you mean that we should set multioutput explicitly in RegressorMixin.score?

@jnothman

This comment has been minimized.

Copy link
Member

commented Feb 14, 2019

@qinhanmin2014

This comment has been minimized.

Copy link
Member Author

commented Feb 14, 2019

We already do that??? But we don't document it...

But this is actually a bug introduced in #5143 right?

I still doubt whether it's worthwhile to let all the regression metrics which supports multioutput to raise an error when parameter multioutput is not set explicitly (y_type is multioutput). Is this your final decision?

@jnothman

This comment has been minimized.

Copy link
Member

commented Feb 14, 2019

@qinhanmin2014

This comment has been minimized.

Copy link
Member Author

commented Feb 14, 2019

No I suppose it doesn't really solve the problem... Documentation does.

@jnothman Could you please summarize your proposal here? I'm starting to get confused. Thanks.

@qinhanmin2014 qinhanmin2014 added this to To do in Sprint Paris 2019 via automation Feb 27, 2019

@qinhanmin2014

This comment has been minimized.

Copy link
Member Author

commented Mar 11, 2019

I'm surprised to find that we implement the score method in MultiOutputRegressor and are using multioutput='uniform_average' there :)
I prefer to include this is 0.21. @jnothman Is it possible to regard it as a bug fix and change it without a deprecation cycle? If not, what's your proposal here?

@jnothman

This comment has been minimized.

Copy link
Member

commented Mar 12, 2019

Can we stay by documenting the current behaviour?

@jnothman jnothman closed this Mar 12, 2019

@jnothman jnothman reopened this Mar 12, 2019

@qinhanmin2014

This comment has been minimized.

Copy link
Member Author

commented Mar 12, 2019

@jnothman current behavior:
(1)r2_score default multioutput="uniform_average"
(2)MultiOutputRegressor default multioutput="uniform_average"
(3)RegressorMixin (i.e., all other regressors) default multioutput="variance_weighted"
Do you think that's acceptable? (without (2), maybe acceptable)
I think we can regard it as a bug fix because it's erroneously introduced when changing the default value of a parameter.

@jnothman
Copy link
Member

left a comment

Okay. I'm persuaded. I think this is an okay solution, albeit raising lots of warnings for the next while.

Show resolved Hide resolved doc/whats_new/v0.21.rst Outdated
Show resolved Hide resolved doc/whats_new/v0.21.rst Outdated
@qinhanmin2014

This comment has been minimized.

Copy link
Member Author

commented Mar 12, 2019

ready for review @jnothman

Show resolved Hide resolved sklearn/base.py Outdated
Show resolved Hide resolved sklearn/base.py Outdated
Show resolved Hide resolved sklearn/base.py Outdated
@qinhanmin2014

This comment has been minimized.

Copy link
Member Author

commented Mar 13, 2019

ping related people here: @amueller @agramfort @ogrisel
and maybe @adrinjalali

@agramfort agramfort changed the title API Change default multioutput in RegressorMixin.score to keep consistent with metrics.r2_score [MRG+1] API Change default multioutput in RegressorMixin.score to keep consistent with metrics.r2_score Mar 15, 2019

@qinhanmin2014 qinhanmin2014 merged commit 73e2ecf into scikit-learn:master Mar 15, 2019

9 checks passed

LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 92.37%)
Details
codecov/project 92.37% (+<.01%) compared to 8544097
Details

Sprint Paris 2019 automation moved this from To do to Done Mar 15, 2019

@qinhanmin2014 qinhanmin2014 removed this from Done in Sprint Paris 2019 Mar 15, 2019

@qinhanmin2014 qinhanmin2014 deleted the qinhanmin2014:RegressorMixin branch Mar 17, 2019

@amueller

This comment has been minimized.

Copy link
Member

commented Mar 18, 2019

Thanks! Maybe we should tell the user how to avoid this message? It's a bit ugly unfortunately. They can specify it directly when using cv or grid-search.
If they use the score method they would need to import r2 directly.

@qinhanmin2014

This comment has been minimized.

Copy link
Member Author

commented Mar 19, 2019

Thanks! Maybe we should tell the user how to avoid this message? It's a bit ugly unfortunately. They can specify it directly when using cv or grid-search. If they use the score method they would need to import r2 directly.

I've opened #13477 @amueller

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.