New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 2] add sample_weight support to Pipeline.score #7723

Merged
merged 6 commits into from Oct 29, 2016

Conversation

Projects
None yet
4 participants
@kmike
Contributor

kmike commented Oct 21, 2016

Many estimators support sample_weight argument in their .score function - e.g. default .score implementation in ClassifierMixin provides this feature. In this PR kwargs support is added to Pipeline.score, similar to **fit_params in Pipeline.fit; this allows to pass sample_weight to pipeline.score(...).

Show outdated Hide outdated sklearn/tests/test_pipeline.py
pipe.fit(X, y=None)
assert_equal(pipe.score(X), 3)
assert_equal(pipe.score(X, y=None), 3)
assert_equal(pipe.score(X, sample_weight=np.array([2, 3])), 8)

This comment has been minimized.

@raghavrv

raghavrv Oct 23, 2016

Member

Could you also check if it raises a sensible error message if the underlying estimator does not support sample_weight in it's scoring?

@raghavrv

raghavrv Oct 23, 2016

Member

Could you also check if it raises a sensible error message if the underlying estimator does not support sample_weight in it's scoring?

This comment has been minimized.

@kmike

kmike Oct 23, 2016

Contributor

A good call, I've added a check for TypeError.

I haven't added error message check because the error is a standard TypeError for missing keyword arguments - it may potentially vary across Python versions. Error messages for similar TypeErrors are not checked in other test_pipeline.py tests.

@kmike

kmike Oct 23, 2016

Contributor

A good call, I've added a check for TypeError.

I haven't added error message check because the error is a standard TypeError for missing keyword arguments - it may potentially vary across Python versions. Error messages for similar TypeErrors are not checked in other test_pipeline.py tests.

This comment has been minimized.

@raghavrv

raghavrv Oct 23, 2016

Member

That makes sense but are there snippets of the error message that would be common across the versions? Something like unexpected.*sample_weight that would surely be present in all the versions?

(assert_raises_regexp)?

@raghavrv

raghavrv Oct 23, 2016

Member

That makes sense but are there snippets of the error message that would be common across the versions? Something like unexpected.*sample_weight that would surely be present in all the versions?

(assert_raises_regexp)?

@raghavrv

Besides, looks okay to me. @GaelVaroquaux @amueller @jnothman Do you approve this change?

Show outdated Hide outdated sklearn/tests/test_pipeline.py
@@ -227,6 +227,7 @@ def test_pipeline_score_params():
assert_equal(pipe.score(X), 3)
assert_equal(pipe.score(X, y=None), 3)
assert_equal(pipe.score(X, sample_weight=np.array([2, 3])), 8)
assert_raises(TypeError, pipe.score, X, foo='bar')

This comment has been minimized.

@raghavrv

raghavrv Oct 23, 2016

Member

I think testing for an explicit error message would be better... (assert_raise_message)

@raghavrv

raghavrv Oct 23, 2016

Member

I think testing for an explicit error message would be better... (assert_raise_message)

Show outdated Hide outdated sklearn/tests/test_pipeline.py
assert_equal(pipe.score(X), 3)
assert_equal(pipe.score(X, y=None), 3)
assert_equal(pipe.score(X, sample_weight=np.array([2, 3])), 8)
assert_raises(TypeError, pipe.score, X, foo='bar')

This comment has been minimized.

@amueller

amueller Oct 24, 2016

Member

please use assert_raise_message. Otherwise LGTM

@amueller

amueller Oct 24, 2016

Member

please use assert_raise_message. Otherwise LGTM

@amueller amueller changed the title from add sample_weight support to Pipeline.score to [MRG + 1] add sample_weight support to Pipeline.score Oct 24, 2016

@amueller amueller added this to the 0.19 milestone Oct 24, 2016

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Oct 24, 2016

Member

LGTM

Member

amueller commented Oct 24, 2016

LGTM

@kmike

This comment has been minimized.

Show comment
Hide comment
@kmike

kmike Oct 24, 2016

Contributor

Done! I've also added a similar test for pipe.fit(X,y, **fit_params), errors were untested for them. Is it OK to have it in the same PR?

Contributor

kmike commented Oct 24, 2016

Done! I've also added a similar test for pipe.fit(X,y, **fit_params), errors were untested for them. Is it OK to have it in the same PR?

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Oct 24, 2016

Member

Is it OK to have it in the same PR?

Thanks for that. Please do!

Member

raghavrv commented Oct 24, 2016

Is it OK to have it in the same PR?

Thanks for that. Please do!

assert_raise_message(
TypeError,
"fit() got an unexpected keyword argument 'bad'",
pipe.fit, None, None, clf__bad=True

This comment has been minimized.

@raghavrv

raghavrv Oct 24, 2016

Member

+1 thanks!

@raghavrv

raghavrv Oct 24, 2016

Member

+1 thanks!

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Oct 24, 2016

Member

I think this could be mentioned in whatsnew.

Member

raghavrv commented Oct 24, 2016

I think this could be mentioned in whatsnew.

Show outdated Hide outdated sklearn/pipeline.py
@@ -481,7 +481,7 @@ def _inverse_transform(self, X):
return Xt
@if_delegate_has_method(delegate='_final_estimator')
def score(self, X, y=None):
def score(self, X, y=None, **score_params):

This comment has been minimized.

@jnothman

jnothman Oct 26, 2016

Member

I'd be tempted to be more conservative and make this only about sample_weight. Is there no prospect of allowing other kwargs with the __ magic to be passed to the transformers along the way?

@jnothman

jnothman Oct 26, 2016

Member

I'd be tempted to be more conservative and make this only about sample_weight. Is there no prospect of allowing other kwargs with the __ magic to be passed to the transformers along the way?

This comment has been minimized.

@amueller

amueller Oct 26, 2016

Member

Sorry, I'm not sure I'm following.

@amueller

amueller Oct 26, 2016

Member

Sorry, I'm not sure I'm following.

This comment has been minimized.

@jnothman

jnothman Oct 26, 2016

Member

What I'm saying is that, despite the PR description referring to fit_params, this behaves quite differently. fit_params is routed to each step according to __-delimited namespaces. By introducing **score_params that are all routed to the score function only, we are promising that we will never do anything more sophisticated with any named arguments to score. I'm not convinced we have a use case that motivates such a general implementation, when we could just accept sample_weight=None.

@jnothman

jnothman Oct 26, 2016

Member

What I'm saying is that, despite the PR description referring to fit_params, this behaves quite differently. fit_params is routed to each step according to __-delimited namespaces. By introducing **score_params that are all routed to the score function only, we are promising that we will never do anything more sophisticated with any named arguments to score. I'm not convinced we have a use case that motivates such a general implementation, when we could just accept sample_weight=None.

This comment has been minimized.

@amueller

amueller Oct 27, 2016

Member

Makes sense. I agree.

@amueller

amueller Oct 27, 2016

Member

Makes sense. I agree.

This comment has been minimized.

@kmike

kmike Oct 27, 2016

Contributor

The difference with **fit_params is that transformers can have .fit method, but there is only a single .score method - that's why I went with **score_params without __ handling. But I see the problem now.

So did I get it right: add sample_weight=None argument, don't pass sample_weight to .score if it is None (as not all score methods suport sample_weight), and in the future scikit-learn can add support for __-delimeted keyword arguments which are passed to .transform methods?

score(X, y=None, sample_weight=None) signature has a small issue: it is not possible to distinguish between passing sample_weight=None and not passing sample_weight; it is technically different if estimator.score doesn't support sample_weight argument. Should I handle this case by making signature .score(X, y=None, **kwargs), but documenting it as sample_weight=None?

@kmike

kmike Oct 27, 2016

Contributor

The difference with **fit_params is that transformers can have .fit method, but there is only a single .score method - that's why I went with **score_params without __ handling. But I see the problem now.

So did I get it right: add sample_weight=None argument, don't pass sample_weight to .score if it is None (as not all score methods suport sample_weight), and in the future scikit-learn can add support for __-delimeted keyword arguments which are passed to .transform methods?

score(X, y=None, sample_weight=None) signature has a small issue: it is not possible to distinguish between passing sample_weight=None and not passing sample_weight; it is technically different if estimator.score doesn't support sample_weight argument. Should I handle this case by making signature .score(X, y=None, **kwargs), but documenting it as sample_weight=None?

This comment has been minimized.

@raghavrv

raghavrv Oct 27, 2016

Member

Or we could have the signature as ..., sample_weight=None, pass **score_params to score method and have it documented that it is applicable only if the underlying estimator supports it...

@raghavrv

raghavrv Oct 27, 2016

Member

Or we could have the signature as ..., sample_weight=None, pass **score_params to score method and have it documented that it is applicable only if the underlying estimator supports it...

This comment has been minimized.

@kmike

kmike Oct 27, 2016

Contributor

If we pass **score_params to score function people will start using it even if it is not documented, why should we do that?

@kmike

kmike Oct 27, 2016

Contributor

If we pass **score_params to score function people will start using it even if it is not documented, why should we do that?

This comment has been minimized.

@raghavrv

raghavrv Oct 27, 2016

Member

If we pass **score_params to score function people

Ah wait we are talking about different score functions ;)

I suggested

  • passing **score_params to self.steps[-1][-1].score(Xt, y, **score_params) so it doesn't raise an error if sample_weights is not supported by the underlying estimator's score.
  • passing sample_weights=None to Pipeline.score(..., sample_weights=None) to make it clear that we do not support passing any other parameters, except sample_weights.
  • documenting at Pipeline.score that sample_weights can only be passed if the underlying estimator supports it...

I think this is exactly what you suggested already. Sorry for the noise :)

@raghavrv

raghavrv Oct 27, 2016

Member

If we pass **score_params to score function people

Ah wait we are talking about different score functions ;)

I suggested

  • passing **score_params to self.steps[-1][-1].score(Xt, y, **score_params) so it doesn't raise an error if sample_weights is not supported by the underlying estimator's score.
  • passing sample_weights=None to Pipeline.score(..., sample_weights=None) to make it clear that we do not support passing any other parameters, except sample_weights.
  • documenting at Pipeline.score that sample_weights can only be passed if the underlying estimator supports it...

I think this is exactly what you suggested already. Sorry for the noise :)

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Oct 27, 2016

Member

I think using sample_weight and also passing it if it is not None is a good solution.

Member

amueller commented Oct 27, 2016

I think using sample_weight and also passing it if it is not None is a good solution.

@kmike

This comment has been minimized.

Show comment
Hide comment
@kmike

kmike Oct 28, 2016

Contributor

I've implemented the simplest solution - don't pass sample_weight if it is None. Does it look ok?

Contributor

kmike commented Oct 28, 2016

I've implemented the simplest solution - don't pass sample_weight if it is None. Does it look ok?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 29, 2016

Member

Could we bother you for a what's new entry? LGTM

Member

jnothman commented Oct 29, 2016

Could we bother you for a what's new entry? LGTM

@jnothman jnothman changed the title from [MRG + 1] add sample_weight support to Pipeline.score to [MRG + 2] add sample_weight support to Pipeline.score Oct 29, 2016

@kmike

This comment has been minimized.

Show comment
Hide comment
@kmike

kmike Oct 29, 2016

Contributor

@jnothman yep!

Contributor

kmike commented Oct 29, 2016

@jnothman yep!

@jnothman jnothman merged commit 10323f5 into scikit-learn:master Oct 29, 2016

2 of 3 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 29, 2016

Member

thanks!

Member

jnothman commented Oct 29, 2016

thanks!

sergeyf added a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

[MRG + 2] ENH add sample_weight support to Pipeline.score (#7723)
* add sample_weight support to Pipeline.score

* TST check that pipe.score raises TypeError for unsorrported arguments

* TST check error message for invalid score_params in pipeline.score

* TST check that pipe.fit raises TypeError when argument is not supported

* only support sample_weight argument

* whats_new: mention sample_weight in Pipeline.score

afiodorov added a commit to unravelin/scikit-learn that referenced this pull request Apr 25, 2017

[MRG + 2] ENH add sample_weight support to Pipeline.score (#7723)
* add sample_weight support to Pipeline.score

* TST check that pipe.score raises TypeError for unsorrported arguments

* TST check error message for invalid score_params in pipeline.score

* TST check that pipe.fit raises TypeError when argument is not supported

* only support sample_weight argument

* whats_new: mention sample_weight in Pipeline.score

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG + 2] ENH add sample_weight support to Pipeline.score (#7723)
* add sample_weight support to Pipeline.score

* TST check that pipe.score raises TypeError for unsorrported arguments

* TST check error message for invalid score_params in pipeline.score

* TST check that pipe.fit raises TypeError when argument is not supported

* only support sample_weight argument

* whats_new: mention sample_weight in Pipeline.score

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG + 2] ENH add sample_weight support to Pipeline.score (#7723)
* add sample_weight support to Pipeline.score

* TST check that pipe.score raises TypeError for unsorrported arguments

* TST check error message for invalid score_params in pipeline.score

* TST check that pipe.fit raises TypeError when argument is not supported

* only support sample_weight argument

* whats_new: mention sample_weight in Pipeline.score

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG + 2] ENH add sample_weight support to Pipeline.score (#7723)
* add sample_weight support to Pipeline.score

* TST check that pipe.score raises TypeError for unsorrported arguments

* TST check error message for invalid score_params in pipeline.score

* TST check that pipe.fit raises TypeError when argument is not supported

* only support sample_weight argument

* whats_new: mention sample_weight in Pipeline.score
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment