Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 2] add sample_weight support to Pipeline.score #7723

Merged
merged 6 commits into from Oct 29, 2016

Conversation

@kmike
Copy link
Contributor

@kmike kmike commented Oct 21, 2016

Many estimators support sample_weight argument in their .score function - e.g. default .score implementation in ClassifierMixin provides this feature. In this PR kwargs support is added to Pipeline.score, similar to **fit_params in Pipeline.fit; this allows to pass sample_weight to pipeline.score(...).

pipe.fit(X, y=None)
assert_equal(pipe.score(X), 3)
assert_equal(pipe.score(X, y=None), 3)
assert_equal(pipe.score(X, sample_weight=np.array([2, 3])), 8)

This comment has been minimized.

@raghavrv

raghavrv Oct 23, 2016
Member

Could you also check if it raises a sensible error message if the underlying estimator does not support sample_weight in it's scoring?

This comment has been minimized.

@kmike

kmike Oct 23, 2016
Author Contributor

A good call, I've added a check for TypeError.

I haven't added error message check because the error is a standard TypeError for missing keyword arguments - it may potentially vary across Python versions. Error messages for similar TypeErrors are not checked in other test_pipeline.py tests.

This comment has been minimized.

@raghavrv

raghavrv Oct 23, 2016
Member

That makes sense but are there snippets of the error message that would be common across the versions? Something like unexpected.*sample_weight that would surely be present in all the versions?

(assert_raises_regexp)?

Copy link
Member

@raghavrv raghavrv left a comment

Besides, looks okay to me. @GaelVaroquaux @amueller @jnothman Do you approve this change?

@@ -227,6 +227,7 @@ def test_pipeline_score_params():
assert_equal(pipe.score(X), 3)
assert_equal(pipe.score(X, y=None), 3)
assert_equal(pipe.score(X, sample_weight=np.array([2, 3])), 8)
assert_raises(TypeError, pipe.score, X, foo='bar')

This comment has been minimized.

@raghavrv

raghavrv Oct 23, 2016
Member

I think testing for an explicit error message would be better... (assert_raise_message)

assert_equal(pipe.score(X), 3)
assert_equal(pipe.score(X, y=None), 3)
assert_equal(pipe.score(X, sample_weight=np.array([2, 3])), 8)
assert_raises(TypeError, pipe.score, X, foo='bar')

This comment has been minimized.

@amueller

amueller Oct 24, 2016
Member

please use assert_raise_message. Otherwise LGTM

@amueller amueller changed the title add sample_weight support to Pipeline.score [MRG + 1] add sample_weight support to Pipeline.score Oct 24, 2016
@amueller amueller added this to the 0.19 milestone Oct 24, 2016
@amueller
Copy link
Member

@amueller amueller commented Oct 24, 2016

LGTM

@kmike
Copy link
Contributor Author

@kmike kmike commented Oct 24, 2016

Done! I've also added a similar test for pipe.fit(X,y, **fit_params), errors were untested for them. Is it OK to have it in the same PR?

@raghavrv
Copy link
Member

@raghavrv raghavrv commented Oct 24, 2016

Is it OK to have it in the same PR?

Thanks for that. Please do!

assert_raise_message(
TypeError,
"fit() got an unexpected keyword argument 'bad'",
pipe.fit, None, None, clf__bad=True

This comment has been minimized.

@raghavrv

raghavrv Oct 24, 2016
Member

+1 thanks!

@raghavrv
Copy link
Member

@raghavrv raghavrv commented Oct 24, 2016

I think this could be mentioned in whatsnew.

@@ -481,7 +481,7 @@ def _inverse_transform(self, X):
return Xt

@if_delegate_has_method(delegate='_final_estimator')
def score(self, X, y=None):
def score(self, X, y=None, **score_params):

This comment has been minimized.

@jnothman

jnothman Oct 26, 2016
Member

I'd be tempted to be more conservative and make this only about sample_weight. Is there no prospect of allowing other kwargs with the __ magic to be passed to the transformers along the way?

This comment has been minimized.

@amueller

amueller Oct 26, 2016
Member

Sorry, I'm not sure I'm following.

This comment has been minimized.

@jnothman

jnothman Oct 26, 2016
Member

What I'm saying is that, despite the PR description referring to fit_params, this behaves quite differently. fit_params is routed to each step according to __-delimited namespaces. By introducing **score_params that are all routed to the score function only, we are promising that we will never do anything more sophisticated with any named arguments to score. I'm not convinced we have a use case that motivates such a general implementation, when we could just accept sample_weight=None.

This comment has been minimized.

@amueller

amueller Oct 27, 2016
Member

Makes sense. I agree.

This comment has been minimized.

@kmike

kmike Oct 27, 2016
Author Contributor

The difference with **fit_params is that transformers can have .fit method, but there is only a single .score method - that's why I went with **score_params without __ handling. But I see the problem now.

So did I get it right: add sample_weight=None argument, don't pass sample_weight to .score if it is None (as not all score methods suport sample_weight), and in the future scikit-learn can add support for __-delimeted keyword arguments which are passed to .transform methods?

score(X, y=None, sample_weight=None) signature has a small issue: it is not possible to distinguish between passing sample_weight=None and not passing sample_weight; it is technically different if estimator.score doesn't support sample_weight argument. Should I handle this case by making signature .score(X, y=None, **kwargs), but documenting it as sample_weight=None?

This comment has been minimized.

@raghavrv

raghavrv Oct 27, 2016
Member

Or we could have the signature as ..., sample_weight=None, pass **score_params to score method and have it documented that it is applicable only if the underlying estimator supports it...

This comment has been minimized.

@kmike

kmike Oct 27, 2016
Author Contributor

If we pass **score_params to score function people will start using it even if it is not documented, why should we do that?

This comment has been minimized.

@raghavrv

raghavrv Oct 27, 2016
Member

If we pass **score_params to score function people

Ah wait we are talking about different score functions ;)

I suggested

  • passing **score_params to self.steps[-1][-1].score(Xt, y, **score_params) so it doesn't raise an error if sample_weights is not supported by the underlying estimator's score.
  • passing sample_weights=None to Pipeline.score(..., sample_weights=None) to make it clear that we do not support passing any other parameters, except sample_weights.
  • documenting at Pipeline.score that sample_weights can only be passed if the underlying estimator supports it...

I think this is exactly what you suggested already. Sorry for the noise :)

@amueller
Copy link
Member

@amueller amueller commented Oct 27, 2016

I think using sample_weight and also passing it if it is not None is a good solution.

@kmike
Copy link
Contributor Author

@kmike kmike commented Oct 28, 2016

I've implemented the simplest solution - don't pass sample_weight if it is None. Does it look ok?

@jnothman
Copy link
Member

@jnothman jnothman commented Oct 29, 2016

Could we bother you for a what's new entry? LGTM

@jnothman jnothman changed the title [MRG + 1] add sample_weight support to Pipeline.score [MRG + 2] add sample_weight support to Pipeline.score Oct 29, 2016
@kmike
Copy link
Contributor Author

@kmike kmike commented Oct 29, 2016

@jnothman yep!

@kmike kmike force-pushed the kmike:pipeline-score-sample-weight branch from fd72352 to 71d486d Oct 29, 2016
@jnothman jnothman merged commit 10323f5 into scikit-learn:master Oct 29, 2016
2 of 3 checks passed
2 of 3 checks passed
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jnothman
Copy link
Member

@jnothman jnothman commented Oct 29, 2016

thanks!

sergeyf added a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
…rn#7723)

* add sample_weight support to Pipeline.score

* TST check that pipe.score raises TypeError for unsorrported arguments

* TST check error message for invalid score_params in pipeline.score

* TST check that pipe.fit raises TypeError when argument is not supported

* only support sample_weight argument

* whats_new: mention sample_weight in Pipeline.score
afiodorov added a commit to unravelin/scikit-learn that referenced this pull request Apr 25, 2017
…rn#7723)

* add sample_weight support to Pipeline.score

* TST check that pipe.score raises TypeError for unsorrported arguments

* TST check error message for invalid score_params in pipeline.score

* TST check that pipe.fit raises TypeError when argument is not supported

* only support sample_weight argument

* whats_new: mention sample_weight in Pipeline.score
Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…rn#7723)

* add sample_weight support to Pipeline.score

* TST check that pipe.score raises TypeError for unsorrported arguments

* TST check error message for invalid score_params in pipeline.score

* TST check that pipe.fit raises TypeError when argument is not supported

* only support sample_weight argument

* whats_new: mention sample_weight in Pipeline.score
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…rn#7723)

* add sample_weight support to Pipeline.score

* TST check that pipe.score raises TypeError for unsorrported arguments

* TST check error message for invalid score_params in pipeline.score

* TST check that pipe.fit raises TypeError when argument is not supported

* only support sample_weight argument

* whats_new: mention sample_weight in Pipeline.score
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…rn#7723)

* add sample_weight support to Pipeline.score

* TST check that pipe.score raises TypeError for unsorrported arguments

* TST check error message for invalid score_params in pipeline.score

* TST check that pipe.fit raises TypeError when argument is not supported

* only support sample_weight argument

* whats_new: mention sample_weight in Pipeline.score
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants