[WIP] scorer_params and sample_weight support #3524

vene · 2014-08-01T14:24:42Z

This is me picking up the remaining commits from #1574, rebasing on master, and adding a different API:

Since I want to write a sample_group-aware scorer, I needed an API to pass arbitrary params to scorers in cross-validation in the same way as fit_params. This supersedes the need for an explicit sample_weights parameter, at the cost of some duplication in the function call (fit_params=dict(sample_weight=sw), scorer_params=dict(sample_weight=sw)). Internally this saves us the need for special treatment, though, and allows scorers to be more powerful.

Consider API change: fit_params and scorer_params starting with sample_ will be indexed by train-test splits.
support fit_params and scorer_params in learning_curve and RFECV
test that scorer_params are getting appropriately indexed.
proof of concept learning to rank grid search using this API

coveralls · 2014-08-01T14:35:06Z

Coverage decreased (-0.03%) when pulling f1f6a3c on vene:weighted_score_params into 07560e4 on scikit-learn:master.

agramfort · 2014-08-01T20:53:23Z

sklearn/learning_curve.py

@@ -17,7 +17,8 @@
 from .utils.fixes import astype


-def learning_curve(estimator, X, y, train_sizes=np.linspace(0.1, 1.0, 5),
+def learning_curve(estimator, X, y, sample_weight=None,


this changes the function signature :( can we live with it?

This is one of the changes from @ndawe's original changeset. I would suggest making these functions (learning_curve, validation_curve, RFECV) support fit_params and scorer_params, and adding them at the end so that the API doesn't change, in the same way I did in grid search.

I didn't do this yet because I wanted to get the API proposal out there so we can discuss, and also because I only really need GridSearchCV for what I'm doing now.

jnothman · 2014-08-02T10:39:16Z

I don't dispute the utility of scorer_params in other cases, but I think we are best off supporting a sample_weight parameter directly in GridSearchCV et al. This is consistent with the standard fit interface across scikit-learn, and makes things like nested CV straightforward.

vene · 2014-08-02T12:26:11Z

@jnothman I am not convinced either. But there are two attributes that need to have this behaviour: sample_weight and sample_group, and it made me think that it'd be cleaner to implement them together.

This is clearly a generalizable approach, which is tempting---but I can't come up with other examples of arguments that should be passed like this, though. Also I can't think of other scorer_params that are not data-dependent and can't be set when instantiating a Scorer object.

So I'm torn.
But I need sample_group though.

jnothman · 2014-08-02T12:40:18Z

As far as I'm concerned, scikit-learn is almost entirely concerned with data where samples may be assumed iid. sample_group falls outside (or at least on the periphery) of that space, while sample_weight is something that most estimators, metrics, etc. could/should support. However, it is useful that the more generic components in scikit-learn also be compatible with learning tasks where additional information like sample_group is needed. I think this is sufficient for arguing that sample_weight and sample_group be considered different classes of citizen.

jnothman · 2014-08-02T12:51:26Z

(Sorry, slipped.)

On a related note, split_params may also be needed in the context of #3340 and nested cross-validation, in order to handle LeaveOneLabelOut. Or is that getting too crazy?

On a completely different note, I doubt that scorer_params should be an attribute of GridSearchCV et al., mostly because it will add confusion to the metrics-scorer interfaces. Basically, scorers should be designed such that all parameters other than estimator are data-aligned, and hence scorer_params should only be provided as a kwarg to fit.

Another way of looking at it is that arbitrary kwargs could be provided to fit, and the GridSearchCV constructor could have a parameter that defines their routing: fit_param_routing={'sample_weight': ['fit', 'score'], 'sample_group': ['score'], 'labels': ['split']} (or the inverse mapping). (NB: In nested CV, all parameters should be routed to 'fit'.)

vene · 2014-08-02T13:41:50Z

I see your point and I agree. I was thinking the same: something like sample_group should definitely not be defined in scikit-learn itself, but the cross-val and search API should allow for something like this to be implemented easily on top, without having to copy-paste and hack the code.

Something that kind of collides with your suggestion is that fit_params is already a constructor argument, while by design stuff that goes into fit_params will be data-dependent, so this is a bit weird. I think it would be cleaner to route arbitrary **kwargs passed to the grid search's fit method.

jnothman · 2014-08-02T13:52:00Z

fit_params I think has long legacy, but I don't think it would be harmful to deprecate it. I only worry that the routing approach is a bit too abstracted. Although perhaps the argument that in general the base estimator's fit should not receive sample_group but that a nested cross-validator's fit should, suffices to say that a mechanism of this power is necessary.

mblondel · 2014-08-12T15:17:13Z

sklearn/tests/test_grid_search.py

+    for search_cls in (GridSearchCV, RandomizedSearchCV):
+        params=dict(sample_weight=sample_weight)
+        grid_search = search_cls(MockClassifier(), est_parameters, cv=cv,
+                                 fit_params=params, scorer_params=params)


It could be useful to support scorer_params="fit_params" to avoid the double declaration. Now, should the default be scorer_params=None or scorer_params="fit_params"?

adrinjalali · 2023-07-14T13:56:48Z

Also closing this one since SLEP006 provides the API and it's already implemented for scorers.

ndawe and others added 5 commits August 1, 2014 15:05

grid_search: add sample_weight support

e313f81

cross_validation: add sample_weight support

5816618

rfe: sample_weight support

1e5c536

learning_curve: sample_weight support

7612a1f

Refactor sample_weights as generic scorer_params

f1f6a3c

agramfort reviewed Aug 1, 2014
View reviewed changes

jnothman closed this Aug 2, 2014

jnothman reopened this Aug 2, 2014

mblondel reviewed Aug 12, 2014
View reviewed changes

larsmans force-pushed the master branch from 58a55ad to 4b82379 Compare August 25, 2014 21:50

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

raghavrv mentioned this pull request Mar 30, 2015

[MRG+1] Make cross-validators data independent + Reorganize grid_search, cross_validation and learning_curve into model_selection #4294

Merged

24 tasks

GaelVaroquaux mentioned this pull request Apr 2, 2015

[API] Consistent API for attaching properties to samples #4497

Closed

jnothman mentioned this pull request Aug 16, 2017

[WIP] Sample property routing #9566

Closed

11 tasks

amueller added API Needs Decision Requires decision labels Aug 5, 2019

github-actions bot added the module:feature_selection label Mar 2, 2020

adrinjalali mentioned this pull request Oct 29, 2020

[WIP] sample props (proposal 4) #16079

Closed

Base automatically changed from master to main January 22, 2021 10:48

adrinjalali mentioned this pull request Jun 24, 2021

sample-props alternate implementation #20350

Closed

adrinjalali mentioned this pull request Aug 18, 2022

FEAT SLEP006: metadata routing infrastructure #24027

Merged

adrinjalali mentioned this pull request Apr 26, 2023

SLEP006 - Metadata Routing task list #22893

Open

70 tasks

adrinjalali closed this Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] scorer_params and sample_weight support #3524

[WIP] scorer_params and sample_weight support #3524

vene commented Aug 1, 2014

coveralls commented Aug 1, 2014

agramfort Aug 1, 2014

vene Aug 2, 2014

jnothman commented Aug 2, 2014

vene commented Aug 2, 2014

jnothman commented Aug 2, 2014

jnothman commented Aug 2, 2014

vene commented Aug 2, 2014

jnothman commented Aug 2, 2014

mblondel Aug 12, 2014

adrinjalali commented Jul 14, 2023

[WIP] scorer_params and sample_weight support #3524

[WIP] scorer_params and sample_weight support #3524

Conversation

vene commented Aug 1, 2014

coveralls commented Aug 1, 2014

agramfort Aug 1, 2014

Choose a reason for hiding this comment

vene Aug 2, 2014

Choose a reason for hiding this comment

jnothman commented Aug 2, 2014

vene commented Aug 2, 2014

jnothman commented Aug 2, 2014

jnothman commented Aug 2, 2014

vene commented Aug 2, 2014

jnothman commented Aug 2, 2014

mblondel Aug 12, 2014

Choose a reason for hiding this comment

adrinjalali commented Jul 14, 2023