Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sample_weights array can't be used with GridSearchCV #2879

Closed
shenberg opened this issue Feb 20, 2014 · 20 comments
Closed

sample_weights array can't be used with GridSearchCV #2879

shenberg opened this issue Feb 20, 2014 · 20 comments
Labels

Comments

@shenberg
Copy link

@shenberg shenberg commented Feb 20, 2014

The internal cross-validation isn't aware of sample weights, so and exception is thrown if a sample_weights sequence is passed to the grid search, because fit_grid_point does not split the weights into training and test sets.

@ndawe
Copy link
Member

@ndawe ndawe commented Feb 20, 2014

I have added support for sample_weight in GridSearchCV in #1574. Hopefully I can get that merged soon.

@shenberg
Copy link
Author

@shenberg shenberg commented Feb 23, 2014

Thanks, that was extremely fast turn-around.

As a meta-question, is sampling weighted? e.g. Is it better for stratified folds to try maintain the ratio of total weight per class instead of the number of samples?

@amueller
Copy link
Member

@amueller amueller commented Oct 27, 2016

sample_weight support was merged long ago

@amueller amueller closed this Oct 27, 2016
@jnothman
Copy link
Member

@jnothman jnothman commented Oct 29, 2016

@amueller, sample_weight is still not supported as an argument to fit in *SearchCV

@jnothman jnothman reopened this Oct 29, 2016
@amueller
Copy link
Member

@amueller amueller commented Nov 1, 2016

There is fit_params

@stephen-hoover
Copy link
Contributor

@stephen-hoover stephen-hoover commented Jan 31, 2017

@amueller , I'm running into a similar issue. It's possible to set a sample_weights array as an instance attribute via the GridSearchCV.fit_params dictionary, and then everything works fine. But this fails in nested cross-validation with model_selection.cross_val_predict, because the GridSearchCV.fit (and RandomizedSearchCV.fit) method doesn't accept fit parameters.

Is there a reason why the BaseSearchCV subclasses can't accept fit parameters? Having to set data-dependent fit parameters as instance attributes appears to contradict http://scikit-learn.org/stable/developers/contributing.html#fitting .

I would be willing to make that change (adding fit parameters) if you'd accept that PR.

@jnothman
Copy link
Member

@jnothman jnothman commented Feb 2, 2017

Yes, it's broken. One issue is that we need to be clear whether sample_weight is being passed only to scoring, only to fit, or both. Feel free to propose and champion a solution.

@ManasHardas
Copy link

@ManasHardas ManasHardas commented Dec 13, 2017

fit_params is deprecated since 0.19 and will be removed in 0.21
How else to make "sample_weights" a part of cross-validation?

@amueller
Copy link
Member

@amueller amueller commented Dec 13, 2017

@ManasHardas pass them to fit.

@farfan92
Copy link

@farfan92 farfan92 commented Feb 5, 2018

@amueller Trying your solution when using RandomizedSearchCV, with a RandomForestClassifier.

Attempting to pass: fit_params ={'sample_weight':s_weights}
to the .fit method of RandomizedSearchCV results in

TypeError: fit() got an unexpected keyword argument 'fit_params'

But will still manage to run with the deprecation warning when passed to the constructor instead.

@jnothman
Copy link
Member

@jnothman jnothman commented Feb 5, 2018

@farfan92
Copy link

@farfan92 farfan92 commented Feb 5, 2018

I see, should have more closely looked at doc

@rishabhgit
Copy link

@rishabhgit rishabhgit commented Apr 15, 2019

Hi @amueller , @jnothman ,

Has this issue been fixed? I'm trying to run Randomized Search CV with sample_weights both as a scoring param and fit param. Here is a code snippet:

scorer = make_scorer(r2_score,sample_weight=weights)
rs = RandomizedSearchCV(estimator=rf, param_distributions=param_grid, n_iter=3, 
                           scoring = scorer,
                           n_jobs=-1, cv=3)
    
rs = rs.fit(X_train, y_train) 

But, RandomizedSearchCV does not split the sample weights column when splitting the data into train and test sets. Here's some output about the array shapes and the error:

X_train shape  (349978, 367)
y_train shape  (349978,)
Sample weight shape  (349978,)

ValueError: Found input variables with inconsistent numbers of samples: [116660, 116660, 349978]

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 15, 2019

Has this issue been fixed?

No we don't currently support weighted scoring in cross validation. Sorry. :'( Soon?

@doriang102
Copy link

@doriang102 doriang102 commented Apr 30, 2019

Is this still unresolved? GridSearchCV and RandomSearchCV do not throw an error when sample_weight is passed to the fit method, yet the cross validation seems to ignore them entirely.

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 30, 2019

@doriang102
Copy link

@doriang102 doriang102 commented Apr 30, 2019

Thanks. Is there any current workaround other than oversampling the class in a way way proportional to the weights?

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 30, 2019

@david-r-wasserman
Copy link

@david-r-wasserman david-r-wasserman commented Jan 23, 2020

Now that this issue has been fixed, the new feature should be documented at https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html and https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html.

Also, this only works for sample_weights, right? If the estimator's fit() function has other params that are sample-specific, those still won't work, or will they? This should be made clear in the documentation.

@lpossberg
Copy link

@lpossberg lpossberg commented Mar 29, 2020

Has this issue been fixed?

No we don't currently support weighted scoring in cross validation. Sorry. :'( Soon?

Hi @jnothman ,
is there any progress concerning the weighted scoring in cross validation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

You can’t perform that action at this time.