Possible bug when combining SVC + class_weights='balanced' + LeaveOneOut #10233

m-guggenmos · 2017-11-30T18:23:32Z

This piece of code yields perfect classification accuracy for random data:

import numpy as np
from sklearn.model_selection import cross_val_score, LeaveOneOut
from sklearn.svm import SVC

scores = cross_val_score(SVC(kernel='linear', class_weight='balanced', C=1e-08), 
                         np.random.rand(79, 100), 
                         y=np.hstack((np.ones(20), np.zeros(59))), 
                         cv=LeaveOneOut())
print(scores)

The problem disappears when using class_weight=None or another CV.

Is it a bug or am I missing something?

Tested with version 0.19.1 of scikit-learn on Ubuntu Linux.

The text was updated successfully, but these errors were encountered:

m-guggenmos · 2017-11-30T19:37:33Z

Upon reflection this can probably be closed. What happens is that the classifier basically learns the class weights, when a low value of C is chosen. Through this, the classifier always predicts the class that must have been absent from the training data, essentially independent of the data of the test sample. This is also why it only 'works', when only a single class is in the test set (as in LeaveOneOut).

Still, until I boiled it down to this minimal example, this gave me a major headache in a larger piece of code. Maybe one could provide a warning in the docs?

jnothman · 2017-11-30T21:41:23Z

I haven't really understood your explanation. what would the warning say? I wouldn't generally use LOO unless I really needed to. repeated k fold might be a better alternative.

m-guggenmos · 2017-12-01T11:36:39Z

Possibly it is the custom scikit-learn code around class_weight='balanced' that causes problems. With this option, the class weights are computed anew for each cross-validation fold. Apparently, what can happen is that if a class 1 sample is left out for testing, the balance between class 1 and class -1 in training is exactly such that the sklearn-computed class weights make it more likely for the classifier to predict class 1, and vice versa.

It's tricky, because setting class_weight='balanced' appears as a good and innocent thing to do, but can lead to 100% classification accuracy with random data.

Sorry, this is not very constructive, but I hope these caveats could at least be informative for other users. Without a better understanding it's also a bit difficult to come up with an authoritative warning. The best I could come up with is: "Note that setting class weights can lead to biased results in certain cross-validation procedures (e.g. leave-one-sample-out). One workaround is to ensure an equal number of samples of each class in the test set."

((On a side note, without a better understanding I am also a bit skeptical about setting fixed values for class_weight based on the frequency of classes in the entire dataset, because strictly speaking this destroys the non-independence of training and testing. For instance, if I tell libsvm that class 1 is exactly 10x more frequent than class -1 (amounting to the libsvm options '-w1 10 -w-1 1') than the classifier is not blind about the test-data in a leave-one-out cross-validated procedure, because the class of the test sample could, in theory, be inferred from the class frequency in the training set. However, as opposed to the sklearn 'balanced' option, I don't have a demo example for this possible caveat and I don't know whether it's really an issue.))

jnothman · 2017-12-03T23:46:38Z

It took me a while to be convinced by this. I'm going to conclude that it comes down to numerical imprecision: you can either draw 58 0s and 20 1s, in which case the class weights are [ 0.67241379 1.95 ] or you can draw 59 0s and 19 yielding [ 0.66101695 2.05263158]. Due to numerical imprecision, we get 0.67241379 * 58 < 1.95 * 20 but 0.66101695 * 59 > 2.05263158 * 19.

I suppose we could document the 'balanced' compute_class_weight option to say that it is brittle to numerical precision issues... but so is everything when there is no clear signal to learn.

An alternative might be to incorporate minor perturbation in compute_class_weights so as to ensure that there is a very marginal preference for the classes' true distribution, since (0.67241379 + 1e-8) * 58 > 1.95 * 20. But I hope you'll agree it is a fairly marginal case (no signal, no regularisation, an appropriate estimator, small number of samples, and perfect precision on the less frequent class but imperfect precision on the more frequent class) in which this appears to be needed.

I don't understand your final concern in the sklearn context, as our automated class weighting is performed only for a given training set, not the whole dataset.

jnothman · 2017-12-03T23:57:58Z

For example patch see b07172d

m-guggenmos · 2017-12-04T11:14:12Z

Thanks a lot for taking on this issue! Using the following code

import numpy as np
from sklearn.model_selection import cross_val_predict, LeaveOneOut
from sklearn.svm import SVC
from sklearn.metrics import balanced_accuracy_score

labels = np.hstack((np.ones(20), np.zeros(59)))
pred = cross_val_predict(SVC(kernel='linear', class_weight='balanced', C=1e-08),
                         np.random.rand(79, 100), y=labels,
                         cv=LeaveOneOut())

print(balanced_accuracy_score(labels, pred))

I verified that your proposed solution works in my case - the balanced accuracy is at chance (0.5).

Regarding my final concern: I put it in double brackets because I was referring to the case of setting fixed class weights (i.e. not using the 'balanced' option), which is a bit off topic. Here the problem is indeed that class weights are not separately computed for each training data set, but rather the weighting is often determined from the class frequency of the entire data set (at least that is what I have seen recommended in various CrossValidated comments).

jnothman · 2017-12-04T12:08:26Z

Right. Now I understand. But it only destroys the strict independence of training and testing if you derive your fixed weights from examining the data distribution, which I agree is very possible, but it's certainly not an error on the part of the software. I suppose you would rather a class_weight that can be set as a function of the training data in a manner that isn't 'balanced'? I would certainly consider a PR that does that if there's evidence that users have other useful weighting schemes that are a function of the training distribution.

So is this problem something you came across organically, or something you found in a contrived situation? Do you think others would land up in a problem caused by this numerical imprecision, with real ML problems? Do you think we should try to fix it or that it's a very weird case for which it's not worth breaking backwards compatibility for existing models learnt with class_weight='balanced'?

m-guggenmos · 2017-12-04T12:33:44Z

I'm not aware of class weighting procedures other than 'balanced', which of course does not mean they don't exist. In my opinion, the way this is handled with the 'balanced' option in sklearn is exemplary, precisely because the weights are computed on the training data only.

I would say that I came across the problem relatively organically. To elaborate, I was using GridSearchCV on the C parameter of SVC and after setting class_weight='balanced' I suddenly got amazing accuracies on a real-world data set (i.e., not artificial/random data). I then realized that GridSearchCV was selecting very low values of C, i.e. no regularization at all, which at first was even weirder.

Based on this experience I'm inclined to recommend inclusion of your patch, because I'm sure many people will not investigate further when accuracies are good and 'publishable'. The effect of changing class weights in the order of 1e-8 should be negligible in almost all cases, and if not, it's likely because of this very issue. I see the trade-off with exact backwards compatibility though.

jnothman · 2017-12-04T19:24:05Z

Does your real world dataset have no signal?

m-guggenmos · 2017-12-04T19:32:47Z

It has signal - around 60-65% classification accuracy. With class_weight='balanced' the accuracy became around 75%, which would have been a huge gain in this case.

amueller · 2017-12-04T21:17:09Z

I'm confused, but is the example not using enormous regularization and just fitting the intercept?

amueller · 2017-12-04T21:26:10Z

What's the class balance on your dataset? 60-65% classification accuracy seems like no signal in an imbalanced setting. I think this is more an issue of using accuracy and LOO on an imbalanced dataset.

jnothman · 2017-12-04T22:30:30Z

I think it's an unfortunate mix of a few things. It would not happen if compute_class_weight returned rationals instead of fixed-width floats.

m-guggenmos · 2017-12-04T23:30:19Z

@amueller you're of course right, it's an instance of high regularization, I mixed it up with λ = 1/C.

The accuracies mentioned in my last post refer to balanced accuracies though, so class imbalance is taken into account.

amueller · 2017-12-05T15:46:31Z

what version of balanced accuracy? ;) [less relevant to this issue maybe but part of my quest to find out what people mean when they say that]. Though depending on the definition you're using, chance performance could be anything depending on the imbalance.

m-guggenmos · 2017-12-05T15:49:07Z

Hmm, it is the version from a pip install git+https://github.com/scikit-learn/scikit-learn.git around a week ago. And then

from sklearn.metrics import balanced_accuracy_score

Does that help?

amueller · 2017-12-05T15:56:16Z

ah, makes sense. That always has chance performance of .5, right (if we always predict one class the recall will be 1 for that and 0 for the other)? Are we saying that in the docs anywhere?

jnothman · 2017-12-05T20:32:40Z

it's irrelevant what metric: the system got perfect scores for unexpected and unrealistic reasons.

…

On 6 Dec 2017 2:56 am, "Andreas Mueller" ***@***.***> wrote: ah, makes sense. That always has chance performance of .5, right (if we always predict one class the recall will be 1 for that and 0 for the other)? Are we saying that in the docs anywhere? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10233 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz61OarK_QYVC3CU0j8JWZfQIN2NeBks5s9WeigaJpZM4Qw_gr> .

mlplyler · 2018-07-19T13:40:25Z

I believe I have run into almost the exact same issue. I have a small dataset. I am testing many different feature extraction methods, and some have effectively no signal. I am using loocv. Is the proposed fix to add the perturbations to sklearn/utils/class_weight.py?

EDIT:
I think this code illustrates the discussion? I'm still kind of confused.

import numpy as np
from sklearn.metrics import accuracy_score

labels = np.hstack((np.ones(20), np.zeros(59))).astype(int)
X = np.random.rand(len(labels), 100)
ypred = np.empty(labels.shape)

ydumbs=[]
for i in range(len(labels)):
        
    xtrain = None#np.delete(X,i,axis=0)
    ytrain = np.delete(labels,i,axis=0)
    xtest = None#np.array(X[i])[np.newaxis,]
    ytest = np.array(labels[i])
    
    w1 = np.float16(len(ytrain)/(2*len(ytrain[ytrain==1])))
    w0 = np.float16(len(ytrain)/(2*len(ytrain[ytrain==0])))
    
    if w1*len(ytrain[ytrain==1]) > w0*len(ytrain[ytrain==0]):
        ydumb = 0
    else:
        ydumb = 1
    
    #print('ydumb', ydumb, ytest, ytest==ydumb)
    ydumbs.append(ydumb)        
    
print(accuracy_score(labels,ydumbs))

wjxts · 2020-01-31T09:13:08Z

I came across the same issue, too. The SOTA accuarcy of my problem is 70%. When I set C=0.001, I got accuracy 100% which is impossible. I find the answer here. Thank you!

ogrisel · 2022-01-14T16:15:27Z

I came across the same issue, too. The SOTA accuarcy of my problem is 70%. When I set C=0.001, I got accuracy 100% which is impossible. I find the answer here. Thank you!

Was this also with class_weight="balanced" and LOO?

ogrisel · 2022-01-14T16:16:27Z

Maybe we could just document this pitfall in an example and add a short note in the relevant docstrings for class_weight="balanced" and the LOO doc?

jbschiratti · 2022-10-27T11:53:46Z

@jnothman I'm following up on a previous discussion. Unless I am mistaken, if class_weight='balanced' is passed to LogisticRegressionCV, the class weights are computed the labels of the entire dataset. This breaks the independence of training and test data. Is there a specific reason for that?

jnothman mentioned this issue Dec 4, 2017

[WIP] FIX? Add eps to obviate imprecision issues in balanced class_weight #10249

Closed

Johayon mentioned this issue Jan 21, 2018

[MRG] Class weight imprecision #10515

Closed

cmarmo added the Needs Decision - Close Requires decision for closing label Dec 23, 2021

ogrisel mentioned this issue Jan 14, 2022

ENH Adds class_weight to HistGradientBoostingClassifier #22014

Merged

thomasjpfan added Documentation module:linear_model module:model_selection and removed Needs Decision - Close Requires decision for closing labels Jan 14, 2022

thomasjpfan added Moderate Anything that requires some knowledge of conventions and best practices Hard Hard level of difficulty and removed Moderate Anything that requires some knowledge of conventions and best practices labels Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug when combining SVC + class_weights='balanced' + LeaveOneOut #10233

Possible bug when combining SVC + class_weights='balanced' + LeaveOneOut #10233

m-guggenmos commented Nov 30, 2017

m-guggenmos commented Nov 30, 2017

jnothman commented Nov 30, 2017 via email

m-guggenmos commented Dec 1, 2017 •

edited

jnothman commented Dec 3, 2017

jnothman commented Dec 3, 2017

m-guggenmos commented Dec 4, 2017

jnothman commented Dec 4, 2017

m-guggenmos commented Dec 4, 2017 •

edited

jnothman commented Dec 4, 2017 via email

m-guggenmos commented Dec 4, 2017

amueller commented Dec 4, 2017 •

edited

amueller commented Dec 4, 2017

jnothman commented Dec 4, 2017 via email

m-guggenmos commented Dec 4, 2017

amueller commented Dec 5, 2017

m-guggenmos commented Dec 5, 2017 •

edited

amueller commented Dec 5, 2017

jnothman commented Dec 5, 2017 via email

mlplyler commented Jul 19, 2018 •

edited

wjxts commented Jan 31, 2020

ogrisel commented Jan 14, 2022

ogrisel commented Jan 14, 2022

jbschiratti commented Oct 27, 2022

Possible bug when combining SVC + class_weights='balanced' + LeaveOneOut #10233

Possible bug when combining SVC + class_weights='balanced' + LeaveOneOut #10233

Comments

m-guggenmos commented Nov 30, 2017

m-guggenmos commented Nov 30, 2017

jnothman commented Nov 30, 2017 via email

m-guggenmos commented Dec 1, 2017 • edited

jnothman commented Dec 3, 2017

jnothman commented Dec 3, 2017

m-guggenmos commented Dec 4, 2017

jnothman commented Dec 4, 2017

m-guggenmos commented Dec 4, 2017 • edited

jnothman commented Dec 4, 2017 via email

m-guggenmos commented Dec 4, 2017

amueller commented Dec 4, 2017 • edited

amueller commented Dec 4, 2017

jnothman commented Dec 4, 2017 via email

m-guggenmos commented Dec 4, 2017

amueller commented Dec 5, 2017

m-guggenmos commented Dec 5, 2017 • edited

amueller commented Dec 5, 2017

jnothman commented Dec 5, 2017 via email

mlplyler commented Jul 19, 2018 • edited

wjxts commented Jan 31, 2020

ogrisel commented Jan 14, 2022

ogrisel commented Jan 14, 2022

jbschiratti commented Oct 27, 2022

m-guggenmos commented Dec 1, 2017 •

edited

m-guggenmos commented Dec 4, 2017 •

edited

amueller commented Dec 4, 2017 •

edited

m-guggenmos commented Dec 5, 2017 •

edited

mlplyler commented Jul 19, 2018 •

edited