New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug when combining SVC + class_weights='balanced' + LeaveOneOut #10233
Comments
Upon reflection this can probably be closed. What happens is that the classifier basically learns the class weights, when a low value of C is chosen. Through this, the classifier always predicts the class that must have been absent from the training data, essentially independent of the data of the test sample. This is also why it only 'works', when only a single class is in the test set (as in LeaveOneOut). Still, until I boiled it down to this minimal example, this gave me a major headache in a larger piece of code. Maybe one could provide a warning in the docs? |
I haven't really understood your explanation. what would the warning say? I
wouldn't generally use LOO unless I really needed to. repeated k fold might
be a better alternative.
|
Possibly it is the custom scikit-learn code around It's tricky, because setting Sorry, this is not very constructive, but I hope these caveats could at least be informative for other users. Without a better understanding it's also a bit difficult to come up with an authoritative warning. The best I could come up with is: "Note that setting class weights can lead to biased results in certain cross-validation procedures (e.g. leave-one-sample-out). One workaround is to ensure an equal number of samples of each class in the test set." ((On a side note, without a better understanding I am also a bit skeptical about setting fixed values for |
It took me a while to be convinced by this. I'm going to conclude that it comes down to numerical imprecision: you can either draw 58 0s and 20 1s, in which case the class weights are I suppose we could document the 'balanced' An alternative might be to incorporate minor perturbation in I don't understand your final concern in the sklearn context, as our automated class weighting is performed only for a given training set, not the whole dataset. |
For example patch see b07172d |
Thanks a lot for taking on this issue! Using the following code import numpy as np
from sklearn.model_selection import cross_val_predict, LeaveOneOut
from sklearn.svm import SVC
from sklearn.metrics import balanced_accuracy_score
labels = np.hstack((np.ones(20), np.zeros(59)))
pred = cross_val_predict(SVC(kernel='linear', class_weight='balanced', C=1e-08),
np.random.rand(79, 100), y=labels,
cv=LeaveOneOut())
print(balanced_accuracy_score(labels, pred)) I verified that your proposed solution works in my case - the balanced accuracy is at chance (0.5). Regarding my final concern: I put it in double brackets because I was referring to the case of setting fixed class weights (i.e. not using the 'balanced' option), which is a bit off topic. Here the problem is indeed that class weights are not separately computed for each training data set, but rather the weighting is often determined from the class frequency of the entire data set (at least that is what I have seen recommended in various CrossValidated comments). |
Right. Now I understand. But it only destroys the strict independence of training and testing if you derive your fixed weights from examining the data distribution, which I agree is very possible, but it's certainly not an error on the part of the software. I suppose you would rather a class_weight that can be set as a function of the training data in a manner that isn't 'balanced'? I would certainly consider a PR that does that if there's evidence that users have other useful weighting schemes that are a function of the training distribution. So is this problem something you came across organically, or something you found in a contrived situation? Do you think others would land up in a problem caused by this numerical imprecision, with real ML problems? Do you think we should try to fix it or that it's a very weird case for which it's not worth breaking backwards compatibility for existing models learnt with |
I'm not aware of class weighting procedures other than 'balanced', which of course does not mean they don't exist. In my opinion, the way this is handled with the 'balanced' option in sklearn is exemplary, precisely because the weights are computed on the training data only. I would say that I came across the problem relatively organically. To elaborate, I was using Based on this experience I'm inclined to recommend inclusion of your patch, because I'm sure many people will not investigate further when accuracies are good and 'publishable'. The effect of changing class weights in the order of 1e-8 should be negligible in almost all cases, and if not, it's likely because of this very issue. I see the trade-off with exact backwards compatibility though. |
Does your real world dataset have no signal?
|
It has signal - around 60-65% classification accuracy. With class_weight='balanced' the accuracy became around 75%, which would have been a huge gain in this case. |
I'm confused, but is the example not using enormous regularization and just fitting the intercept? |
What's the class balance on your dataset? 60-65% classification accuracy seems like no signal in an imbalanced setting. I think this is more an issue of using accuracy and LOO on an imbalanced dataset. |
I think it's an unfortunate mix of a few things. It would not happen if
compute_class_weight returned rationals instead of fixed-width floats.
|
@amueller you're of course right, it's an instance of high regularization, I mixed it up with λ = 1/C. The accuracies mentioned in my last post refer to balanced accuracies though, so class imbalance is taken into account. |
what version of balanced accuracy? ;) [less relevant to this issue maybe but part of my quest to find out what people mean when they say that]. Though depending on the definition you're using, chance performance could be anything depending on the imbalance. |
Hmm, it is the version from a from sklearn.metrics import balanced_accuracy_score Does that help? |
ah, makes sense. That always has chance performance of .5, right (if we always predict one class the recall will be 1 for that and 0 for the other)? Are we saying that in the docs anywhere? |
it's irrelevant what metric: the system got perfect scores for unexpected
and unrealistic reasons.
…On 6 Dec 2017 2:56 am, "Andreas Mueller" ***@***.***> wrote:
ah, makes sense. That always has chance performance of .5, right (if we
always predict one class the recall will be 1 for that and 0 for the
other)? Are we saying that in the docs anywhere?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10233 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz61OarK_QYVC3CU0j8JWZfQIN2NeBks5s9WeigaJpZM4Qw_gr>
.
|
I believe I have run into almost the exact same issue. I have a small dataset. I am testing many different feature extraction methods, and some have effectively no signal. I am using loocv. Is the proposed fix to add the perturbations to sklearn/utils/class_weight.py? EDIT:
|
I came across the same issue, too. The SOTA accuarcy of my problem is 70%. When I set C=0.001, I got accuracy 100% which is impossible. I find the answer here. Thank you! |
Was this also with |
Maybe we could just document this pitfall in an example and add a short note in the relevant docstrings for |
@jnothman I'm following up on a previous discussion. Unless I am mistaken, if |
This piece of code yields perfect classification accuracy for random data:
The problem disappears when using
class_weight=None
or another CV.Is it a bug or am I missing something?
Tested with version 0.19.1 of scikit-learn on Ubuntu Linux.
The text was updated successfully, but these errors were encountered: