Dense svm and zeroed weight for samples of entire class #5150

olologin · 2015-08-24T22:24:24Z

This bug appears in current master, and for any dense svm class.

import numpy as np
from sklearn.svm import SVC
X = np.array([[0, 0, 0],
              [0, 0, 1],
              [0, 1, 0],
              [0, 1, 1],
              [1, 0, 0],
              [1, 0, 1],
              [1, 1, 0],
              [1, 1, 1]])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2])
w = np.array([1, 1, 1, 1, 1, 1, 0, 0])


f = SVC(kernel='linear', probability=True, random_state=1)
f.fit(X,y, w)
print(f.classes_)
print(f.predict_proba(X))

Output:

[0 1 2]
warning: class label 2 specified in weight is not found
[[ 0.28963492  0.71036508]
 [ 0.39180833  0.60819167]
 [ 0.28963492  0.71036508]
 [ 0.39180833  0.60819167]
 [ 0.57544014  0.42455986]
 [ 0.68293573  0.31706427]
 [ 0.57544014  0.42455986]
 [ 0.68293573  0.31706427]]

Here we see that svmlib internally have lost 2nd class, at the same time sklean's wrapper class keeps all class labels inside, that's why predict_proba returns matrix of shape (n_samples, 2) instead of (n_sample, 3) (what is expected by bagging classifier implementation). I understand that it's insane usage of weights by itself, but together with bagging and dataset with many labels, bagging randomly zeroes complete classes, and this bug shows itself, because bagging expects that svm's return probability of classes which they hold (e.g. all classes).

I investigated this a little bit, and can try to fix this, if someone will say that all this usage with bagging makes sense (Because i don't really sure about this).

The text was updated successfully, but these errors were encountered:

amueller · 2015-08-24T22:39:18Z

Yeah that looks like a bug. I wonder if this shows in any other models or just the svm. If you are interested, could you maybe add a test to the common tests in utils/estimator_checks.py to see if this happens elsewhere?

(or you could just loop over all_estimators()).

amueller · 2015-08-24T22:40:31Z

We could just fix it by handing all points to libsvm but that's probably not what we want to do, right? Does libsvm handle them efficiently?
Could you maybe do a benchmark?

olologin · 2015-09-04T21:10:50Z

I've tested all classifiers that have sample_weights parameter in fit, and have predict_proba method.
Looks like only SVC affected. NuSVC uses same implementation, but it will always throw exception "ValueError: specified nu is infeasible" if total sample weight of some class equals zero.

amueller · 2015-09-08T21:26:56Z

Thanks for checking.

giorgiop · 2015-10-02T12:29:05Z

Yeah that looks like a bug. I wonder if this shows in any other models or just the svm. If you are interested, could you maybe add a test to the common tests in utils/estimator_checks.py to see if this happens elsewhere?
(or you could just loop over all_estimators()).

Here a script. I have included Regressors as well, but I am not sure whether that makes sense here/is relevant for the scope of this bug.

import inspect
import numpy as np
from sklearn.utils.testing import all_estimators

# This breaks because of too few samples
exclude_CV_ests = ['CalibratedClassifierCV']
X = np.array([[0, 0, 0],
              [0, 0, 1],
              [0, 1, 0],
              [0, 1, 1],
              [1, 0, 0],
              [1, 0, 1],
              [1, 1, 0],
              [1, 1, 1]])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2])
w = np.array([1, 1, 1, 1, 1, 1, 0, 0])

for (name, est) in all_estimators():
    insp = inspect.getargspec(est.fit)
    if name not in exclude_CV_ests and 'sample_weight' in insp.args:
        print("\n" + name)
        try:
            est().fit(X, y, w)
            print("OK")
        except ValueError as e:
            print("ValueError: " + str(e))

Output

[some deprecation warnings from importing  ...]

AdaBoostClassifier
OK

AdaBoostRegressor
OK

BaggingClassifier
OK

BaggingRegressor
OK

BernoulliNB
RuntimeWarning: divide by zero encountered in log
  self.class_log_prior_ = (np.log(self.class_count_)
OK

DBSCAN
OK

DecisionTreeClassifier
OK

DecisionTreeRegressor
OK

ExtraTreeClassifier
OK

ExtraTreeRegressor
OK

ExtraTreesClassifier
OK

ExtraTreesRegressor
OK

GaussianNB
RuntimeWarning: invalid value encountered in true_divide
  new_mu = np.average(X, axis=0, weights=sample_weight / n_new)
RuntimeWarning: invalid value encountered in true_divide
  weights=sample_weight / n_new)
OK

GradientBoostingClassifier
OK

GradientBoostingRegressor
OK

KernelRidge
OK

LinearRegression
OK

LogisticRegression
ValueError: Solver liblinear does not support sample weights.

LogisticRegressionCV
Warning: The least populated class in y has only 2 members, which is too few.
The minimum number of labels for any class cannot be less than n_folds=3.
  % (min_labels, self.n_folds)), Warning)
OK

MultinomialNB
RuntimeWarning: divide by zero encountered in log
  self.class_log_prior_ = (np.log(self.class_count_)
OK

NuSVC
ValueError: specified nu is infeasible

NuSVR
OK

OneClassSVM
OK

Perceptron
ValueError: Provided ``coef_`` does not match dataset. 

RandomForestClassifier
OK

RandomForestRegressor
OK

Ridge
OK

RidgeCV
OK

RidgeClassifier
OK

RidgeClassifierCV
OK

SGDClassifier
ValueError: Provided ``coef_`` does not match dataset. 

SGDRegressor
ValueError: Provided coef_init does not match dataset.

SVC
warning: class label 2 specified in weight is not found
OK

SVR
OK

_BaseRidgeCV
OK

_RidgeGCV
OK

amueller · 2015-10-12T23:50:59Z

thanks for that. You should use sample_weight=w, though. Using a positional argument is what caused the SGD errors, right?

olologin · 2015-10-14T22:11:58Z

Should we rise warning for such input? Or maybe throw some exception? If we throw exceptions - meta classifiers which tend to vanish weights for entire classes may not work.
What is expected output from predict_proba?
should it return probabilities of "correct" classes only? Or return probabilities for all classes in input, but with 0's for entire "incorrect" classes?

Now all classifiers which predict_proba method (except NuSVC which throws ValueError, and SVC of course) treat such input silently and predict_proba return columns for all classes from dataset.

We can remove all "incorrect" classes inside BaseSVC's fit method, initialize internal variable classes_ from this fixed dataset, and feed this dataset into underlying implementation. At least now classes_ and predict_proba output will be consistent.

amueller · 2015-10-15T15:24:19Z

predict_proba should return probabilities for all classes in the classes_ attribute, which should be the same as np.unique(y_train)
And I don't think we should raise an error.
This is a valid input, and I don't think these are "incorrect" classes. It's just a bug in SVC

olologin · 2015-10-15T16:09:07Z

Ok, i'll try to fix this.

olologin · 2015-10-16T19:14:11Z

How about such results?

X = np.array([[0, 0, 0],
              [0, 0, 1],
              [0, 1, 0],
              [0, 1, 1],
              [1, 0, 0],
              [1, 0, 1],
              [1, 1, 0],
              [1, 1, 1]])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2])
w = np.array([1, 1, 1, 1, 1, 1, 0, 0])

f = SVC(probability=True, class_weight={0:1,1:1,2:1}, random_state=1)
f.fit(X,y,w)
print('.classes_:')
print(f.classes_)
print('f.support_:')
print(f.support_)
print('f.dual_coef_:')
print(f.dual_coef_)
print('f.intercept_:')
print(f.intercept_)
print('f.predict_proba(X):')
print(f.predict_proba(X))

produces

f.classes_:
[0 1 2]
f.support_:
[0 1 2 3 4 5]
f.dual_coef_:
[[ 1.  1.  1. -1. -1. -1.]
 [ 0.  0.  0.  0.  0.  0.]]
f.intercept_:
[-0.02878827         inf         inf]
f.predict_proba(X):
[[  3.31225602e-14   3.30356882e-14   1.00000000e+00]
 [  3.63993489e-14   3.24543926e-14   1.00000000e+00]
 [  3.36042808e-14   3.30595561e-14   1.00000000e+00]
 [  3.77307873e-14   3.12880128e-14   1.00000000e+00]
 [  3.86736989e-14   2.78250291e-14   1.00000000e+00]
 [  3.82381936e-14   2.50595908e-14   1.00000000e+00]
 [  3.84822371e-14   2.96383203e-14   1.00000000e+00]
 [  3.84242527e-14   2.58035393e-14   1.00000000e+00]]

Confusing, but at least it looks mathematically correct. And current master already produces same output if you will run this code:

f = SVC(probability=True, class_weight={0:1,1:1,2:0}, random_state=1)
f.fit(X,y)

Which is the same. Because here we vanish class_weight for second class, instead of doing it through sample_weights.

amueller · 2015-10-16T22:15:31Z

so it is just using "balanced"?
Why do you say this is correct, though? shouldn't the probability for class 2 be zero?

olologin · 2015-10-17T07:04:09Z

@amueller,

so it is just using "balanced"?

No, i forced SVC to use any sample_weights. To obtain such results i've removed this call https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/src/libsvm/svm.cpp#L2342 And all corresponding memory allocation/deallocation. Function call at that line is removing samples from dataset, whose sample_weight is equals to zero. And if it deletes samples of entire class - svm training starts on this 'truncated' dataset without knowledge about all possible classes. But output of this 'fixed' code now is consistent with case when you specify some class_weight=0. And as i found out original libsvm also produces same results.
But with this 'fix' new bug apears - on some small inputs training freezes (maybe infinite loop).

Why do you say this is correct, though?

Nevermind, i tried to explain such outputs, but i made a mistake in my thoughts.

shouldn't the probability for class 2 be zero?

Of course it should, but even original libsvm returns same probability estimates.
If someone wants to test it on original libsvm:

dataset.txt:

0 1:0 2:0 3:0
0 1:0 2:0 3:1
0 1:0 2:1 3:0
1 1:0 2:1 3:1
1 1:1 2:0 3:0
1 1:1 2:0 3:1
2 1:1 2:1 3:0
2 1:1 2:1 3:1

code:

libsvm-3.20$ ./svm-train -b 1 -w0 1 -w1 1 -w2 0 dataset.txt model
libsvm-3.20$ ./svm-predict -b 1 dataset.txt model predictions.out

It produces in predictions.out:

labels 0 1 2
2 3.31221e-14 3.30357e-14 1
2 3.63995e-14 3.24543e-14 1
2 3.36039e-14 3.30595e-14 1
2 3.77311e-14 3.12876e-14 1
2 3.86737e-14 2.78238e-14 1
2 3.82377e-14 2.50579e-14 1
2 3.84825e-14 2.96375e-14 1
2 3.84239e-14 2.58019e-14 1

olologin · 2015-10-17T07:19:54Z

Maybe it's easier to just fix meta estimators, so that they wouldn't pass 0 weight samples into estimators. And throw an error for any such input. Bagging for example can can chose samples for estimator training by choosing weights (default, if estimator supports sample_weights), or subsampling from dataset.

Because in case with this bug, if you want to return 0 prob for any 'incorrect' class, you must take into account that SVM classifier also contains bunch of other attributes, like support_, n_support_, dual_coef_, coef_, what values will they have in this case? It will look ugly.

amueller · 2015-10-18T17:15:37Z

I think this should be fixed in libsvm. Do you have an explanation for the libsvm behavior? It seems highly odd to me. I guess it is a combination of how the class weights change the loss combined with the OVR approach. I don't have time to go through the math right now.

This seems to be separate from the "balanced' / "auto" issue, though, right?

olologin · 2015-10-19T07:54:56Z

Do you have an explanation for the libsvm behavior?

No.

This seems to be separate from the "balanced' / "auto" issue

Can you point at that issue? I don't know about which you are asking.

amueller · 2015-10-21T12:22:23Z

this one.

olologin · 2015-10-21T18:48:15Z

Ah, sorry, now i understood. Yep, it isn't related to balanced/auto. I've updated code listing in first post.

olologin · 2015-11-02T10:19:31Z

So as i suspected, it's not a bug, i asked about it cjlin1/libsvm#50 (comment)

If someone didn't understand what they said there:
alphas of j-th class bounded from above with C value, if C (or values of entire sample_weights vector) of some class is equal to 0 — y.T*alphas = 0 constraint forces other alphas to be 0's. Thus you don't even have any support vectors in solution (because entire alpha vector - 0's)

In terms of original optimization problem - C = 0 means that you cannot penalize xi values of entire class, and minimum of problem achieved when you classify any points in dataset with label of that strange class whose C=0 (because xi's could be any positive numbers, C=0 will not penalize them).

amueller · 2015-11-02T22:43:44Z

basically: using class-weights with OVR is not a great idea without calibration. Or is there another conclusion?
That should maybe be mentioned in the docs.
However, the original issue still persists

amueller added the Bug label Aug 24, 2015

amueller added this to the 0.17 milestone Sep 8, 2015

amueller modified the milestones: 0.17, 0.18 Nov 2, 2015

amueller modified the milestones: 0.18, 0.19 Sep 22, 2016

amueller removed this from the 0.19 milestone Jun 12, 2017

whitezhang mentioned this issue Aug 4, 2017

Bug: Fail to train SVM when got "warning: class label 0 specified in weight is not found" #9494

Closed

cmarmo added the module:svm label Dec 8, 2021

glemaitre linked a pull request Nov 10, 2023 that will close this issue

FIX handle properly null weights in SVC #27763

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dense svm and zeroed weight for samples of entire class #5150

Dense svm and zeroed weight for samples of entire class #5150

olologin commented Aug 24, 2015

amueller commented Aug 24, 2015

amueller commented Aug 24, 2015

olologin commented Sep 4, 2015

amueller commented Sep 8, 2015

giorgiop commented Oct 2, 2015

amueller commented Oct 12, 2015

olologin commented Oct 14, 2015

amueller commented Oct 15, 2015

olologin commented Oct 15, 2015

olologin commented Oct 16, 2015

amueller commented Oct 16, 2015

olologin commented Oct 17, 2015

olologin commented Oct 17, 2015

amueller commented Oct 18, 2015

olologin commented Oct 19, 2015

amueller commented Oct 21, 2015

olologin commented Oct 21, 2015

olologin commented Nov 2, 2015

amueller commented Nov 2, 2015

Dense svm and zeroed weight for samples of entire class #5150

Dense svm and zeroed weight for samples of entire class #5150

Comments

olologin commented Aug 24, 2015

amueller commented Aug 24, 2015

amueller commented Aug 24, 2015

olologin commented Sep 4, 2015

amueller commented Sep 8, 2015

giorgiop commented Oct 2, 2015

amueller commented Oct 12, 2015

olologin commented Oct 14, 2015

amueller commented Oct 15, 2015

olologin commented Oct 15, 2015

olologin commented Oct 16, 2015

amueller commented Oct 16, 2015

olologin commented Oct 17, 2015

olologin commented Oct 17, 2015

amueller commented Oct 18, 2015

olologin commented Oct 19, 2015

amueller commented Oct 21, 2015

olologin commented Oct 21, 2015

olologin commented Nov 2, 2015

amueller commented Nov 2, 2015