Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dense svm and zeroed weight for samples of entire class #5150

Open
olologin opened this issue Aug 24, 2015 · 19 comments · May be fixed by #27763
Open

Dense svm and zeroed weight for samples of entire class #5150

olologin opened this issue Aug 24, 2015 · 19 comments · May be fixed by #27763

Comments

@olologin
Copy link
Contributor

This bug appears in current master, and for any dense svm class.

import numpy as np
from sklearn.svm import SVC
X = np.array([[0, 0, 0],
              [0, 0, 1],
              [0, 1, 0],
              [0, 1, 1],
              [1, 0, 0],
              [1, 0, 1],
              [1, 1, 0],
              [1, 1, 1]])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2])
w = np.array([1, 1, 1, 1, 1, 1, 0, 0])


f = SVC(kernel='linear', probability=True, random_state=1)
f.fit(X,y, w)
print(f.classes_)
print(f.predict_proba(X))

Output:

[0 1 2]
warning: class label 2 specified in weight is not found
[[ 0.28963492  0.71036508]
 [ 0.39180833  0.60819167]
 [ 0.28963492  0.71036508]
 [ 0.39180833  0.60819167]
 [ 0.57544014  0.42455986]
 [ 0.68293573  0.31706427]
 [ 0.57544014  0.42455986]
 [ 0.68293573  0.31706427]]

Here we see that svmlib internally have lost 2nd class, at the same time sklean's wrapper class keeps all class labels inside, that's why predict_proba returns matrix of shape (n_samples, 2) instead of (n_sample, 3) (what is expected by bagging classifier implementation). I understand that it's insane usage of weights by itself, but together with bagging and dataset with many labels, bagging randomly zeroes complete classes, and this bug shows itself, because bagging expects that svm's return probability of classes which they hold (e.g. all classes).

I investigated this a little bit, and can try to fix this, if someone will say that all this usage with bagging makes sense (Because i don't really sure about this).

@amueller amueller added the Bug label Aug 24, 2015
@amueller
Copy link
Member

Yeah that looks like a bug. I wonder if this shows in any other models or just the svm. If you are interested, could you maybe add a test to the common tests in utils/estimator_checks.py to see if this happens elsewhere?

(or you could just loop over all_estimators()).

@amueller
Copy link
Member

We could just fix it by handing all points to libsvm but that's probably not what we want to do, right? Does libsvm handle them efficiently?
Could you maybe do a benchmark?

@olologin
Copy link
Contributor Author

olologin commented Sep 4, 2015

I've tested all classifiers that have sample_weights parameter in fit, and have predict_proba method.
Looks like only SVC affected. NuSVC uses same implementation, but it will always throw exception "ValueError: specified nu is infeasible" if total sample weight of some class equals zero.

@amueller
Copy link
Member

amueller commented Sep 8, 2015

Thanks for checking.

@amueller amueller added this to the 0.17 milestone Sep 8, 2015
@giorgiop
Copy link
Contributor

giorgiop commented Oct 2, 2015

Yeah that looks like a bug. I wonder if this shows in any other models or just the svm. If you are interested, could you maybe add a test to the common tests in utils/estimator_checks.py to see if this happens elsewhere?
(or you could just loop over all_estimators()).

Here a script. I have included Regressors as well, but I am not sure whether that makes sense here/is relevant for the scope of this bug.

import inspect
import numpy as np
from sklearn.utils.testing import all_estimators

# This breaks because of too few samples
exclude_CV_ests = ['CalibratedClassifierCV']
X = np.array([[0, 0, 0],
              [0, 0, 1],
              [0, 1, 0],
              [0, 1, 1],
              [1, 0, 0],
              [1, 0, 1],
              [1, 1, 0],
              [1, 1, 1]])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2])
w = np.array([1, 1, 1, 1, 1, 1, 0, 0])

for (name, est) in all_estimators():
    insp = inspect.getargspec(est.fit)
    if name not in exclude_CV_ests and 'sample_weight' in insp.args:
        print("\n" + name)
        try:
            est().fit(X, y, w)
            print("OK")
        except ValueError as e:
            print("ValueError: " + str(e))

Output

[some deprecation warnings from importing  ...]

AdaBoostClassifier
OK

AdaBoostRegressor
OK

BaggingClassifier
OK

BaggingRegressor
OK

BernoulliNB
RuntimeWarning: divide by zero encountered in log
  self.class_log_prior_ = (np.log(self.class_count_)
OK

DBSCAN
OK

DecisionTreeClassifier
OK

DecisionTreeRegressor
OK

ExtraTreeClassifier
OK

ExtraTreeRegressor
OK

ExtraTreesClassifier
OK

ExtraTreesRegressor
OK

GaussianNB
RuntimeWarning: invalid value encountered in true_divide
  new_mu = np.average(X, axis=0, weights=sample_weight / n_new)
RuntimeWarning: invalid value encountered in true_divide
  weights=sample_weight / n_new)
OK

GradientBoostingClassifier
OK

GradientBoostingRegressor
OK

KernelRidge
OK

LinearRegression
OK

LogisticRegression
ValueError: Solver liblinear does not support sample weights.

LogisticRegressionCV
Warning: The least populated class in y has only 2 members, which is too few.
The minimum number of labels for any class cannot be less than n_folds=3.
  % (min_labels, self.n_folds)), Warning)
OK

MultinomialNB
RuntimeWarning: divide by zero encountered in log
  self.class_log_prior_ = (np.log(self.class_count_)
OK

NuSVC
ValueError: specified nu is infeasible

NuSVR
OK

OneClassSVM
OK

Perceptron
ValueError: Provided ``coef_`` does not match dataset. 

RandomForestClassifier
OK

RandomForestRegressor
OK

Ridge
OK

RidgeCV
OK

RidgeClassifier
OK

RidgeClassifierCV
OK

SGDClassifier
ValueError: Provided ``coef_`` does not match dataset. 

SGDRegressor
ValueError: Provided coef_init does not match dataset.

SVC
warning: class label 2 specified in weight is not found
OK

SVR
OK

_BaseRidgeCV
OK

_RidgeGCV
OK

@amueller
Copy link
Member

thanks for that. You should use sample_weight=w, though. Using a positional argument is what caused the SGD errors, right?

@olologin
Copy link
Contributor Author

  1. Should we rise warning for such input? Or maybe throw some exception? If we throw exceptions - meta classifiers which tend to vanish weights for entire classes may not work.
  2. What is expected output from predict_proba?
    should it return probabilities of "correct" classes only? Or return probabilities for all classes in input, but with 0's for entire "incorrect" classes?

Now all classifiers which predict_proba method (except NuSVC which throws ValueError, and SVC of course) treat such input silently and predict_proba return columns for all classes from dataset.

We can remove all "incorrect" classes inside BaseSVC's fit method, initialize internal variable classes_ from this fixed dataset, and feed this dataset into underlying implementation. At least now classes_ and predict_proba output will be consistent.

@amueller
Copy link
Member

predict_proba should return probabilities for all classes in the classes_ attribute, which should be the same as np.unique(y_train)
And I don't think we should raise an error.
This is a valid input, and I don't think these are "incorrect" classes. It's just a bug in SVC

@olologin
Copy link
Contributor Author

Ok, i'll try to fix this.

@olologin
Copy link
Contributor Author

How about such results?

X = np.array([[0, 0, 0],
              [0, 0, 1],
              [0, 1, 0],
              [0, 1, 1],
              [1, 0, 0],
              [1, 0, 1],
              [1, 1, 0],
              [1, 1, 1]])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2])
w = np.array([1, 1, 1, 1, 1, 1, 0, 0])

f = SVC(probability=True, class_weight={0:1,1:1,2:1}, random_state=1)
f.fit(X,y,w)
print('.classes_:')
print(f.classes_)
print('f.support_:')
print(f.support_)
print('f.dual_coef_:')
print(f.dual_coef_)
print('f.intercept_:')
print(f.intercept_)
print('f.predict_proba(X):')
print(f.predict_proba(X))

produces

f.classes_:
[0 1 2]
f.support_:
[0 1 2 3 4 5]
f.dual_coef_:
[[ 1.  1.  1. -1. -1. -1.]
 [ 0.  0.  0.  0.  0.  0.]]
f.intercept_:
[-0.02878827         inf         inf]
f.predict_proba(X):
[[  3.31225602e-14   3.30356882e-14   1.00000000e+00]
 [  3.63993489e-14   3.24543926e-14   1.00000000e+00]
 [  3.36042808e-14   3.30595561e-14   1.00000000e+00]
 [  3.77307873e-14   3.12880128e-14   1.00000000e+00]
 [  3.86736989e-14   2.78250291e-14   1.00000000e+00]
 [  3.82381936e-14   2.50595908e-14   1.00000000e+00]
 [  3.84822371e-14   2.96383203e-14   1.00000000e+00]
 [  3.84242527e-14   2.58035393e-14   1.00000000e+00]]

Confusing, but at least it looks mathematically correct. And current master already produces same output if you will run this code:

f = SVC(probability=True, class_weight={0:1,1:1,2:0}, random_state=1)
f.fit(X,y)

Which is the same. Because here we vanish class_weight for second class, instead of doing it through sample_weights.

@amueller
Copy link
Member

so it is just using "balanced"?
Why do you say this is correct, though? shouldn't the probability for class 2 be zero?

@olologin
Copy link
Contributor Author

@amueller,

so it is just using "balanced"?

No, i forced SVC to use any sample_weights. To obtain such results i've removed this call https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/src/libsvm/svm.cpp#L2342 And all corresponding memory allocation/deallocation. Function call at that line is removing samples from dataset, whose sample_weight is equals to zero. And if it deletes samples of entire class - svm training starts on this 'truncated' dataset without knowledge about all possible classes. But output of this 'fixed' code now is consistent with case when you specify some class_weight=0. And as i found out original libsvm also produces same results.
But with this 'fix' new bug apears - on some small inputs training freezes (maybe infinite loop).

Why do you say this is correct, though?

Nevermind, i tried to explain such outputs, but i made a mistake in my thoughts.

shouldn't the probability for class 2 be zero?

Of course it should, but even original libsvm returns same probability estimates.
If someone wants to test it on original libsvm:

dataset.txt:

0 1:0 2:0 3:0
0 1:0 2:0 3:1
0 1:0 2:1 3:0
1 1:0 2:1 3:1
1 1:1 2:0 3:0
1 1:1 2:0 3:1
2 1:1 2:1 3:0
2 1:1 2:1 3:1

code:

libsvm-3.20$ ./svm-train -b 1 -w0 1 -w1 1 -w2 0 dataset.txt model
libsvm-3.20$ ./svm-predict -b 1 dataset.txt model predictions.out

It produces in predictions.out:

labels 0 1 2
2 3.31221e-14 3.30357e-14 1
2 3.63995e-14 3.24543e-14 1
2 3.36039e-14 3.30595e-14 1
2 3.77311e-14 3.12876e-14 1
2 3.86737e-14 2.78238e-14 1
2 3.82377e-14 2.50579e-14 1
2 3.84825e-14 2.96375e-14 1
2 3.84239e-14 2.58019e-14 1

@olologin
Copy link
Contributor Author

Maybe it's easier to just fix meta estimators, so that they wouldn't pass 0 weight samples into estimators. And throw an error for any such input. Bagging for example can can chose samples for estimator training by choosing weights (default, if estimator supports sample_weights), or subsampling from dataset.

Because in case with this bug, if you want to return 0 prob for any 'incorrect' class, you must take into account that SVM classifier also contains bunch of other attributes, like support_, n_support_, dual_coef_, coef_, what values will they have in this case? It will look ugly.

@amueller
Copy link
Member

I think this should be fixed in libsvm. Do you have an explanation for the libsvm behavior? It seems highly odd to me. I guess it is a combination of how the class weights change the loss combined with the OVR approach. I don't have time to go through the math right now.

This seems to be separate from the "balanced' / "auto" issue, though, right?

@olologin
Copy link
Contributor Author

Do you have an explanation for the libsvm behavior?

No.

This seems to be separate from the "balanced' / "auto" issue

Can you point at that issue? I don't know about which you are asking.

@amueller
Copy link
Member

this one.

@olologin
Copy link
Contributor Author

Ah, sorry, now i understood. Yep, it isn't related to balanced/auto. I've updated code listing in first post.

@amueller amueller modified the milestones: 0.17, 0.18 Nov 2, 2015
@olologin
Copy link
Contributor Author

olologin commented Nov 2, 2015

So as i suspected, it's not a bug, i asked about it cjlin1/libsvm#50 (comment)

If someone didn't understand what they said there:
alphas of j-th class bounded from above with C value, if C (or values of entire sample_weights vector) of some class is equal to 0 — y.T*alphas = 0 constraint forces other alphas to be 0's. Thus you don't even have any support vectors in solution (because entire alpha vector - 0's)

In terms of original optimization problem - C = 0 means that you cannot penalize xi values of entire class, and minimum of problem achieved when you classify any points in dataset with label of that strange class whose C=0 (because xi's could be any positive numbers, C=0 will not penalize them).

@amueller
Copy link
Member

amueller commented Nov 2, 2015

basically: using class-weights with OVR is not a great idea without calibration. Or is there another conclusion?
That should maybe be mentioned in the docs.
However, the original issue still persists

@amueller amueller modified the milestones: 0.18, 0.19 Sep 22, 2016
@amueller amueller removed this from the 0.19 milestone Jun 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
scikit-learn 0.19
Issues Without PR
Development

Successfully merging a pull request may close this issue.

4 participants