Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] FIX Negative or null sample_weights in SVM #14286

Merged
merged 40 commits into from Sep 24, 2019

Conversation

alexshacked
Copy link
Contributor

@alexshacked alexshacked commented Jul 7, 2019

closes #9674
closes #9494

Issue #9494 surfaced when training svm.SVC classifier with negative sample weights. There is a PR that attempted to fix this issue - PR #9674. SVC algorithm is part of five (5) svm algorithms that are implemented in svm.cpp: SVC, NuSVC, SVR, NuSVR and OneClassSVM. Function PREFIX(train) in svm.cpp implements all 5 algorithms. This function handles the samples with negative/zero weights by removing them from the training set.This behavior can result in an invalid model after fit() returns if too many samples are removed. The problem occurs in algorithms: SVC, SVR, NuSVR and OneClassSVM. NuSVC is the only class that currently detects this and fails fit() with an exception.This PR resolves this problem by detecting this situation in all classes and behaving like NuSVC- throwing an Exception during fit(). Please see a more detailed analysis at the bottom of issue #9494

@rth
Copy link
Member

@rth rth commented Jul 8, 2019

Thanks for tackling this, @alexshacked ! Wouldn't enforcing that all sample_weights are positive on the Python side (and raising an error if they are not) be a simpler solution?

@alexshacked
Copy link
Contributor Author

@alexshacked alexshacked commented Jul 8, 2019

Thanks for tackling this, @alexshacked ! Wouldn't enforcing that all sample_weights are positive on the Python side (and raising an error if they are not) be a simpler solution?

Yes, @rth. enforcing that all sample_weights are positive on the python side, in BaseLibSVM.fit() for example would be simpler. This was my original intention.
Until I saw that in the algorithm implementation in svm.cpp:2342, the negative weights are actually handled.
PREFIX(model) *PREFIX(train)(const PREFIX(problem) *prob, const svm_parameter *param, int *status) { PREFIX(problem) newprob; remove_zero_weight(&newprob, prob);
This is the current implementation in the code base. There was a concious decision by the SVM implementor to handle input with negative/zero weights, by removing this samples from the input, and training the model on the remaining samples. The SVM implementor was not shy about rejecting invalid input and in function check_parameter() it does just that for many scenarios. But in the case of negative weights the decision was not to reject the input but to handle it by removing samples with negative weights.
The only thing is that while doing so the SVM implementor ommited some border cases that arise, like the one discovered by bug #9494.
My thought was that a less intrusive fix would be to teach the implementation to handle the border cases without changing current behavior of the SVM algorithm which handles input with negative weights.
Hence, the present solution.
If on the other hand, the decision for SVM algorithms would be not to handle input where part of the samples are zero/negative anymore, I would be happy to implement. On the python side naturally:-)

@rth rth changed the title [MRG] #9494 FIX Fail to train SVM ... [MRG] #9494 FIX Negative or null sample_weights in SVM Jul 8, 2019
@rth rth changed the title [MRG] #9494 FIX Negative or null sample_weights in SVM [MRG] FIX Negative or null sample_weights in SVM Jul 8, 2019
@glemaitre glemaitre self-requested a review Jul 8, 2019
@alexshacked
Copy link
Contributor Author

@alexshacked alexshacked commented Jul 8, 2019

Hi @glemaitre. I don't know what self-request a review means. But, I"ll be glad if you will review my PR.
I see that you also have worked with sample weights lately
If you need me to do something to enable you to review please tell me.

Copy link
Contributor

@glemaitre glemaitre left a comment

I am proposing some changes in the tests:

  • use pytest
  • improve some previous tests which were using sample_weight

The tests will fail because the ValueError raised does not match the correct error message (which is hidden when using assert_raises).

Even if I agree with the changes in the cpp file, I think that we should implement what was proposed by @rth. Validate and failing early will be better. So I would add the checking in the cpp and change slightly the tests that I propose.

sklearn/svm/tests/test_svm.py Outdated Show resolved Hide resolved
sklearn/svm/tests/test_svm.py Outdated Show resolved Hide resolved
sklearn/svm/tests/test_svm.py Outdated Show resolved Hide resolved
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jul 8, 2019

self-request a review means

@alexshacked it is just for me to track the PR which I am going to review :)

Copy link
Contributor

@glemaitre glemaitre left a comment

Style change in the CPP file

sklearn/svm/src/libsvm/svm.cpp Outdated Show resolved Hide resolved
sklearn/svm/src/libsvm/svm.cpp Outdated Show resolved Hide resolved
sklearn/svm/src/libsvm/svm.cpp Outdated Show resolved Hide resolved
sklearn/svm/src/libsvm/svm.cpp Outdated Show resolved Hide resolved
sklearn/svm/src/libsvm/svm.cpp Outdated Show resolved Hide resolved
sklearn/svm/src/libsvm/svm.cpp Outdated Show resolved Hide resolved
sklearn/svm/src/libsvm/svm.cpp Outdated Show resolved Hide resolved
sklearn/svm/src/libsvm/svm.cpp Outdated Show resolved Hide resolved
sklearn/svm/src/libsvm/svm.cpp Outdated Show resolved Hide resolved
sklearn/svm/src/libsvm/svm.cpp Show resolved Hide resolved
alexshacked and others added 14 commits Jul 9, 2019
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre glemaitre moved this from REVIEWED AND WAITING FOR CHANGES to MERGED in Guillaume's pet Oct 1, 2019
ivannz added a commit to ivannz/scikit-learn that referenced this issue May 30, 2020
ivannz added a commit to ivannz/scikit-learn that referenced this issue Jul 5, 2020
ivannz added a commit to ivannz/scikit-learn that referenced this issue Jul 5, 2020
ivannz added a commit to ivannz/scikit-learn that referenced this issue Aug 5, 2020
ivannz added a commit to ivannz/scikit-learn that referenced this issue Oct 15, 2020
ivannz added a commit to ivannz/scikit-learn that referenced this issue Nov 28, 2020
ivannz added a commit to ivannz/scikit-learn that referenced this issue Dec 12, 2020
ivannz added a commit to ivannz/scikit-learn that referenced this issue Feb 25, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this issue Feb 25, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this issue Mar 27, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this issue Jun 15, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this issue Jul 23, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this issue Nov 10, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this issue May 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

6 participants