Liblinear Sample Weights #2784

Closed
wants to merge 6 commits into
from

Conversation

Projects
None yet
9 participants
@HapeMask

As discussed in #409, I have patched the included liblinear sources with the update to scale C for each sample individually.

I mostly just copied the libsvm python wrapper to expose the new parameter. The existing tests didn't fully cover the LinearSVC+sample_weights case, so I modified them.

I'm admittedly not familiar with the liblinear internals, but the patching was straightforward and preserved the existing sklearn modifications. The previous SVM tests still pass as well.

@HapeMask

This comment has been minimized.

Show comment Hide comment
@HapeMask

HapeMask Jan 21, 2014

I see that one of the sklearn.ensemble tests is failing with this PR. I didn't change any files in that package but they pass on the master branch. Sorry about that, I'll re-open the PR when I figure out what I broke.

I see that one of the sklearn.ensemble tests is failing with this PR. I didn't change any files in that package but they pass on the master branch. Sorry about that, I'll re-open the PR when I figure out what I broke.

@HapeMask HapeMask closed this Jan 21, 2014

@jnothman

This comment has been minimized.

Show comment Hide comment
@jnothman

jnothman Jan 21, 2014

Owner

So are you not working from the liblinear fork that supports sample
weights? Maybe you should compare your implementation to it.

On 22 January 2014 07:37, Gabe Schwartz notifications@github.com wrote:

Closed #2784 #2784.


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2784
.

Owner

jnothman commented Jan 21, 2014

So are you not working from the liblinear fork that supports sample
weights? Maybe you should compare your implementation to it.

On 22 January 2014 07:37, Gabe Schwartz notifications@github.com wrote:

Closed #2784 #2784.


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2784
.

@HapeMask

This comment has been minimized.

Show comment Hide comment
@HapeMask

HapeMask Jan 21, 2014

I'm using the version with weights as provided here: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances

I'm looking through the changes to linear.cpp now to see what's going on with it, my guess is that I undid one of the sklearn patches by accident.

I'm using the version with weights as provided here: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances

I'm looking through the changes to linear.cpp now to see what's going on with it, my guess is that I undid one of the sklearn patches by accident.

@jnothman

This comment has been minimized.

Show comment Hide comment
@jnothman

jnothman Jan 21, 2014

Owner

Right. You said LibSVM in your original post, but I think you meant
liblinear.

Thanks for working on this!

On 22 January 2014 08:04, Gabe Schwartz notifications@github.com wrote:

I'm using the version with weights as provided here:
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances

I'm looking through the changes to linear.cpp now to see what's going on
with it, my guess is that I undid one of the sklearn patches by accident.


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2784#issuecomment-32963202
.

Owner

jnothman commented Jan 21, 2014

Right. You said LibSVM in your original post, but I think you meant
liblinear.

Thanks for working on this!

On 22 January 2014 08:04, Gabe Schwartz notifications@github.com wrote:

I'm using the version with weights as provided here:
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances

I'm looking through the changes to linear.cpp now to see what's going on
with it, my guess is that I undid one of the sklearn patches by accident.


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2784#issuecomment-32963202
.

@jnothman

This comment has been minimized.

Show comment Hide comment
@jnothman

jnothman Jan 21, 2014

Owner

Btw, once you get this working on LinearSVC, you should patch it into (and
test) sklearn.linear_model.LogisticRegression.

On 22 January 2014 08:11, Joel Nothman jnothman@student.usyd.edu.au wrote:

Right. You said LibSVM in your original post, but I think you meant
liblinear.

Thanks for working on this!

On 22 January 2014 08:04, Gabe Schwartz notifications@github.com wrote:

I'm using the version with weights as provided here:
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances

I'm looking through the changes to linear.cpp now to see what's going on
with it, my guess is that I undid one of the sklearn patches by accident.


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2784#issuecomment-32963202
.

Owner

jnothman commented Jan 21, 2014

Btw, once you get this working on LinearSVC, you should patch it into (and
test) sklearn.linear_model.LogisticRegression.

On 22 January 2014 08:11, Joel Nothman jnothman@student.usyd.edu.au wrote:

Right. You said LibSVM in your original post, but I think you meant
liblinear.

Thanks for working on this!

On 22 January 2014 08:04, Gabe Schwartz notifications@github.com wrote:

I'm using the version with weights as provided here:
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances

I'm looking through the changes to linear.cpp now to see what's going on
with it, my guess is that I undid one of the sklearn patches by accident.


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2784#issuecomment-32963202
.

@HapeMask HapeMask reopened this Jan 22, 2014

@HapeMask

This comment has been minimized.

Show comment Hide comment
@HapeMask

HapeMask Jan 22, 2014

I found the bug that was making tests fail: it turns out that LogisticRegression already was using LinearSVC because it inherited from BaseLibLinear at some point.

The BaggingClassifier tests w/LogisticRegression exposed a case where if you have a class with all instances' sample_weight=0, the internal LibLinear model and the sklearn class will disagree on how many classes the estimator was trained on. This is because the patched LibLinear/LibSVM both remove any training samples with 0 weight.

I believe I've fixed the issue, and I added new test cases to cover both the new features and the old bug (it affected SVC as well, though no test cases covered it). I reopened the PR since the tests pass on my machine in a virtualenv w/numpy-1.7.1, scipy-0.13.2, MKL BLAS.

I found the bug that was making tests fail: it turns out that LogisticRegression already was using LinearSVC because it inherited from BaseLibLinear at some point.

The BaggingClassifier tests w/LogisticRegression exposed a case where if you have a class with all instances' sample_weight=0, the internal LibLinear model and the sklearn class will disagree on how many classes the estimator was trained on. This is because the patched LibLinear/LibSVM both remove any training samples with 0 weight.

I believe I've fixed the issue, and I added new test cases to cover both the new features and the old bug (it affected SVC as well, though no test cases covered it). I reopened the PR since the tests pass on my machine in a virtualenv w/numpy-1.7.1, scipy-0.13.2, MKL BLAS.

@coveralls

This comment has been minimized.

Show comment Hide comment
@coveralls

coveralls Jan 22, 2014

Coverage Status

Coverage remained the same when pulling 7258ecf on HapeMask:liblinear_weights into f8e7cf1 on scikit-learn:master.

Coverage Status

Coverage remained the same when pulling 7258ecf on HapeMask:liblinear_weights into f8e7cf1 on scikit-learn:master.

@coveralls

This comment has been minimized.

Show comment Hide comment
@coveralls

coveralls Jan 22, 2014

Coverage Status

Coverage remained the same when pulling 79124ec on HapeMask:liblinear_weights into f8e7cf1 on scikit-learn:master.

Coverage Status

Coverage remained the same when pulling 79124ec on HapeMask:liblinear_weights into f8e7cf1 on scikit-learn:master.

@coveralls

This comment has been minimized.

Show comment Hide comment
@coveralls

coveralls Jan 24, 2014

Coverage Status

Coverage remained the same when pulling bc233c6 on HapeMask:liblinear_weights into f8e7cf1 on scikit-learn:master.

Coverage Status

Coverage remained the same when pulling bc233c6 on HapeMask:liblinear_weights into f8e7cf1 on scikit-learn:master.

+
+ Modified 2014:
+
+ - Patched to add support for instance weights, Gabriel Schwartz.

This comment has been minimized.

Show comment Hide comment
@jnothman

jnothman Jan 25, 2014

Owner

It might be worth citing the liblinear fork

@jnothman

jnothman Jan 25, 2014

Owner

It might be worth citing the liblinear fork

@jnothman

This comment has been minimized.

Show comment Hide comment
@jnothman

jnothman Jan 25, 2014

Owner

Thanks for this! It looks good to me, but I'd like to hear from someone who's worked with liblinear code before: @amueller, @fabianp?

I'd also be interested to see a quick benchmark of the performance change with uniform weights.

I think it might be nice if there were a page in the narrative docs that lists estimators (and metrics?) that support sample weight, but that should probably be a separate PR.

Please add an entry to doc/whats_new.rst.

Owner

jnothman commented Jan 25, 2014

Thanks for this! It looks good to me, but I'd like to hear from someone who's worked with liblinear code before: @amueller, @fabianp?

I'd also be interested to see a quick benchmark of the performance change with uniform weights.

I think it might be nice if there were a page in the narrative docs that lists estimators (and metrics?) that support sample weight, but that should probably be a separate PR.

Please add an entry to doc/whats_new.rst.

@jnothman

This comment has been minimized.

Show comment Hide comment
@jnothman

jnothman Jan 25, 2014

Owner

And perhaps examples/svm/plot_weighted_samples.py should use LinearSVC rather than SVC now.

And my quick benchmarks of the PR didn't show discernable difference.

Owner

jnothman commented Jan 25, 2014

And perhaps examples/svm/plot_weighted_samples.py should use LinearSVC rather than SVC now.

And my quick benchmarks of the PR didn't show discernable difference.

@HapeMask

This comment has been minimized.

Show comment Hide comment
@HapeMask

HapeMask Feb 7, 2014

And perhaps examples/svm/plot_weighted_samples.py should use LinearSVC rather than SVC now.

This is certainly subjective, but I think the plot with SVC's default RBF kernel looks more interesting. Sample weights can't change the shape of the LinearSVC boundary (since it's linear), so they just wind up shifting the colors a bit.

HapeMask commented Feb 7, 2014

And perhaps examples/svm/plot_weighted_samples.py should use LinearSVC rather than SVC now.

This is certainly subjective, but I think the plot with SVC's default RBF kernel looks more interesting. Sample weights can't change the shape of the LinearSVC boundary (since it's linear), so they just wind up shifting the colors a bit.

@coveralls

This comment has been minimized.

Show comment Hide comment
@coveralls

coveralls Feb 14, 2014

Coverage Status

Coverage remained the same when pulling 366d4d5 on HapeMask:liblinear_weights into e20aebe on scikit-learn:master.

Coverage Status

Coverage remained the same when pulling 366d4d5 on HapeMask:liblinear_weights into e20aebe on scikit-learn:master.

@jnothman

This comment has been minimized.

Show comment Hide comment
@jnothman

jnothman Mar 17, 2014

Owner

Here are three runs of bench_covtype with and without this PR:

Classifier   train-time test-time error-rate
--------------------------------------------
at master
liblinear     6.4890s    0.0297s     0.2305
liblinear     6.1431s    0.0290s     0.2305
liblinear     7.9698s    0.0288s     0.2305
with this pr
liblinear     6.9556s    0.0315s     0.2305
liblinear     7.2022s    0.0318s     0.2305
liblinear     6.2474s    0.0351s     0.2305

PS: the test failure is a joblib heisenbug.

Owner

jnothman commented Mar 17, 2014

Here are three runs of bench_covtype with and without this PR:

Classifier   train-time test-time error-rate
--------------------------------------------
at master
liblinear     6.4890s    0.0297s     0.2305
liblinear     6.1431s    0.0290s     0.2305
liblinear     7.9698s    0.0288s     0.2305
with this pr
liblinear     6.9556s    0.0315s     0.2305
liblinear     7.2022s    0.0318s     0.2305
liblinear     6.2474s    0.0351s     0.2305

PS: the test failure is a joblib heisenbug.

@HapeMask

This comment has been minimized.

Show comment Hide comment
@HapeMask

HapeMask Apr 4, 2014

I was looking through this so I can keep it up-to-date w/master and I noticed that while the tests do pass, a new warning is generated that isn't generated in master:

WARNING: class label 1 specified in weight is not found

This comes from sklearn/ensemble/tests/test_bagging.py:164 when testing LogisticRegression (which uses liblinear) with missing classes.

I'm not properly familiar with the C internals of liblinear, so I'd second the idea that it would be good for someone else to take a look if they have time.

HapeMask commented Apr 4, 2014

I was looking through this so I can keep it up-to-date w/master and I noticed that while the tests do pass, a new warning is generated that isn't generated in master:

WARNING: class label 1 specified in weight is not found

This comes from sklearn/ensemble/tests/test_bagging.py:164 when testing LogisticRegression (which uses liblinear) with missing classes.

I'm not properly familiar with the C internals of liblinear, so I'd second the idea that it would be good for someone else to take a look if they have time.

@MechCoder

This comment has been minimized.

Show comment Hide comment
@MechCoder

MechCoder Jul 29, 2014

Owner

@jnothman Can you please tell what is left to do in this PR?

Owner

MechCoder commented Jul 29, 2014

@jnothman Can you please tell what is left to do in this PR?

@jnothman

This comment has been minimized.

Show comment Hide comment
@jnothman

jnothman Jul 29, 2014

Owner

I can't recall there being anything substantial to do. As I said above, I was hoping for another review, if not mere confirmation that this is a desired feature.

Owner

jnothman commented Jul 29, 2014

I can't recall there being anything substantial to do. As I said above, I was hoping for another review, if not mere confirmation that this is a desired feature.

@MechCoder

This comment has been minimized.

Show comment Hide comment
@MechCoder

MechCoder Jul 29, 2014

Owner

@jnothman yes it would be useful, more than anything else, it would help to remove this Error Raise, https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/logistic.py#L355 ie. allow class_weight of type dict for multiclass problems.

Owner

MechCoder commented Jul 29, 2014

@jnothman yes it would be useful, more than anything else, it would help to remove this Error Raise, https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/logistic.py#L355 ie. allow class_weight of type dict for multiclass problems.

@MechCoder

This comment has been minimized.

Show comment Hide comment
@MechCoder

MechCoder Jul 29, 2014

Owner

@jnothman If you want me to have look, I can but I doubt I would able to do much more over your reviews. :)

Owner

MechCoder commented Jul 29, 2014

@jnothman If you want me to have look, I can but I doubt I would able to do much more over your reviews. :)

HapeMask added some commits Jul 29, 2014

BUG: SVC fit() w/ class w/ all 0 sample weights.
Versions of LibSVM and LibLinear that support sample/instance weights
will remove all samples with 0 weight before training. If all samples of
some classes have 0 weight, the internal LibXXX model will think it has
that many less classes than the number provided to sklearn.

This caused issues with, for example, predict_proba returning
probabilities with the wrong shape in the last dimension.
TST: Improve SVC tests for sample_weight.
Tests now cover cases involving the wrong sample_weight shape and
classes whose samples have all 0 weight.
@HapeMask

This comment has been minimized.

Show comment Hide comment
@HapeMask

HapeMask Jul 29, 2014

Since it's been awhile, I rebased against master. It looks like something broke between my last update and now, I'm going to take a look at it.

Since it's been awhile, I rebased against master. It looks like something broke between my last update and now, I'm going to take a look at it.

@MechCoder

This comment has been minimized.

Show comment Hide comment
@MechCoder

MechCoder Jul 29, 2014

Owner

I think it must have something to do with the changes due to the new Logistic Regression CV.

Owner

MechCoder commented Jul 29, 2014

I think it must have something to do with the changes due to the new Logistic Regression CV.

@HapeMask

This comment has been minimized.

Show comment Hide comment
@HapeMask

HapeMask Jul 29, 2014

LogisticRegression + LogisticRegressionCV looks like they expect the classes_ attribute to exist before fit() is called.

(It didn't before either, but you could set an enc object which would have its classes attribute returned when you requested it from BaseLibLinear.)

LogisticRegression + LogisticRegressionCV looks like they expect the classes_ attribute to exist before fit() is called.

(It didn't before either, but you could set an enc object which would have its classes attribute returned when you requested it from BaseLibLinear.)

HapeMask added some commits Jan 22, 2014

ENH: Update LibLinear sources w/instance weights.
Patches LibLinear sources to liblinear-1.94 with support for instance
weights.
ENH: Generalize more tests for LinearSVC.
Scaled sample weights on a linearly-separable dataset cannot cause an
SVM with a linear kernel to misclassify the down-weighted points.

The modified test case changes the linearly-separable dataset to contain
a non-separable outlier. This outlier should be misclassified unless
sample weights are in fact taken into account.
FIX: Behavior of classes_ attr w/new BaseLibLinear
BaseLibLinear and derived classes no longer have a "classes_" attribute
until fit() has been called.
@coveralls

This comment has been minimized.

Show comment Hide comment
@coveralls

coveralls Jul 29, 2014

Coverage Status

Coverage increased (+0.0%) when pulling 3818dfc on HapeMask:liblinear_weights into cbcae04 on scikit-learn:master.

Coverage Status

Coverage increased (+0.0%) when pulling 3818dfc on HapeMask:liblinear_weights into cbcae04 on scikit-learn:master.

@coveralls

This comment has been minimized.

Show comment Hide comment
@coveralls

coveralls Jul 29, 2014

Coverage Status

Coverage increased (+0.0%) when pulling 3818dfc on HapeMask:liblinear_weights into cbcae04 on scikit-learn:master.

Coverage Status

Coverage increased (+0.0%) when pulling 3818dfc on HapeMask:liblinear_weights into cbcae04 on scikit-learn:master.

@HapeMask

This comment has been minimized.

Show comment Hide comment
@HapeMask

HapeMask Jul 29, 2014

I also just saw that a new CI build was started every time I updated my local branch. Sorry about that... I'll make sure to only push to my branch at important intervals in the future.

I also just saw that a new CI build was started every time I updated my local branch. Sorry about that... I'll make sure to only push to my branch at important intervals in the future.

@coveralls

This comment has been minimized.

Show comment Hide comment
@coveralls

coveralls Jul 29, 2014

Coverage Status

Coverage increased (+0.0%) when pulling 3818dfc on HapeMask:liblinear_weights into cbcae04 on scikit-learn:master.

Coverage Status

Coverage increased (+0.0%) when pulling 3818dfc on HapeMask:liblinear_weights into cbcae04 on scikit-learn:master.

@coveralls

This comment has been minimized.

Show comment Hide comment
@coveralls

coveralls Jul 29, 2014

Coverage Status

Coverage increased (+0.0%) when pulling 3818dfc on HapeMask:liblinear_weights into cbcae04 on scikit-learn:master.

Coverage Status

Coverage increased (+0.0%) when pulling 3818dfc on HapeMask:liblinear_weights into cbcae04 on scikit-learn:master.

@calmez

This comment has been minimized.

Show comment Hide comment
@calmez

calmez Oct 17, 2014

@HapeMask @jnothman What is the current status on this?

calmez commented Oct 17, 2014

@HapeMask @jnothman What is the current status on this?

@HapeMask

This comment has been minimized.

Show comment Hide comment
@HapeMask

HapeMask Oct 17, 2014

I'd need to rebase my branch to fix the merge conflicts, but perhaps it would be better to close this PR for the time being until there is more interest in LibLinear+sample weights? I won't have a chance to make the required updates until a few weeks from now either way.

I'd need to rebase my branch to fix the merge conflicts, but perhaps it would be better to close this PR for the time being until there is more interest in LibLinear+sample weights? I won't have a chance to make the required updates until a few weeks from now either way.

@calmez

This comment has been minimized.

Show comment Hide comment
@calmez

calmez Oct 20, 2014

I started working on this, but it's true that this merge is absolutely not straight forward. I don't want to open a new PR right now since I still have 90 tests failing (probably just 4-5 root causes) that I need to fix before. You can find my WIP at https://github.com/liquidm/scikit-learn/tree/liblinear_weights.
I wondered why there is no base class for liblinear anymore in svm/base.py on current master just as for libsvm. Was there any particular reason to move the behavior to functions?

calmez commented Oct 20, 2014

I started working on this, but it's true that this merge is absolutely not straight forward. I don't want to open a new PR right now since I still have 90 tests failing (probably just 4-5 root causes) that I need to fix before. You can find my WIP at https://github.com/liquidm/scikit-learn/tree/liblinear_weights.
I wondered why there is no base class for liblinear anymore in svm/base.py on current master just as for libsvm. Was there any particular reason to move the behavior to functions?

@larsmans

This comment has been minimized.

Show comment Hide comment
@larsmans

larsmans Oct 20, 2014

Owner

LogisticRegression was decoupled from Liblinear because it now has multiple solvers. It doesn't really make sense to make it a Liblinear base class anymore.

Owner

larsmans commented Oct 20, 2014

LogisticRegression was decoupled from Liblinear because it now has multiple solvers. It doesn't really make sense to make it a Liblinear base class anymore.

@jnothman

This comment has been minimized.

Show comment Hide comment
@jnothman

jnothman Oct 20, 2014

Owner

The model is now that LinearSVC and LogisticRegression call a shared fit
function.

On 20 October 2014 20:58, Lars Buitinck notifications@github.com wrote:

LogisticRegression was decoupled from Liblinear because it now has
multiple solvers. It doesn't really make sense to make it a Liblinear
base class anymore.


Reply to this email directly or view it on GitHub
#2784 (comment)
.

Owner

jnothman commented Oct 20, 2014

The model is now that LinearSVC and LogisticRegression call a shared fit
function.

On 20 October 2014 20:58, Lars Buitinck notifications@github.com wrote:

LogisticRegression was decoupled from Liblinear because it now has
multiple solvers. It doesn't really make sense to make it a Liblinear
base class anymore.


Reply to this email directly or view it on GitHub
#2784 (comment)
.

@seanv507

This comment has been minimized.

Show comment Hide comment
@seanv507

seanv507 Dec 4, 2014

LogisticRegression and sample_weights

I am trying to work on this and have come to a problem.
see HapeMask above:

The BaggingClassifier tests w/LogisticRegression exposed a case where if you have a class with all instances' sample_weight=0, the internal LibLinear model and the sklearn class will disagree on how many classes the estimator was trained on. This is because the patched LibLinear/LibSVM both remove any training samples with 0 weight.

-- to summarise, liblinear/libsvm and the wrapping scikit learn class end up having different number of target classes, causing eg prob calculations to fail.
We have two potential solutions, either
A) make scikit learn objects consistent with liblinear ( by removing such empty classes) or
B) somehow expand output of liblinear/libsvm to add these dummy classes back in.

HapeMasks solution for SVC was A) HapeMask/scikit-learn@7f497ef
namely remove any classes with sample weight 0 from LinearSVC.

This doesn't work so well for LogisticRegression (and CV)
Firstly, now liblinear fit is in function (so doesn't have access to class members), secondly there are multiple solvers. So what should one aim to do?
a) just fix liblinear logistic regression, by removing any such dummy classes in Logistic regression class
b) remove classes with sample_weights=0 (in sklearn class) for ALL solvers (not just liblinear)
c) change liblinear raw output to re add this empty class [haven't investigated this]

I have been pursuing option a) but wanted to make sure this would be the accepted solution.
Basically, I am keen to have liblinear + sample weights in official scikit-learn.

seanv507 commented Dec 4, 2014

LogisticRegression and sample_weights

I am trying to work on this and have come to a problem.
see HapeMask above:

The BaggingClassifier tests w/LogisticRegression exposed a case where if you have a class with all instances' sample_weight=0, the internal LibLinear model and the sklearn class will disagree on how many classes the estimator was trained on. This is because the patched LibLinear/LibSVM both remove any training samples with 0 weight.

-- to summarise, liblinear/libsvm and the wrapping scikit learn class end up having different number of target classes, causing eg prob calculations to fail.
We have two potential solutions, either
A) make scikit learn objects consistent with liblinear ( by removing such empty classes) or
B) somehow expand output of liblinear/libsvm to add these dummy classes back in.

HapeMasks solution for SVC was A) HapeMask/scikit-learn@7f497ef
namely remove any classes with sample weight 0 from LinearSVC.

This doesn't work so well for LogisticRegression (and CV)
Firstly, now liblinear fit is in function (so doesn't have access to class members), secondly there are multiple solvers. So what should one aim to do?
a) just fix liblinear logistic regression, by removing any such dummy classes in Logistic regression class
b) remove classes with sample_weights=0 (in sklearn class) for ALL solvers (not just liblinear)
c) change liblinear raw output to re add this empty class [haven't investigated this]

I have been pursuing option a) but wanted to make sure this would be the accepted solution.
Basically, I am keen to have liblinear + sample weights in official scikit-learn.

@charlesmartin14

This comment has been minimized.

Show comment Hide comment
@charlesmartin14

charlesmartin14 Mar 22, 2015

Can you tell me the status of this ? Has there been any progress to update the existing LinearSVC api to add sample weights

Can you tell me the status of this ? Has there been any progress to update the existing LinearSVC api to add sample weights

@sean-violante-lendico

This comment has been minimized.

Show comment Hide comment
@sean-violante-lendico

sean-violante-lendico Mar 23, 2015

Hi Charles, so I am no longer actively working on this (changed company) - but I would be happy to complete. As I mentioned in my previous message, Hapemasks solution works for LinearSVC. The problem is that AFAIK, the way that the liblinear code has been turned into a function (from a class) breaks the 'design' of sklearn. ( where do all the supplementary functions go for converting labels to classes etc). So if I got some feedback from the principal developers on the approved solution I would be happy to finish.

Essentially the code works in a standard setting, but the baggingtest throws up a special case [when the number of classes input specified via scikit-learn class is different from the number of 'effective' classes in liblinear (classes with whoses instances have nonzero sample weight)]

Hi Charles, so I am no longer actively working on this (changed company) - but I would be happy to complete. As I mentioned in my previous message, Hapemasks solution works for LinearSVC. The problem is that AFAIK, the way that the liblinear code has been turned into a function (from a class) breaks the 'design' of sklearn. ( where do all the supplementary functions go for converting labels to classes etc). So if I got some feedback from the principal developers on the approved solution I would be happy to finish.

Essentially the code works in a standard setting, but the baggingtest throws up a special case [when the number of classes input specified via scikit-learn class is different from the number of 'effective' classes in liblinear (classes with whoses instances have nonzero sample weight)]

@MechCoder

This comment has been minimized.

Show comment Hide comment
@MechCoder

MechCoder Aug 30, 2015

Owner

Do you mind if I pick this up?

cc @GaelVaroquaux

Owner

MechCoder commented Aug 30, 2015

Do you mind if I pick this up?

cc @GaelVaroquaux

@sean-violante-lendico

This comment has been minimized.

Show comment Hide comment
@sean-violante-lendico

sean-violante-lendico Aug 30, 2015

@MechCoder, fine by me

@jnothman

This comment has been minimized.

Show comment Hide comment
@jnothman

jnothman Jun 23, 2016

Owner

Thanks @HapeMask for your effort. C support for sample weights was provided in #5274, and while the public interface is still being patched, I think this can be closed. Sorry it didn't end up getting merged directly.

Owner

jnothman commented Jun 23, 2016

Thanks @HapeMask for your effort. C support for sample weights was provided in #5274, and while the public interface is still being patched, I think this can be closed. Sorry it didn't end up getting merged directly.

@jnothman jnothman closed this Jun 23, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment