# scikit-learn/scikit-learn

### Subversion checkout URL

You can clone with HTTPS or Subversion.

# [WIP] Label power set multilabel classification strategy#2461

Open
wants to merge 1 commit into from
+195 −13

### 8 participants

Owner

Add one of the simplest and common multi-label classification strategy which use
a multi-class classifier as a base estimator.

The core code is functional, but there is still things to do:

• Write narrative doc about LP
Owner

This pr is ready for review.

doc/modules/multiclass.rst
 @@ -269,3 +268,42 @@ Below is an example of multiclass learning using Output-Codes:: .. [3] "The Elements of Statistical Learning", Hastie T., Tibshirani R., Friedman J., page 606 (second-edition) 2008. + + +Label power set +=============== + +:class:`LabelPowerSetClassifier` is problem transformation method and
 Owner arjoly added a note Sep 26, 2013 a problem ... Owner amueller added a note Nov 4, 2013 If you explain this problem transformation here, maybe we should also be more explicit about the transformation transformed by the OVR and in particular the OVO for multi-label? Actually this remark applies to the whole module ^^ to join this conversation on GitHub. Already have an account? Sign in to comment
doc/modules/multiclass.rst
 @@ -269,3 +268,42 @@ Below is an example of multiclass learning using Output-Codes:: .. [3] "The Elements of Statistical Learning", Hastie T., Tibshirani R., Friedman J., page 606 (second-edition) 2008. + + +Label power set +=============== + +:class:`LabelPowerSetClassifier` is problem transformation method and +constructs one classifier on a multi-class problem, where each class is a label +set. At prediction time, the classifier predict the most relevant +class which is translated to the corresponding label set. +Since the number of generated class is equal to O(min(2^n_labels), n_samples),
 Owner arjoly added a note Sep 26, 2013 classes to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
 @@ -2,10 +2,11 @@ Multiclass and multilabel classification strategies =================================================== -This module implements multiclass learning algorithms: +This module implements multiclass / multilabel learning algorithms: - one-vs-the-rest / one-vs-all
 Owner arjoly added a note Sep 26, 2013 / binary relevance Owner amueller added a note Nov 4, 2013 I would replace the `/` by "and". Owner vene added a note Jul 16, 2014 missing space after multiclass to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
 @@ -603,3 +607,94 @@ def predict(self, X): return predict_ecoc(self.estimators_, self.classes_, self.code_book_, X) + + +class LabelPowerSetClassifier(BaseEstimator, ClassifierMixin, + MetaEstimatorMixin): + """Label power set multi-label classification strategy + + Label power set is problem transformation method. The multi-label
 Owner arjoly added a note Sep 26, 2013 a problem to join this conversation on GitHub. Already have an account? Sign in to comment
doc/modules/multiclass.rst
 @@ -269,3 +268,42 @@ Below is an example of multiclass learning using Output-Codes:: .. [3] "The Elements of Statistical Learning", Hastie T., Tibshirani R., Friedman J., page 606 (second-edition) 2008. + + +Label power set +=============== + +:class:`LabelPowerSetClassifier` is problem transformation method and +constructs one classifier on a multi-class problem, where each class is a label
sklearn/multiclass.py
 ((57 lines not shown)) + Returns + ------- + self + """ + # Binarize y + self.label_binarizer_ = LabelBinarizer() + y_binary = self.label_binarizer_.fit_transform(y) + + # Code in the label power set + encoding_matrix = np.exp2(np.arange(y_binary.shape[1])).T + y_coded = safe_sparse_dot(y_binary, encoding_matrix, dense_output=True) + + self.estimator.fit(X, y_coded) + + def predict(self, X): + """Predict multi-class targets using underlying estimators.
 Owner arjoly added a note Sep 26, 2013 Predict the classification using the underlying estimators to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
 ((66 lines not shown)) + encoding_matrix = np.exp2(np.arange(y_binary.shape[1])).T + y_coded = safe_sparse_dot(y_binary, encoding_matrix, dense_output=True) + + self.estimator.fit(X, y_coded) + + def predict(self, X): + """Predict multi-class targets using underlying estimators. + + Parameters + ---------- + X : {array-like, sparse matrix}, shape = [n_samples, n_features] + Input data. + + Returns + ------- + y : array-like, shape = [n_samples, n_outputs]
 Owner arjoly added a note Sep 26, 2013 [n_samples] or [n_samples, n_outputs] to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
 ((73 lines not shown)) + + Parameters + ---------- + X : {array-like, sparse matrix}, shape = [n_samples, n_features] + Input data. + + Returns + ------- + y : array-like, shape = [n_samples, n_outputs] + Predicted multilabel target. + """ + y_coded = self.estimator.predict(X) + n_classes = len(self.label_binarizer_.classes_) + n_samples = X.shape[0] + + y_decoded = np.empty((X.shape[0], n_classes), dtype=np.int)
 Owner arjoly added a note Sep 26, 2013 X.shape[0] => n_samples to join this conversation on GitHub. Already have an account? Sign in to comment
Owner

This is so funny, @arjoly are you reviewing your own pull request? :)

Owner

Yep ;-)

Owner

@rsivapr ENH means enhancement(s)

@arjoly That was quick! :) I literally deleted that post within a second when I realized it must mean that.

Owner

I received an email when you post a message on my pull request. ;-)

Owner

(ping @glouppe )

doc/modules/multiclass.rst
 ((11 lines not shown)) +Once the output is tranformed, the :class:`LabelPowerSetClassifier +constructs one classifier on the multi-class task. At prediction time, the +classifier predicts the most relevant +class which is translated to the corresponding label set. +Since the number of generated classes is equal to +O(min(2^n_labels), n_samples), +this method suffers from the combinatorial explosion of possible label sets. +However, this allows to take into account the label correlation contrarily +to One-Vs-The-Rest, also called binary relevance. + + +Multiclass learning +------------------- + +Label power set can be used for multi-class classification, but this is +equivalent to a nop.
 Owner amueller added a note Nov 4, 2013 not sure everybody knows what you mean by nop ;) to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
 ((8 lines not shown)) + """Label power set multi-label classification strategy + + Label power set is a problem transformation method. The multi-label + classification task is transformed into a multi-class classification + task: each label set presents in the training set + is associated to a class. The underlying estimator will learn to predict + the class associated to each label set. + + The maximum number of class is bounded by the number of samples and + the number of possible label sets in the training set. This strategy + allows to take into account the correlation between the labels contrarily + to one-vs-the-rest, also called binary relevance. + + Parameters + ---------- + estimator: classifier estimator object
 Owner amueller added a note Nov 4, 2013 space before colon to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
 @@ -603,3 +607,94 @@ def predict(self, X): return predict_ecoc(self.estimators_, self.classes_, self.code_book_, X) + + +class LabelPowerSetClassifier(BaseEstimator, ClassifierMixin, + MetaEstimatorMixin): + """Label power set multi-label classification strategy + + Label power set is a problem transformation method. The multi-label + classification task is transformed into a multi-class classification + task: each label set presents in the training set + is associated to a class. The underlying estimator will learn to predict + the class associated to each label set. + + The maximum number of class is bounded by the number of samples and
 Owner amueller added a note Nov 4, 2013 "classes" I believe. to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
 ((74 lines not shown)) + Parameters + ---------- + X : {array-like, sparse matrix}, shape = [n_samples, n_features] + Input data. + + Returns + ------- + y : array-like, shape = [n_samples] or [n_samples, n_outputs] + Predicted multilabel target. + """ + y_coded = self.estimator.predict(X) + n_classes = len(self.label_binarizer_.classes_) + n_samples = X.shape[0] + + y_decoded = np.empty((n_samples, n_classes), dtype=np.int) + for i in range(n_samples):
 Owner amueller added a note Nov 4, 2013 That might be a stupid question but have you tried vectorizing? Also, I haven't really understood why you need the second loop below. And isn't `label` a string? Owner jnothman added a note Dec 8, 2013 If we limit this to < 32 initial classes (which is very reasonable!), you can use: ```def _uint_bits(): y_decoded = np.unpackbits(y_coded.astype('>u4').view('u1')).reshape((-1, 32))[:, -n_classes:][:, ::-1]``` though that's a bit obfuscated. Where sparse is necessary, can do something like: ```y_coded = y_coded.astype(np.uint32) indices = array.array('i') indptr = array.array('i', [0]) mask = np.array(1, dtype=np.uint32) for i in range(n_classes): indices.extend(np.flatnonzero(mask & y_coded)) indptr.append(len(indices)) mask *= 2 data = np.empty(len(indices), dtype=np.uint8) data.fill(1) y_decoded = sp.csc_matrix((data, indices, indptr), shape=(n_samples, n_classes))``` (or for `n_samples >> 2**n_classes`, you could binary encode `arange(2**n_classes)`, transform to csr, then extract rows `y_coded`.) Owner arjoly added a note Dec 10, 2013 Finally, I have found a simple way to perform the decoding with only numpy operations and without bounding the maximal number of classes. Owner jnothman added a note Dec 10, 2013 Nice. … Owner arjoly added a note Dec 10, 2013 FIX second condition. It needs a better tests. Owner vene added a note Jul 16, 2014 What is this for, the case where it's really binary classification? I'd add a comment. to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/tests/test_multiclass.py
 @@ -338,3 +342,42 @@ def test_ecoc_gridsearch(): cv.fit(iris.data, iris.target) best_C = cv.best_estimator_.estimators_[0].C assert_true(best_C in Cs) + + +def test_lps_binary():
 Owner amueller added a note Nov 4, 2013 I would add "shape" to the test name as it only tests for shapes.... But you could also test for results, right? fitting an SVM and a wrapped SVM? Owner arjoly added a note Dec 10, 2013 The test is now stronger. Thanks ! to join this conversation on GitHub. Already have an account? Sign in to comment
Owner

My first question would be: does this ever work? And can we get an example of this vs OVR with one dataset where OVR works and one where this works?
Also, I think we should warn the user that this can only produce label combinations that actually exist in the trainingset (making your remark about overfitting a bit more explicit maybe?)

Otherwise this looks good, good job :)

I am not entirely happy with the testing as the real use case is only tested via a hard-coded result. I think I would like it best if the transformation would be done by hand there for a small problem and in an obvious way and compare against the estimator. But maybe that is overkill. wdyt?

sklearn/multiclass.py
 @@ -603,3 +607,94 @@ def predict(self, X): return predict_ecoc(self.estimators_, self.classes_, self.code_book_, X) + + +class LabelPowerSetClassifier(BaseEstimator, ClassifierMixin, + MetaEstimatorMixin): + """Label power set multi-label classification strategy + + Label power set is a problem transformation method. The multi-label + classification task is transformed into a multi-class classification + task: each label set presents in the training set + is associated to a class. The underlying estimator will learn to predict + the class associated to each label set. + + The maximum number of class is bounded by the number of samples and + the number of possible label sets in the training set. This strategy + allows to take into account the correlation between the labels contrarily
 Owner jnothman added a note Dec 8, 2013 "allows to" -> "may". "contrarily to one-vs-the-rest" -> "unlike one-vs-rest". Also, please add a warning that complexity blows out exponentially with the number of classes, restricting its use to ?<=10. to join this conversation on GitHub. Already have an account? Sign in to comment
Owner

@amueller @jnothman Thanks for the review !!! I will try to find some time to work on all your comments.

Owner

My first question would be: does this ever work? And can we get an example of this vs OVR with one dataset where OVR works and one where this works?

Yes, this works. For instance on the yeast dataset, the lps meta-estimator shine on several metrics compare to ovr

``````{'hamming_loss': {'dummy': 0.23298021498675806,
'lps svm': 0.25775042841564105,
'ova svm': 0.23298021498675806},
'jaccard': {'dummy': 0.33653822852296145,
'lps svm': 0.43881512586528965,
'ova svm': 0.33653822852296145},
'macro-f1': {'dummy': 0.12221934801958166,
'lps svm': 0.23575486447032259,
'ova svm': 0.12221934801958166},
'micro-f1': {'dummy': 0.47828362114076395,
'lps svm': 0.56270648870093831,
'ova svm': 0.47828362114076395},
'samples-f1': {'dummy': 0.45689163011954953,
'lps svm': 0.547173739867307,
'ova svm': 0.45689163011954953},
'subset_accuracy': {'dummy': 0.017448200654307525,
'lps svm': 0.14612868047982552,
'ova svm': 0.017448200654307525},
'weighted-f1': {'dummy': 0.30083303670803041,
'lps svm': 0.43848139536413128,
'ova svm': 0.30083303670803041}}
``````

Should I add the script to the examples?

Owner

I am not entirely happy with the testing as the real use case is only tested via a hard-coded result. I think I would like it best if the transformation would be done by hand there for a small problem and in an obvious way and compare against the estimator. But maybe that is overkill. wdyt?

Do you suggest to create a `LabelPowerSetTransformer`?

Owner

Hm strange, commons tests are failing and do not detect that this is a meta estimator.

referenced this pull request
Open

### data-independent CV iterators #2904

Owner

I made the common tests pass in my branch https://github.com/vene/scikit-learn/tree/labelpowerset
Github won't let me send you a PR though.

Owner

Do you suggest to create a LabelPowerSetTransformer?

I think the suggestion was to make a toy problem and compare the (score? model coef?) with what a manually encoded transformed `y` would bring.

doc/modules/multiclass.rst
 ((10 lines not shown)) +label set of the training is associated to one class. +Once the output is tranformed, the :class:`LabelPowerSetClassifier +constructs one classifier on the multi-class task. At prediction time, the +classifier predicts the most relevant +class which is translated to the corresponding label set. +Since the number of generated classes is equal to +O(min(2^n_labels), n_samples), +this method suffers from the combinatorial explosion of possible label sets. +However, this allows to take into account the label correlation contrarily +to One-Vs-The-Rest, also called binary relevance. + + +Multiclass learning +------------------- + +Label power set can be used for multi-class classification, but this have
 Owner vene added a note Jul 16, 2014 s/this have/this would have/ or /this has/ What's up with this section, is it really necessary? Is it for consistency with OvR? Owner arjoly added a note Jul 17, 2014 Consistency and warning user, but I can remove it. to join this conversation on GitHub. Already have an account? Sign in to comment
doc/modules/multiclass.rst
 ((16 lines not shown)) +O(min(2^n_labels), n_samples), +this method suffers from the combinatorial explosion of possible label sets. +However, this allows to take into account the label correlation contrarily +to One-Vs-The-Rest, also called binary relevance. + + +Multiclass learning +------------------- + +Label power set can be used for multi-class classification, but this have +no effect. + +Multilabel learning +------------------- + +Below is an example of multi-class learning using
 Owner vene added a note Jul 16, 2014 I guess you mean multi-label to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
 ((6 lines not shown)) +class LabelPowerSetClassifier(BaseEstimator, ClassifierMixin, + MetaEstimatorMixin): + """Label power set multi-label classification strategy + + Label power set is a problem transformation method. The multi-label + classification task is transformed into a multi-class classification + task: each label set presents in the training set + is associated to a class. The underlying estimator will learn to predict + the class associated to each label set. + + The maximum number of classes is bounded by the number of samples and + the number of possible label sets in the training set. Thus leading + to a maximum of O(min(2^n_labels, n_samples)) generated classes. + This method suffers from the combinatorial explosion of possible label sets. + However, this strategy may take into account the correlation between the + labels unlike one-vs-the-rest, also called binary relevance.
 Owner vene added a note Jul 16, 2014 the "also called..." part breaks the flow of this sentence. to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
 ((30 lines not shown)) + ---------- + `label_binarizer_` : LabelBinarizer object + Object used to transform the classification task into a multilabel + classification task. + + References + ---------- + + .. [1] Tsoumakas, G., & Katakis, I. (2007). "Multi-label classification: + An overview." International Journal of Data Warehousing and Mining + (IJDWM), 3(3), 1-13. + + """ + def __init__(self, estimator): + self.estimator = estimator +
 Owner vene added a note Jul 16, 2014 why the blank line? to join this conversation on GitHub. Already have an account? Sign in to comment
Owner

If the multiclass estimator used is OvR, the explosion of states can lead to very slow training time. I think there should be a way to warn the user in this case. E.g. raise a warning if the number of generated classes seems big.
Or maybe have a verbose mode that always announces how many classes are generated.
Or add a one-liner in the documentation that the users can apply to count how many classes this method would generate.

WDYT?

Owner

Should I create a multilabel module instead of multiclass?

Owner

Should I create a multilabel module instead of multiclass?
I have the impression that these two should go in the same subpackage
(not sure how to name it, maybe simply multiclass) and in different files
in this subpackage.

I am suggesting this to fight the increase in breadth of our package
tree.

This should also be nice to do that for other modules :-)

Owner

Rebased and squash everything

 arjoly `ENH add label power set meta-estimator for multilabel classification` `b5f1a9c`
Owner

rebase on top of master

Coverage increased (+0.01%) when pulling b5f1a9c on arjoly:labelpowerset into 0807e19 on scikit-learn:master.

Owner

@amueller and @vene Is it good for you?

Owner

Writeup of our discussion:

• add a test for the zero-label class being handled correctly in predict_proba
• marginalize to get p(label | x) (btw how would this relate with what OvR gets?

Apart from this lgtm,

Owner

marginalize to get p(label | x) (btw how would this relate with what OvR gets?

There is an api discepancy for that classes.

Owner

Owner

Isn't there some way of renormalizing the output of OvR to be comparable?

Owner

Isn't there some way of renormalizing the output of OvR to be comparable?

I answer to you totally off apparently. Yes, there is a way. The one we discussed today. I am working.

I need to add tests whenever some label set are missing, i.e. label set presented at fit are not the same whenver you predict.

Owner

`LabelPowerSetClassifier` doesn't have a `classes_` attribute. Should it have one?

Owner

LabelPowerSetClassifier doesn't have a classes_ attribute. Should it have one?

yes, it should have one

Switch from MRG to WIP since I am progressing slowly on this.

changed the title from [MRG] Label power set multilabel classification strategy to [WIP] Label power set multilabel classification strategy
referenced this pull request
Open

### [WIP] Classifier Chain for multi-label problems #3727

Commits on Jul 19, 2014
1. arjoly authored