Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

[WIP] Label power set multilabel classification strategy #2461

Open
wants to merge 1 commit into from

8 participants

@arjoly
Owner

Add one of the simplest and common multi-label classification strategy which use
a multi-class classifier as a base estimator.

The core code is functional, but there is still things to do:

  • Add some word about binary relevance in ovr narrative doc
  • Write narrative doc about LP
  • Add some references
  • Add some regression tests
  • "making your remark about overfitting a bit more explicit maybe?"
@arjoly
Owner

This pr is ready for review.

doc/modules/multiclass.rst
@@ -269,3 +268,42 @@ Below is an example of multiclass learning using Output-Codes::
.. [3] "The Elements of Statistical Learning",
Hastie T., Tibshirani R., Friedman J., page 606 (second-edition)
2008.
+
+
+Label power set
+===============
+
+:class:`LabelPowerSetClassifier` is problem transformation method and
@arjoly Owner
arjoly added a note

a problem ...

@amueller Owner
amueller added a note

If you explain this problem transformation here, maybe we should also be more explicit about the transformation transformed by the OVR and in particular the OVO for multi-label? Actually this remark applies to the whole module ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
doc/modules/multiclass.rst
@@ -269,3 +268,42 @@ Below is an example of multiclass learning using Output-Codes::
.. [3] "The Elements of Statistical Learning",
Hastie T., Tibshirani R., Friedman J., page 606 (second-edition)
2008.
+
+
+Label power set
+===============
+
+:class:`LabelPowerSetClassifier` is problem transformation method and
+constructs one classifier on a multi-class problem, where each class is a label
+set. At prediction time, the classifier predict the most relevant
+class which is translated to the corresponding label set.
+Since the number of generated class is equal to O(min(2^n_labels), n_samples),
@arjoly Owner
arjoly added a note

classes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
@@ -2,10 +2,11 @@
Multiclass and multilabel classification strategies
===================================================
-This module implements multiclass learning algorithms:
+This module implements multiclass / multilabel learning algorithms:
- one-vs-the-rest / one-vs-all
@arjoly Owner
arjoly added a note

/ binary relevance

@amueller Owner
amueller added a note

I would replace the / by "and".

@vene Owner
vene added a note

missing space after multiclass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
@@ -603,3 +607,94 @@ def predict(self, X):
return predict_ecoc(self.estimators_, self.classes_,
self.code_book_, X)
+
+
+class LabelPowerSetClassifier(BaseEstimator, ClassifierMixin,
+ MetaEstimatorMixin):
+ """Label power set multi-label classification strategy
+
+ Label power set is problem transformation method. The multi-label
@arjoly Owner
arjoly added a note

a problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
doc/modules/multiclass.rst
@@ -269,3 +268,42 @@ Below is an example of multiclass learning using Output-Codes::
.. [3] "The Elements of Statistical Learning",
Hastie T., Tibshirani R., Friedman J., page 606 (second-edition)
2008.
+
+
+Label power set
+===============
+
+:class:`LabelPowerSetClassifier` is problem transformation method and
+constructs one classifier on a multi-class problem, where each class is a label
@arjoly Owner
arjoly added a note

multi-class task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
((57 lines not shown))
+ Returns
+ -------
+ self
+ """
+ # Binarize y
+ self.label_binarizer_ = LabelBinarizer()
+ y_binary = self.label_binarizer_.fit_transform(y)
+
+ # Code in the label power set
+ encoding_matrix = np.exp2(np.arange(y_binary.shape[1])).T
+ y_coded = safe_sparse_dot(y_binary, encoding_matrix, dense_output=True)
+
+ self.estimator.fit(X, y_coded)
+
+ def predict(self, X):
+ """Predict multi-class targets using underlying estimators.
@arjoly Owner
arjoly added a note

Predict the classification using the underlying estimators

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
((66 lines not shown))
+ encoding_matrix = np.exp2(np.arange(y_binary.shape[1])).T
+ y_coded = safe_sparse_dot(y_binary, encoding_matrix, dense_output=True)
+
+ self.estimator.fit(X, y_coded)
+
+ def predict(self, X):
+ """Predict multi-class targets using underlying estimators.
+
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape = [n_samples, n_features]
+ Input data.
+
+ Returns
+ -------
+ y : array-like, shape = [n_samples, n_outputs]
@arjoly Owner
arjoly added a note

[n_samples] or [n_samples, n_outputs]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
((73 lines not shown))
+
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape = [n_samples, n_features]
+ Input data.
+
+ Returns
+ -------
+ y : array-like, shape = [n_samples, n_outputs]
+ Predicted multilabel target.
+ """
+ y_coded = self.estimator.predict(X)
+ n_classes = len(self.label_binarizer_.classes_)
+ n_samples = X.shape[0]
+
+ y_decoded = np.empty((X.shape[0], n_classes), dtype=np.int)
@arjoly Owner
arjoly added a note

X.shape[0] => n_samples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@glouppe
Owner

This is so funny, @arjoly are you reviewing your own pull request? :)

@arjoly
Owner

Yep ;-)

@arjoly
Owner

@rsivapr ENH means enhancement(s)

@rsivapr

@arjoly That was quick! :) I literally deleted that post within a second when I realized it must mean that.

@arjoly
Owner

I received an email when you post a message on my pull request. ;-)

@arjoly
Owner

(ping @glouppe )

doc/modules/multiclass.rst
((11 lines not shown))
+Once the output is tranformed, the :class:`LabelPowerSetClassifier
+constructs one classifier on the multi-class task. At prediction time, the
+classifier predicts the most relevant
+class which is translated to the corresponding label set.
+Since the number of generated classes is equal to
+O(min(2^n_labels), n_samples),
+this method suffers from the combinatorial explosion of possible label sets.
+However, this allows to take into account the label correlation contrarily
+to One-Vs-The-Rest, also called binary relevance.
+
+
+Multiclass learning
+-------------------
+
+Label power set can be used for multi-class classification, but this is
+equivalent to a nop.
@amueller Owner
amueller added a note

not sure everybody knows what you mean by nop ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
((8 lines not shown))
+ """Label power set multi-label classification strategy
+
+ Label power set is a problem transformation method. The multi-label
+ classification task is transformed into a multi-class classification
+ task: each label set presents in the training set
+ is associated to a class. The underlying estimator will learn to predict
+ the class associated to each label set.
+
+ The maximum number of class is bounded by the number of samples and
+ the number of possible label sets in the training set. This strategy
+ allows to take into account the correlation between the labels contrarily
+ to one-vs-the-rest, also called binary relevance.
+
+ Parameters
+ ----------
+ estimator: classifier estimator object
@amueller Owner
amueller added a note

space before colon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
@@ -603,3 +607,94 @@ def predict(self, X):
return predict_ecoc(self.estimators_, self.classes_,
self.code_book_, X)
+
+
+class LabelPowerSetClassifier(BaseEstimator, ClassifierMixin,
+ MetaEstimatorMixin):
+ """Label power set multi-label classification strategy
+
+ Label power set is a problem transformation method. The multi-label
+ classification task is transformed into a multi-class classification
+ task: each label set presents in the training set
+ is associated to a class. The underlying estimator will learn to predict
+ the class associated to each label set.
+
+ The maximum number of class is bounded by the number of samples and
@amueller Owner
amueller added a note

"classes" I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
((74 lines not shown))
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape = [n_samples, n_features]
+ Input data.
+
+ Returns
+ -------
+ y : array-like, shape = [n_samples] or [n_samples, n_outputs]
+ Predicted multilabel target.
+ """
+ y_coded = self.estimator.predict(X)
+ n_classes = len(self.label_binarizer_.classes_)
+ n_samples = X.shape[0]
+
+ y_decoded = np.empty((n_samples, n_classes), dtype=np.int)
+ for i in range(n_samples):
@amueller Owner
amueller added a note

That might be a stupid question but have you tried vectorizing? Also, I haven't really understood why you need the second loop below. And isn't label a string?

@jnothman Owner
jnothman added a note

If we limit this to < 32 initial classes (which is very reasonable!), you can use:

def _uint_bits():
y_decoded = np.unpackbits(y_coded.astype('>u4').view('u1')).reshape((-1, 32))[:, -n_classes:][:, ::-1]

though that's a bit obfuscated.

Where sparse is necessary, can do something like:

y_coded = y_coded.astype(np.uint32)
indices = array.array('i')
indptr = array.array('i', [0])
mask = np.array(1, dtype=np.uint32)
for i in range(n_classes):
    indices.extend(np.flatnonzero(mask & y_coded))
    indptr.append(len(indices))
    mask *= 2

data = np.empty(len(indices), dtype=np.uint8)
data.fill(1)
y_decoded = sp.csc_matrix((data, indices, indptr), shape=(n_samples, n_classes))

(or for n_samples >> 2**n_classes, you could binary encode arange(2**n_classes), transform to csr, then extract rows y_coded.)

@arjoly Owner
arjoly added a note

Finally, I have found a simple way to perform the decoding with only numpy operations and without bounding the maximal number of classes.

@jnothman Owner
@arjoly Owner
arjoly added a note

FIX second condition. It needs a better tests.

@vene Owner
vene added a note

What is this for, the case where it's really binary classification? I'd add a comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/tests/test_multiclass.py
@@ -338,3 +342,42 @@ def test_ecoc_gridsearch():
cv.fit(iris.data, iris.target)
best_C = cv.best_estimator_.estimators_[0].C
assert_true(best_C in Cs)
+
+
+def test_lps_binary():
@amueller Owner
amueller added a note

I would add "shape" to the test name as it only tests for shapes.... But you could also test for results, right? fitting an SVM and a wrapped SVM?

@arjoly Owner
arjoly added a note

The test is now stronger. Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@amueller
Owner

My first question would be: does this ever work? And can we get an example of this vs OVR with one dataset where OVR works and one where this works?
Also, I think we should warn the user that this can only produce label combinations that actually exist in the trainingset (making your remark about overfitting a bit more explicit maybe?)

Otherwise this looks good, good job :)

I am not entirely happy with the testing as the real use case is only tested via a hard-coded result. I think I would like it best if the transformation would be done by hand there for a small problem and in an obvious way and compare against the estimator. But maybe that is overkill. wdyt?

sklearn/multiclass.py
@@ -603,3 +607,94 @@ def predict(self, X):
return predict_ecoc(self.estimators_, self.classes_,
self.code_book_, X)
+
+
+class LabelPowerSetClassifier(BaseEstimator, ClassifierMixin,
+ MetaEstimatorMixin):
+ """Label power set multi-label classification strategy
+
+ Label power set is a problem transformation method. The multi-label
+ classification task is transformed into a multi-class classification
+ task: each label set presents in the training set
+ is associated to a class. The underlying estimator will learn to predict
+ the class associated to each label set.
+
+ The maximum number of class is bounded by the number of samples and
+ the number of possible label sets in the training set. This strategy
+ allows to take into account the correlation between the labels contrarily
@jnothman Owner
jnothman added a note

"allows to" -> "may".
"contrarily to one-vs-the-rest" -> "unlike one-vs-rest".

Also, please add a warning that complexity blows out exponentially with the number of classes, restricting its use to ?<=10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly
Owner

@amueller @jnothman Thanks for the review !!! I will try to find some time to work on all your comments.

@arjoly
Owner

My first question would be: does this ever work? And can we get an example of this vs OVR with one dataset where OVR works and one where this works?

Yes, this works. For instance on the yeast dataset, the lps meta-estimator shine on several metrics compare to ovr

{'hamming_loss': {'dummy': 0.23298021498675806,
                  'lps svm': 0.25775042841564105,
                  'ova svm': 0.23298021498675806},
 'jaccard': {'dummy': 0.33653822852296145,
             'lps svm': 0.43881512586528965,
             'ova svm': 0.33653822852296145},
 'macro-f1': {'dummy': 0.12221934801958166,
              'lps svm': 0.23575486447032259,
              'ova svm': 0.12221934801958166},
 'micro-f1': {'dummy': 0.47828362114076395,
              'lps svm': 0.56270648870093831,
              'ova svm': 0.47828362114076395},
 'samples-f1': {'dummy': 0.45689163011954953,
                'lps svm': 0.547173739867307,
                'ova svm': 0.45689163011954953},
 'subset_accuracy': {'dummy': 0.017448200654307525,
                     'lps svm': 0.14612868047982552,
                     'ova svm': 0.017448200654307525},
 'weighted-f1': {'dummy': 0.30083303670803041,
                 'lps svm': 0.43848139536413128,
                 'ova svm': 0.30083303670803041}}

Should I add the script to the examples?

@arjoly
Owner

I am not entirely happy with the testing as the real use case is only tested via a hard-coded result. I think I would like it best if the transformation would be done by hand there for a small problem and in an obvious way and compare against the estimator. But maybe that is overkill. wdyt?

Do you suggest to create a LabelPowerSetTransformer?

@arjoly
Owner

Hm strange, commons tests are failing and do not detect that this is a meta estimator.

@vene
Owner

I made the common tests pass in my branch https://github.com/vene/scikit-learn/tree/labelpowerset
Github won't let me send you a PR though.

@vene
Owner

Do you suggest to create a LabelPowerSetTransformer?

I think the suggestion was to make a toy problem and compare the (score? model coef?) with what a manually encoded transformed y would bring.

doc/modules/multiclass.rst
((10 lines not shown))
+label set of the training is associated to one class.
+Once the output is tranformed, the :class:`LabelPowerSetClassifier
+constructs one classifier on the multi-class task. At prediction time, the
+classifier predicts the most relevant
+class which is translated to the corresponding label set.
+Since the number of generated classes is equal to
+O(min(2^n_labels), n_samples),
+this method suffers from the combinatorial explosion of possible label sets.
+However, this allows to take into account the label correlation contrarily
+to One-Vs-The-Rest, also called binary relevance.
+
+
+Multiclass learning
+-------------------
+
+Label power set can be used for multi-class classification, but this have
@vene Owner
vene added a note

s/this have/this would have/ or /this has/
What's up with this section, is it really necessary? Is it for consistency with OvR?

@arjoly Owner
arjoly added a note

Consistency and warning user, but I can remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
doc/modules/multiclass.rst
((16 lines not shown))
+O(min(2^n_labels), n_samples),
+this method suffers from the combinatorial explosion of possible label sets.
+However, this allows to take into account the label correlation contrarily
+to One-Vs-The-Rest, also called binary relevance.
+
+
+Multiclass learning
+-------------------
+
+Label power set can be used for multi-class classification, but this have
+no effect.
+
+Multilabel learning
+-------------------
+
+Below is an example of multi-class learning using
@vene Owner
vene added a note

I guess you mean multi-label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
((6 lines not shown))
+class LabelPowerSetClassifier(BaseEstimator, ClassifierMixin,
+ MetaEstimatorMixin):
+ """Label power set multi-label classification strategy
+
+ Label power set is a problem transformation method. The multi-label
+ classification task is transformed into a multi-class classification
+ task: each label set presents in the training set
+ is associated to a class. The underlying estimator will learn to predict
+ the class associated to each label set.
+
+ The maximum number of classes is bounded by the number of samples and
+ the number of possible label sets in the training set. Thus leading
+ to a maximum of O(min(2^n_labels, n_samples)) generated classes.
+ This method suffers from the combinatorial explosion of possible label sets.
+ However, this strategy may take into account the correlation between the
+ labels unlike one-vs-the-rest, also called binary relevance.
@vene Owner
vene added a note

the "also called..." part breaks the flow of this sentence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/multiclass.py
((30 lines not shown))
+ ----------
+ `label_binarizer_` : LabelBinarizer object
+ Object used to transform the classification task into a multilabel
+ classification task.
+
+ References
+ ----------
+
+ .. [1] Tsoumakas, G., & Katakis, I. (2007). "Multi-label classification:
+ An overview." International Journal of Data Warehousing and Mining
+ (IJDWM), 3(3), 1-13.
+
+ """
+ def __init__(self, estimator):
+ self.estimator = estimator
+
@vene Owner
vene added a note

why the blank line?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@vene
Owner

If the multiclass estimator used is OvR, the explosion of states can lead to very slow training time. I think there should be a way to warn the user in this case. E.g. raise a warning if the number of generated classes seems big.
Or maybe have a verbose mode that always announces how many classes are generated.
Or add a one-liner in the documentation that the users can apply to count how many classes this method would generate.

WDYT?

@arjoly
Owner

Should I create a multilabel module instead of multiclass?

@GaelVaroquaux
@arjoly
Owner

Should I create a multilabel module instead of multiclass?
I have the impression that these two should go in the same subpackage
(not sure how to name it, maybe simply multiclass) and in different files
in this subpackage.

I am suggesting this to fight the increase in breadth of our package
tree.

This should also be nice to do that for other modules :-)

@arjoly
Owner

Rebased and squash everything

@arjoly
Owner

rebase on top of master

@coveralls

Coverage Status

Coverage increased (+0.01%) when pulling b5f1a9c on arjoly:labelpowerset into 0807e19 on scikit-learn:master.

@arjoly
Owner

@amueller and @vene Is it good for you?

@vene
Owner

Writeup of our discussion:

  • add a test for the zero-label class being handled correctly in predict_proba
  • marginalize to get p(label | x) (btw how would this relate with what OvR gets?

Apart from this lgtm, :+1:

@arjoly
Owner

marginalize to get p(label | x) (btw how would this relate with what OvR gets?

There is an api discepancy for that classes.

@arjoly
Owner

see #2451 for more information

@vene
Owner

Isn't there some way of renormalizing the output of OvR to be comparable?

@arjoly
Owner

Isn't there some way of renormalizing the output of OvR to be comparable?

I answer to you totally off apparently. Yes, there is a way. The one we discussed today. I am working.

I need to add tests whenever some label set are missing, i.e. label set presented at fit are not the same whenver you predict.

@vene
Owner

LabelPowerSetClassifier doesn't have a classes_ attribute. Should it have one?

@arjoly
Owner

LabelPowerSetClassifier doesn't have a classes_ attribute. Should it have one?

yes, it should have one

Switch from MRG to WIP since I am progressing slowly on this.

@arjoly arjoly changed the title from [MRG] Label power set multilabel classification strategy to [WIP] Label power set multilabel classification strategy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
View
1  doc/modules/classes.rst
@@ -896,6 +896,7 @@ Pairwise metrics
multiclass.OneVsRestClassifier
multiclass.OneVsOneClassifier
multiclass.OutputCodeClassifier
+ multiclass.LabelPowerSetClassifier
.. _naive_bayes_ref:
View
43 doc/modules/multiclass.rst
@@ -33,7 +33,7 @@ by decomposing such problems into binary classification problems.
several joint classification tasks. This is a generalization
of the multi-label classification task, where the set of classification
problem is restricted to binary classification, and of the multi-class
- classification task. *The output format is a 2d numpy array or sparse
+ classification task. *The output format is a 2d numpy array or sparse
matrix.*
The set of labels can be different for each output variable.
@@ -106,8 +106,9 @@ format.
One-Vs-The-Rest
===============
-This strategy, also known as **one-vs-all**, is implemented in
-:class:`OneVsRestClassifier`. The strategy consists in fitting one classifier
+This strategy, also known as **one-vs-all** or as **binary relevance**, is
+implemented in :class:`OneVsRestClassifier`. The strategy consists in fitting
+one classifier
per class. For each classifier, the class is fitted against all the other
classes. In addition to its computational efficiency (only `n_classes`
classifiers are needed), one advantage of this approach is its
@@ -139,9 +140,8 @@ Multilabel learning
-------------------
:class:`OneVsRestClassifier` also supports multilabel classification.
-To use this feature, feed the classifier an indicator matrix, in which cell
-[i, j] indicates the presence of label j in sample i.
-
+To use this feature, feed the classifier with a binary indicator matrix.
+In this context, one-versus-rest is also called binary relevance.
.. figure:: ../auto_examples/images/plot_multilabel_001.png
:target: ../auto_examples/plot_multilabel.html
@@ -259,3 +259,34 @@ Below is an example of multiclass learning using Output-Codes::
.. [3] "The Elements of Statistical Learning",
Hastie T., Tibshirani R., Friedman J., page 606 (second-edition)
2008.
+
+
+Label power set
+===============
+
+:class:`LabelPowerSetClassifier` is a multilabel problem transformation method:
+each label set of the training is associated to one class. Once the output is
+tranformed, the :class:`LabelPowerSetClassifier`fits one classifier on the
+multi-class task. At prediction time, the classifier predicts the most relevant
+class which is translated to the corresponding label set. Since the number of
+generated classes is equal to ``O(min(2^n_labels), n_samples)``, this method
+suffers from the combinatorial explosion of possible label sets and overfit
+over the data. Nevertheless, this allows to take into account the label
+correlation contrarily to One-Vs-The-Rest.
+
+Below is an example of multi-label learning using
+:class:`LabelPowerSetClassifier`:
+
+ >>> from sklearn.datasets import make_multilabel_classification
+ >>> from sklearn.multiclass import LabelPowerSetClassifier
+ >>> from sklearn.svm import LinearSVC
+ >>> from sklearn.cross_validation import train_test_split
+ >>> from sklearn.metrics import jaccard_similarity_score
+ >>> X, y = make_multilabel_classification(return_indicator=True,
+ ... n_samples=200, random_state=0)
+ >>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
+ >>> classifier = LabelPowerSetClassifier(LinearSVC(random_state=1))
+ >>> classifier.fit(X_train, y_train)
+ >>> y_pred = classifier.predict(X_test)
+ >>> jaccard_similarity_score(y_test, y_pred) # doctest: +ELLIPSIS
+ 0.45...
View
108 sklearn/multiclass.py
@@ -2,10 +2,11 @@
Multiclass and multilabel classification strategies
===================================================
-This module implements multiclass learning algorithms:
- - one-vs-the-rest / one-vs-all
+This module implements multiclass and multilabel learning algorithms:
+ - one-vs-the-rest / one-vs-all / binary relevance
- one-vs-one
- error correcting output codes
+ - label power set
The estimators provided in this module are meta-estimators: they require a base
estimator to be provided in their constructor. For example, it is possible to
@@ -29,7 +30,8 @@
"""
# Author: Mathieu Blondel <mathieu@mblondel.org>
-# Author: Hamzeh Alsalhi <93hamsal@gmail.com>
+# Hamzeh Alsalhi <93hamsal@gmail.com>
+# Arnaud Joly <arnaud.v.joly@gmail.com>
#
# License: BSD 3 clause
@@ -45,6 +47,7 @@
from .utils import check_random_state
from .utils.validation import _num_samples
from .utils import deprecated
+from .utils.extmath import safe_sparse_dot
from .externals.joblib import Parallel
from .externals.joblib import delayed
@@ -228,9 +231,11 @@ class OneVsRestClassifier(BaseEstimator, ClassifierMixin, MetaEstimatorMixin):
`classes_` : array, shape = [`n_classes`]
Class labels.
+
`label_binarizer_` : LabelBinarizer object
Object used to transform multiclass labels to binary labels and
vice-versa.
+
`multilabel_` : boolean
Whether a OneVsRestClassifier is a multilabel classifier.
"""
@@ -728,3 +733,100 @@ def predict(self, X):
Y = np.array([_predict_binary(e, X) for e in self.estimators_]).T
pred = euclidean_distances(Y, self.code_book_).argmin(axis=1)
return self.classes_[pred]
+
+
+class LabelPowerSetClassifier(BaseEstimator, ClassifierMixin,
+ MetaEstimatorMixin):
+ """Label power set multi-label classification strategy
+
+ Label power set is a problem transformation method. The multi-label
+ classification task is transformed into a multi-class classification
+ task: each label set presents in the training set
+ is associated to a class. The underlying estimator will learn to predict
+ the class associated to each label set.
+
+ The maximum number of classes is bounded by the number of samples and
+ the number of possible label sets in the training set. Thus leading
+ to a maximum of O(min(2^n_labels, n_samples)) generated classes.
+ This method suffers from the combinatorial explosion of possible label sets.
+ However, this strategy may take into account the correlation between the
+ labels unlike one-vs-the-rest (binary relevance).
+
+ Parameters
+ ----------
+ estimator : classifier estimator object
+ A multi-class estimator object implementing a `fit` and a `predict`
+ method.
+
+ Attributes
+ ----------
+ `label_binarizer_` : LabelBinarizer object
+ Object used to transform the classification task into a multilabel
+ classification task.
+
+ References
+ ----------
+
+ .. [1] Tsoumakas, G., & Katakis, I. (2007). "Multi-label classification:
+ An overview." International Journal of Data Warehousing and Mining
+ (IJDWM), 3(3), 1-13.
+
+ """
+ def __init__(self, estimator):
+ self.estimator = estimator
+ self.label_binarizer_ = None
+
+ def fit(self, X, y):
+ """Fit underlying estimators.
+
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape = [n_samples, n_features]
+ Input training data.
+
+ y : array-like, shape = [n_samples] or [n_samples, n_outputs]
+ Output training target in label indicator format
+
+ Returns
+ -------
+ self
+ """
+ # Binarize y
+ self.label_binarizer_ = LabelBinarizer(sparse_output=True)
+ y_binary = self.label_binarizer_.fit_transform(y)
+
+ # Code in the label power set
+ encoding_matrix = np.exp2(np.arange(y_binary.shape[1])).T
+ y_coded = safe_sparse_dot(y_binary, encoding_matrix, dense_output=True)
+
+ self.estimator.fit(X, y_coded)
+
+ def predict(self, X):
+ """Predict targets using the underlying estimator.
+
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape = [n_samples, n_features]
+ Input data.
+
+ Returns
+ -------
+ y : (sparse) array-like, shape = [n_samples] or [n_samples, n_outputs]
+ Predicted multilabel target.
+ """
+ y_coded = self.estimator.predict(X)
+ binary_code_size = len(self.label_binarizer_.classes_)
+
+ if self.label_binarizer_.y_type_ == "binary":
+ binary_code_size = 1
+
+ shifting_vector = 2 ** np.arange(binary_code_size)
+
+ # Shift the binary representation of a class
+ y_shifted = y_coded.reshape((-1, 1)) // shifting_vector
+ y_shifted = y_shifted.astype(np.int)
+
+ # Decode y by checking the appropriate bit
+ y_decoded = np.bitwise_and(0x1, y_shifted)
+
+ return self.label_binarizer_.inverse_transform(y_decoded)
View
53 sklearn/tests/test_multiclass.py
@@ -1,18 +1,21 @@
import numpy as np
import scipy.sparse as sp
+from sklearn.cross_validation import train_test_split
+from sklearn.utils.testing import assert_almost_equal
from sklearn.utils.testing import assert_array_equal
from sklearn.utils.testing import assert_equal
-from sklearn.utils.testing import assert_almost_equal
-from sklearn.utils.testing import assert_true
from sklearn.utils.testing import assert_false
+from sklearn.utils.testing import assert_greater
from sklearn.utils.testing import assert_raises
+from sklearn.utils.testing import assert_true
from sklearn.utils.testing import assert_warns
from sklearn.utils.testing import ignore_warnings
-from sklearn.utils.testing import assert_greater
+
from sklearn.multiclass import OneVsRestClassifier
from sklearn.multiclass import OneVsOneClassifier
from sklearn.multiclass import OutputCodeClassifier
+from sklearn.multiclass import LabelPowerSetClassifier
from sklearn.multiclass import fit_ovr
from sklearn.multiclass import fit_ovo
@@ -540,6 +543,50 @@ def test_deprecated():
meta_est.predict(X_test))
+def test_lps_binary():
+ X, Y = datasets.make_classification(n_samples=50,
+ n_features=20,
+ random_state=0)
+ X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state=0)
+
+ lps = LabelPowerSetClassifier(LinearSVC(random_state=0))
+ lps.fit(X_train, Y_train)
+ out_lps = lps.predict(X_test)
+
+ svc = LinearSVC(random_state=0)
+ svc.fit(X_train, Y_train)
+ out_svc = svc.predict(X_test)
+
+ assert_equal(out_lps.shape, Y_test.shape)
+ assert_array_equal(out_lps, out_svc)
+
+
+def test_lps_multiclass():
+ lp = LabelPowerSetClassifier(LinearSVC(random_state=0))
+ lp.fit(iris.data, iris.target)
+ out_lp = lp.predict(iris.data)
+
+ svc = LinearSVC(random_state=0)
+ svc.fit(iris.data, iris.target)
+ out_svc = svc.predict(iris.data)
+
+ assert_array_equal(out_lp, out_svc)
+
+
+def test_lps_multilabel():
+ X, Y = datasets.make_multilabel_classification(n_samples=50,
+ n_features=20,
+ random_state=0,
+ return_indicator=True)
+
+ X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state=0)
+
+ lps = LabelPowerSetClassifier(LinearSVC(random_state=0))
+ lps.fit(X_train, Y_train)
+ out_lps = lps.predict(X_test)
+ assert_equal(out_lps.shape, Y_test.shape)
+
+
if __name__ == "__main__":
import nose
nose.runmodule()
View
3  sklearn/utils/testing.py
@@ -456,7 +456,8 @@ def uninstall_mldata_mock():
# Meta estimators need another estimator to be instantiated.
META_ESTIMATORS = ["OneVsOneClassifier",
"OutputCodeClassifier", "OneVsRestClassifier", "RFE",
- "RFECV", "BaseEnsemble"]
+ "RFECV", "BaseEnsemble", "LabelPowerSetClassifier"]
+
# estimators that there is no way to default-construct sensibly
OTHER = ["Pipeline", "FeatureUnion", "GridSearchCV", "RandomizedSearchCV"]
Something went wrong with that request. Please try again.