Extend ClassifierChain to multi-output problems #9245

jnothman · 2017-06-29T10:28:07Z

ClassifierChain currently supports multilabel classification. It should be straightforward to extend it to multi-output (as long as it only chains on predict) except for implementing ClassifierChain.{predict_proba,decision_function} which will take some care.

The text was updated successfully, but these errors were encountered:

siebenHeaven · 2017-07-02T14:39:29Z

@jnothman I am new to contributing this project. Would like to start here. The way i understand this is that currently, ClassifierChain predicts that a given instance belongs to a class, and goes on (by passing this prediction to next estimator in the chain) to check for other classes.So would extending it to support multi-output include implementing a new parameter which, if set, tells it to not check for further classes?

jnothman · 2017-07-02T21:39:05Z

no it's basically the same structure. at the moment it is multilabel, which is equivalent to multi-output for binary problems. The chain consists of a collection of binary classifiers. Extending to multi-output multiclass (which I should have stated more explicitly above) just means each classifier may be multiclass.

…

On 3 Jul 2017 12:39 am, "siebenHeaven" ***@***.***> wrote: @jnothman <https://github.com/jnothman> I am new to contributing this project. Would like to start here. The way i understand this is that currently, ClassifierChain predicts that a given instance belongs to a class, and goes on (by passing this prediction to next estimator in the chain) to check for other classes.So would extending it to support multi-output include implementing a new parameter which, if set, tells it to not check for further classes? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9245 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz67V5r4kXdHBwDYJlAd4LIajkWGB8ks5sJ6ujgaJpZM4OJHOm> .

siebenHeaven · 2017-07-20T04:31:59Z

Ok! So right now, the base estimator that is passed to the ClassifierChain is a binary classifier. So inorder to make multi-class classifiers work, what changes will be needed? Would it need a separate class or changes would be needed in the current ClassifierChain class?
Also, what dataset can this multi-outptut multiclass be tested on?
(sorry for late reply,had some work at university :) )

jnothman · 2017-07-20T08:16:42Z

No, use the same class. Basically, you just need to ensure that the predict_proba and decision_function output conform to what you get from a multi-output DecisionTreeClassifier. It doesn't look like we have any standard datasets here. You could take a look at sklearn/tree/tests/test_tree.py:test_multioutput.

Johayon · 2018-02-09T10:31:31Z

I could take it up, if no one is currently working on it.

jnothman · 2018-02-20T09:39:08Z

you may, @Johayon.

agamemnonc · 2018-10-31T16:22:47Z

@jnothman just to clarify, by multi-output you mean multi-class, right? Because I feel that the convention that is currently used is that multi-output == multi-label.

Anyway, I confirm that this is currently causing issues, especially given that the outputs of MultioutputClassifier and ClassifierChain predict_proba are not compatible; the former returns a list of length n_outputs where each element has shape = (n_samples, n_classes), whereas the latter returns an array of shape = (n_samples, n_outputs).

For multi-output binary problems, this is OK, as it is assumed that the method returns the probability of the positive class for each output. However, in the multioutput-multiclass case, what predict_proba returns makes no sense, since there is no way to know which class the probabilities correspond to. And there are no warning messages to let the user know that the results may not make sense. Therefore, I would personally classify this issue as a bug.

@Johayon if you have decided not to work on this any more, pls let me know and I would be happy to take it up.

jnothman · 2018-11-03T13:47:41Z

Yes, I mean multioutput multiclass. And we know that there are inconsistencies in predict_proba for multilabel, although that's better described with respect to decision_function: http://scikit-learn.org/0.20/glossary.html#term-decision-function Yes, I think we should ideally be working towards a more consistent probability representation for multilabel, beginning with deprecating OneVsRestClassifier for multilabel... But other core devs may disagree

henrif94 · 2021-01-28T11:26:17Z

I am still getting a No Loop Matching Error for .predict_proba().
Any Updates on this issue?

agamemnonc · 2021-01-28T13:00:30Z

There is a PR under-way to address this issue (#14654).

It is currently in stall mode due to conflicts with the master etc, but I hope to be able to address this in the next few weeks.

lucyleeow · 2024-03-28T04:05:05Z

Looking deeper into this, Y is currently of shape (n_samples, n_classes) (multi-label binarized). Other estimators that support multi-label, multi-output have Y of shape (n_samples, n_outputs). I could not find an example where Y is of shape list of (n_samples, n_classes).

If we want to implement this, do we ask Y to be:

(n_samples, n_outputs) (not backwards compatible)
List of (n_samples, n_classes) - ? complex?
Support both to be backwards compatible - difficult to maintain?

cc @ogrisel since you reviewed #21942

jnothman added Enhancement Moderate Anything that requires some knowledge of conventions and best practices Need Contributor labels Jun 29, 2017

lesteve added help wanted and removed Need Contributor labels Oct 18, 2017

This was referenced Feb 28, 2019

need "multilabel_only" tag. #13338

Open

Bad error messages in ClassifierChain on multioutput multiclass #13339

Open

agamemnonc mentioned this issue Aug 14, 2019

[MRG] ENH add support for multiclass-multioutput to ClassifierChain #14654

Closed

cmarmo removed the help wanted label Aug 24, 2020

rana-akkumar linked a pull request Dec 10, 2021 that will close this issue

Extend ClassifierChain to multi-output problems (ClassifierChain.decision_function) #21942

Open

cmarmo added the module:multioutput label Jan 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend ClassifierChain to multi-output problems #9245

Extend ClassifierChain to multi-output problems #9245

jnothman commented Jun 29, 2017

siebenHeaven commented Jul 2, 2017

jnothman commented Jul 2, 2017 via email

siebenHeaven commented Jul 20, 2017

jnothman commented Jul 20, 2017

Johayon commented Feb 9, 2018

jnothman commented Feb 20, 2018 via email

agamemnonc commented Oct 31, 2018 •

edited

jnothman commented Nov 3, 2018 via email

henrif94 commented Jan 28, 2021

agamemnonc commented Jan 28, 2021

lucyleeow commented Mar 28, 2024 •

edited

Extend ClassifierChain to multi-output problems #9245

Extend ClassifierChain to multi-output problems #9245

Comments

jnothman commented Jun 29, 2017

siebenHeaven commented Jul 2, 2017

jnothman commented Jul 2, 2017 via email

siebenHeaven commented Jul 20, 2017

jnothman commented Jul 20, 2017

Johayon commented Feb 9, 2018

jnothman commented Feb 20, 2018 via email

agamemnonc commented Oct 31, 2018 • edited

jnothman commented Nov 3, 2018 via email

henrif94 commented Jan 28, 2021

agamemnonc commented Jan 28, 2021

lucyleeow commented Mar 28, 2024 • edited

agamemnonc commented Oct 31, 2018 •

edited

lucyleeow commented Mar 28, 2024 •

edited