Test support for sparse multilabel format #7886

jnothman · 2016-11-16T06:19:41Z

Mutlilabel data is indicated by having y be a 2d array consisting of 0s and 1s. This may be represented with a sparse matrix format. However, it seems we are lacking in tests of sparse representations of multilabel formats.

We'd like to see one or both of:

specific tests for each multilabel estimator (or confirmation that they already exist)
a general test for multilabel estimators that they support sparse y and behave identically to dense y
clarity in the estimator docstrings where sparse y is supported.

The text was updated successfully, but these errors were encountered:

lesteve · 2016-11-16T08:02:18Z

The issue that prompted this for completeness: #7786 (comment).

dalmia · 2016-11-17T07:30:06Z

Can I take this up?

lesteve · 2016-11-17T07:41:13Z

Sure, go ahead ! If there is a "Need Contributor" and nobody is working on it by looking at the comments, you don't need to ask for permission, just add a comment saying you will start working on it.

Note there is no "Easy" label on this one. Not sure whether that was intentional, but in any case you have been warned ;-).

dalmia · 2016-11-17T07:44:10Z

Thanks for the heads up @lesteve. The more challenging the issue, the better it feels to solve it :)

lesteve · 2016-11-17T08:16:05Z

I think the outlined bullet points by @jnothman give you a good idea how to start on this issue. Feel free to ping us if you get stuck.

dalmia · 2016-11-17T08:21:44Z

@lesteve Will do Sir. Just had one doubt before I get started on working on this, that since not a large number of estimator support multilabel prediction, is there any way I could find all the multilabel estimators so that I can get started?

amueller · 2016-11-22T19:32:36Z

@dalmia do a try with all_estimators.

dalmia · 2016-12-03T15:03:03Z

This says that we don't support sparse y. As per the issue, do we need to change this? Please correct me if I am wrong.

jnothman · 2016-12-03T20:22:54Z

It's okay to not yet support in some things. That can be a separate issue.

…

On 4 December 2016 at 02:03, Aman Dalmia ***@***.***> wrote: This <https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tests/test_tree.py#L1598> says that we don't support sparse y. As per the issue, do we need to change this? Please correct me if I am wrong. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7886 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz69VeCqQtNv6gk3w5UmKTDoXa1dJzks5rEYSogaJpZM4Kzbmk> .

jnothman · 2016-12-03T20:51:09Z

Basically, I'd think sparse support should be mandatory first for those classifiers that handle multilabel but not general multioutput.

…

On 4 December 2016 at 07:22, Joel Nothman ***@***.***> wrote: It's okay to not yet support in some things. That can be a separate issue. On 4 December 2016 at 02:03, Aman Dalmia ***@***.***> wrote: > This > <https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tests/test_tree.py#L1598> > says that we don't support sparse y. As per the issue, do we need to > change this? Please correct me if I am wrong. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#7886 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAEz69VeCqQtNv6gk3w5UmKTDoXa1dJzks5rEYSogaJpZM4Kzbmk> > . >

dalmia · 2016-12-04T01:33:16Z

It seems we have it tested for OneVsRestClassifier here. I am going to check the same for DecisionTrees, RandomForests and NearestNeighbors. I suppose these are the multilabel estimators. Please let me know if I am going in the right way.
Thanks.

dalmia · 2016-12-04T01:39:22Z

But since DecisionTreeClassifier is a multilabel estimator and the tests should indicate support for sparse y, we need to change the code I earlier linked to, right?

jnothman · 2016-12-04T01:53:24Z

All of those are generalised multioutput classifiers. We could choose to implement multilabel sparse support, but we haven't yet.

…

On 4 December 2016 at 12:39, Aman Dalmia ***@***.***> wrote: But since DecisionTreeClassifier is a multilabel estimator and the tests should indicate support for sparse y, we need to change the code I earlier linked to, right? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7886 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz604-O4NPpWYchYXRgfwD-fpVLr05ks5rEhnLgaJpZM4Kzbmk> .

dalmia · 2016-12-04T05:15:03Z

@jnothman Could you please give me a final brief overview as to what do I need to implement for the issue as I'm getting a bit confused as to what should be implemented and what isn't supported?

jnothman · 2016-12-04T08:04:01Z

Fair enough. I wrote it up because there was some sense that the idea of sparse multilabel y wasn't clearly enough documented, and someone suggested some kind of invariance testing but that may not be necessary. What we've done with sparse X is tested that either an estimator supports it or gives a consistent error. I think that would be appropriate for multilabel/multioutput y, with evntual implementation of support where possible.

dalmia · 2016-12-06T07:12:24Z

So as you said, I checked for the errors that the multioutput/multilabel classifiers give for sparse multilabel y. As I had already linked before, DecisionTree gives a TypeError which has tests written. However, we don't have tests for RandomForest gives a ValueError which doesn't have tests. So, I need to write them right? Also RandomTreesEmbedding throws an AttributeError instead. So I plan to add a separate check for that.

jnothman · 2016-12-06T23:01:23Z

Sounds reasonable.

…

On 6 December 2016 at 18:12, Aman Dalmia ***@***.***> wrote: So as you said, I checked for the errors that the multioutput/multilabel classifiers give for sparse multilabel y. As I had already linked before, DecisionTree gives a TypeError which has tests written. However, we don't have tests for RandomForest gives a ValueError which doesn't have tests. So, I need to write them right? Also RandomTreesEmbedding throws an AttributeError instead. So I plan to add a separate check for that. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7886 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz69-9Ep3Ln4WM2ssXtVSL4kalvL4Qks5rFQrZgaJpZM4Kzbmk> .

dalmia · 2016-12-07T02:45:09Z

I did a mistake earlier. Since RandomTreesEmbedding only transforms the input and doesn't use y, it does support multilabel sparse y. Also, fit and fit_transform don't mention anything about y or sample_weight. Although they are not needed here, shouldn't we add some sort of note about them?

jnothman added the Need Contributor label Nov 16, 2016

amueller removed the Need Contributor label Nov 22, 2016

dalmia linked a pull request Dec 7, 2016 that will close this issue

[MRG] Tests for no sparse y support in RandomForests #7996

Open

cmarmo added help wanted module:ensemble labels Apr 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test support for sparse multilabel format #7886

Test support for sparse multilabel format #7886

jnothman commented Nov 16, 2016

lesteve commented Nov 16, 2016

dalmia commented Nov 17, 2016

lesteve commented Nov 17, 2016

dalmia commented Nov 17, 2016

lesteve commented Nov 17, 2016

dalmia commented Nov 17, 2016 •

edited

amueller commented Nov 22, 2016 •

edited

dalmia commented Dec 3, 2016

jnothman commented Dec 3, 2016 via email

jnothman commented Dec 3, 2016 via email

dalmia commented Dec 4, 2016

dalmia commented Dec 4, 2016

jnothman commented Dec 4, 2016 via email

dalmia commented Dec 4, 2016 •

edited

jnothman commented Dec 4, 2016

dalmia commented Dec 6, 2016

jnothman commented Dec 6, 2016 via email

dalmia commented Dec 7, 2016 •

edited

Test support for sparse multilabel format #7886

Test support for sparse multilabel format #7886

Comments

jnothman commented Nov 16, 2016

lesteve commented Nov 16, 2016

dalmia commented Nov 17, 2016

lesteve commented Nov 17, 2016

dalmia commented Nov 17, 2016

lesteve commented Nov 17, 2016

dalmia commented Nov 17, 2016 • edited

amueller commented Nov 22, 2016 • edited

dalmia commented Dec 3, 2016

jnothman commented Dec 3, 2016 via email

jnothman commented Dec 3, 2016 via email

dalmia commented Dec 4, 2016

dalmia commented Dec 4, 2016

jnothman commented Dec 4, 2016 via email

dalmia commented Dec 4, 2016 • edited

jnothman commented Dec 4, 2016

dalmia commented Dec 6, 2016

jnothman commented Dec 6, 2016 via email

dalmia commented Dec 7, 2016 • edited

dalmia commented Nov 17, 2016 •

edited

amueller commented Nov 22, 2016 •

edited

dalmia commented Dec 4, 2016 •

edited

dalmia commented Dec 7, 2016 •

edited