New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test support for sparse multilabel format #7886
Comments
The issue that prompted this for completeness: #7786 (comment). |
Can I take this up? |
Sure, go ahead ! If there is a "Need Contributor" and nobody is working on it by looking at the comments, you don't need to ask for permission, just add a comment saying you will start working on it. Note there is no "Easy" label on this one. Not sure whether that was intentional, but in any case you have been warned ;-). |
Thanks for the heads up @lesteve. The more challenging the issue, the better it feels to solve it :) |
I think the outlined bullet points by @jnothman give you a good idea how to start on this issue. Feel free to ping us if you get stuck. |
@lesteve Will do Sir. Just had one doubt before I get started on working on this, that since not a large number of estimator support multilabel prediction, is there any way I could find all the multilabel estimators so that I can get started? |
@dalmia do a |
This says that we don't support sparse |
It's okay to not yet support in some things. That can be a separate issue.
…On 4 December 2016 at 02:03, Aman Dalmia ***@***.***> wrote:
This
<https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tests/test_tree.py#L1598>
says that we don't support sparse y. As per the issue, do we need to
change this? Please correct me if I am wrong.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7886 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz69VeCqQtNv6gk3w5UmKTDoXa1dJzks5rEYSogaJpZM4Kzbmk>
.
|
Basically, I'd think sparse support should be mandatory first for those
classifiers that handle multilabel but not general multioutput.
…On 4 December 2016 at 07:22, Joel Nothman ***@***.***> wrote:
It's okay to not yet support in some things. That can be a separate issue.
On 4 December 2016 at 02:03, Aman Dalmia ***@***.***> wrote:
> This
> <https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tests/test_tree.py#L1598>
> says that we don't support sparse y. As per the issue, do we need to
> change this? Please correct me if I am wrong.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#7886 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAEz69VeCqQtNv6gk3w5UmKTDoXa1dJzks5rEYSogaJpZM4Kzbmk>
> .
>
|
It seems we have it tested for |
But since |
All of those are generalised multioutput classifiers. We could choose to
implement multilabel sparse support, but we haven't yet.
…On 4 December 2016 at 12:39, Aman Dalmia ***@***.***> wrote:
But since DecisionTreeClassifier is a multilabel estimator and the tests
should indicate support for sparse y, we need to change the code I
earlier linked to, right?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7886 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz604-O4NPpWYchYXRgfwD-fpVLr05ks5rEhnLgaJpZM4Kzbmk>
.
|
@jnothman Could you please give me a final brief overview as to what do I need to implement for the issue as I'm getting a bit confused as to what should be implemented and what isn't supported? |
Fair enough. I wrote it up because there was some sense that the idea of sparse multilabel |
So as you said, I checked for the errors that the multioutput/multilabel classifiers give for sparse multilabel |
Sounds reasonable.
…On 6 December 2016 at 18:12, Aman Dalmia ***@***.***> wrote:
So as you said, I checked for the errors that the multioutput/multilabel
classifiers give for sparse multilabel y. As I had already linked before,
DecisionTree gives a TypeError which has tests written. However, we don't
have tests for RandomForest gives a ValueError which doesn't have tests.
So, I need to write them right? Also RandomTreesEmbedding throws an
AttributeError instead. So I plan to add a separate check for that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7886 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz69-9Ep3Ln4WM2ssXtVSL4kalvL4Qks5rFQrZgaJpZM4Kzbmk>
.
|
I did a mistake earlier. Since |
Mutlilabel data is indicated by having
y
be a 2d array consisting of 0s and 1s. This may be represented with a sparse matrix format. However, it seems we are lacking in tests of sparse representations of multilabel formats.We'd like to see one or both of:
y
and behave identically to densey
y
is supported.The text was updated successfully, but these errors were encountered: