Add optional attribute target_names #2979

Closed
wants to merge 3 commits into from

4 participants

@l00pen

Add the optional attribute to replace target values with string values of choice, just like feature_names.

@coveralls

Coverage Status

Coverage remained the same when pulling 669699f on l00pen:add_value_name_attribute into 096070d on scikit-learn:master.

@larsmans
scikit-learn member

I don't get this. Why not use the classes_ on a tree?

@l00pen

Do you mean instead of passing it as a parameter?

I am new to scikit and I just wanted the option to see the decision as a string instead of an array with all zeros except the label chosen.

@larsmans
scikit-learn member

Yes, instead of a parameter. If the object has the info, then letting the user pass it is just confusing.

@l00pen

Yes I agree!
However, I found that passing string corresponding categories as a parameter is quite common:

  1. When fitting a tree: "The target values (integers that correspond to classes in
    classification, real numbers in regression)" -> Then classes_ will only correspond to integer numbers, like value now.

  2. I have mimic the many tutorials with using Bunch to handle my data. Bunch supports having integer targets and passing the corresponding categories as a parameter:
    "return Bunch(data=data, target=target, target_names=categories, DESCR="NONE", feature_names=feature_names)"

  3. Also in the classification report a target_name list is passed as parameter:
    "target_names : list of strings
    Optional display names matching the labels (same order)"

Although:
I haven't thought this through when it comes to multilabel classification, do you find an issue here? does it even exists?
Neither, regression or any other types of use cases that is not a decision tree for classification where one leaf node should only correspond to one category.

@larsmans
scikit-learn member
  1. The documentation is outdated. I just fixed it in cc7f7d6.

  2. Bunch was invented before we supported string class labels. I'd personally love to get rid of this legacy API because it's confusing, but that's almost impossible.

  3. Again, legacy.

As for multilabel, yes, trees support that. In fact they support something even fancier, multi-target multiclass output.

@l00pen

Okay then I understand perfectly!

I sent in a new commit, however for the multi-target multiclass output this is not supported in this commit.

Do you know more where I can have a look at that? Totally understand if you do not have time!

@larsmans
scikit-learn member

The docs have an explanation of multi-output, do git grep multioutput.

l00pen added some commits Mar 20, 2014
@l00pen l00pen Add optional attribute target_names
Add the optional attribute to replace target values with string values of choice, just like feature_names.
7cda1cf
@l00pen l00pen Add target as value without passing as param 24f4df5
@l00pen l00pen Change corresponding test values for class labels
support multilabel output
fb23b34
@coveralls

Coverage Status

Coverage remained the same when pulling fb23b34 on l00pen:add_value_name_attribute into 39b859b on scikit-learn:master.

@amueller
scikit-learn member

Fixed by #3735.

@amueller amueller closed this May 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment