Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX export_text and export_graphviz accepts feature and class names as array-like #26289

Merged
merged 13 commits into from
May 15, 2023

Conversation

Charlie-XIAO
Copy link
Contributor

Reference Issues/PRs

Fixes #26265.

What does this implement/fix? Explain your changes.

Make tree.export_text accept both feature names and class names as numpy arrays. Before modification, both tree.export_graphviz and tree.export_text accept can deal with class names which are numpy arrays, but accept only lists during parameter validation. Moreover, tree.export_graphviz can deal with feature names which are numpy arrays, but tree.export_text cannot, and both of them accept only lists during parameter validation.

This fix makes both export_text consistent with export_graphviz, i.e., being able to deal with feature names and class names which are array-like. I will change docstring and parameter validation for export_graphviz in #26034.

Any other comments?

Please let me know if I should add test cases for this.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Charlie-XIAO . This also needs a regression test. Otherwise I think LGTM.

@Charlie-XIAO
Copy link
Contributor Author

@adrinjalali Tests added, but I'm not sure if the way I add them is neat enough.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @Charlie-XIAO !

@Charlie-XIAO Charlie-XIAO changed the title FIX tree.export_text accepts feature and class names as array-like FIX tree.export_text accepts feature and class names as ndarray Apr 27, 2023
@adrinjalali
Copy link
Member

we could call check_array on the feature names and actually accept all array-like data.

@thomasjpfan
Copy link
Member

we could call check_array on the feature names and actually accept all array-like data.

Calling check_array ends up calling asarray on list of strings, which makes a copy:

from sklearn.utils.validation import check_array

a = ["first", "second"]
a_array = check_array(a, dtype=object, ensure_2d=False)
a_array[0] = "new_first"

print(a_array[0])
# new_first

print(a[0])
# first

I think list and ndarray covers most of the use case for export_text.

@adrinjalali
Copy link
Member

which makes a copy

In case of something like export_text, do we care?

@thomasjpfan
Copy link
Member

In case of something like export_text, do we care?

There are a few ways to interpret this. If you mean:

  1. "There should not be too many feature names" when using export_text, then I think it is a valid argument for making the copy.
  2. "export_text is not performance critical", then I still care. I prefer not to use more resources if we do not have to.

Logistically, I think most users will pass in ndarray, list, or df.columns. Supporting df.columns leans toward converting things to an array. I am now +0.5 on converting the input to an ndarray.

@Charlie-XIAO
Copy link
Contributor Author

@thomasjpfan @adrinjalali So now we are going for converting input to ndarray right? I also agree with that: this is consistent with what export_graphviz supports for now, and I will modify its docstring as well. Is there a suggestion on which function I should use to convert input to ndarray? I assume that there is something in sklearn.utils.validation but I did not find the specific one to use.

@adrinjalali
Copy link
Member

@Charlie-XIAO you can use utils.validation.check_array

@Charlie-XIAO
Copy link
Contributor Author

@Charlie-XIAO you can use utils.validation.check_array

I remembered that @thomasjpfan said this would make a copy of lists, which is not desirable?

@thomasjpfan
Copy link
Member

said this would make a copy of lists, which is not desirable?

From #26289 (comment), I think it's okay and make the copy.

@Charlie-XIAO
Copy link
Contributor Author

@thomasjpfan How would the docstring be then? We cannot write "array-like" because array-like objects do not necessarily support __len__ right?

@thomasjpfan
Copy link
Member

We cannot write "array-like" because array-like objects do not necessarily support len right?

If we convert inputs to ndarrays before using on them, then we can use "array-like".

@Charlie-XIAO Charlie-XIAO changed the title FIX tree.export_text accepts feature and class names as ndarray FIX tree.export_text and tree.export_graphviz accepts feature and class names as array-like Apr 28, 2023
@Charlie-XIAO
Copy link
Contributor Author

@thomasjpfan Changes made, please let me know if tests need to be improved or the way I use check_array need to be changed. I will updated parameter validation of export_graphviz in #26034 after this change is merged. By the way, should this be FIX or ENH or MAINT?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

@Charlie-XIAO
Copy link
Contributor Author

@adrinjalali @glemaitre Thanks for your review! I've decoupled the tests, please let me know if it is okay (it has some repeated code). Otherwise I can use your other suggestion (i.e., just run twice since they are fast).

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick iterations @Charlie-XIAO

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented May 5, 2023

Oh right I completely forgot about parametrization lol, will do soon.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Charlie-XIAO

@glemaitre glemaitre merged commit 6be774b into scikit-learn:main May 15, 2023
@glemaitre
Copy link
Member

Thanks @Charlie-XIAO LGTM.

@Charlie-XIAO Charlie-XIAO changed the title FIX tree.export_text and tree.export_graphviz accepts feature and class names as array-like FIX export_text and export_graphviz accepts feature and class names as array-like May 16, 2023
@Charlie-XIAO Charlie-XIAO deleted the tree-export-feat-names branch June 28, 2023 05:43
@Charlie-XIAO Charlie-XIAO restored the tree-export-feat-names branch September 23, 2023 13:39
@Charlie-XIAO Charlie-XIAO deleted the tree-export-feat-names branch September 23, 2023 13:39
@Charlie-XIAO Charlie-XIAO restored the tree-export-feat-names branch September 23, 2023 13:39
@Charlie-XIAO Charlie-XIAO deleted the tree-export-feat-names branch September 23, 2023 13:39
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sklearn.tree.export_text failing when feature_names supplied
4 participants