Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] more detailed docs for trees_to_dataframe(), create_tree_digraph(), plot_tree() #3618

Merged
merged 11 commits into from
Dec 7, 2020

Conversation

jameslamb
Copy link
Collaborator

I've been looking at the [lightgbm] tag on Stacck Overflow recently, and have seen a few questions about how to interpret the output of some visualization methods in the Python Package.

I think there is an opportunity to improve the documentation for those methods. This PR proposes more detailed docs on trees_to_dataframe(), create_tree_digraph(), and plot_tree() to help people understand them better. It's very possible that I've also misunderstood the meaning of some of the these things, so I appreciate your thorough reviews.

Notes for Reviewers

I opened this from a LightGBM branch so that we can hopefully enable RTD builds and see if I made any formatting mistakes. @StrikerRUS if you agree with the spirit of this PR, could you enable RTD builds for this branch?

@StrikerRUS
Copy link
Collaborator

@jameslamb

could you enable RTD builds for this branch?

Sure, done!
https://readthedocs.org/projects/lightgbm/builds/

@jameslamb
Copy link
Collaborator Author

There are some warnings, will fix them tonight

/basic.py:docstring of lightgbm.Booster.trees_to_dataframe:6: WARNING: Unexpected indentation.
/basic.py:docstring of lightgbm.Booster.trees_to_dataframe:8: WARNING: Block quote ends without a blank line; unexpected unindent.
/plotting.py:docstring of lightgbm.create_tree_digraph:26: WARNING: Unexpected indentation.
/plotting.py:docstring of lightgbm.create_tree_digraph:27: WARNING: Block quote ends without a blank line; unexpected unindent.
/plotting.py:docstring of lightgbm.plot_tree:31: WARNING: Unexpected indentation.
/plotting.py:docstring of lightgbm.plot_tree:32: WARNING: Block quote ends without a blank line; unexpected unindent.
looking for now-outdated files... none found

https://readthedocs.org/projects/lightgbm/builds/12463523/

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb Thanks for this great PR! I believe it will help a lot users to better understand returned results of these methods.

Please check my comments below.

python-package/lightgbm/basic.py Outdated Show resolved Hide resolved
Comment on lines 2219 to 2222
- ``split_feature``: string, identifier for the feature used for splitting.
This is of the form ``"Column_i"``, where ``i`` refers to the number of the
feature. For example, the first feature would be ``"Column_0"``. ``None``
for leaf nodes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is not always true. Feature names can have custom values passed by user.

Suggested change
- ``split_feature``: string, identifier for the feature used for splitting.
This is of the form ``"Column_i"``, where ``i`` refers to the number of the
feature. For example, the first feature would be ``"Column_0"``. ``None``
for leaf nodes.
- ``split_feature``: string, name of the feature used for splitting.
``None`` for leaf nodes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll change this then. But I don't think just saying "name of the feature used for splitting" is sufficient. We should document both cases.

What do you think about this?

string, name of the feature used for splitting. If feature_name was not provided to the fit() call that produced this model, this will be of the form "Column_i", where i refers to the number of the feature. So for example, the first feature would be "Column_0".

I think it's important to specify that these Column_i names are 0-based (since that might not be obvious), and that they're based on the order of the input data.

Copy link
Collaborator

@StrikerRUS StrikerRUS Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb

was not provided to the fit() call that produced this model

There are a lot of methods where feature names can be set, not only fit() from sklearn wrapper.

I believe it is enough to say that default feature names are in a form of "Column_i" (zero-based). But please read my next paragraph first.

We should document both cases.

Yeah, I support this idea! But I think that this can be documented in a more "general" place, because default feature names are common among all wrappers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh great points! Ok I'll change it, I agree with you

python-package/lightgbm/basic.py Outdated Show resolved Hide resolved
python-package/lightgbm/basic.py Outdated Show resolved Hide resolved
python-package/lightgbm/basic.py Outdated Show resolved Hide resolved
python-package/lightgbm/basic.py Outdated Show resolved Hide resolved
python-package/lightgbm/plotting.py Outdated Show resolved Hide resolved
python-package/lightgbm/plotting.py Outdated Show resolved Hide resolved
python-package/lightgbm/plotting.py Outdated Show resolved Hide resolved
python-package/lightgbm/plotting.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb Great improvement! Very clean description!

@guolinke Could you please also check that field descriptions are correct?

python-package/lightgbm/plotting.py Show resolved Hide resolved
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Copy link
Collaborator

@guolinke guolinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@StrikerRUS StrikerRUS merged commit eb03501 into master Dec 7, 2020
@StrikerRUS StrikerRUS deleted the docs/trees-to-dataframe branch December 7, 2020 12:06
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants