-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] more detailed docs for trees_to_dataframe(), create_tree_digraph(), plot_tree() #3618
Conversation
…raph(), plot_tree()
Sure, done! |
There are some warnings, will fix them tonight
|
build is passing (https://readthedocs.org/projects/lightgbm/builds/12467879/) and I think the docs look ok, so this is ready for review.
Please check that my descriptions are correct, I'm not sure about all of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb Thanks for this great PR! I believe it will help a lot users to better understand returned results of these methods.
Please check my comments below.
python-package/lightgbm/basic.py
Outdated
- ``split_feature``: string, identifier for the feature used for splitting. | ||
This is of the form ``"Column_i"``, where ``i`` refers to the number of the | ||
feature. For example, the first feature would be ``"Column_0"``. ``None`` | ||
for leaf nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is not always true. Feature names can have custom values passed by user.
- ``split_feature``: string, identifier for the feature used for splitting. | |
This is of the form ``"Column_i"``, where ``i`` refers to the number of the | |
feature. For example, the first feature would be ``"Column_0"``. ``None`` | |
for leaf nodes. | |
- ``split_feature``: string, name of the feature used for splitting. | |
``None`` for leaf nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll change this then. But I don't think just saying "name of the feature used for splitting" is sufficient. We should document both cases.
What do you think about this?
string, name of the feature used for splitting. If
feature_name
was not provided to thefit()
call that produced this model, this will be of the form"Column_i"
, wherei
refers to the number of the feature. So for example, the first feature would be"Column_0"
.
I think it's important to specify that these Column_i
names are 0-based (since that might not be obvious), and that they're based on the order of the input data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was not provided to the
fit()
call that produced this model
There are a lot of methods where feature names can be set, not only fit()
from sklearn wrapper.
I believe it is enough to say that default feature names are in a form of "Column_i"
(zero-based). But please read my next paragraph first.
We should document both cases.
Yeah, I support this idea! But I think that this can be documented in a more "general" place, because default feature names are common among all wrappers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh great points! Ok I'll change it, I agree with you
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
…BM into docs/trees-to-dataframe
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb Great improvement! Very clean description!
@guolinke Could you please also check that field descriptions are correct?
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
I've been looking at the
[lightgbm]
tag on Stacck Overflow recently, and have seen a few questions about how to interpret the output of some visualization methods in the Python Package.I think there is an opportunity to improve the documentation for those methods. This PR proposes more detailed docs on
trees_to_dataframe()
,create_tree_digraph()
, andplot_tree()
to help people understand them better. It's very possible that I've also misunderstood the meaning of some of the these things, so I appreciate your thorough reviews.Notes for Reviewers
I opened this from a LightGBM branch so that we can hopefully enable RTD builds and see if I made any formatting mistakes. @StrikerRUS if you agree with the spirit of this PR, could you enable RTD builds for this branch?