Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plot tree with high cardinality feature #5687

Closed
moziada opened this issue Jan 27, 2023 · 16 comments · Fixed by #5818
Closed

plot tree with high cardinality feature #5687

moziada opened this issue Jan 27, 2023 · 16 comments · Fixed by #5818
Labels

Comments

@moziada
Copy link
Contributor

moziada commented Jan 27, 2023

When I try to plot a tree that splits on a feature with high cardinality the overall diagram gets missed up no matter how I tried to adjust the width and height
out
There must be a way to ignore writing all the values that are used on the split within the nodes

@jmoralez
Copy link
Collaborator

Hi @moziada, thanks for using LightGBM. I think a possible solution would be to check the number of categories in the split and if there are too many we should collapse them and show them in the tooltip instead, which would look like this:
Screenshot from 2023-02-12 16-30-37
The problem is that at least for me (firefox + jupyterlab) the graph isn't automatically shown as SVG, so I had to save it first and then load it and show as HTML which is cumbersome. We should find a way to display it as SVG by default, I can look into that later, or if you're interested you can investigate and contribute the change yourself. The logic for showing it in the tooltip is very simple, here's a sample diff:

diff --git a/python-package/lightgbm/plotting.py b/python-package/lightgbm/plotting.py
index d2a57209..87c287c0 100644
--- a/python-package/lightgbm/plotting.py
+++ b/python-package/lightgbm/plotting.py
@@ -493,7 +493,13 @@ def _to_graphviz(
                     direction = _determine_direction_for_numeric_split(
                         example_case[split_feature], root['threshold'], root['missing_type'], root['default_left']
                     )
-            label += f"<B>{_float2str(root['threshold'], precision)}</B>"
+            if is_categorical_split and len(root['threshold'].split('||')) > 5:
+                threshold = '...'
+                tooltip = root['threshold']
+            else:
+                threshold = root['threshold']
+                tooltip = None
+            label += f"<B>{_float2str(threshold, precision)}</B>"
             for info in ['split_gain', 'internal_value', 'internal_weight', "internal_count", "data_percentage"]:
                 if info in show_info:
                     output = info.split('_')[-1]
@@ -525,7 +531,8 @@ def _to_graphviz(
             if "data_percentage" in show_info:
                 label += f"<br/>{_float2str(root['leaf_count'] / total_count * 100, 2)}% of data"
             label = f"<{label}>"
-        graph.node(name, label=label, shape=shape, style=style, fillcolor=fillcolor, color=color, penwidth=penwidth)
+            tooltip = None
+        graph.node(name, label=label, shape=shape, style=style, fillcolor=fillcolor, color=color, penwidth=penwidth, tooltip=tooltip)
         if parent is not None:
             graph.edge(parent, name, decision, color=color, penwidth=penwidth)

@github-actions
Copy link

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

@jmoralez
Copy link
Collaborator

I shouldn't have set the awaiting response for this, it's a valid issue and I think should be fixed. I'm reopening.

@jmoralez jmoralez reopened this Mar 15, 2023
@github-actions
Copy link

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

@moziada
Copy link
Contributor Author

moziada commented Mar 16, 2023

Hey @jameslamb sorry I did not receive your response earlier, I will look into it and try to make a contribution.
I will keep you with any updates

@moziada
Copy link
Contributor Author

moziada commented Mar 26, 2023

After searching I have found that this is a common issue with JupyterLab and people have opened issues on it (like this issuse).
The workaround to display svg images on JupyterLab is to wrap the svg image content with IPython.display.HTML then the tooltip works properly
JupyterLab

Also I have tested the tooltip on Jupyter Notebook it works fine no workarounds are needed
Jupyter Notebook
Both screenshots are taken on Firefox browser

@moziada
Copy link
Contributor Author

moziada commented Mar 28, 2023

@jmoralez sorry I forgot to mention you with my results

@jmoralez
Copy link
Collaborator

I think this is a good feature. We can maybe add an argument with the maximum number of thresholds to show and in the description of that argument specify that the user may need to use the IPython.display.HTML in jupyterlab. Do you want to submit a PR with that?

@moziada
Copy link
Contributor Author

moziada commented Mar 31, 2023

Sure I am working on a PR

@moziada
Copy link
Contributor Author

moziada commented Apr 1, 2023

I am having difficulties on testing the changes I have made, is there any systematic way to make changes and install the new version to the current python environment?

@jameslamb
Copy link
Collaborator

jameslamb commented Apr 2, 2023

Since you're only touching Python code, compile the library just one time by running the following from the root of the repo.

rm -r ./build
mkdir ./build
cd ./build
cmake ..
make -j2

Then every time you change the Python package, run this from the root of the repo.

cd ./python-package
python setup.py install --precompile

Then every time you want to run the tests, point pytest at any of the directories and files in the tests directory. For example, by runming the following from the root of the repo.

pytest tests/python_package_test/test_plotting.py

@moziada
Copy link
Contributor Author

moziada commented Apr 3, 2023

I have made the changes but I noticed that all .rst doc pages under Plotting section are not available on github

@jameslamb
Copy link
Collaborator

That page is automatically generared from the docstrings in python-package/lightgbm/plotting.py.

@jmoralez
Copy link
Collaborator

jmoralez commented Apr 4, 2023

You can also render the docs locally following this.

@moziada
Copy link
Contributor Author

moziada commented Apr 4, 2023

Thanks I have managed to render it correctly and it seems fine to me, I have opened pull request here #5818

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
3 participants