Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show leaf values, i.e. leaf weights, for classification trees #239

Open
mepland opened this issue Jan 7, 2023 · 6 comments
Open

Show leaf values, i.e. leaf weights, for classification trees #239

mepland opened this issue Jan 7, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@mepland
Copy link
Collaborator

mepland commented Jan 7, 2023

Instead of printing the argmax predicted class name at each leaf for classification trees, allow the user to show the numeric value, i.e. weight, of the leaf as is done for regression trees. We may want to retain the current argmax class name behavior as an option for the user.

Somewhat related to #178

Current relevant code: trees.py

    prediction = node.prediction_name()

    if leaftype == 'pie':
        _draw_piechart(counts, size=size, colors=colors, filename=filename, label=f"n={nsamples}\n{prediction}",
                      graph_colors=graph_colors, fontname=fontname)
    elif leaftype == 'barh':
        _draw_barh_chart(counts, size=size, colors=colors, filename=filename, label=f"n={nsamples}\n{prediction}",
                      graph_colors=graph_colors, fontname=fontname)

For a get_prediction() example, see the sklearn_decision_trees.py implementation:

    def get_prediction(self, id):
        if self.is_classifier():
            counts = self.tree_model.tree_.value[id][0]
            return np.argmax(counts)
        else:
            return self.tree_model.tree_.value[id][0][0]
@parrt parrt added the enhancement New feature or request label Jan 8, 2023
@mepland
Copy link
Collaborator Author

mepland commented Jan 9, 2023

Also discussed here.

@parrt
Copy link
Owner

parrt commented Jan 14, 2023

yeah, let's see what @tlapusan thanks about creating a special function for classifiers, depending on the decision tree library, that returns a value to display.

@tlapusan
Copy link
Collaborator

The most important information of a leaf to display is the predicted class and after that the probability of the predictions, which shows the confidence of the predicted class. So IMO, we can add an option to display the probability, but not making it the default one. Indirectly... the user can deduce the probability of the predicted class by looking at the leaf pie chart...

All the dtreeviz visualisations were created to interprete trees which are independent (not interconnected), like a tree from a random forest... Indeed, xgboost is a little different and we can make some adjustments for it.

I'm in vacation this week, but I will thing about it while skiing ⛷️ .

@mepland
Copy link
Collaborator Author

mepland commented Jan 19, 2023

So IMO, we can add an option to display the probability, but not making it the default one.

Totally happy to have the class name remain the default behavior. I would just like to extend it to also be able to show the leaf values if the user wants to enable them.

Indirectly... the user can deduce the probability of the predicted class by looking at the leaf pie chart...

For most tree models yes, but the FIGS model of csinva/imodels does not use the leaf positive class fraction for its leaf values; instead they are the residuals of the other trees in the ensemble for the points in the leaf. Plus it is always good to have a quantitative display option, rather than trying to read the leaf graph by eye for the % positive.

@parrt
Copy link
Owner

parrt commented Jan 22, 2023

Rather than a user having a specify a dictionary, I think it's better if we come up with a function that is generic across libraries that returns a value that makes sense for that library. Then there is an option to flip it to show that value.

Or, we allow lambda or function as an argument that gets applied to each leaf node to get a value.

@mepland
Copy link
Collaborator Author

mepland commented Jan 22, 2023

Yeah, makes sense - that is the elegant solution. I will work on writing up an implementation for sklearn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants