Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Score decision tree using only samples that propagate to leaves trained with >N datapoints #20537

Open
rohilverma opened this issue Jul 15, 2021 · 3 comments

Comments

@rohilverma
Copy link

After training a particular model, there is typically a decent amount of threshold tuning in order to select the appropriate decision rule for the application. In general, the goal is to identify rules (either based on threshold, or other properties of the algorithm) that enable us to exclude the model from making bad decisions while still being applicable to the majority of the dataset.

For decision trees, one such possible rule is to exclude predictions which fall into leaves lacking a sufficient number of datapoints, where sufficient (>N) is specified by the user. For example, one might say that a leaf was added during training based on only 2 datapoints. During testing/scoring, we could skip samples which fall into this leaf to prevent low-confidence decision making.

Alternatives/nuance: it is possible that rather than raw number of samples, we may also want to consider impurity within the leaf, or perhaps some weighted metric weighing impurity against number of samples.

Feedback on whether this is something others have considered (I could not find any similar issues in the tracker) or whether there are clear drawbacks to this approach would be appreciated.

@jnothman
Copy link
Member

This would be an alternative pruning strategy? How does it compare to cost-complexity pruning (https://scikit-learn.org/stable/auto_examples/tree/plot_cost_complexity_pruning.html)? There are indeed several issues related to pruning in our tracker.

@ayushi2019031
Copy link

Hi I want to work on this.

@thomasjpfan
Copy link
Member

thomasjpfan commented Aug 28, 2021

@ayushi2019031 To move forward, we would need references to these alternative pruning strategies and see if it meets our inclusion criteria.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants