Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allows CART classification trees to produce pure leaves #303

Merged
merged 3 commits into from Dec 8, 2022

Conversation

Craigacp
Copy link
Member

@Craigacp Craigacp commented Dec 2, 2022

Description

Removes a check that the left and right nodes have non-zero impurities. These values (on either or both sides) could be zero when the nodes are pure (i.e. have a single output label), and that should be allowed in the tree creation. Removing this check could cause trees to overfit more if built until pure, but that's what the CART algorithm says should happen, and it is expected in ensembles like Random Forests.

Motivation

The original check was invalid and prevents the model from making pure leaf nodes which only contain a single output value, potentially causing poor performance. Fixes #298.

@Craigacp Craigacp added the squash-commits Squash the commits when merging this PR label Dec 2, 2022
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Dec 2, 2022
@Craigacp Craigacp merged commit 8bb0259 into oracle:main Dec 8, 2022
@Craigacp Craigacp deleted the tree-fix branch December 8, 2022 17:20
Craigacp added a commit that referenced this pull request Dec 16, 2022
* Removing a check that prevents pure splits from being added.

* Making the leaf node detection more robust.

* Adding a test for pure leaf creation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement. squash-commits Squash the commits when merging this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Poor decision tree classification performance on simple, noise-free data
2 participants