Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add min.node.size option to tree search #77

Merged
merged 7 commits into from
Mar 5, 2021
Merged

Add min.node.size option to tree search #77

merged 7 commits into from
Mar 5, 2021

Conversation

kanodiaayush
Copy link
Member

@kanodiaayush kanodiaayush commented Mar 3, 2021

Updates policy_tree with an optional argument min.node.size specifying the smallest permissible node size.

@erikcs erikcs changed the title min node size for splits Add min.node.size option to tree search Mar 3, 2021
@erikcs erikcs mentioned this pull request Mar 3, 2021
if (value == next_value) {
continue;
}
if (samples_counter < min_node_size || num_points - samples_counter < min_node_size) {
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving a note here for future @kanodiaayush and @erikcs:

We could break instead of continue when the RHS of this condition is met, but leave it as this for simplicity.

@erikcs
Copy link
Member

erikcs commented Mar 5, 2021

A quick context for this feature:

In an empirical application @halflearned had optimal depth-2 policies were some terminal nodes were very small (like 5 observations) - naturally these did not perform very well on held out data. This feature alleviates this "overfitting" issue by enforcing a minimum terminal node size.

(originally we did not anticipate the need for this feature - as a depth 2 tree is partitioning covariates into just 4 regions, and who would have thought the global optimum would have "pathologically" tiny nodes - feels kind of reminiscent of the "end-cut preference" of classical greedy CART https://arxiv.org/pdf/1511.05741.pdf)

@erikcs erikcs merged commit b4791b9 into master Mar 5, 2021
@erikcs erikcs deleted the min_node_size branch March 5, 2021 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants