Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monotonic variables for GBM #14868

Closed
exalate-issue-sync bot opened this issue May 13, 2023 · 7 comments
Closed

Monotonic variables for GBM #14868

exalate-issue-sync bot opened this issue May 13, 2023 · 7 comments

Comments

@exalate-issue-sync
Copy link

Monotonic variables for GBM
The ability to designate a variable as monotonically increasing or decreasing. This means that splits will only be chosen if the prediction for the left side is smaller (or larger, resp.) than the right.

@exalate-issue-sync
Copy link
Author

Mark Landry commented: At first read, this looks like ordered factors. But this is modeling behavior. And it is subtly quite complex. It likely changes the way the entire algorithm operates to where we'd have to implement the basic tree split differently. They want a suboptimal model that fits the supposed natural order of the data sets they have.
This seems difficult and not often requested, though this does seem to match how some insurance companies prefer to model.
Additionally, we'd have to have a new column chooser to pick the multiple sets of columns that should obey this constraint, so the API will get bulky.

@exalate-issue-sync
Copy link
Author

Alon Gilmore commented: Insurance and banking companies sometimes have very large multi-dimensional rating factors table that needs to be optimized.
The dataset itself is not ordered and one cannot assume the all factors are monotonically decreasing or increasing. Consider the fact we're talking about (possibly many) multiple factors in one dataset, this is a real problem for companies to use GBM effectively.

@exalate-issue-sync
Copy link
Author

Jenna Yang commented: This feature has already been implemented in xgboost:
http://xgboost.readthedocs.io/en/latest/tutorials/monotonic.html

Not only will it alleviate overfitting but also helps to eliminate relationships that do not make sense.

@exalate-issue-sync
Copy link
Author

Patrick Hall commented: See: https://0xdata.atlassian.net/browse/PUBDEV-3920

@exalate-issue-sync
Copy link
Author

Javier Recasens commented: Will it be implemented for H2O gbm?

This is currently implemented in the standard gmb R package via var.monotone argument.

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: [~accountid:5b16f0d47dab4c51f61b5513], yes - good news is will be adding this feature in one of the future releases (1-2 months from now)

@DinukaH2O
Copy link
Contributor

JIRA Issue Migration Info

Jira Issue: PUBDEV-1984
Assignee: Michal Kurka
Reporter: Mark Landry
State: Resolved
Fix Version: 3.22.0.3
Attachments: N/A
Development PRs: N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant