New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PUBDEV-5865: Add monotonicity constraints to H2O GBM #3097
Conversation
954b687
to
d00687b
Compare
Remove forgotten FIXME
@@ -107,7 +107,6 @@ native | |||
|
|||
# Ignore Jupyter Notebook project files | |||
*.ipynb_checkpoints/ | |||
*.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -84,6 +86,9 @@ | |||
@API(help="Column sample rate (from 0.0 to 1.0)", level = API.Level.critical, gridable = true) | |||
public double col_sample_rate; | |||
|
|||
@API(help = "A mapping representing monotonic constraints. Use +1 to enforce an increasing constraint and -1 to specify a decreasing constraint.", level = API.Level.secondary) | |||
public KeyValueV3[] monotone_constraints; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how checkpointing will work.
Is it possible to continue "improving" a model with monotonic constraints ? Or even better, remove them selectively by setting them to zero ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think people can keep training a model with constraints. It would even be possible to remove the constraints (or define them in the opposite direction - not much sense though). The new trees will respect the new parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great.. Adding the option to set various constraints to zero is in vain and might even be confusing for some users. We should however explain this possibility in documentation (probably, just thinking out loud) (cc @angela0xdata ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but there might be a good reason to explicitly point out columns that are not constrained. In that case 0 makes sense.
|
||
public class TreeUtils { | ||
|
||
public static void checkMonotoneConstraints(ModelBuilder<?, ?, ?> mb, Frame train, KeyValue[] constraints) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also check the value ? <-1,1> ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code will take any number, only the sign matters. We can restrict it more but IMHO not necessary.
} | ||
Constraint[] cs = new Constraint[f.numCols()]; | ||
for (KeyValue spec : _monotone_constraints) { | ||
if (spec.getValue() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[-1,0,1] is the standard. But here we silently ignore 0. Shouldn't we check it ? Passing 0 does not make sense with H2O API, as checkpointing uses new constraints only (newly supplied constraints are used once continuing from a checkpoing).
I guess lots of data scientists might provide a set of monotonic constraints that equals to the set of features, and mark the ignored ones by putting in the value 0
.
To me, this behavior seems correct. Just thinking out loud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it still makes sense to pass 0 as a constraint. The user might want to explicitly show that some column are not constrained. The user might also want to annotate every column to be sure all columns are handled. So I think 0 makes sense as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see two of my minor comments. Otherwise I could find nothing.
The way constraints are being set seems to be the standard anywhere else. Easy to use. I like the enhancements in debug outpout while the tree is built, this will come in handy. I bet many users don't know about that. We should mention this in a short blogpost somewhere :)
No description provided.