Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUBDEV-5865: Add monotonicity constraints to H2O GBM #3097

Merged
merged 7 commits into from Dec 20, 2018

Conversation

michalkurka
Copy link
Contributor

No description provided.

Remove forgotten FIXME
@@ -107,7 +107,6 @@ native

# Ignore Jupyter Notebook project files
*.ipynb_checkpoints/
*.ipynb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -84,6 +86,9 @@
@API(help="Column sample rate (from 0.0 to 1.0)", level = API.Level.critical, gridable = true)
public double col_sample_rate;

@API(help = "A mapping representing monotonic constraints. Use +1 to enforce an increasing constraint and -1 to specify a decreasing constraint.", level = API.Level.secondary)
public KeyValueV3[] monotone_constraints;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how checkpointing will work.

Is it possible to continue "improving" a model with monotonic constraints ? Or even better, remove them selectively by setting them to zero ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think people can keep training a model with constraints. It would even be possible to remove the constraints (or define them in the opposite direction - not much sense though). The new trees will respect the new parameters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great.. Adding the option to set various constraints to zero is in vain and might even be confusing for some users. We should however explain this possibility in documentation (probably, just thinking out loud) (cc @angela0xdata ).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but there might be a good reason to explicitly point out columns that are not constrained. In that case 0 makes sense.


public class TreeUtils {

public static void checkMonotoneConstraints(ModelBuilder<?, ?, ?> mb, Frame train, KeyValue[] constraints) {
Copy link
Contributor

@Pscheidl Pscheidl Dec 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check the value ? <-1,1> ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code will take any number, only the sign matters. We can restrict it more but IMHO not necessary.

}
Constraint[] cs = new Constraint[f.numCols()];
for (KeyValue spec : _monotone_constraints) {
if (spec.getValue() == 0)
Copy link
Contributor

@Pscheidl Pscheidl Dec 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[-1,0,1] is the standard. But here we silently ignore 0. Shouldn't we check it ? Passing 0 does not make sense with H2O API, as checkpointing uses new constraints only (newly supplied constraints are used once continuing from a checkpoing).

I guess lots of data scientists might provide a set of monotonic constraints that equals to the set of features, and mark the ignored ones by putting in the value 0.

To me, this behavior seems correct. Just thinking out loud.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still makes sense to pass 0 as a constraint. The user might want to explicitly show that some column are not constrained. The user might also want to annotate every column to be sure all columns are handled. So I think 0 makes sense as well.

Copy link
Contributor

@Pscheidl Pscheidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see two of my minor comments. Otherwise I could find nothing.

The way constraints are being set seems to be the standard anywhere else. Easy to use. I like the enhancements in debug outpout while the tree is built, this will come in handy. I bet many users don't know about that. We should mention this in a short blogpost somewhere :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants