-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to specify algo parameters (e.g. monotonicity constraints) in H2O AutoML #4005
Conversation
stopping_criteria = new AutoMLStoppingCriteria(); | ||
|
||
// reasonable defaults: | ||
stopping_criteria.set_max_models(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is -1 or 0 more common as a replacement for None/NULL for an integer-valued parameter on the backend? I have seen both... (which I guess means we can choose whichever one is "better"?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ledell I didn't change that code, just moved it down: usually in Java, attributes are declared before constructor, I was just applying this "norm" here.
Regarding the -1/0 debate... there's no rule written in stone. -1 is commonly used for "unset" (although Java assigns 0 by default to int variables), and when there's no ambiguity like here, I tend to use 0 for "unlimited". Maybe it deserves a tiny comment next to it.
@seb-h2o Can you update the description to remove My concern with
In general, GBMs require more (usually more shallow) trees than Random Forest, so I think it makes sense to allow the user to be precise in their configurations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Seb, great feature!
if (!addParameter(param.algo, param.name, param.value)) | ||
throw new H2OIllegalValueException(param.name, ROOT_PARAM, param.value); | ||
} | ||
return this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seb-h2o usually build pattern returns different type so that we can't do chain calls anymore( SmthBuilder.add().add().build() -> Smth). With current approach we maybe don't need to return this
at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I hesitated into turning this into a proper "builder" (that's also why I renamed the build
method).
Mainly trying to keep it simple: not sure builders are justified most of the time.
When I see
AutoMLCustomParameters algo_parameters;
my reflex is:
algo_parameters = new AutoMLCustomParameters();
not: "damn, constructor is private, any factory method? AutoMLCustomParameters.newInstance(...)
? damn, what then? Oh found it new AutoMLCustomParametersBuilder()....build()
"
So trying to combine the best of both:
algo_parameters = new AutoMLCustomParameters().add(...)
.add(...)
.end();
It's not without flaws though, conscious about this. Will reconsider and try to find sth more satisfying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ended up with AutoMLCustomParameters.create().add(...).build()
create
returns an internal Builder instance`
return algo_parameter_names.get(algo.name()); | ||
} | ||
|
||
public Model.Parameters getCustomParameters(Algo algo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we actually get both default and custom parameters here.... naming is slightly confusing. Maybe getCustomisedDefaults
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this should be public anymore actually, which would make the naming less important.
getCustomDefaults
has something oxymoronic in it that makes me like it somehow :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed, and lowered visibility (used in tests)
h2o-automl/src/main/java/water/automl/api/schemas3/AutoMLBuildSpecV99.java
Outdated
Show resolved
Hide resolved
@ledell you mean in the JIRA ticket? regarding the implementation, it is actually possible to set params for a specific algo using the following syntax (works both in R and Py):
Anyway, as soon as we're not sure about exposing |
@seb-h2o No I mean in this PR description (it still says Thanks for clarifying about how to apply the parameter to a particular algorithm. I remember seeing this Can you list the strings that you accept for each algorithm? I want to make sure we are using consistent naming of the algorithms... I filed this JIRA recently because I noticed that H2O has several sets of algorithm names. It would be great to unify this across H2O moving forward. Regarding |
It is the "standard" flat format used by scikit learn to specify nested parameters: I must say that I like it. When we have the possibility to avoid an additional level of depth without possibility of confusion, it brings only clarity I think.
AutoML is using the algo names listed in https://github.com/h2oai/h2o-3/blob/master/h2o-automl/src/main/java/ai/h2o/automl/Algo.java everywhere.
For me But overall, I think the idea of full customization should be thought like DAI recipes, with a mini-language describing additional models, grids, feature engineering... This being said, after disabling |
Co-Authored-By: Erin LeDell <erin@h2o.ai>
3cae0d3
to
7281e40
Compare
…) in H2O AutoML (h2oai#4005) * Java API for custom params * specs for custom params * move some logic from schema to BuildSpec * Py support for algo_parameters * removed useless schema from AutoMLBuildSpec * improved parsing logic for REST API polymorphic params * R client support * disabled ntrees, improved tests and addressed PR comments
https://0xdata.atlassian.net/browse/PUBDEV-6737
Current implementation restricts the parameters that can be assigned by the user to:
monotone_constraints
as required by initial customer request.Added a System property to allow overriding of
any
parameter: the plan is to use this for development/benchmarking.Tuning algorithms, and running benchmarks against versions with different parameters is currently a hassle (recompile, rename built jar, upload it, keep reference...).
Having those parameters configurable from client will allow us to try more combinations faster (may also support hyperparameters override in the future for benchmarking as well).