New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SW-2029] Add H2O GLRM to Algo API #2224
Conversation
JObject(obj) <- ast | ||
JField("model_summary", JObject(modelSummary)) <- obj | ||
JField("columns", JArray(columns)) <- modelSummary | ||
("final_objective_value", index) <- columns.arr.map(getColumnName).zipWithIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wendycwong I'm extracting the metric from model_summary on MOJO model. The name of the value is final_objective_value
. Is this the value that we try to minimize during the model training?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mn-mikke: that is correct.
|
||
algo = getPreconfiguredAlgorithm() | ||
algo.setLossByCol(["absolute", "huber"]) | ||
algo.setLossByColNames(["Murder", "Rape"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've replace loss_by_col_idx
with lossByColNames
which reference column by their names. Column indices wouldn't work well in SW. I also considered to merge lossByCol
and lossByColNames
into one map/dictionary parameter, but I didn't that since lossByCol
can be used in gridSearch according to the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a few small suggestions
ml/src/main/scala/ai/h2o/sparkling/ml/algos/H2OGridSearch.scala
Outdated
Show resolved
Hide resolved
ml/src/main/scala/ai/h2o/sparkling/ml/algos/H2OGridSearch.scala
Outdated
Show resolved
Hide resolved
|
||
algo = getPreconfiguredAlgorithm() | ||
algo.setLossByCol(["absolute", "huber"]) | ||
algo.setLossByColNames(["Murder", "Rape"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Also just realized that it might be hard to obtain the reduced frame if we do not explicitly set the responseName because H2O creates the name as What do you think of generating some default name on SW side and mentioning that in docs? |
Is a user able to get
We could generate some default value, but would it cause collisions in DKV if train algorithm multiple times? |
Good points on the last comments:
Don't know, maybe @wendycwong knows?
What I meant generating a unique DKV name and setting it to the default value of the parameter( if that's possible). So a user could get the name via the algorithm parameter. But it's true that in some real edge cases even this key could become taken before we run the algo. But just thinking out loud. |
No description provided.