Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SW-2029] Add H2O GLRM to Algo API #2224

Merged
merged 14 commits into from Jul 20, 2020
Merged

[SW-2029] Add H2O GLRM to Algo API #2224

merged 14 commits into from Jul 20, 2020

Conversation

mn-mikke
Copy link
Collaborator

No description provided.

@mn-mikke mn-mikke added work in progress WIP next major release Goes into Major release labels Jul 13, 2020
JObject(obj) <- ast
JField("model_summary", JObject(modelSummary)) <- obj
JField("columns", JArray(columns)) <- modelSummary
("final_objective_value", index) <- columns.arr.map(getColumnName).zipWithIndex
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wendycwong I'm extracting the metric from model_summary on MOJO model. The name of the value is final_objective_value. Is this the value that we try to minimize during the model training?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mn-mikke: that is correct.


algo = getPreconfiguredAlgorithm()
algo.setLossByCol(["absolute", "huber"])
algo.setLossByColNames(["Murder", "Rape"])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replace loss_by_col_idx with lossByColNames which reference column by their names. Column indices wouldn't work well in SW. I also considered to merge lossByCol and lossByColNames into one map/dictionary parameter, but I didn't that since lossByCol can be used in gridSearch according to the documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Contributor

@jakubhava jakubhava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a few small suggestions

doc/src/site/sphinx/ml/sw_glrm.rst Outdated Show resolved Hide resolved

algo = getPreconfiguredAlgorithm()
algo.setLossByCol(["absolute", "huber"])
algo.setLossByColNames(["Murder", "Rape"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@jakubhava
Copy link
Contributor

Also just realized that it might be hard to obtain the reduced frame if we do not explicitly set the responseName because H2O creates the name as "GLRMLoading_" + Key.rand() if not specified.

What do you think of generating some default name on SW side and mentioning that in docs?

@mn-mikke
Copy link
Collaborator Author

@jakubhava

Also just realized that it might be hard to obtain the reduced frame if we do not explicitly set the responseName because H2O creates the name as "GLRMLoading_" + Key.rand() if not specified.

Is a user able to get representation_name through h2O-3 if it the value wasn't explicitly specified?

What do you think of generating some default name on SW side and mentioning that in docs?

We could generate some default value, but would it cause collisions in DKV if train algorithm multiple times?

@jakubhava
Copy link
Contributor

jakubhava commented Jul 16, 2020

Good points on the last comments:

Is a user able to get representation_name through h2O-3 if it the value wasn't explicitly specified?

Don't know, maybe @wendycwong knows?

We could generate some default value, but would it cause collisions in DKV if the train algorithm multiple times?

What I meant generating a unique DKV name and setting it to the default value of the parameter( if that's possible). So a user could get the name via the algorithm parameter. But it's true that in some real edge cases even this key could become taken before we run the algo. But just thinking out loud.

@mn-mikke mn-mikke merged commit 22a9822 into master Jul 20, 2020
@mn-mikke mn-mikke deleted the mn/SW-2029 branch July 20, 2020 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next major release Goes into Major release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants