[SW-2029] Add H2O GLRM to Algo API #2224

mn-mikke · 2020-07-13T21:38:04Z

No description provided.

mn-mikke · 2020-07-15T14:16:21Z

ml/src/main/scala/ai/h2o/sparkling/ml/algos/H2OGridSearch.scala

+        JObject(obj) <- ast
+        JField("model_summary", JObject(modelSummary)) <- obj
+        JField("columns", JArray(columns)) <- modelSummary
+        ("final_objective_value", index) <- columns.arr.map(getColumnName).zipWithIndex


@wendycwong I'm extracting the metric from model_summary on MOJO model. The name of the value is final_objective_value. Is this the value that we try to minimize during the model training?

@mn-mikke: that is correct.

mn-mikke · 2020-07-15T14:21:55Z

py/tests/unit/with_runtime_sparkling/test_glrm.py

+
+    algo = getPreconfiguredAlgorithm()
+    algo.setLossByCol(["absolute", "huber"])
+    algo.setLossByColNames(["Murder", "Rape"])


I've replace loss_by_col_idx with lossByColNames which reference column by their names. Column indices wouldn't work well in SW. I also considered to merge lossByCol and lossByColNames into one map/dictionary parameter, but I didn't that since lossByCol can be used in gridSearch according to the documentation.

Good catch!

py/tests/unit/with_runtime_sparkling/test_glrm.py

jakubhava

LGTM. Just a few small suggestions

doc/src/site/sphinx/ml/sw_glrm.rst

ml/src/main/scala/ai/h2o/sparkling/ml/algos/H2OGridSearch.scala

ml/src/test/scala/ai/h2o/sparkling/ml/algos/H2OGLRMTestSuite.scala

py/tests/unit/with_runtime_sparkling/test_glrm.py

jakubhava · 2020-07-16T09:59:01Z

py/tests/unit/with_runtime_sparkling/test_glrm.py

+
+    algo = getPreconfiguredAlgorithm()
+    algo.setLossByCol(["absolute", "huber"])
+    algo.setLossByColNames(["Murder", "Rape"])


Good catch!

jakubhava · 2020-07-16T10:01:58Z

Also just realized that it might be hard to obtain the reduced frame if we do not explicitly set the responseName because H2O creates the name as "GLRMLoading_" + Key.rand() if not specified.

What do you think of generating some default name on SW side and mentioning that in docs?

mn-mikke · 2020-07-16T15:02:00Z

@jakubhava

Also just realized that it might be hard to obtain the reduced frame if we do not explicitly set the responseName because H2O creates the name as "GLRMLoading_" + Key.rand() if not specified.

Is a user able to get representation_name through h2O-3 if it the value wasn't explicitly specified?

What do you think of generating some default name on SW side and mentioning that in docs?

We could generate some default value, but would it cause collisions in DKV if train algorithm multiple times?

jakubhava · 2020-07-16T16:26:50Z

Good points on the last comments:

Is a user able to get representation_name through h2O-3 if it the value wasn't explicitly specified?

Don't know, maybe @wendycwong knows?

We could generate some default value, but would it cause collisions in DKV if the train algorithm multiple times?

What I meant generating a unique DKV name and setting it to the default value of the parameter( if that's possible). So a user could get the name via the algorithm parameter. But it's true that in some real edge cases even this key could become taken before we run the algo. But just thinking out loud.

api-generation/src/main/scala/ai/h2o/sparkling/api/generation/common/IgnoredParameters.scala

mn-mikke added work in progress WIP next major release Goes into Major release labels Jul 13, 2020

mn-mikke added 2 commits July 14, 2020 11:09

[SW-2029] Add H2O GLRM to Algo API

f9d6b87

Fix tests

053f4a6

mn-mikke force-pushed the mn/SW-2029 branch from d154e3a to 053f4a6 Compare July 14, 2020 09:11

mn-mikke added 6 commits July 14, 2020 15:45

Python tests

98b2a69

Add DimrReductionPredictionTestSuite

4b6b0ef

Add GLRM to GridSearch

b15b429

Update GridSearch doc

9b32f56

Explicit definition of HasLossbyColNames

bc5da94

Add documentation

28609a2

mn-mikke removed the work in progress WIP label Jul 15, 2020

mn-mikke requested review from jakubhava and wendycwong July 15, 2020 14:11

mn-mikke commented Jul 15, 2020

View reviewed changes

py/tests/unit/with_runtime_sparkling/test_glrm.py Show resolved Hide resolved

jakubhava approved these changes Jul 16, 2020

View reviewed changes

mn-mikke added 3 commits July 16, 2020 12:21

Drop class column from exeamples

6edf53b

Adress Some of the review comments

ec8600a

Update formatting in doc

786250e

mn-mikke added 3 commits July 16, 2020 23:28

Add GLRM test

2e6fb14

Update documentation

abc0ee0

Update documentation

e247009

wendycwong approved these changes Jul 20, 2020

View reviewed changes

api-generation/src/main/scala/ai/h2o/sparkling/api/generation/common/IgnoredParameters.scala Show resolved Hide resolved

mn-mikke merged commit 22a9822 into master Jul 20, 2020

mn-mikke deleted the mn/SW-2029 branch July 20, 2020 18:39

DinukaH2O mentioned this pull request May 23, 2023

Add H2O GLRM to Algo API #3546

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SW-2029] Add H2O GLRM to Algo API #2224

[SW-2029] Add H2O GLRM to Algo API #2224

mn-mikke commented Jul 13, 2020

mn-mikke Jul 15, 2020

wendycwong Jul 20, 2020

mn-mikke Jul 15, 2020

jakubhava Jul 16, 2020

jakubhava left a comment

jakubhava Jul 16, 2020

jakubhava commented Jul 16, 2020

mn-mikke commented Jul 16, 2020

jakubhava commented Jul 16, 2020 •

edited

[SW-2029] Add H2O GLRM to Algo API #2224

[SW-2029] Add H2O GLRM to Algo API #2224

Conversation

mn-mikke commented Jul 13, 2020

mn-mikke Jul 15, 2020

Choose a reason for hiding this comment

wendycwong Jul 20, 2020

Choose a reason for hiding this comment

mn-mikke Jul 15, 2020

Choose a reason for hiding this comment

jakubhava Jul 16, 2020

Choose a reason for hiding this comment

jakubhava left a comment

Choose a reason for hiding this comment

jakubhava Jul 16, 2020

Choose a reason for hiding this comment

jakubhava commented Jul 16, 2020

mn-mikke commented Jul 16, 2020

jakubhava commented Jul 16, 2020 • edited

jakubhava commented Jul 16, 2020 •

edited