Expose Number of Trees in SW MOJO #5593

exalate-issue-sync · 2023-05-22T20:15:44Z

Allow users to get the number of trees (ntrees) in a Tree-Based model (i.e. GBM, DRF) when early stopping is enabled.

Note, if early stopping is enabled the algorithm can overwrite a user's ntree value. To access the model's actual ntree value, a user currently would have to use the mixed api (https://github.com/h2oai/h2o-droplets/blob/master/h2o-sw-mixed-api-droplet/src/main/scala/water/droplets/H2OSWMixedAPIDroplet.scala#L100).

exalate-issue-sync · 2023-05-22T20:15:46Z

Jakub Hava commented: I was thinking about this and I think the right solution will be to have H2OMOJOModel as base and create MOJOModel for each Algo which can expose additional methods and we avoid having methods on algos which does not make sense

We need to make sure this specialized mojo model is exported in the pipelines.

exalate-issue-sync · 2023-05-22T20:15:47Z

Jakub Hava commented: [~accountid:5a32df017dcf343865c26fa5] just want to catch up → is the work on H2O side already done?

exalate-issue-sync · 2023-05-22T20:15:49Z

Pavel Pscheidl commented: [~accountid:557058:eeeb611c-665e-431d-b442-1f255171db6f] Every tree-based model in H2O-3 inherits from SharedTreeModel class, which forces it to override {{getNTrees}} methods. This information is thus available in core.

The same goes for model *Output classes. When a SharedTreeModelOutput is fetched (Be it GBM, DRF, XGBoost…), there is {{_ntrees}} property serialized. Either way, you should be able to obtain such information about the model AFTER it has been train, or in other words to get the actual number of trees trained, not some value from parameters.

Is that sufficient for your needs ?

exalate-issue-sync · 2023-05-22T20:15:51Z

Jakub Hava commented: Yup, perfect, thank you Pavel!

exalate-issue-sync · 2023-05-22T20:15:53Z

Jakub Hava commented: [~accountid:5c9943ec3a5542225fedb6b9] would you mind if I have a look on this? I would like to give you free hands for the conversions

exalate-issue-sync · 2023-05-22T20:15:54Z

Jakub Hava commented: After discussion with Pavel, the values are already in SharedTreeMojoModel and can the real number of trees can be computed as {{_ntree_groups * _ntrees_per_group}}

We should probably put the value directly into SharedTreeMojoModel so SW can just consume the API

exalate-issue-sync · 2023-05-22T20:15:56Z

Marek Novotny commented: [~accountid:557058:eeeb611c-665e-431d-b442-1f255171db6f] Go a head if you want 😉

exalate-issue-sync · 2023-05-22T20:15:58Z

Jakub Hava commented: [~accountid:5c9943ec3a5542225fedb6b9] I was thinking about possible implementation.

The issue is that if we have multiple mojo types we loose the specific type as we go, which I’m not sure if we can actually solve.

The solution I propose is to introduce H2OMOJOModel for each algo ( something what you did for MojoSupervised & MojoUnsupervised).

They will actually just extend the classes you created and GLMMojo would implement this functionality.

On Mojo Object we would have methods like mojo.getAsGLMMojoModel/getAsGBMMojoModel etc. ( I want to avoid user calling asInstanceOf and want this api to be documented). The methods would internaly check if the instance is trully what the user is asking and give the user either the cast type or warn the user.

Basically I would just like to hide the casting from the user and expose access to particular mojo type. We do not have that many algos, I think this is fine to do.

What do you think?

exalate-issue-sync · 2023-05-22T20:15:59Z

Marek Novotny commented: IMHO, it will be unavoidable to have a specific MOJO class per algorithm, but to fulfil this task, we could just introduce a subclass for the supervised MOJO. Something like TreeBasedMOJO that would have number of trees exposed.

exalate-issue-sync · 2023-05-22T20:16:01Z

Jakub Hava commented: Yup, for this task we can start just with this, no need to do the mojo for each algo now

DinukaH2O · 2023-05-23T13:12:29Z

JIRA Issue Migration Info

Jira Issue: SW-1495
Assignee: Marek Novotny
Reporter: Lauren DiPerna
State: Resolved
Fix Version: 3.28.0.1-1
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#1693

hasithjp · 2023-05-29T15:53:20Z

JIRA Issue Migration Info Cont'd

Jira Issue Created Date: 2019-08-07T15:52:42.086-0700

DinukaH2O assigned mn-mikke May 23, 2023

DinukaH2O closed this as completed May 23, 2023

DinukaH2O added the fixVersion/3.28.0.1-1 label May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose Number of Trees in SW MOJO #5593

Expose Number of Trees in SW MOJO #5593

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

DinukaH2O commented May 23, 2023

hasithjp commented May 29, 2023

Expose Number of Trees in SW MOJO #5593

Expose Number of Trees in SW MOJO #5593

Comments

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

DinukaH2O commented May 23, 2023

hasithjp commented May 29, 2023