Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Number of Trees in SW MOJO #5593

Closed
exalate-issue-sync bot opened this issue May 22, 2023 · 12 comments
Closed

Expose Number of Trees in SW MOJO #5593

exalate-issue-sync bot opened this issue May 22, 2023 · 12 comments
Assignees

Comments

@exalate-issue-sync
Copy link

Allow users to get the number of trees (ntrees) in a Tree-Based model (i.e. GBM, DRF) when early stopping is enabled.

Note, if early stopping is enabled the algorithm can overwrite a user's ntree value. To access the model's actual ntree value, a user currently would have to use the mixed api (https://github.com/h2oai/h2o-droplets/blob/master/h2o-sw-mixed-api-droplet/src/main/scala/water/droplets/H2OSWMixedAPIDroplet.scala#L100).

@exalate-issue-sync
Copy link
Author

Jakub Hava commented: I was thinking about this and I think the right solution will be to have H2OMOJOModel as base and create MOJOModel for each Algo which can expose additional methods and we avoid having methods on algos which does not make sense

We need to make sure this specialized mojo model is exported in the pipelines.

@exalate-issue-sync
Copy link
Author

Jakub Hava commented: [~accountid:5a32df017dcf343865c26fa5] just want to catch up → is the work on H2O side already done?

@exalate-issue-sync
Copy link
Author

Pavel Pscheidl commented: [~accountid:557058:eeeb611c-665e-431d-b442-1f255171db6f] Every tree-based model in H2O-3 inherits from SharedTreeModel class, which forces it to override {{getNTrees}} methods. This information is thus available in core.

The same goes for model *Output classes. When a SharedTreeModelOutput is fetched (Be it GBM, DRF, XGBoost…), there is {{_ntrees}} property serialized. Either way, you should be able to obtain such information about the model AFTER it has been train, or in other words to get the actual number of trees trained, not some value from parameters.

Is that sufficient for your needs ?

@exalate-issue-sync
Copy link
Author

Jakub Hava commented: Yup, perfect, thank you Pavel!

@exalate-issue-sync
Copy link
Author

Jakub Hava commented: [~accountid:5c9943ec3a5542225fedb6b9] would you mind if I have a look on this? I would like to give you free hands for the conversions

@exalate-issue-sync
Copy link
Author

Jakub Hava commented: After discussion with Pavel, the values are already in SharedTreeMojoModel and can the real number of trees can be computed as {{_ntree_groups * _ntrees_per_group}}

We should probably put the value directly into SharedTreeMojoModel so SW can just consume the API

@exalate-issue-sync
Copy link
Author

Marek Novotny commented: [~accountid:557058:eeeb611c-665e-431d-b442-1f255171db6f] Go a head if you want 😉

@exalate-issue-sync
Copy link
Author

Jakub Hava commented: [~accountid:5c9943ec3a5542225fedb6b9] I was thinking about possible implementation.

The issue is that if we have multiple mojo types we loose the specific type as we go, which I’m not sure if we can actually solve.

The solution I propose is to introduce H2OMOJOModel for each algo ( something what you did for MojoSupervised & MojoUnsupervised).

They will actually just extend the classes you created and GLMMojo would implement this functionality.

On Mojo Object we would have methods like mojo.getAsGLMMojoModel/getAsGBMMojoModel etc. ( I want to avoid user calling asInstanceOf and want this api to be documented). The methods would internaly check if the instance is trully what the user is asking and give the user either the cast type or warn the user.

Basically I would just like to hide the casting from the user and expose access to particular mojo type. We do not have that many algos, I think this is fine to do.

What do you think?

@exalate-issue-sync
Copy link
Author

Marek Novotny commented: IMHO, it will be unavoidable to have a specific MOJO class per algorithm, but to fulfil this task, we could just introduce a subclass for the supervised MOJO. Something like TreeBasedMOJO that would have number of trees exposed.

@exalate-issue-sync
Copy link
Author

Jakub Hava commented: Yup, for this task we can start just with this, no need to do the mojo for each algo now

@DinukaH2O
Copy link

JIRA Issue Migration Info

Jira Issue: SW-1495
Assignee: Marek Novotny
Reporter: Lauren DiPerna
State: Resolved
Fix Version: 3.28.0.1-1
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#1693

@hasithjp
Copy link
Member

JIRA Issue Migration Info Cont'd

Jira Issue Created Date: 2019-08-07T15:52:42.086-0700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants