New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose Number of Trees in SW MOJO #5593
Comments
Jakub Hava commented: I was thinking about this and I think the right solution will be to have H2OMOJOModel as base and create MOJOModel for each Algo which can expose additional methods and we avoid having methods on algos which does not make sense We need to make sure this specialized mojo model is exported in the pipelines. |
Jakub Hava commented: [~accountid:5a32df017dcf343865c26fa5] just want to catch up → is the work on H2O side already done? |
Pavel Pscheidl commented: [~accountid:557058:eeeb611c-665e-431d-b442-1f255171db6f] Every tree-based model in H2O-3 inherits from SharedTreeModel class, which forces it to override {{getNTrees}} methods. This information is thus available in core. The same goes for model *Output classes. When a SharedTreeModelOutput is fetched (Be it GBM, DRF, XGBoost…), there is {{_ntrees}} property serialized. Either way, you should be able to obtain such information about the model AFTER it has been train, or in other words to get the actual number of trees trained, not some value from parameters. Is that sufficient for your needs ? |
Jakub Hava commented: Yup, perfect, thank you Pavel! |
Jakub Hava commented: [~accountid:5c9943ec3a5542225fedb6b9] would you mind if I have a look on this? I would like to give you free hands for the conversions |
Jakub Hava commented: After discussion with Pavel, the values are already in SharedTreeMojoModel and can the real number of trees can be computed as {{_ntree_groups * _ntrees_per_group}} We should probably put the value directly into SharedTreeMojoModel so SW can just consume the API |
Marek Novotny commented: [~accountid:557058:eeeb611c-665e-431d-b442-1f255171db6f] Go a head if you want 😉 |
Jakub Hava commented: [~accountid:5c9943ec3a5542225fedb6b9] I was thinking about possible implementation. The issue is that if we have multiple mojo types we loose the specific type as we go, which I’m not sure if we can actually solve. The solution I propose is to introduce H2OMOJOModel for each algo ( something what you did for MojoSupervised & MojoUnsupervised). They will actually just extend the classes you created and GLMMojo would implement this functionality. On Mojo Object we would have methods like mojo.getAsGLMMojoModel/getAsGBMMojoModel etc. ( I want to avoid user calling asInstanceOf and want this api to be documented). The methods would internaly check if the instance is trully what the user is asking and give the user either the cast type or warn the user. Basically I would just like to hide the casting from the user and expose access to particular mojo type. We do not have that many algos, I think this is fine to do. What do you think? |
Marek Novotny commented: IMHO, it will be unavoidable to have a specific MOJO class per algorithm, but to fulfil this task, we could just introduce a subclass for the supervised MOJO. Something like TreeBasedMOJO that would have number of trees exposed. |
Jakub Hava commented: Yup, for this task we can start just with this, no need to do the mojo for each algo now |
JIRA Issue Migration Info Jira Issue: SW-1495 Linked PRs from JIRA |
JIRA Issue Migration Info Cont'd Jira Issue Created Date: 2019-08-07T15:52:42.086-0700 |
Allow users to get the number of trees (ntrees) in a Tree-Based model (i.e. GBM, DRF) when early stopping is enabled.
Note, if early stopping is enabled the algorithm can overwrite a user's ntree value. To access the model's actual ntree value, a user currently would have to use the mixed api (https://github.com/h2oai/h2o-droplets/blob/master/h2o-sw-mixed-api-droplet/src/main/scala/water/droplets/H2OSWMixedAPIDroplet.scala#L100).
The text was updated successfully, but these errors were encountered: