Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic MOJO model_ids #7358

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 10 comments
Closed

Non-deterministic MOJO model_ids #7358

exalate-issue-sync bot opened this issue May 11, 2023 · 10 comments

Comments

@exalate-issue-sync
Copy link

Hello,

I've been happy to see the progress in MOJO stability and robustness when encountering errors in recent versions! One of the things I noticed is that when MOJO models are loaded to an H2O cluster, they are assigned a unique model_id, but this ID differs from the 'true' model_id, isn't deterministic, and I've not found a great way to get the original from the MOJO itself.

I just implemented a solution that pulls the model_details JSON that I've been saving and overwrites the objects generic model ID with its original (and this later is used as a fingerprint for the MOJO... -- i.e. ensuring a non-production model with an unapproved ID never 'slips through' into production by accident).

My request would be to implement this sort of "use the original ID" behavior by default, or at minimum include a field within the @allparameters attribute that retains the original ID.

@exalate-issue-sync
Copy link
Author

Kunal Mishra commented: I’ll add one tiny bit of nuance in case anyone runs into this or tries this – overwriting the top level {{model_id}} slot of the object makes it unusable/unfindable to the h2o cluster (i.e. for later prediction), so you need to add it to the {{allparameters}} slot or somewhere else… Those changes and updates do not persist across downloads and imports though, so keep that model_details JSON handy!

{noformat}# Doesn't work
models@model_id <- "StackedEnsemble_BestOfFamily_AutoML_20210806_010917"

Works

models@allparameters$model_id <- "StackedEnsemble_BestOfFamily_AutoML_20210806_010917"{noformat}

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: One approach would be to derive the model id from file name

@exalate-issue-sync
Copy link
Author

Kunal Mishra commented: So while that would work for files saved to their default file names, I’ve got a custom function that saves stuff locally then tosses it to S3 and saves it w/ a specific formatted file name. That said, functionality to preserve the original model_id still feels like a nice to have somewhere, regardless of the differing use cases available to folks like me

@exalate-issue-sync
Copy link
Author

Veronika Maurerová commented: Hi [~accountid:5cc0b0886fbf5a10040d2945]. I've implemented:

Possibility to add custom model ID as a parameter into import and upload methods through R/Python API.

If you do not specify the custom model ID, the default model ID is propagated as the model's name from the MOJO path, which is by default the model ID from H2O cluster DKV.

Does this solution work for you?

@exalate-issue-sync
Copy link
Author

Kunal Mishra commented: Solution 1 sounds great!|

For Solution 2, I’d warn that could be fragile as a default – if someone attempts to load the same model twice or two different models w/ the same filename, there should be an adequate amount of erroring/helpful verbosity that occurs – accidentally overwriting a model on the cluster silently would be bad, as would having two models w/ the same ID.

The ideal solution, however, remains saving the {{model_id}} within the MOJO object and having that load as the default, but I recognize that’s probably a more involved task than the two solutions implemented!

@exalate-issue-sync
Copy link
Author

Veronika Maurerová commented: [~accountid:5cc0b0886fbf5a10040d2945] , thank you for your answer. Good points about problems with solution 2. You can't save two models with identical IDs. Currently, the new model import silently rewrites the old one if it has the same ID, which is not ideal.

The ideal solution you describe is a more involved task, but I will investigate this possibility more again.

@exalate-issue-sync
Copy link
Author

Michal Kurka commented:

The ideal solution you describe is a more involved task, but I will investigate this possibility more again.

This is how it works for binary model import as well. IMHO doesn’t need addressing, especially not in this task.

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: FIY: This was implemented and release in 3.34.0.4, however, this feature proved to cause unpredicted issues. It will be likely removed in future releases.

@h2o-ops-ro
Copy link
Collaborator

JIRA Issue Details

Jira Issue: PUBDEV-8297
Assignee: Veronika Maurerová
Reporter: Kunal Mishra
State: Resolved
Fix Version: 3.34.0.4
Attachments: N/A
Development PRs: Available

@h2o-ops-ro
Copy link
Collaborator

Linked PRs from JIRA

#5777

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant