New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve memory efficiency of H2OMOJOPipelineModel #4557
Comments
Nidhi Mehta commented: #94138 (https://support.h2o.ai/a/tickets/94138) - Re: Deploying MOJO on Spark |
Jakub Hava commented: [~accountid:557058:389d9607-5bd8-4611-8c6a-755fe9295223] We have this code, but after testing it today I can verify it does not work as expected. the reader back end is created every time the prediction is done. Will fix, for start {code} |
Jakub Hava commented: import org.apache.spark.ml.h2o.models.H2OMOJOPipelineModel 0.until(100).foreach{ _ => If we put print statement into getOrCreateModel we see it is being created all over again. First step is to create some sort of registry which is local to executor and ensures the mojo bytes does not have to be serialized and deserialized and new instance created |
Jakub Hava commented: Created first implementation which avoids serializing the mojo and creating new instance for each row. We should however investigate why this was happening in the first place. Putting this change to release so the user can try it as soon as possible |
JIRA Issue Migration Info Jira Issue: SW-1199 Linked PRs from JIRA |
JIRA Issue Migration Info Cont'd Jira Issue Created Date: 2019-04-12T13:50:18.384-0700 |
Can we cache loaded MOJO models in memory to avoid duplication if
H2OMOJOPipelineModel
transformer is instantiated multiple times?Actions:
CC: [~accountid:5c9943ec3a5542225fedb6b9] [~accountid:557058:eeeb611c-665e-431d-b442-1f255171db6f]
The text was updated successfully, but these errors were encountered: