Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out better way of caching MOJO Pipelines in H2OMOJOPipelineModel transformer #5438

Closed
exalate-issue-sync bot opened this issue May 22, 2023 · 4 comments
Assignees

Comments

@exalate-issue-sync
Copy link

Problem

The code:

@transient private lazy val mojoPipeline: MojoPipeline = {
val reader = MojoPipelineReaderBackendFactory.createReaderBackend(new ByteArrayInputStream(getMojoData()))
MojoPipeline.loadFrom(reader)
}

loads MOJO pipeline from actual bytes, however, it will happen for each thread running in executor (i.e., thread representing executor core). This brings significant memory, time overhead for bigger MOJO models.

Goal
Load the MOJO model only once per JVM and share it cross multiple executor threads.

  • If we decide to cache MOJO, we have to make sure we will not leave it in memory for too long,
  • and also expect that Spark job can use multiple MOJOs
@exalate-issue-sync
Copy link
Author

Michal Malohlava commented: Should we broadcast the model? (similar to idea sketch here: [https://stackoverflow.com/questions/40435741/object-cache-on-spark-executors|https://stackoverflow.com/questions/40435741/object-cache-on-spark-executors])

@exalate-issue-sync
Copy link
Author

Jakub Hava commented: We already broadcast the mojo bytes → spark driver registers it as the broadcast variable and executors just fetch it when they need it.

We could also try create instance of the MOJO model on driver and broadcast it, but not sure if we hit any serialization issues. But that would be definitely good improvement. I can have a look on it today and at least check if the model be serialized or not

@DinukaH2O
Copy link

JIRA Issue Migration Info

Jira Issue: SW-1658
Assignee: Jakub Hava
Reporter: Michal Malohlava
State: Resolved
Fix Version: 3.26.7
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#1558
#1568

@hasithjp
Copy link
Member

JIRA Issue Migration Info Cont'd

Jira Issue Created Date: 2019-10-01T21:14:09.067-0700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants