Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Deserializing models can take a long time on some Android devices #36

Closed
joseph-o3h opened this issue Dec 6, 2022 · 10 comments
Closed
Labels
critical Issues of utmost importance enhancement New feature or request

Comments

@joseph-o3h
Copy link

These models are taking a long time to load on a Google Pixel 6 phone, even though they are embedded into the build with MLModelDataEmbed:

We have found by profiling that almost all of this time is spent in NatML.CreateModel.

On a Samsung Galazy S21 Ultra the same models take only ~1.5 seconds to load in total.

Is there a format conversion or any expensive computation happening here? If so, is it possible to do this conversion ahead of time (such as when the app starts) without having to keep the model in memory for the entire lifetime of the app?

@olokobayusuf olokobayusuf added critical Issues of utmost importance enhancement New feature or request labels Dec 7, 2022
@olokobayusuf
Copy link
Member

@joseph-o3h , what happens when you add the following line?

var modelData = ...
modelData.computeTarget = MLModelData.ComputeTarget.CPUOnly; // <-- add this
var model = new MLEdgeModel(modelData);

I suspect the time is being spent creating an NNAPI representation of the model for execution.

@joseph-o3h
Copy link
Author

joseph-o3h commented Dec 7, 2022

@joseph-o3h , what happens when you add the following line?

I cannot use computeTarget right now as it was added in 1.0.18, but we are still on 1.0.13 because of #35

@olokobayusuf
Copy link
Member

olokobayusuf commented Dec 8, 2022

@joseph-o3h working on #35 . We're batching a few things into the next update. Regarding this issue, we've changed the API to make model creation asynchronous:

// Fetch the model data
var modelData = await MLModelData.FromHub(tag);
// Create the model
var model = await MLEdgeModel.Create(modelData); // <-- this is offloaded to a native background worker

I'll update this thread once we have an ETA.

@joseph-o3h
Copy link
Author

But even with the asynchronous model creation it will take the same amount of time won't it? It is just not going to block the main thread.

@olokobayusuf
Copy link
Member

But even with the asynchronous model creation it will take the same amount of time won't it? It is just not going to block the main thread.

That's correct, though the time taken should be on the order of a few frames.

@joseph-o3h
Copy link
Author

joseph-o3h commented Dec 23, 2022

That's correct, though the time taken should be on the order of a few frames.

Is this when inference is run on the CPU? That might improve the model load/compile time but could also impact run time performance.

NNAPI supports caching of compiled models (https://developer.android.com/ndk/reference/group/neural-networks#aneuralnetworkscompilation_setcaching), is it possible to use it in NatML (if it is not being used already)?

@olokobayusuf
Copy link
Member

Hey @joseph-o3h Happy New Year! I've got inline responses below:

Is this when inference is run on the CPU? That might improve the model load/compile time but could also impact run time performance.

That's correct!

NNAPI supports caching of compiled models (https://developer.android.com/ndk/reference/group/neural-networks#aneuralnetworkscompilation_setcaching), is it possible to use it in NatML (if it is not being used already)?

NatML doesn't use NNAPI caching unfortunately. Adding support for IR caching is on the mid-to-longer term roadmap.

@olokobayusuf
Copy link
Member

We've had an engineering slowdown over the holidays, but we're picking back up now. ETA on the update with async model creation should be sometime next week.

@olokobayusuf
Copy link
Member

Okay minor follow up on the caching question: we're likely gonna add support for this in the near-term, but to iOS, macOS, and Windows first.

@olokobayusuf
Copy link
Member

Hey @joseph-o3h we've updated model creation to be async in the NatML 1.1 update. For device-specific delays in creating the model, the culprit is likely building the NNAPI representation, so you can either keep the model on the CPU (won't advise this); or you can hide the delay since the process is now async. I'm closing this issue; feel free to reopen another issue if you run into something similar.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
critical Issues of utmost importance enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants