-
Notifications
You must be signed in to change notification settings - Fork 447
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Feature Summary
The ability to load models one by one (only load one model at a time when the calculation needs it) to reduce memory usage.
Detailed Description
I'm working on an android library leveraging llama.cpp and stablediffusion.cpp for easy on device inference but I'm memory limited for some models where the encoder, vae and UNet are being loaded at the same time, would it be possible to add sequential model loading where a model is only loaded when it is needed to only have one model loaded at a time, to drastically reduce the memory footprint.
Alternatives you considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request