Skip to content

Unload LRU models automatically  #6068

@mudler

Description

@mudler

Is your feature request related to a problem? Please describe.

When loading multiple models, it's hard currently to avoid filling available VRAM if not using watchdog or single active backend.

Describe the solution you'd like

LocalAI could ideally estimate the available VRAM and remove the last used model if there is no available VRAM. It is hard to predict the VRAM used across all backends, so I'm thinking here of a simple solution of monitoring the current usage, and if we fail loading the model to automatically delete one from the loaded.

Describe alternatives you've considered

Another option is to specify a fixed number of concurrent backends that are allowed. This doesn't really scale because it does not take account of the VRAM that any model could consume

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions