Unload LRU models automatically 

**Is your feature request related to a problem? Please describe.**


When loading multiple models, it's hard currently to avoid filling available VRAM if not using watchdog or single active backend.

**Describe the solution you'd like**


LocalAI could ideally estimate the available VRAM and remove the last used model if there is no available VRAM. It is hard to predict the VRAM used across all backends, so I'm thinking here of a simple solution of monitoring the current usage, and if we fail loading the model to automatically delete one from the loaded. 


**Describe alternatives you've considered**


Another option is to specify a fixed number of concurrent backends that are allowed. This doesn't really scale because it does not take account of the VRAM that any model could consume

**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Unload LRU models automatically #6068

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Unload LRU models automatically #6068

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions