Model Concurrency Matrix

**Is your feature request related to a problem? Please describe.**
I would like more control over which models can load concurrently. At the moment we can use "Max Active Backends" but that would allow 3 large models to load which may not fit.

**Describe the solution you'd like**

Literally the matrix configuration from llama-swap: https://github.com/mostlygeek/llama-swap/issues/643 
This would allow specifying more complicated rules like "allow my zed prediciton model to run alongside anything but don't allow my two 120b models to run alongside eachother"

**Describe alternatives you've considered**
Max Active Backends - but has problem as mentioned above.

**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model Concurrency Matrix #9659

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Model Concurrency Matrix #9659

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions