feat: add --single-active-backend to allow only one backend active at the time #925
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes #909 by adding a simple mechanisms to manage single devices. It adds a
single-active-backend
(SINGLE_ACTIVE_BACKEND
) CLI flag: when enabled LocalAI will make sure to use only one backend - and automatically stop the ones in use only if idleing if there is a new request (otherwise it will wait). This allows for instance, to generate an image with one GPU, and then start chatting right after with an LLM using the same GPU. This is fundamental when having two consecutive requests to different backends targeting the same GPU or LocalAI will just crash as for now.In scenarios with multiple-GPUs, for Llama, it is possible to specify already a CUDA device - this allows fine-grained control over the devices being used, however multi-GPU management is out of scope of this PR (as it focuses only on the specific, single case).
It also lowers down the grpc server workers for python to 1 - this allows only one request per time (it automatically queues them, as it seems) bringing back the old behavior. I just tried with diffusers, and parallel requests didn't seem to work well here at all.
Notes for Reviewers
Signed commits