Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add --single-active-backend to allow only one backend active at the time #925

Merged
merged 5 commits into from Aug 18, 2023

Conversation

mudler
Copy link
Owner

@mudler mudler commented Aug 18, 2023

Description

This PR fixes #909 by adding a simple mechanisms to manage single devices. It adds a single-active-backend (SINGLE_ACTIVE_BACKEND) CLI flag: when enabled LocalAI will make sure to use only one backend - and automatically stop the ones in use only if idleing if there is a new request (otherwise it will wait). This allows for instance, to generate an image with one GPU, and then start chatting right after with an LLM using the same GPU. This is fundamental when having two consecutive requests to different backends targeting the same GPU or LocalAI will just crash as for now.

In scenarios with multiple-GPUs, for Llama, it is possible to specify already a CUDA device - this allows fine-grained control over the devices being used, however multi-GPU management is out of scope of this PR (as it focuses only on the specific, single case).

It also lowers down the grpc server workers for python to 1 - this allows only one request per time (it automatically queues them, as it seems) bringing back the old behavior. I just tried with diffusers, and parallel requests didn't seem to work well here at all.

Notes for Reviewers

Signed commits

  • Yes, I signed my commits.

… the time

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler merged commit afdc0eb into master Aug 18, 2023
14 checks passed
@mudler mudler deleted the one_backend branch August 18, 2023 23:49
@mudler mudler mentioned this pull request Aug 19, 2023
@mudler mudler added the enhancement New feature or request label Aug 24, 2023
@gregoryca
Copy link

This should improve how the models are handled when idling ! i'm going to test it this week to see how the interaction with different backend goes, and report back with more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add option to consider a "single" backend active
2 participants