Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epic: resource management and control #912

Open
mudler opened this issue Aug 17, 2023 · 2 comments
Open

epic: resource management and control #912

mudler opened this issue Aug 17, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request roadmap

Comments

@mudler
Copy link
Owner

mudler commented Aug 17, 2023

Tracker for the issue.

Problem:

The gRPC servers are tied to resources local to the machine (CPU, GPU etc) which might be difficult either to partition, reserve or schedule in a large deployment. Currently LocalAI allows to query multiple backends, however it has a shared lock that allows only 1 request for each backend (while multiple requests for different backends is allowed).

This ticket is to discuss potential solution and track all efforts in trying to alleviate or ease out management of hardware resources in a better and scalable way.

There are of course to consider solutions which are tangential and not in scope - for example the scope of LocalAI is to not overlap with existing orchestration mechanisms such as Kubernetes in term of functionalities, but rather be smart enough to be used in a way that can be orchestrated and scaled out to large deployments as well by leveraging such orchestration mechanisms ( rationale is keep it KISS rather than trying to be complex and "solve all")

@mudler mudler added the enhancement New feature or request label Aug 17, 2023
@mudler mudler self-assigned this Aug 17, 2023
@mudler
Copy link
Owner Author

mudler commented Aug 17, 2023

related: #863 #909 #892

@localai-bot
Copy link
Contributor

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me,
I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

One potential solution could be to add support for dynamic resource allocation, where the backend can dynamically adjust its resources based on the incoming request load. This could involve implementing a queueing system that allows requests to wait in line until resources become available, or by adding support for load balancing across multiple backends. Another approach could be to introduce a scheduling mechanism that prioritizes requests based on their importance or urgency, allowing high-priority requests to access resources ahead of lower-priority ones. Additionally, it may be possible to implement a system that allows users to specify minimum and maximum resource requirements for their requests, ensuring that their requests are always processed within their specified constraints. These are just some potential solutions that could help alleviate the issue of limited hardware resources in LocalAI deployments.

Sources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap
Projects
None yet
Development

No branches or pull requests

2 participants