epic: resource management and control #912

mudler · 2023-08-17T21:31:11Z

Tracker for the issue.

Problem:

The gRPC servers are tied to resources local to the machine (CPU, GPU etc) which might be difficult either to partition, reserve or schedule in a large deployment. Currently LocalAI allows to query multiple backends, however it has a shared lock that allows only 1 request for each backend (while multiple requests for different backends is allowed).

This ticket is to discuss potential solution and track all efforts in trying to alleviate or ease out management of hardware resources in a better and scalable way.

There are of course to consider solutions which are tangential and not in scope - for example the scope of LocalAI is to not overlap with existing orchestration mechanisms such as Kubernetes in term of functionalities, but rather be smart enough to be used in a way that can be orchestrated and scaled out to large deployments as well by leveraging such orchestration mechanisms ( rationale is keep it KISS rather than trying to be complex and "solve all")

mudler · 2023-08-17T21:31:45Z

related: #863 #909 #892

localai-bot · 2023-08-19T12:19:53Z

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me,
I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

One potential solution could be to add support for dynamic resource allocation, where the backend can dynamically adjust its resources based on the incoming request load. This could involve implementing a queueing system that allows requests to wait in line until resources become available, or by adding support for load balancing across multiple backends. Another approach could be to introduce a scheduling mechanism that prioritizes requests based on their importance or urgency, allowing high-priority requests to access resources ahead of lower-priority ones. Additionally, it may be possible to implement a system that allows users to specify minimum and maximum resource requirements for their requests, ensuring that their requests are always processed within their specified constraints. These are just some potential solutions that could help alleviate the issue of limited hardware resources in LocalAI deployments.

Sources:

mudler added the enhancement New feature or request label Aug 17, 2023

mudler self-assigned this Aug 17, 2023

mudler added the high prio label Aug 17, 2023

mudler added the roadmap label Sep 14, 2023

mudler removed the high prio label Dec 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epic: resource management and control #912

epic: resource management and control #912

mudler commented Aug 17, 2023 •

edited

mudler commented Aug 17, 2023

localai-bot commented Aug 19, 2023

epic: resource management and control #912

epic: resource management and control #912

Comments

mudler commented Aug 17, 2023 • edited

mudler commented Aug 17, 2023

localai-bot commented Aug 19, 2023

⚠️⚠️⚠️⚠️⚠️

⚠️⚠️⚠️⚠️⚠️

mudler commented Aug 17, 2023 •

edited