-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaling/Concurrent Requests #1187
Comments
Yes thats the current design as far as I understand it. All requests are currently handled sequentially. That allows the API to switch out the LLM it is using per request and allows for better planning of the needed resources to run the service. When Implementing my app that uses Ollama I implemented a worker queue that handles all requests in the background. |
It would be great to have this mechanism as a configuration parameter (as in on or off) as being able to handle just a single request at a time is a limitation. |
Hi @SMenigat I'm the maintainer of LiteLLM. We provider an OpenAI compatible endpoint + request queueing with workers for ollama if you're interested in using it (would love your feedback on this) Here's a quick start on using it: Compatible with ollama, GPT-4, (any LiteLLM supported LLM) Quick Start
REDIS_HOST="my-redis-endpoint"
REDIS_PORT="my-redis-port"
REDIS_PASSWORD="my-redis-password" # [OPTIONAL] if self-hosted
REDIS_USERNAME="default" # [OPTIONAL] if self-hosted
$ litellm --config /path/to/config.yaml --use_queue Here's an example config for config.yaml model_list:
- model_name: llama2
litellm_params:
model: ollama/llama2
api_key:
- model_name: code-llama
litellm_params:
model: ollama/code-llama # actual model name
$ litellm --test_async --num_requests 100 Available Endpoints
|
Merging with #358 |
Hello again. Great project. This may not be an issue, but I did notice that placing a second request while another one is currently processing makes the new request timeout.
Is this by design? This is not the case when using HuggingFace UI >0.4
Thanks.
The text was updated successfully, but these errors were encountered: