-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrency and multiple calls #1356
Comments
Plus one. I love Ollama but I'm failing to see how I could deploy it as a cloud server. |
Right now you'd need to start multiple ollama servers on different ports and put them behind a reverse proxy. I don't have any inside knowledge, but I'd expect this to change since Llama.cpp, which Ollama uses, has added support for batched requests, which is much more efficient than load balancing among separate instances. |
Thanks. I used it with docker and docker swarm, and then it solved with 10 instances running, therefore it is very limited yet. I hope a new version can fix that limitation. |
Going to close this as a dupe of #358 |
Hi, I would like to know if running Ollama and making multiple calls is possible. I would love to add a server and use it for my users.
Therefore, when testing it, I saw it is waiting until a process finishes when I use the liteLLM proxy.
Is it possible?
The text was updated successfully, but these errors were encountered: