Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency and multiple calls #1356

Closed
enriquesouza opened this issue Dec 3, 2023 · 4 comments
Closed

Concurrency and multiple calls #1356

enriquesouza opened this issue Dec 3, 2023 · 4 comments

Comments

@enriquesouza
Copy link

Hi, I would like to know if running Ollama and making multiple calls is possible. I would love to add a server and use it for my users.

Therefore, when testing it, I saw it is waiting until a process finishes when I use the liteLLM proxy.

Is it possible?

@austin-starks
Copy link

Plus one. I love Ollama but I'm failing to see how I could deploy it as a cloud server.

@easp
Copy link
Contributor

easp commented Dec 4, 2023

Right now you'd need to start multiple ollama servers on different ports and put them behind a reverse proxy.

I don't have any inside knowledge, but I'd expect this to change since Llama.cpp, which Ollama uses, has added support for batched requests, which is much more efficient than load balancing among separate instances.

@enriquesouza
Copy link
Author

Thanks. I used it with docker and docker swarm, and then it solved with 10 instances running, therefore it is very limited yet. I hope a new version can fix that limitation.

@pdevine
Copy link
Contributor

pdevine commented Jan 26, 2024

Going to close this as a dupe of #358

@pdevine pdevine closed this as completed Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants