Concurrency and multiple calls #1356

enriquesouza · 2023-12-03T02:01:34Z

Hi, I would like to know if running Ollama and making multiple calls is possible. I would love to add a server and use it for my users.

Therefore, when testing it, I saw it is waiting until a process finishes when I use the liteLLM proxy.

Is it possible?

austin-starks · 2023-12-03T02:14:42Z

Plus one. I love Ollama but I'm failing to see how I could deploy it as a cloud server.

easp · 2023-12-04T16:42:07Z

Right now you'd need to start multiple ollama servers on different ports and put them behind a reverse proxy.

I don't have any inside knowledge, but I'd expect this to change since Llama.cpp, which Ollama uses, has added support for batched requests, which is much more efficient than load balancing among separate instances.

enriquesouza · 2023-12-11T17:13:06Z

Thanks. I used it with docker and docker swarm, and then it solved with 10 instances running, therefore it is very limited yet. I hope a new version can fix that limitation.

pdevine · 2024-01-26T23:55:51Z

Going to close this as a dupe of #358

pdevine closed this as completed Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency and multiple calls #1356

Concurrency and multiple calls #1356

enriquesouza commented Dec 3, 2023

austin-starks commented Dec 3, 2023

easp commented Dec 4, 2023

enriquesouza commented Dec 11, 2023

pdevine commented Jan 26, 2024

Concurrency and multiple calls #1356

Concurrency and multiple calls #1356

Comments

enriquesouza commented Dec 3, 2023

austin-starks commented Dec 3, 2023

easp commented Dec 4, 2023

enriquesouza commented Dec 11, 2023

pdevine commented Jan 26, 2024