Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: smarter ollama load balancing #1081

Open
tjbck opened this issue Mar 7, 2024 · 4 comments
Open

feat: smarter ollama load balancing #1081

tjbck opened this issue Mar 7, 2024 · 4 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@tjbck
Copy link
Contributor

tjbck commented Mar 7, 2024

right now it just uses random.choices, refer to my comment here:

# TODO: Implement a more intelligent load balancing mechanism for distributing requests among multiple backend instances.

@tjbck tjbck added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Mar 7, 2024
@asedmammad
Copy link
Contributor

@tjbck I can work on this one. I am considering implementing a configuration parameter that allows users to select from various strategies.

@tjbck
Copy link
Contributor Author

tjbck commented Mar 17, 2024

@asedmammad Feel free to create a draft PR! I'll also get actively involved in this one!

@lewismacnow
Copy link

lewismacnow commented Apr 12, 2024

@tjbck I can work on this one. I am considering implementing a configuration parameter that allows users to select from various strategies.

I'd be very thankful for an ability to restrict models and/or Users to a specific connection (i.e. prevent/stop load balancing in open-webui in some situations).
I drew some examples of these situations in #1527

My reason for this is:

The loading/unloading process adds seconds to the response so if multiple users are activiely using multiple models which are shared bewteen connections, we will potentially add time to responses (whilst the model is removed/added to memory) instead of benefiting from load balancing over multiple connections.

@longregen
Copy link

longregen commented Apr 30, 2024

I'd like to bump this since now Ollama has support for multiple loaded models at the same time: ollama/ollama#3418

In my testing with the "rc5" of 0.33, it worked just fine without changes on open-webui

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants