-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: smarter ollama load balancing #1081
Comments
@tjbck I can work on this one. I am considering implementing a configuration parameter that allows users to select from various strategies. |
@asedmammad Feel free to create a draft PR! I'll also get actively involved in this one! |
I'd be very thankful for an ability to restrict models and/or Users to a specific connection (i.e. prevent/stop load balancing in open-webui in some situations). My reason for this is: The loading/unloading process adds seconds to the response so if multiple users are activiely using multiple models which are shared bewteen connections, we will potentially add time to responses (whilst the model is removed/added to memory) instead of benefiting from load balancing over multiple connections. |
I'd like to bump this since now Ollama has support for multiple loaded models at the same time: ollama/ollama#3418 In my testing with the "rc5" of 0.33, it worked just fine without changes on open-webui |
right now it just uses
random.choices
, refer to my comment here:open-webui/backend/apps/ollama/main.py
Line 39 in 8ed5759
The text was updated successfully, but these errors were encountered: