feat: smarter ollama load balancing #1081

tjbck · 2024-03-07T05:11:40Z

right now it just uses random.choices, refer to my comment here:

Line 39 in 8ed5759

    
           # TODO: Implement a more intelligent load balancing mechanism for distributing requests among multiple backend instances.

The text was updated successfully, but these errors were encountered:

asedmammad · 2024-03-17T00:58:59Z

@tjbck I can work on this one. I am considering implementing a configuration parameter that allows users to select from various strategies.

tjbck · 2024-03-17T02:30:18Z

@asedmammad Feel free to create a draft PR! I'll also get actively involved in this one!

lewismacnow · 2024-04-12T20:28:56Z

@tjbck I can work on this one. I am considering implementing a configuration parameter that allows users to select from various strategies.

I'd be very thankful for an ability to restrict models and/or Users to a specific connection (i.e. prevent/stop load balancing in open-webui in some situations).
I drew some examples of these situations in #1527

My reason for this is:

The loading/unloading process adds seconds to the response so if multiple users are activiely using multiple models which are shared bewteen connections, we will potentially add time to responses (whilst the model is removed/added to memory) instead of benefiting from load balancing over multiple connections.

longregen · 2024-04-30T03:54:14Z

I'd like to bump this since now Ollama has support for multiple loaded models at the same time: ollama/ollama#3418

In my testing with the "rc5" of 0.33, it worked just fine without changes on open-webui

tjbck added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Mar 7, 2024

asedmammad mentioned this issue Mar 24, 2024

Better ollama load balancing #1276

Draft

4 tasks

tjbck mentioned this issue Apr 12, 2024

Whitelist Connections, or Whitelist Models on specific Connection #1527

Closed

tjbck mentioned this issue May 27, 2024

Enhancement: Ollama server prioritisation or selection #2616

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: smarter ollama load balancing #1081

feat: smarter ollama load balancing #1081

tjbck commented Mar 7, 2024

asedmammad commented Mar 17, 2024

tjbck commented Mar 17, 2024

lewismacnow commented Apr 12, 2024 •

edited

longregen commented Apr 30, 2024 •

edited

feat: smarter ollama load balancing #1081

feat: smarter ollama load balancing #1081

Comments

tjbck commented Mar 7, 2024

asedmammad commented Mar 17, 2024

tjbck commented Mar 17, 2024

lewismacnow commented Apr 12, 2024 • edited

longregen commented Apr 30, 2024 • edited

lewismacnow commented Apr 12, 2024 •

edited

longregen commented Apr 30, 2024 •

edited