issue: conversation abruptly stops across multiple models and backends with many tool calls (REPEATABLE) #24915

vektorprime · 2026-05-19T17:19:47Z

vektorprime
May 19, 2026

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.9.5

Ollama Version (if applicable)

NA

Operating System

Ubuntu 24

Browser (if applicable)

Latest firefox

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

The model should continue generating and tool calling, but it abruptly stops only when interfacing through open-webui.

Actual Behavior

Just stops. I have to prompt it to continue or something similar.
Here's an example of me prompting it to continue.

Steps to Reproduce

Quick summary:
I am using open-webui as the frontend to my locally hosted setup. I am consistently seeing conversations stopping even though the model is supposed to continue generating. This occurs when the backend is vLLM and llama-cpp. It also occurs with both Qwen3.6 and Gemma4 models.

System with ALL software up to date:
Ubuntu 24
Docker image of open-webui

How to reproduce:
Make sure native tool calling is enabled for your model
Disable web search and other tools for the conversation so they don't get in the way
Enable open-terminal (for file writing and access)
Use either llama-CPP or vLLM as the backend
Use any model, but I first noticed on Gemma 4 31B, and I mainly use Qwen3.7 27B Q8 (I tried many quants and chat templates)

Paste the following prompt, and you'll see the conversation just stop between task 10-18. Almost almost always it's closer to the upper range for me.

Here's how I paste my prompt:

The prompt:

Create these 3 files with these contents:
alpha.txt
apple:3
banana:5
cherry:2
date:7

beta.txt
red
blue
green
yellow


gamma.txt
status=draft
owner=lee
priority=medium



Next, complete the following tasks, do not group tool calls between tasks together:
1. Count the lines in alpha.txt. Print T1: followed by the count.
2. Append elderberry:4 to alpha.txt. Print the last line of alpha.txt.
3. Replace banana:5 with banana:6 in alpha.txt. Print the full banana line.
4. Sort the lines of beta.txt alphabetically. Print the full contents joined by commas.
5. Add a new line orange to the end of beta.txt. Print the line count of beta.txt.
6. Change status=draft to status=review in gamma.txt. Print the full status line.
7. Add reviewer=kim to gamma.txt. Print the full contents of gamma.txt joined by semicolons.
8. In alpha.txt, increase every numeric value by 1. Print the full contents joined by commas.
9. Move the line green from beta.txt to the end of gamma.txt as tag=green. Print whether green still appears in beta.txt: yes or no.
10. In beta.txt, replace yellow with gold. Print the full contents joined by commas.
11. Add a header line FRUITS to the top of alpha.txt. Print the first line.
12. Remove the line cherry:3 from alpha.txt. Print the line count of alpha.txt.
13. In gamma.txt, change priority=medium to priority=high. Print the full priority line.
14. Append silver to beta.txt, then sort beta.txt alphabetically. Print the full contents joined by pipes.
15. Add total_fruits=4 to gamma.txt, where 4 is the number of fruit entries in alpha.txt excluding the FRUITS header. Print the new line.
16. In alpha.txt, rename date to dragonfruit. Print the renamed line.
17. Create a summary line at the end of beta.txt in the format colors=N, where N is the number of color lines before the summary. Print the summary line.
18. In gamma.txt, alphabetize all lines by key name before the equals sign. Print the first line.
19. In alpha.txt, compute the sum of all numeric values. Print sum= followed by the result.
20. Print the final contents of all three files in this exact format: alpha=[...]; beta=[...]; gamma=[...], with each file's lines joined by commas.


At the end, cleanly list me the results from every step that you printed

The logs & screenshots section will show what it looks like.

If you try this with llama-cpp as the backend it does the same thing. If you run that same model with same exact settings and prompt but use the llama-server webui (with similar MCP) it works just fine.

Logs & Screenshots

Here's what it looks like when it stops:

Here's what vLLM shows at the end:

(APIServer pid=1) INFO 05-19 16:58:45 [logger.py:92] Generated response chatcmpl-82807bd2f5345ab6 (streaming complete): output**: '\n\n\n\nT9: no\n\nTask 10: In beta.txt, replace yellow with gold. Print the full contents joined by commas.\n\n<tool_call>\n<function=run_command>\n<parameter=command>\npython3 -c "\nlines = open('/home/user/beta.txt').read().strip().split('\n')\nlines = [l for l in lines if l.strip()]\nlines = [l.replace('yellow','gold') if l == 'yellow' else l for l in lines]\nopen('/home/user/beta.txt','w').write('\n'.join(lines) + '\n')\nprint(','.join(lines))\n"\n\n\n</tool_call>', output_token_ids: None, finish_reason: streaming_complete**
(APIServer pid=1) INFO 05-19 16:58:45 [logger.py:63] Received request chatcmpl-8418f4846e0da28f: params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], thinking_token_budget=None, include_stop_str_in_output=False, ignore_eos=False, max_tokens=65536, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, structured_outputs=None, extra_args=None), lora_request: None.
(APIServer pid=1) INFO: 172.17.0.1:56966 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO 05-19 16:58:45 [async_llm.py:415] Added request chatcmpl-8418f4846e0da28f-8cc2de91.
(APIServer pid=1) INFO 05-19 16:58:48 [logger.py:92] Generated response chatcmpl-8418f4846e0da28f (streaming complete): output: 'The task 10 command is running. Let me wait for it.\n\n\n<tool_call>\n<function=get_process_status>\n<parameter=process_id>\n20260519-165845-6531de\n\n<parameter=wait>\n3\n\n\n</tool_call>', output_token_ids: None, finish_reason: streaming_complete

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Here's ANOTHER run with a new conversation, same exact settings, model etc. In this one there's a function call that never seems to run or show up:
(APIServer pid=1) INFO 05-19 17:15:48 [logger.py:92] Generated response chatcmpl-883f6dde7c01e292 (streaming complete): output: 'beta.txt currently has 5 lines (blue, gold, orange, red, silver). So N=5.\n\n\n<tool_call>\n<function=get_process_status>\n<parameter=process_id>\n20260519-171546-21a6eb\n\n\n</tool_call>', output_token_ids: None, finish_reason: streaming_complete
(APIServer pid=1) INFO 05-19 17:15:49 [logger.py:63] Received request chatcmpl-a233d880ee7773ab: params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], thinking_token_budget=None, include_stop_str_in_output=False, ignore_eos=False, max_tokens=65536, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, structured_outputs=None, extra_args=None), lora_request: None.
(APIServer pid=1) INFO: 172.17.0.1:52996 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO 05-19 17:15:49 [async_llm.py:415] Added request chatcmpl-a233d880ee7773ab-8930fb05.
(APIServer pid=1) INFO 05-19 17:15:51 [logger.py:92] Generated response chatcmpl-a233d880ee7773ab (streaming complete): output: 'beta.txt currently has 5 lines (blue, gold, orange, red, silver). So colors=5.\n\n\n<tool_call>\n<function=run_command>\n<parameter=command>\necho "colors=5" >> /home/user/beta.txt && tail -n 1 /home/user/beta.txt\n\n\n</tool_call>', output_token_ids: None, finish_reason: streaming_complete
(APIServer pid=1) INFO 05-19 17:15:51 [loggers.py:271] Engine 000: Avg prompt throughput: 112.2 tokens/s, Avg generation throughput: 35.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 79.5%

And here's the screenshot for the second run:

Additional Information

We are not hitting a token generation limit, and the final_reason in vLLM shows streaming-complete. There's supposed to be another

2026-05-19T17:20:01Z

owui-terminator[bot]
Bot May 19, 2026

🔍 Related Issues Found

I found some existing issues that might be related. Please check if any of these are duplicates or contain helpful solutions:

🟢 #20896 issue: Generation stops after tool call when routing Ollama through WebUI (GLM-4.7-Flash in OpenCode)
Very similar symptom: generation stops immediately after a tool call when using Open WebUI as the frontend, requiring manual continuation. It also involves local model backends and tool-calling behavior that halts mid-agent loop.
by HuysArthur · bug
🟣 #23466 issue: Random response stops after tool call
Matches the core failure mode of responses randomly stopping after a tool call in Open WebUI. Although this report is less deterministic, it points to the same class of post-tool-call continuation bug.
by trinhkvo · bug
🟣 #24607 issue: Incorrect tool parsing with several tool calls (specially provided with open-terminal)
Related because it describes problems once several tool calls have occurred, including raw tool output parsing and unexpected stopping. The new issue also appears after many sequential tool calls with open-terminal.
by N-point-N · bug
🟣 #21768 issue: OpenAI-compatible streaming: finish_reason incorrectly returned as "stop" after streaming tool_calls
Highly relevant if the new issue is actually caused by Open WebUI returning the wrong streaming finish_reason after tool-call chunks. That would make agent frameworks think generation is complete and stop the loop prematurely.
by Sechma · bug
🟣 #23863 issue: Tool calls with Gemma 4 requires default -> native -> default toggling of Function Calling
Relevant as another tool-calling regression with Gemma 4 in Open WebUI, specifically around native/default function-calling behavior. Since the new issue reproduces with Gemma models and tool calls, it may share the same underlying tool-calling path.
by gitfrederic · bug

💡 If your issue is a duplicate, please close it and add any additional details to the existing issue instead.

This comment was generated automatically. React with 👍 if helpful, 👎 if not.

0 replies

frenzybiscuit · 2026-05-19T17:26:49Z

frenzybiscuit
May 19, 2026

Are you hitting the context limit? OWUI doesn't really tell you if you are. It just stops, like you're describing.

The only way to know is if your backend records what context you're using. It won't show up under OWUI (even with usage enabled) on tool calls if it fails during it.

0 replies

frenzybiscuit · 2026-05-19T17:27:04Z

frenzybiscuit
May 19, 2026

For example, opening a single large file consumes 100k context for me.

0 replies

vektorprime · 2026-05-19T17:30:41Z

vektorprime
May 19, 2026
Author

🔍 Related Issues Found

I found some existing issues that might be related. Please check if any of these are duplicates or contain helpful solutions:

1. 🟢 [#20896](https://github.com/open-webui/open-webui/issues/20896) **issue: Generation stops after tool call when routing Ollama through WebUI (GLM-4.7-Flash in OpenCode)**
   _Very similar symptom: generation stops immediately after a tool call when using Open WebUI as the frontend, requiring manual continuation. It also involves local model backends and tool-calling behavior that halts mid-agent loop._
   _by HuysArthur · `bug`_

2. 🟣 [#23466](https://github.com/open-webui/open-webui/issues/23466) **issue: Random response stops after tool call**
   _Matches the core failure mode of responses randomly stopping after a tool call in Open WebUI. Although this report is less deterministic, it points to the same class of post-tool-call continuation bug._
   _by trinhkvo · `bug`_

3. 🟣 [#24607](https://github.com/open-webui/open-webui/issues/24607) **issue: Incorrect tool parsing with several tool calls (specially provided with open-terminal)**
   _Related because it describes problems once several tool calls have occurred, including raw tool output parsing and unexpected stopping. The new issue also appears after many sequential tool calls with open-terminal._
   _by N-point-N · `bug`_

4. 🟣 [#21768](https://github.com/open-webui/open-webui/issues/21768) **issue: OpenAI-compatible streaming: finish_reason incorrectly returned as "stop" after streaming tool_calls**
   _Highly relevant if the new issue is actually caused by Open WebUI returning the wrong streaming finish_reason after tool-call chunks. That would make agent frameworks think generation is complete and stop the loop prematurely._
   _by Sechma · `bug`_

5. 🟣 [#23863](https://github.com/open-webui/open-webui/issues/23863) **issue: Tool calls with Gemma 4 requires `default` -> `native` -> `default` toggling of `Function Calling`**
   _Relevant as another tool-calling regression with Gemma 4 in Open WebUI, specifically around native/default function-calling behavior. Since the new issue reproduces with Gemma models and tool calls, it may share the same underlying tool-calling path._
   _by gitfrederic · `bug`_

💡 If your issue is a duplicate, please close it and add any additional details to the existing issue instead.

This comment was generated automatically. React with 👍 if helpful, 👎 if not.

#23466 and #24607 - Not related because my experience doesn't show printing tool calls, mine experience is just stops generating or won't continue

#20896 - May be related, but their use case is that cli coding agent uses openweb-ui as the backend for API. So their setup may make troubleshooting more difficult.

#21768 - May be related.

#23863 - Not related, switching to Default tool calling doesn't fix my issue.

0 replies

vektorprime · 2026-05-19T17:32:02Z

vektorprime
May 19, 2026
Author

Are you hitting the context limit? OWUI doesn't really tell you if you are. It just stops, like you're describing.

The only way to know is if your backend records what context you're using. It won't show up under OWUI (even with usage enabled) on tool calls if it fails during it.

No I am not. The context here is only 11k to 15k (when it stops), and my window size (KV cache size) is 160K+. Further, I am not hitting the PER generation limit too as confirmed by my vLLM logs.

I even tried to set a VERY high (65k) token generation limit to see if it it helps, and it did not.

(APIServer pid=1) INFO 05-19 17:19:08 [logger.py:63] Received request chatcmpl-a8d4c651970416da: params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], thinking_token_budget=None, include_stop_str_in_output=False, ignore_eos=False, max_tokens=65536, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, structured_outputs=None, extra_args=None), lora_request: None.

0 replies

vektorprime · 2026-05-19T17:32:36Z

vektorprime
May 19, 2026
Author

For example, opening a single large file consumes 100k context for me.

These files I am working with are created by the prompt, they only contain like 10-30 characters each, and they are only modified by the steps, they don't get bigger.

0 replies

frenzybiscuit · 2026-05-19T17:53:19Z

frenzybiscuit
May 19, 2026

Okay... I can't replicate this.

Maybe someone else can?

1 reply

vektorprime May 19, 2026
Author

Same question to you - how you tried to reproduce this? Did you set llama-cpp's api or vllm's api as the backend?

I know this isn't a big thing but I will sponsor 2 cups of coffee for your time to try to reproduce this again.

Classic298 · 2026-05-19T17:55:23Z

Classic298
May 19, 2026
Collaborator

i also cannot replicate. This has been reported some times in the past and everytime it was a provider issue/upstream on inference layer. sending to discussions for now because absolutely not replicable here

15 replies

vektorprime May 19, 2026
Author

can you try a larger more well known provider? deepInfra had massive issues in the last weeks

Yes, I will try that shortly (next few min).

I also uploaded a video I just took of the experience. In this video we get all the way to task 20 and the model stops just short of its last task.
https://youtu.be/EJmZKtdL_Sc

vektorprime May 19, 2026
Author

Same issue with MiniMax 2.7, and the provider was MiniMax. All I did was set the model, enable native function tool calling, and enabled the terminal tool.

vektorprime May 19, 2026
Author

Does open-webui have some sort of limit on the number of tool calls a model can make in one generation? I know llama CPP has something like this. This is quite a few tool calls that the model makes for this.

Classic298 May 19, 2026
Collaborator

no it doesn't have such limit, and for minimax provider and minimax m2.7 i can personally say it works for me on openrouter with multiple tool calls. hm - i am still waiting on all the other details i need from you btw

vektorprime May 19, 2026
Author

no it doesn't have such limit, and for minimax provider and minimax m2.7 i can personally say it works for me on openrouter with multiple tool calls. hm - i am still waiting on all the other details i need from you btw

Sorry what other details are missing?

There are no env variables for open-webui as it's just the container, and the hosts' env vars are not passed to container (I think).

I posted this part earlier (by editing the message):

containers on the same host:
vLLM latest in container (latest image)
open-webui (latest) <-- only special setting is enabled native function calling
open-terminal in containers (latest images)
No reverse proxy.

When I test with llama-server, it's also on the same host. The issue occurs with both Gemma 4 and Qwen3.6 27B (various quants and chat templates don't help).

If you don't have both native function tool calling enabled AND terminal enabled, it does not really generate one tool call per task, so the issue never occurs. You must have both on.

frenzybiscuit · 2026-05-19T19:10:13Z

frenzybiscuit
May 19, 2026

Actually I can confirm the bug.

That's unfortunate.

4 replies

frenzybiscuit May 19, 2026

Stops cold.

frenzybiscuit May 19, 2026

Only using 11k context as well.

Classic298 May 19, 2026
Collaborator

hmmmmmmmmm.

vektorprime May 19, 2026
Author

Wow, thank you for confirming I didn't waste hours on this, lol. I feel a little better. AND YES THE 11k context is so weird! Strangely it also happens at around 15k too. I think the context size matching for our tests is just due to the prompt leading to similar results of generation.

Here's what's interesting, it only seems to occur when there's this transition between tool calls and text. If you ask for 50+ tool calls, they work fine. But this request to print every thing in between seems to introduce more opportunity for things to not work well.

Are there any specific debugs I can enable on the open-webui side that are very verbose which can help?

Classic298 · 2026-05-19T19:27:57Z

Classic298
May 19, 2026
Collaborator

found something, potentially

0 replies

Classic298 · 2026-05-19T19:32:40Z

Classic298
May 19, 2026
Collaborator

@vektorprime set CHAT_RESPONSE_MAX_TOOL_CALL_RETRIES to 9999 as an env var for open webui - this is not supposed to do this, this is a workaround i am still investigating, but this is the fix for now

9 replies

Classic298 May 19, 2026
Collaborator

@vektorprime we pushed a commit to dev branch that 1) renamed the env var 2) raises an error to chat if max tool call is reached and 3) we raised the default to 256 which should suffice lol with the option to turn off the limit by setting it to -1

vektorprime May 20, 2026
Author

Awesome, thanks to both of you (@Classic298 and @frenzybiscuit (sorry if I missed others)) for looking at this!

vektorprime May 22, 2026
Author

@Classic298 can you convert this back to a bug instead of a discussion? I am noticing some automated searches are finding the old bug page , but it doesn't reference this conversation we had so it's not clear that this is actually a problem with a known workaround for now.

Classic298 May 22, 2026
Collaborator

@vektorprime what do you mean by automated searches?

And no, won't convert back. New version will come out soon and it is fixed there. If anyone opens a new issue I will reference this discussion and they can see the workaround so it will be fine

vektorprime May 22, 2026
Author

@vektorprime what do you mean by automated searches?

And no, won't convert back. New version will come out soon and it is fixed there. If anyone opens a new issue I will reference this discussion and they can see the workaround so it will be fine

I meant the "Related Issues Found" it found that old bug page but the end of the post requires navigating to the discussion to see what actually occurred otherwise it just shows "not repeatable"

Uh oh!

issue: conversation abruptly stops across multiple models and backends with many tool calls (REPEATABLE) #24915

Uh oh!

Uh oh!

vektorprime May 19, 2026

Check Existing Issues

Installation Method

Open WebUI Version

Ollama Version (if applicable)

Operating System

Browser (if applicable)

Confirmation

Expected Behavior

Actual Behavior

Steps to Reproduce

Logs & Screenshots

Additional Information

Replies: 11 comments · 29 replies

Uh oh!

owui-terminator[bot] Bot May 19, 2026

Uh oh!

frenzybiscuit May 19, 2026

Uh oh!

frenzybiscuit May 19, 2026

Uh oh!

Uh oh!

vektorprime May 19, 2026 Author

Uh oh!

Uh oh!

vektorprime May 19, 2026 Author

Uh oh!

vektorprime May 19, 2026 Author

Uh oh!

frenzybiscuit May 19, 2026

Uh oh!

vektorprime May 19, 2026 Author

Uh oh!

Classic298 May 19, 2026 Collaborator

Uh oh!

vektorprime May 19, 2026 Author

Uh oh!

vektorprime May 19, 2026 Author

Uh oh!

vektorprime May 19, 2026 Author

Uh oh!

Classic298 May 19, 2026 Collaborator

Uh oh!

vektorprime May 19, 2026 Author

Uh oh!

frenzybiscuit May 19, 2026

Uh oh!

frenzybiscuit May 19, 2026

Uh oh!

frenzybiscuit May 19, 2026

Uh oh!

Classic298 May 19, 2026 Collaborator

Uh oh!

Uh oh!

vektorprime May 19, 2026 Author

Uh oh!

Classic298 May 19, 2026 Collaborator

Uh oh!

Classic298 May 19, 2026 Collaborator

Uh oh!

Classic298 May 19, 2026 Collaborator

Uh oh!

vektorprime May 20, 2026 Author

Uh oh!

vektorprime May 22, 2026 Author

Uh oh!

Classic298 May 22, 2026 Collaborator

Uh oh!

vektorprime May 22, 2026 Author

vektorprime
May 19, 2026

Replies: 11 comments 29 replies

owui-terminator[bot]
Bot May 19, 2026

frenzybiscuit
May 19, 2026

frenzybiscuit
May 19, 2026

vektorprime
May 19, 2026
Author

vektorprime
May 19, 2026
Author

vektorprime
May 19, 2026
Author

frenzybiscuit
May 19, 2026

vektorprime May 19, 2026
Author

Classic298
May 19, 2026
Collaborator

vektorprime May 19, 2026
Author

vektorprime May 19, 2026
Author

vektorprime May 19, 2026
Author

Classic298 May 19, 2026
Collaborator

vektorprime May 19, 2026
Author

frenzybiscuit
May 19, 2026

Classic298 May 19, 2026
Collaborator

vektorprime May 19, 2026
Author

Classic298
May 19, 2026
Collaborator

Classic298
May 19, 2026
Collaborator

Classic298 May 19, 2026
Collaborator

vektorprime May 20, 2026
Author

vektorprime May 22, 2026
Author

Classic298 May 22, 2026
Collaborator

vektorprime May 22, 2026
Author