Problem
ToolCallTextFilter.ShouldSuppress() in OpenAiCompatibleChatClient re-scans the
entire accumulated text on every streaming delta, producing O(n²) complexity over
the life of a streamed response.
File: src/Netclaw.Providers/SelfHosted/OpenAiCompatibleChatClient.cs (lines 91-94, 636-664)
On each delta:
accumulatedText.ToString() — copies full StringBuilder to new string (O(n))
.Contains("<tool_call", StringComparison.Ordinal) — linear scan of full string (O(n))
For a 10K token response with ~1000 deltas, this produces ~50MB of scanning and
string allocation for 10KB of actual content.
Once _suppressionActive is true, subsequent calls short-circuit — but all deltas
before and during tool call detection trigger full scans.
Impact
Affects the self-hosted provider path (vLLM, llama.cpp, Lemonade, Ollama via
OpenAI-compatible endpoint). Does not affect OpenAI/OpenRouter/Anthropic paths which
use their SDK's native streaming.
Worst case is long text responses on slower self-hosted hardware where the
allocation pressure and scanning overhead compound with limited resources.
Proposed Fix
Replace full-text rescanning with incremental detection:
- Only scan new text in each delta for
<tool_call prefix
- Keep a small overlap window (~10 chars) across delta boundaries to catch partial
matches (e.g., delta N ends with <tool_ and delta N+1 starts with call>)
- Once
_suppressionActive is true, the existing short-circuit is fine
This reduces per-delta work from O(accumulated_length) to O(delta_length + overlap),
making the total O(n) instead of O(n²).
Key Files
src/Netclaw.Providers/SelfHosted/OpenAiCompatibleChatClient.cs — filter class (lines 636-664) and call site (lines 91-94)
src/Netclaw.Daemon.Tests/Configuration/OpenAiCompatibleChatClientTests.cs — existing filter tests (lines 328-392)
Problem
ToolCallTextFilter.ShouldSuppress()inOpenAiCompatibleChatClientre-scans theentire accumulated text on every streaming delta, producing O(n²) complexity over
the life of a streamed response.
File:
src/Netclaw.Providers/SelfHosted/OpenAiCompatibleChatClient.cs(lines 91-94, 636-664)On each delta:
accumulatedText.ToString()— copies full StringBuilder to new string (O(n)).Contains("<tool_call", StringComparison.Ordinal)— linear scan of full string (O(n))For a 10K token response with ~1000 deltas, this produces ~50MB of scanning and
string allocation for 10KB of actual content.
Once
_suppressionActiveis true, subsequent calls short-circuit — but all deltasbefore and during tool call detection trigger full scans.
Impact
Affects the self-hosted provider path (vLLM, llama.cpp, Lemonade, Ollama via
OpenAI-compatible endpoint). Does not affect OpenAI/OpenRouter/Anthropic paths which
use their SDK's native streaming.
Worst case is long text responses on slower self-hosted hardware where the
allocation pressure and scanning overhead compound with limited resources.
Proposed Fix
Replace full-text rescanning with incremental detection:
<tool_callprefixmatches (e.g., delta N ends with
<tool_and delta N+1 starts withcall>)_suppressionActiveis true, the existing short-circuit is fineThis reduces per-delta work from O(accumulated_length) to O(delta_length + overlap),
making the total O(n) instead of O(n²).
Key Files
src/Netclaw.Providers/SelfHosted/OpenAiCompatibleChatClient.cs— filter class (lines 636-664) and call site (lines 91-94)src/Netclaw.Daemon.Tests/Configuration/OpenAiCompatibleChatClientTests.cs— existing filter tests (lines 328-392)