fix(gateway): keep dense stream updates incremental#87558
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 28, 2026, 6:45 AM ET / 10:45 UTC. Summary PR surface: Source +318, Tests +1116, Docs +8, Generated 0, Other +602. Total +2044 across 33 files. Reproducibility: Do we have a high-confidence way to reproduce the issue? Partly: the linked issue has concrete starvation logs, and this PR adds a live dense-stream harness, but this review did not reproduce current main or the exact Windows/WSL2 real-Ollama setup. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land only after maintainers accept the stream-contract compatibility change and decide whether the remaining Windows/WSL2 real-Ollama proof gap must be closed before merge. Do we have a high-confidence way to reproduce the issue? Do we have a high-confidence way to reproduce the issue? Partly: the linked issue has concrete starvation logs, and this PR adds a live dense-stream harness, but this review did not reproduce current main or the exact Windows/WSL2 real-Ollama setup. Is this the best way to solve the issue? Is this the best way to solve the issue? Likely yes if maintainers accept the compatibility tradeoff: centralizing accumulation and making hot-path deltas cheap is cleaner than repeated full snapshots, but the SDK contract change needs explicit approval. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 4bd711e1c424. Label changesLabel justifications:
Evidence reviewedPR surface: Source +318, Tests +1116, Docs +8, Generated 0, Other +602. Total +2044 across 33 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg: ✨ hatched 🥚 common Sunspot Signal Puff. Rarity: 🥚 common. Trait: finds missing screenshots. DetailsShare on X: post this hatch
About:
|
|
Final implementation report for the dense Ollama/gateway stream fix. Summary:
Behavior addressed: dense Ollama-compatible streams no longer force repeated full partial snapshots through the gateway/subscriber hot path, reducing event-loop starvation risk while preserving full boundary/final messages.
|
|
Additional local live smoke after the final report: Real local Ollama daemon path was available on this machine and passed a direct provider-stream smoke. Command: node --import tsx --input-type=module <<'EOS'
import { createOllamaStreamFn } from './extensions/ollama/src/stream.ts';
const streamFn = createOllamaStreamFn('http://127.0.0.1:11434');
const stream = await streamFn(
{
id: 'qwen2.5:0.5b',
api: 'ollama',
provider: 'ollama',
name: 'qwen2.5:0.5b',
contextWindow: 32768,
maxTokens: 64,
input: ['text'],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
},
{ messages: [{ role: 'user', content: 'Reply with exactly: live smoke ok' }] },
{ maxTokens: 16, temperature: 0, timeoutMs: 30000 },
);
let text = '';
let textDeltas = 0;
let done = false;
for await (const event of stream) {
if (event.type === 'text_delta') {
textDeltas += 1;
text = event.replace ? event.delta : text + event.delta;
}
if (event.type === 'done') done = true;
if (event.type === 'error') throw new Error(event.error.errorMessage ?? 'ollama stream error');
}
const finalMessage = await stream.result();
console.log(JSON.stringify({ done, textDeltas, streamedText: text.trim(), stopReason: finalMessage.stopReason, finalText: finalMessage.content.map((part) => part.type === 'text' ? part.text : '').join('').trim() }, null, 2));
if (!done || textDeltas === 0) process.exitCode = 1;
EOSObserved result: {
"done": true,
"textDeltas": 3,
"streamedText": "live smoke ok",
"stopReason": "stop",
"finalText": "live smoke ok"
}This adds real local Ollama daemon coverage on macOS. It still does not replace exact Windows/WSL2 real-Ollama proof for issue #86599. |
Opened on behalf of Onur Solmaz (@osolmaz). Windows, WSL2, and real Ollama-daemon validation are still open proof gaps. Maintainer Crabbox changed-gate proof passed on AWS Linux.
AI-assisted PR. Dense local-provider streams could make the gateway slow to answer unrelated requests.
This change makes streaming delta-first for the hot path: providers emit cheap deltas, the shared accumulator owns the full assistant message, and full snapshots are reserved for boundaries.
It also keeps the live gateway repro so reviewers can prove the delay behavior instead of relying only on unit tests.
Refs #86599.
Related #86633.
Summary
Dense local-provider streams could starve the event loop even though the response was asynchronous.
The expensive part was CPU work repeated for every tiny chunk, especially parsing dense Ollama-compatible output and repeatedly carrying or inspecting full growing assistant messages.
This change keeps ordinary streaming work proportional to the new delta, not the whole answer so far.
What Changed
The fix now includes both the narrow Ollama safety valve and the broader production shape.
Ollama still yields cooperatively while parsing dense NDJSON, but the long-term fix is the shared accumulator plus delta-first subscribers.
scripts/repro-gateway-dense-stream-latency.tsas an opt-in live gateway harness.--assert-responsive, which checks dense-stream p99 health latency instead of max latency because a one-chunk control run showed unrelated max outliers.createAssistantStreamAccumulatorin core and re-exported it throughopenclaw/plugin-sdk/provider-stream-shared.partialpayloads ontext_deltaandthinking_delta;text_end,thinking_end, anddonestill carry full content.deltais the source of truth for incremental text/thinking/tool fragments;partialis compatibility state and may be lightweight on delta events.text_deltahandling uses the new delta and streaming directive accumulator instead of repeatedly extracting/tag-stripping the full partial or accumulated assistant text.Testing
The fast tests cover the new helper, the hot Ollama path, the proxy compatibility path, gateway coalescing, replacement clear propagation, and the subscriber delta path.
The live proof uses a real local OpenClaw gateway and a local Ollama-compatible dense NDJSON endpoint.
node scripts/run-vitest.mjs packages/agent-core/src/agent-loop.test.ts src/plugin-sdk/provider-stream-shared.test.ts src/llm/assistant-stream-accumulator.test.ts src/agents/embedded-agent-subscribe.handlers.messages.test.ts src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.test.ts src/agents/runtime/proxy.test.ts src/gateway/server-chat.agent-events.test.ts src/gateway/server-chat.stream-text-merge.test.ts ui/src/ui/controllers/chat.test.ts packages/sdk/src/index.test.ts- 13 files, 302 tests passed.node --max-old-space-size=8192 --import tsx scripts/generate-plugin-sdk-api-baseline.ts --check.git diff --check origin/codex/ollama-dense-gateway-repro..HEAD../.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main- clean, no accepted/actionable findings.node scripts/crabbox-wrapper.mjs run --timing-json --shell -- "corepack pnpm check:changed"- passed on AWS Crabbox, runrun_83ba0b846d17, leasecbx_dac8f161e0fe, provideraws, total13m38.652s, exit0.pnpm tsgo:core,pnpm tsgo:extensions,pnpm build, andnode --import tsx scripts/repro-gateway-dense-stream-latency.ts --assert-responsive.What was not tested locally: Windows, WSL2, and a real Ollama daemon producing naturally dense output.
Real behavior proof
Behavior addressed: dense local-provider stream gateway responsiveness delay related to #86599 and #86633.
Real environment tested: local macOS OpenClaw checkout on
codex/ollama-dense-gateway-repro, real OpenClaw gateway process, local Ollama-compatible dense NDJSON endpoint. This does not use a real Ollama daemon for the dense output.Exact steps or command run after this patch:
node --import tsx scripts/repro-gateway-dense-stream-latency.ts --assert-responsiveEvidence after fix:
{ "liveGateway": true, "providerProtocol": "ollama-native-compatible", "realProviderDaemon": false, "chunks": 10000, "chunkBytes": 64, "serverBatchSize": 100, "finalExpectedChars": 640000, "chatRequests": 1, "sendStatus": "started", "runStatus": "ok", "ackMs": 105, "totalMs": 36414, "baseline": { "count": 20, "min": 36, "p50": 41, "p95": 56, "p99": 56, "max": 79, "over100ms": 0, "over500ms": 0, "over1000ms": 0 }, "during": { "count": 465, "min": 30, "p50": 34, "p95": 279, "p99": 323, "max": 2436, "over100ms": 87, "over500ms": 3, "over1000ms": 1, "errors": 0 }, "chatEvents": 77, "finalChars": 500000 }Observed result after fix: the dense run completed with p99 health latency
323msand no health errors. Max latency still had outliers; a one-chunk control run also showed a >1s max outlier, so the fixed-branch assertion intentionally uses p99 rather than max.What was not tested: Windows, WSL2, and a real Ollama daemon producing naturally dense output.
Risks
The main compatibility risk is any consumer that incorrectly treated
partialon every delta as the canonical full message.The stream contract now says ordinary delta consumers should use
delta, while boundary/final events carry full content.text_deltaandthinking_deltaevents now carry lightweight partial content;text_end,thinking_end, anddonestill carry full content.