fix(azure):Drain split provider stream frames#80927
Conversation
|
Codex review: needs maintainer review before merge. Summary Reproducibility: yes. Source inspection shows current main can buffer split SSE/JSON chunks without emitting parser-visible output from a sanitizer pull, and the PR supplies before-timeout plus after-pass proof for the focused split-frame regressions. Real behavior proof Next step before merge Security Review detailsBest possible solution: Land or fold the split-frame draining fix with focused regressions while consolidating the shared memory enum and Azure first-event timeout pieces with #81015. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection shows current main can buffer split SSE/JSON chunks without emitting parser-visible output from a sanitizer pull, and the PR supplies before-timeout plus after-pass proof for the focused split-frame regressions. Is this the best way to solve the issue? Yes. Draining inside the existing provider response sanitizer until an SSE event, JSON body, or stream close is the narrowest maintainable fix; the shared memory schema and Azure timeout pieces should be coordinated with the related branch rather than treated as a correctness flaw here. What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 5681cfd83984. |
2a3b582 to
c2e5bd2
Compare
|
Updated branch proof after adding the provider transport fix. Before / After ProofBeforeUsing the rebased branch test with the git restore --source=origin/main --worktree src/agents/provider-transport-fetch.ts
pnpm test src/agents/provider-transport-fetch.test.ts -- --reporter=verbose -t "continues reading until split SSE frames" --testTimeout=1500Result: failed by timeout at 1500 ms. The test reproduced the stall: the response wrapper had read partial chunks but had not emitted a complete SSE event to AfterSame focused regression with the PR implementation restored: pnpm test src/agents/provider-transport-fetch.test.ts -- --reporter=verbose -t "continues reading until split SSE frames" --testTimeout=1500Result: passed in 100 ms. Companion split-JSON fallback regression: pnpm test src/agents/provider-transport-fetch.test.ts -- --reporter=verbose -t "continues reading split JSON bodies" --testTimeout=1500Result: passed in 121 ms. Full provider transport file: pnpm test src/agents/provider-transport-fetch.test.ts -- --reporter=verboseResult: 33 tests passed. Gateway ProofI started an isolated dev gateway and pointed Observed proof from gateway logs:
Verification
|
3e57a41 to
2a5b223
Compare
2a5b223 to
03a7e1f
Compare
|
Merged via squash.
Thanks @galiniliev! |
Merged via squash. Prepared head SHA: 03a7e1f Co-authored-by: galiniliev <5711535+galiniliev@users.noreply.github.com> Co-authored-by: galiniliev <5711535+galiniliev@users.noreply.github.com> Reviewed-by: @galiniliev
Merged via squash. Prepared head SHA: 03a7e1f Co-authored-by: galiniliev <5711535+galiniliev@users.noreply.github.com> Co-authored-by: galiniliev <5711535+galiniliev@users.noreply.github.com> Reviewed-by: @galiniliev
Summary
src/agents/provider-transport-fetch.test.ts.This fixes the stalled-session path where the OpenAI SDK waited indefinitely because
ReadableStream.pull()returned before the sanitizer emitted a complete event.Real behavior proof
ReadableStream.pull()before emitting a complete parser-visible event./tmp/openclaw-stalled-proof.lin3kf, gateway port 19023, and provider endpointhttp://127.0.0.1:19024/v1served byscripts/e2e/mock-openai-server.mjs.scripts/e2e/mock-openai-server.mjson port 19024, started an isolatedpnpm openclaw gatewayon port 19023, configuredopenai/gpt-5.5to use the local provider endpoint, then sent a directcallGatewayRPCagentrequest.callGatewayRPC returnedstatus: okand assistant payload textgateway-ok.Before the fix, I restored
src/agents/provider-transport-fetch.tstoorigin/mainand ran the split-SSE regression with--testTimeout=1500; the test timed out at 1500 ms, reproducing the stalled stream behavior. After the fix, the same split-SSE regression passed in about 100 ms, and the split JSON fallback regression passed in about 121 ms.Verification
pnpm test src/agents/provider-transport-fetch.test.ts -- --reporter=verbose- 33 passedpnpm check:changed- passedpnpm test:changed- passed, 2 shards, 5 files, 65 testspnpm build- passedgit diff --check- passed