fix(agents): serialize new-session resolution per session key#85404
fix(agents): serialize new-session resolution per session key#85404openperf wants to merge 1 commit into
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 29, 2026, 10:06 AM ET / 14:06 UTC. Summary PR surface: Source +87, Tests +92. Total +179 across 3 files. Reproducibility: yes. Current main resolves a session id before the execution lane and Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the focused reservation helper after maintainer review, current CI completion, and overlap refresh, then let the linked bug close from the merged fix. Do we have a high-confidence way to reproduce the issue? Yes. Current main resolves a session id before the execution lane and Is this the best way to solve the issue? Yes. The proposed layer is narrow because AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 4925f84219ff. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +87, Tests +92. Total +179 across 3 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Clockwork Shellbean Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
3d89040 to
5d6b9ca
Compare
5d6b9ca to
8fde49f
Compare
This comment was marked as spam.
This comment was marked as spam.
|
Heads up: this PR needs to be updated against current |
8fde49f to
005ffbb
Compare
005ffbb to
cf81e99
Compare
cf81e99 to
60bce25
Compare
60bce25 to
573096e
Compare
Concurrent commands for the same session key minted separate sessionIds because resolveSession ran before the per-session execution lane, so a second OpenAI-compatible request that arrived mid-turn forked an isolated, memory-less session. Reserve the brand-new-session path per session key so concurrent commands adopt one sessionId. Fixes openclaw#84575
573096e to
3b1a3b2
Compare
Summary
x-openclaw-session-keyat the OpenAI-compatible endpoints (/v1/chat/completions,/v1/responses) can run in separate sessions when the second arrives while the first is still in flight (e.g. a follow-up message sent before the first turn finishes). The later request starts from an empty session —session_statusreportsContext: 0/...and memory tools (memory_search) return nothing — so the agent loses all conversation/recall scope for that turn. Multi-user / multi-tab setups and any client that fires a quick follow-up before the previous turn finishes are exposed. Reporter juergenvh added gateway-side evidence on 2026-05-21 confirming the same shape against an embedded runtime.resolveSession(src/agents/command/session.ts) before the per-session execution lane is entered. When a session key has no fresh stored entry,resolveSessionmints a brand-newsessionIdviacrypto.randomUUID(). Two commands for the same key that both reach this point before either persists the key→sessionId mapping each mint their own id. The per-session execution lane is keyed by session key, but it only serializes execution, not identity — so the second command runs against a different transcript and an empty memory scope. The WebChatchat.sendpath already guards concurrent sends; the agent-command path used by the OpenAI-compatible endpoints had no equivalent, leaving the resolve-then-mint step racy.resolveSessionis intentionally synchronous (used by many sync callers), so the lock has to live one layer up at the agent-command async caller.resolveSessionWithReservation(src/agents/command/session-resolution-reservation.ts) and call it from the agent-command pipeline (src/agents/agent-command.ts) in place ofresolveSession. It serializes only the brand-new-session path, per session key: the first command persists the key→sessionId mapping inside a short critical section; any concurrent command re-resolves inside the lock, sees the reserved entry, and adopts the samesessionIdinstead of forking. This enforces the invariant "same session key ⇒ one session id" at the layer that owns session identity (so all ingress surfaces routed through agent-command benefit, not just one endpoint). Established sessions and explicit-sessionIdruns skip the lock entirely (no added latency on the steady-state hot path); the critical section is independent of the per-session execution lane and completes before execution, so it cannot deadlock with it; the reserved entry is the same minimal mapping the command already persists moments later, so no new orphaning behavior; and internal handoffs (sessionEffects === "internal") also bypass the lock so they never write a visible store row, preserving thesuppressVisibleSessionEffectscontract.src/agents/command/session-resolution-reservation.ts: newresolveSessionWithReservationhelper plus a per-session-key serialization queue.ResolveSessionInputdeclaresclone?: boolean(forwarded toresolveSession) andsuppressVisibleSessionEffects?: boolean(internal-handoff opt-out).src/agents/command/session-resolution-reservation.test.ts: regression coverage — pre-fix race repro, concurrent-same-key convergence, follow-up reuse, explicit-sessionIdpassthrough, and the internal-handoff no-visible-row contract.src/agents/agent-command.ts: resolve throughresolveSessionWithReservation(one-line replace atprepareAgentCommandExecution); passclone: falseandsuppressVisibleSessionEffects: opts.sessionEffects === "internal".resolveSessionsemantics, the per-session execution lane, and the session-store schema are untouched. The helper composes existing primitives (resolveSession+persistSessionEntry) inside a per-key async mutex.sessionIdpath, ordinary single requests, established (already-persisted, fresh) sessions, and internal handoffs all bypass the reservation lock entirely.prepareAgentCommandExecutionregions touched by sibling PRs [Fix] Deliver restart recovery replies #86089 / fix(agents): persist user turn before attempt failures #86764 are in different regions of the function; this PR rebases cleanly on currentorigin/mainwith zero conflicts.CHANGELOG.mdedit (release-owned); nopackage.json/ lockfile / shrinkwrap edit (Dependency Guard not applicable).Reproduction
x-openclaw-session-key.x-openclaw-session-key.session_statusshowsContext: 0/...); it has no recall of the in-flight conversation. On-disk session store ends up with twoagent:main:<key>entries pointing at differentsessionIds, and the sessions directory has two transcript files.sessionIdand retains conversation/recall scope. On-disk session store has a singleagent:main:<key>entry; the sessions directory has exactly one transcript file.Real behavior proof
x-openclaw-session-keymust converge on onesessionIdand one transcript; once the first command persists the key→sessionId mapping, any concurrent same-key command re-resolves inside the lock and adopts the same id instead of forking an isolated, memory-less session.POST /v1/chat/completions, routed to a local OpenAI-compatible model endpoint with an isolated on-disk session store. Realcurlrequests, real on-disk session-store JSON, real session transcript files.openclaw gateway run(this branch) bound to the OpenAI-compatible endpoints; fired two concurrentcurlcalls to/v1/chat/completionscarrying an identicalx-openclaw-session-key: final-1779465436; inspected the on-disk session store and the per-session transcript files. Unit-level regression:node scripts/run-vitest.mjs src/agents/command/session-resolution-reservation.test.ts.sessionIdand one transcript file — the second request joined the in-flight session instead of forking an isolated, memory-less one. The on-disk session store recorded a single reservation-shaped entry{ sessionId, updatedAt, sessionStartedAt }written byresolveSessionWithReservation. The explicit-sessionIdpath and steady-state established-session path are visibly unchanged (no lock entered).node scripts/run-vitest.mjs src/agents/command/session-resolution-reservation.test.ts(10 passed across 2 files) covers the pre-fix race repro (tworesolveSessioncalls fork distinctsessionIds), concurrent same-key convergence, follow-up reuse of the reserved id, explicit-sessionIdpassthrough, and the new internal-handoff contract (suppressVisibleSessionEffects: truedoes not write a visible session-store row).origin/mainwith zero conflicts.Risk / Mitigation
sessionIdruns, and internal handoffs bypass it, so the steady-state hot path is unchanged. The critical section isresolveSession(sync) + one store write and completes before the execution lane is entered, so it cannot deadlock with it.sessionIdmapping the command persists moments later anyway; the orphan window matches the pre-fixcrypto.randomUUID()+persistSessionEntrywindow — no new orphaning shape.prepareAgentCommandExecution. Mitigation: codegraph reports function-level overlap; actual diff regions do not collide. This PR rebases cleanly onto currentorigin/main(zero conflict). Merge order is independent.ResolveSessionInputomittedclone, which the call site still passes; (2) reservation wrote a visible session-store row beforeagentCommandInternalderivedsuppressVisibleSessionEffectsfromopts.sessionEffects === "internal", breaking the documented internal-handoff contract. Mitigation: both addressed —clone?: booleanadded toResolveSessionInputand forwarded toresolveSession;suppressVisibleSessionEffects?: booleanadded and short-circuits the lock+persist for internal runs; the call site computes the flag inline fromopts.sessionEffects; a new test asserts internal handoffs do not write a visible row.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Fixes #84575