fix: harden agy-acp session binding#927
Conversation
neonvision7724
left a comment
There was a problem hiding this comment.
✅ Approved
Reviewer: neon (agent)
Summary
Correct fix for the stale-output bug in agy-acp session binding. The snapshot-diff approach is strictly safer than the previous global-latest heuristic.
What works well
- Snapshot-based binding —
conversation_snapshot()+new_conversation_id()eliminates the race where shared-home or multi-session activity picks the wrong.pbfile. - Prefix-checked delta —
strip_prefixfallback to full output is safer than raw byte-offset slicing. - Multi-file guard — Refuses to bind when multiple new conversations appear. Good fail-safe.
- Unit tests — Covers unbound, append-only, non-append fallback, and snapshot diff scenarios.
Minor notes (non-blocking)
prev_output: Stringgrows unbounded with conversation length. Consider capping or hash-based comparison in a follow-up.- TOCTOU window between pre/post snapshot is acceptable given the multi-file guard.
Verdict
Directly addresses the observed production bug. Logic is sound, tests pass. 火力充足,ship it. 🔥
Runtime Verification UpdatePer the requested pre-merge check, I tested this PR with a real OpenAB runtime deployment instead of only relying on unit tests. Verification SpecTarget:
Procedure: # Build patched image from this PR branch
docker build -f Dockerfile.antigravity -t openab-antigravity:pr-927 .
# Deploy only the Antigravity bot to the local OpenAB target
kubectl patch deployment openab-antigravity -n default --type=json -p='[
{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"openab-antigravity:pr-927"},
{"op":"replace","path":"/spec/template/spec/containers/0/imagePullPolicy","value":"Never"}
]'
kubectl rollout status deployment/openab-antigravity -n defaultRollback command: kubectl patch deployment openab-antigravity -n default --type=json -p='[
{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"ghcr.io/openabdev/openab-antigravity:beta"},
{"op":"replace","path":"/spec/template/spec/containers/0/imagePullPolicy","value":"Always"}
]'
kubectl rollout status deployment/openab-antigravity -n defaultTest Result
Observed pod state after rollout: End-to-End Discord ResultA Discord mention reached OpenAB and dispatched to the patched Antigravity pod, but full behavior verification was blocked by the underlying Antigravity CLI quota state, not by the deployment itself. The Antigravity CLI log showed: So the honest status is:
|
|
CHANGES REQUESTED What This PR DoesFixes stale output leaking in How It Works
Findings
Finding Details🔴 F1: Filesystem I/O test missing
|
| Global Latest (current main) | Snapshot Diff (this PR) | State File (~/.openab/agy-acp/sessions.json) |
agy CLI returns conv ID (upstream) | |
|---|---|---|---|---|
| Binding method | Most recent mtime .pb |
Pre/post dir diff | Snapshot diff on first bind, then persist | CLI outputs ID directly |
| TOCTOU risk | High | Medium | Low | None |
| Survives restart | ❌ | ❌ | ✅ | ✅ |
| Concurrent sessions | Unsafe | Degrades to single-turn | Independent (with file lock) | Independent |
| Memory overhead | Low | Medium (full prev_output) | Low (only ID in file) | Low |
| Stale output risk | High | Low | Lowest | None |
| Observability | Poor | Medium (warnings) | Good (inspect JSON) | Best |
The ideal solution is for agy CLI to natively return the conversation ID (e.g., in a structured envelope or on a dedicated fd). This eliminates all guessing. However, this requires an upstream change outside our control.
Recommended path: We suggest evolving this PR toward a state-file approach (~/.openab/agy-acp/sessions.json) — persist the session→conversation binding on first successful bind. This is the best solution achievable without upstream changes: it eliminates TOCTOU on subsequent turns, survives adapter restarts, and supports concurrent sessions with file locking.
The snapshot-diff mechanism in this PR is still needed for the initial binding moment, but once bound, the mapping should be persisted rather than held only in memory.
Baseline Check
- PR opened: 2026-05-26
- Main already has:
--conversation <ID>support + byte-offset delta extraction (from fix(agy-acp): use --conversation ID + delta extraction for multi-turn #906) - Net-new value: Snapshot-based binding (eliminates global-latest race) + prefix-checked delta (eliminates byte-slice corruption) + fail-soft single-turn fallback
What's Good (🟢)
- Snapshot-before-spawn correctly eliminates the
latest_conversation_id()mtime race - Multiple-files guard returns
None+ warning rather than guessing wrong — correct defensive design extract_deltaas a static method is clean and easily unit-testableeprintln!warnings cover all failure paths — good observability- Unit tests cover the core delta extraction logic well
|
close in favor of #928 |
Summary
Follow-up to #905 and #906.
We saw
agy-acpstill return stale Antigravity output in a live Discord/OpenAB deployment after the OpenAB session had been reset. A prompt likestat?could receive unrelated prior content such as an old transcription/auth-flow response. That makes Antigravity unsafe as a peer worker in multi-bot Discord loops because the adapter may send old conversation state as if it were the current assistant reply.This PR tightens the adapter in two places:
agyconversation ID by taking a pre/post snapshot of the conversation directory around the first prompt, instead of selecting the globally newest conversation file.agyconversation.Before / After
Data Flow (ASCII)
Investigation
Related Issue: #905
agy --continue -pemitted cumulative conversation output andagy-acpforwarded it verbatim.Related PR: #906
--conversation <ID>plus delta extraction.mainimplementation still uses a globallatest_conversation_id()heuristic and byte-index slicing.Discord Discussion
Live Operator Observation
/resetremoved the active OpenAB session.agy-acpsession was then created for a new mention.Key Takeaway
agy-acpshould fail soft into single-turn mode when it cannot safely bind the exact conversation, rather than relying on global "latest conversation" state. Once bound, it should only emit deltas when the previous stdout is an actual prefix of the current stdout.Changes
agy-acp/src/main.rsagy-acp/src/main.rsagy-acp/src/main.rsstrip_prefix()instead of slicing by byte offset.agy-acp/src/main.rsagy-acp/src/main.rsagy-acp/src/main.rsTesting
cargo fmt --manifest-path agy-acp/Cargo.tomlcargo test --manifest-path agy-acp/Cargo.tomlTest result:
Not yet verified:
Risks and Mitigations
Risk: If
agydoes not create a detectable new conversation file on first turn, the ACP session will not get multi-turn nativeagycontext.Risk: If
agy --conversation <ID>stops returning append-only stdout, delta extraction will not strip anything.strip_prefix()and sends full output rather than panicking or slicing incorrectly.Risk: This does not fix all possible
agyCLI output-quality issues.Compatibility / Migration