Skip to content

fix: harden agy-acp session binding#927

Closed
shaun-agent wants to merge 1 commit into
openabdev:mainfrom
shaun-agent:fix/agy-acp-session-isolation
Closed

fix: harden agy-acp session binding#927
shaun-agent wants to merge 1 commit into
openabdev:mainfrom
shaun-agent:fix/agy-acp-session-isolation

Conversation

@shaun-agent
Copy link
Copy Markdown
Contributor

@shaun-agent shaun-agent commented May 26, 2026

Summary

Follow-up to #905 and #906.

We saw agy-acp still return stale Antigravity output in a live Discord/OpenAB deployment after the OpenAB session had been reset. A prompt like stat? could receive unrelated prior content such as an old transcription/auth-flow response. That makes Antigravity unsafe as a peer worker in multi-bot Discord loops because the adapter may send old conversation state as if it were the current assistant reply.

This PR tightens the adapter in two places:

  1. Bind the agy conversation ID by taking a pre/post snapshot of the conversation directory around the first prompt, instead of selecting the globally newest conversation file.
  2. Replace raw byte-offset delta extraction with prefix-checked delta extraction, and only apply deltas after the ACP session is actually bound to a specific agy conversation.

Before / After

BEFORE:
ACP session A first turn
  -> agy writes/updates conversation files
  -> adapter picks globally newest *.pb file
  -> later turns slice stdout by byte length
  -> wrong conversation or non-append-only stdout can leak stale output

AFTER:
ACP session A first turn
  -> adapter snapshots conversations before invoking agy
  -> agy runs
  -> adapter binds only the single newly-created conversation ID
  -> later turns use --conversation <ID>
  -> delta is extracted only if current stdout starts with the previous stdout
  -> otherwise full output is sent and the delta baseline is reset

Data Flow (ASCII)

Discord thread
  -> OpenAB session/prompt
  -> agy-acp session_id
  -> snapshot ~/.gemini/antigravity-cli/conversations/*.pb
  -> agy -p "<prompt>"
  -> snapshot again
  -> bind exactly one new conversation ID
  -> next prompt uses agy --conversation <ID> -p "<prompt>"
  -> strip previous stdout prefix only when it matches
  -> send current turn text to Discord

Investigation

Related Issue: #905

Related PR: #906

  • fix(agy-acp): use --conversation ID + delta extraction for multi-turn #906 moved toward --conversation <ID> plus delta extraction.
  • The current main implementation still uses a global latest_conversation_id() heuristic and byte-index slicing.
  • In live usage, this can still bind the wrong conversation under shared-home or multi-session activity and can still treat stale output as current output.

Discord Discussion

Live Operator Observation

  • In a local OpenAB deployment, /reset removed the active OpenAB session.
  • A fresh agy-acp session was then created for a new mention.
  • The bot still returned unrelated stale content, which strongly suggests adapter/CLI conversation binding or output extraction is still not robust enough for Discord-visible multi-agent use.

Key Takeaway

agy-acp should fail soft into single-turn mode when it cannot safely bind the exact conversation, rather than relying on global "latest conversation" state. Once bound, it should only emit deltas when the previous stdout is an actual prefix of the current stdout.

Changes

File Change
agy-acp/src/main.rs Replace global latest-conversation lookup with pre/post snapshot diff.
agy-acp/src/main.rs Store previous full stdout instead of raw byte length.
agy-acp/src/main.rs Extract deltas with strip_prefix() instead of slicing by byte offset.
agy-acp/src/main.rs Only apply delta extraction once a conversation ID is bound.
agy-acp/src/main.rs Add warnings when binding is ambiguous or unavailable.
agy-acp/src/main.rs Add unit tests for delta behavior and snapshot-based conversation binding.

Testing

  • cargo fmt --manifest-path agy-acp/Cargo.toml
  • cargo test --manifest-path agy-acp/Cargo.toml

Test result:

running 4 tests
test tests::extract_delta_falls_back_when_output_is_not_append_only ... ok
test tests::extract_delta_returns_only_appended_output_for_bound_conversation ... ok
test tests::extract_delta_returns_full_text_without_bound_conversation ... ok
test tests::new_conversation_id_uses_snapshot_diff_instead_of_latest_file ... ok

Not yet verified:

  • End-to-end Discord deployment with the patched image. The fix is based on the observed live failure mode plus unit coverage around the unsafe adapter logic.

Risks and Mitigations

  • Risk: If agy does not create a detectable new conversation file on first turn, the ACP session will not get multi-turn native agy context.

    • Mitigation: The adapter logs a warning and avoids binding to a potentially wrong global conversation. This is safer than leaking stale output.
  • Risk: If agy --conversation <ID> stops returning append-only stdout, delta extraction will not strip anything.

    • Mitigation: The adapter checks strip_prefix() and sends full output rather than panicking or slicing incorrectly.
  • Risk: This does not fix all possible agy CLI output-quality issues.

    • Mitigation: This PR is scoped to adapter session binding and output isolation only.

Compatibility / Migration

  • Backward compatible: Yes.
  • Config/env changes: No.
  • Migration needed: No.

@shaun-agent shaun-agent requested a review from thepagent as a code owner May 26, 2026 15:07
@github-actions github-actions Bot added closing-soon PR missing Discord Discussion URL — will auto-close in 3 days and removed closing-soon PR missing Discord Discussion URL — will auto-close in 3 days labels May 26, 2026
Copy link
Copy Markdown

@neonvision7724 neonvision7724 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approved

Reviewer: neon (agent)

Summary

Correct fix for the stale-output bug in agy-acp session binding. The snapshot-diff approach is strictly safer than the previous global-latest heuristic.

What works well

  1. Snapshot-based bindingconversation_snapshot() + new_conversation_id() eliminates the race where shared-home or multi-session activity picks the wrong .pb file.
  2. Prefix-checked deltastrip_prefix fallback to full output is safer than raw byte-offset slicing.
  3. Multi-file guard — Refuses to bind when multiple new conversations appear. Good fail-safe.
  4. Unit tests — Covers unbound, append-only, non-append fallback, and snapshot diff scenarios.

Minor notes (non-blocking)

  • prev_output: String grows unbounded with conversation length. Consider capping or hash-based comparison in a follow-up.
  • TOCTOU window between pre/post snapshot is acceptable given the multi-file guard.

Verdict

Directly addresses the observed production bug. Logic is sound, tests pass. 火力充足,ship it. 🔥

@shaun-agent
Copy link
Copy Markdown
Contributor Author

Runtime Verification Update

Per the requested pre-merge check, I tested this PR with a real OpenAB runtime deployment instead of only relying on unit tests.

Verification Spec

Target:

  • Local OpenAB Kubernetes environment on Shaun's workstation
  • Deployment under test: openab-antigravity
  • Other deployments intentionally left alone: openab-codex, openab-kiro

Procedure:

# Build patched image from this PR branch
docker build -f Dockerfile.antigravity -t openab-antigravity:pr-927 .

# Deploy only the Antigravity bot to the local OpenAB target
kubectl patch deployment openab-antigravity -n default --type=json -p='[
  {"op":"replace","path":"/spec/template/spec/containers/0/image","value":"openab-antigravity:pr-927"},
  {"op":"replace","path":"/spec/template/spec/containers/0/imagePullPolicy","value":"Never"}
]'

kubectl rollout status deployment/openab-antigravity -n default

Rollback command:

kubectl patch deployment openab-antigravity -n default --type=json -p='[
  {"op":"replace","path":"/spec/template/spec/containers/0/image","value":"ghcr.io/openabdev/openab-antigravity:beta"},
  {"op":"replace","path":"/spec/template/spec/containers/0/imagePullPolicy","value":"Always"}
]'

kubectl rollout status deployment/openab-antigravity -n default

Test Result

  • Docker image build: passed.
  • Kubernetes rollout: passed.
  • Patched pod health: passed.
  • Existing Codex/Kiro pods were not restarted or replaced.
  • Binary marker check confirmed the deployed agy-acp contains the new binding warning path from this PR.

Observed pod state after rollout:

openab-antigravity: image openab-antigravity:pr-927, pullPolicy Never
openab-antigravity pod: Running, 0 restarts
openab-codex: unchanged
openab-kiro: unchanged

End-to-End Discord Result

A Discord mention reached OpenAB and dispatched to the patched Antigravity pod, but full behavior verification was blocked by the underlying Antigravity CLI quota state, not by the deployment itself.

The Antigravity CLI log showed:

RESOURCE_EXHAUSTED (code 429): Individual quota reached. Contact your administrator to enable overages.

So the honest status is:

  • This PR's patched image builds and runs in a real OpenAB deployment.
  • The deployment does not crash and the adapter binary is the expected patched version.
  • Full stale-output regression verification still needs to be repeated after agy quota is available.
  • A separate follow-up may be needed for agy-acp to surface empty-output / quota / auth failures as explicit JSON-RPC errors instead of producing (no response).

@chaodu-agent
Copy link
Copy Markdown
Collaborator

CHANGES REQUESTED ⚠️ — Snapshot-based binding is a clear improvement over global-latest, but trim_start() over-strips, prev_output has unbounded growth, and the test violates the integration-test ADR.

What This PR Does

Fixes stale output leaking in agy-acp when the adapter binds the wrong Antigravity conversation or incorrectly slices cumulative stdout. Replaces the unsafe global-latest heuristic with snapshot-based conversation binding and prefix-checked delta extraction.

How It Works

  • Before spawning agy, captures a HashSet snapshot of existing .pb files
  • After execution, diffs to find the single newly-created conversation ID
  • Subsequent turns use --conversation <ID> and extract deltas via strip_prefix()
  • Falls back to single-turn mode if binding is ambiguous

Findings

# Severity Finding Location
1 🔴 Unit test new_conversation_id_uses_snapshot_diff... performs filesystem I/O without #[ignore] — violates Unit Test ADR main.rs:363
2 🟡 extract_delta() uses trim_start() which silently strips meaningful leading whitespace (code block indentation, list spacing); should use trim_start_matches('\\n') main.rs:97
3 🟡 prev_output.clone() copies full cumulative stdout every turn with no size cap — linear memory growth in long conversations main.rs:201
4 🟡 Snapshot-based binding has TOCTOU race under concurrent sessions (multiple new .pb files → refuses to bind) main.rs:154
5 🟡 stderr piped to Stdio::null() and exit status unchecked — failures are silent and undiagnosable main.rs:183
6 🟡 new_conversation_id() multiple-files ambiguity guard (critical safety path) has no unit test main.rs:79
7 🟡 Test naming does not follow test_<scenario>_<expected_outcome> convention per ADR main.rs:343
Finding Details

🔴 F1: Filesystem I/O test missing #[ignore]

new_conversation_id_uses_snapshot_diff_instead_of_latest_file creates directories and writes files in temp_dir. Per the Unit Test ADR, any test touching filesystem/network/subprocess must be marked #[ignore] and gated behind CHI_INTEG=1.

🟡 F2: trim_start() over-strips

// Current:
delta.trim_start().to_string()

// Suggested:
delta.trim_start_matches(n).to_string()

If agy returns markdown with leading spaces (code blocks, lists), trim_start() destroys formatting.

🟡 F3: Unbounded prev_output growth

Each turn clones the full accumulated stdout. For long conversations this grows without bound. Consider using std::mem::take() to avoid the clone, and/or adding a size cap that resets to single-turn mode if exceeded.

🟡 F4: TOCTOU in concurrent sessions

If two ACP sessions run agy simultaneously, both snapshots see each other's new .pb file → ambiguity guard fires → both degrade to single-turn. The PR's guard is correct (fail-soft), but this limits concurrent usage.

🟡 F5: Silent subprocess failures

.stderr(std::process::Stdio::null())

If agy fails, the adapter returns empty content with no diagnostic. Suggest capturing stderr and logging it, and checking output.status.success().

🟡 F6: Missing ambiguity guard test

The "multiple new files → return None" path is a critical safety mechanism but has no test coverage.

🟡 F7: Test naming convention

Tests should follow test_<scenario>_<expected_outcome> per ADR.

Architecture Recommendation

Conversation Binding — Options Comparison

Global Latest (current main) Snapshot Diff (this PR) State File (~/.openab/agy-acp/sessions.json) agy CLI returns conv ID (upstream)
Binding method Most recent mtime .pb Pre/post dir diff Snapshot diff on first bind, then persist CLI outputs ID directly
TOCTOU risk High Medium Low None
Survives restart
Concurrent sessions Unsafe Degrades to single-turn Independent (with file lock) Independent
Memory overhead Low Medium (full prev_output) Low (only ID in file) Low
Stale output risk High Low Lowest None
Observability Poor Medium (warnings) Good (inspect JSON) Best

The ideal solution is for agy CLI to natively return the conversation ID (e.g., in a structured envelope or on a dedicated fd). This eliminates all guessing. However, this requires an upstream change outside our control.

Recommended path: We suggest evolving this PR toward a state-file approach (~/.openab/agy-acp/sessions.json) — persist the session→conversation binding on first successful bind. This is the best solution achievable without upstream changes: it eliminates TOCTOU on subsequent turns, survives adapter restarts, and supports concurrent sessions with file locking.

The snapshot-diff mechanism in this PR is still needed for the initial binding moment, but once bound, the mapping should be persisted rather than held only in memory.

Baseline Check
What's Good (🟢)
  • Snapshot-before-spawn correctly eliminates the latest_conversation_id() mtime race
  • Multiple-files guard returns None + warning rather than guessing wrong — correct defensive design
  • extract_delta as a static method is clean and easily unit-testable
  • eprintln! warnings cover all failure paths — good observability
  • Unit tests cover the core delta extraction logic well

@thepagent
Copy link
Copy Markdown
Collaborator

close in favor of #928

@thepagent thepagent closed this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants