fix: prune replay control messages#82242
Conversation
|
Verification for landing:
Known proof gap: no live external-channel restart replay; covered with focused pending-delivery, heartbeat replay, restart-recovery, provider replay, and gateway sanitizer regression tests. |
|
Landed via rebase onto main.
Thanks @steipete! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 99cc7795b9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| import { stripInternalMetadataForDisplay } from "./display-text-sanitize.js"; | ||
|
|
||
| export function sanitizePendingFinalDeliveryText(text: string): string { | ||
| let stripped = stripInternalMetadataForDisplay(text).trim(); |
There was a problem hiding this comment.
Avoid stripping user-visible timestamp prefixes in pending replay text
Calling stripInternalMetadataForDisplay on every pending-final-delivery string can silently alter legitimate assistant output, because that sanitizer removes leading timestamp-like prefixes (via the inbound-metadata timestamp regex) even when the text is not OpenClaw metadata. If a reply starts with content like [Mon 2026-05-15 09:30 UTC] ... and delivery is retried/replayed, the recovered message no longer matches what the model produced, which corrupts user-visible replay behavior.
Useful? React with 👍 / 👎.
Summary
NO_REPLY, copied runtime context, or inbound metadata can change the next model turn and leak control text into user-visible recovery.HEARTBEAT_OKdecisions; provider/tool replay repair behavior outside display/control text pruning is unchanged.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Real behavior proof
Behavior addressed: restart/heartbeat/provider replay no longer preserves internal runtime context, inbound metadata, or standalone silent control replies in model-visible replay text.
Real environment tested: local OpenClaw source checkout on macOS, Node/Vitest repo test lanes, plus a direct local OpenClaw source command invoking the landed sanitizer.
Exact steps or command run after this patch:
pnpm exec tsx -e ...sanitizePendingFinalDeliveryText(...); supplemental regression suite:pnpm test src/auto-reply/reply/pending-final-delivery.test.ts src/agents/pi-embedded-runner/replay-history.test.ts src/agents/main-session-restart-recovery.test.ts src/auto-reply/reply/get-reply.fast-path.test.ts src/gateway/chat-sanitize.test.ts -- --reporter=verboseEvidence after fix: terminal output from local OpenClaw source command:
Supplemental regression output: 4 Vitest shards passed; 62 tests passed after rebase onto current origin/main.
Observed result after fix: direct terminal capture shows runtime-context text stripped to
Visible reply, gluedNO_REPLYstripped to user-visible text, andHEARTBEAT_OK shortpreserved for the ack-aware classifier; regression tests also passed for pending final-delivery recovery, provider replay, restart recovery, heartbeat replay, and gateway display sanitization.What was not tested: live provider restart/heartbeat replay against an external channel; coverage is focused regression tests plus local sanitizer/replay seams.
Root Cause
Regression Test Plan
src/auto-reply/reply/pending-final-delivery.test.ts,src/agents/pi-embedded-runner/replay-history.test.ts,src/auto-reply/reply/get-reply.fast-path.test.ts,src/agents/main-session-restart-recovery.test.ts,src/gateway/chat-sanitize.test.tsUser-visible / Behavior Changes
Restart and heartbeat recovery no longer feed OpenClaw internal runtime metadata or
NO_REPLYcontrol text back into the next model turn.Diagram
Security Impact
Yes, explain risk + mitigation: N/ARepro + Verification
Environment
Steps
NO_REPLY.Expected
ackMaxCharsclassification.Actual
Evidence
Human Verification
NO_REPLY, JSONNO_REPLY, mixed leading/trailingNO_REPLY, copied internal runtime context, inbound metadata, andHEARTBEAT_OK shortack preservation.Review Conversations
Compatibility / Migration
Risks and Mitigations
HEARTBEAT_OKin the generic sanitizer and keep ack-size stripping in the existing heartbeat classifier; regression test coversHEARTBEAT_OK short.