fix: prune replay control messages by steipete · Pull Request #82242 · openclaw/openclaw

steipete · 2026-05-15T17:35:06Z

Summary

Problem: restart and heartbeat replay paths could feed persisted internal runtime metadata or silent control payloads back into provider context.
Why it matters: replaying NO_REPLY, copied runtime context, or inbound metadata can change the next model turn and leak control text into user-visible recovery.
What changed: sanitize pending final-delivery text at capture and replay boundaries, strip copied runtime/inbound metadata from assistant replay, and drop standalone silent assistant replay turns.
What did NOT change (scope boundary): heartbeat ack-size classification still owns HEARTBEAT_OK decisions; provider/tool replay repair behavior outside display/control text pruning is unchanged.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes [Bug] replay should prune internal/no-op control messages before provider context #76629
This PR fixes a bug or regression

Real behavior proof

Behavior addressed: restart/heartbeat/provider replay no longer preserves internal runtime context, inbound metadata, or standalone silent control replies in model-visible replay text.
Real environment tested: local OpenClaw source checkout on macOS, Node/Vitest repo test lanes, plus a direct local OpenClaw source command invoking the landed sanitizer.
Exact steps or command run after this patch: pnpm exec tsx -e ...sanitizePendingFinalDeliveryText(...); supplemental regression suite: pnpm test src/auto-reply/reply/pending-final-delivery.test.ts src/agents/pi-embedded-runner/replay-history.test.ts src/agents/main-session-restart-recovery.test.ts src/auto-reply/reply/get-reply.fast-path.test.ts src/gateway/chat-sanitize.test.ts -- --reporter=verbose
Evidence after fix: terminal output from local OpenClaw source command:

$ pnpm exec tsx -e 'import { sanitizePendingFinalDeliveryText } from "./src/auto-reply/reply/pending-final-delivery.ts"; ...'
dirty -> "Visible reply"
mixed -> "The user is saying hello"
heartbeat -> "HEARTBEAT_OK short"

Supplemental regression output: 4 Vitest shards passed; 62 tests passed after rebase onto current origin/main.
Observed result after fix: direct terminal capture shows runtime-context text stripped to Visible reply, glued NO_REPLY stripped to user-visible text, and HEARTBEAT_OK short preserved for the ack-aware classifier; regression tests also passed for pending final-delivery recovery, provider replay, restart recovery, heartbeat replay, and gateway display sanitization.
What was not tested: live provider restart/heartbeat replay against an external channel; coverage is focused regression tests plus local sanitizer/replay seams.

Root Cause

Root cause: pending final-delivery and provider replay boundaries reused persisted text that normal display/delivery paths sanitize elsewhere.
Missing detection / guardrail: no regression coverage for stale pending-final-delivery text containing runtime context or silent control payloads.
Contributing context: gateway display sanitization, provider replay sanitization, and pending delivery recovery duplicated overlapping text cleanup responsibilities.

Regression Test Plan

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/auto-reply/reply/pending-final-delivery.test.ts, src/agents/pi-embedded-runner/replay-history.test.ts, src/auto-reply/reply/get-reply.fast-path.test.ts, src/agents/main-session-restart-recovery.test.ts, src/gateway/chat-sanitize.test.ts
Scenario the test should lock in: stale pending/replay text strips internal metadata and silent sentinels without breaking heartbeat ack classification.
Why this is the smallest reliable guardrail: the bug is at text-normalization and replay boundaries, so focused seam tests cover the branchy behavior without live provider flake.
Existing test that already covers this (if any): gateway inbound metadata stripping had partial coverage; pending delivery and provider replay control text did not.
If no new test is added, why not: N/A

User-visible / Behavior Changes

Restart and heartbeat recovery no longer feed OpenClaw internal runtime metadata or NO_REPLY control text back into the next model turn.

Diagram

Before:
[persisted pending/replay text] -> [provider/recovery prompt] -> [internal metadata/control text visible to model]

After:
[persisted pending/replay text] -> [boundary sanitizer] -> [provider/recovery prompt with user-visible text only]

Security Impact

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: macOS
Runtime/container: local Node 22 OpenClaw checkout
Model/provider: mocked/unit replay paths
Integration/channel (if any): gateway/auto-reply/replay seams
Relevant config (redacted): default test configs

Steps

Seed pending final-delivery or assistant replay text with runtime-context delimiters, inbound metadata, and/or NO_REPLY.
Run pending delivery, heartbeat replay, restart recovery, and provider replay sanitizers.
Assert provider/recovery text contains only user-visible content and heartbeat ack classification remains ack-aware.

Expected

Internal metadata and standalone silent control text are removed before replay.
Heartbeat ack text is not stripped before ackMaxChars classification.

Actual

Matches expected after this patch.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification

Verified scenarios: focused replay, pending final-delivery, heartbeat pending replay, restart recovery, and gateway display sanitizer tests.
Edge cases checked: exact NO_REPLY, JSON NO_REPLY, mixed leading/trailing NO_REPLY, copied internal runtime context, inbound metadata, and HEARTBEAT_OK short ack preservation.
What you did not verify: live external channel restart replay.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: over-stripping heartbeat acknowledgements before replay.
- Mitigation: preserve HEARTBEAT_OK in the generic sanitizer and keep ack-size stripping in the existing heartbeat classifier; regression test covers HEARTBEAT_OK short.

steipete · 2026-05-15T17:35:32Z

Verification for landing:

Local focused tests: pnpm test src/auto-reply/reply/pending-final-delivery.test.ts src/agents/pi-embedded-runner/replay-history.test.ts src/agents/main-session-restart-recovery.test.ts src/auto-reply/reply/get-reply.fast-path.test.ts src/gateway/chat-sanitize.test.ts -- --reporter=verbose
Result: passed after rebase onto current origin/main; 4 Vitest shards, 62 tests.
Diff hygiene: git diff --check origin/main...HEAD && node scripts/check-changelog-attributions.mjs
Result: passed.
Review: /Users/steipete/Projects/agent-scripts/skills/codex-review/scripts/codex-review
Result: clean; no accepted/actionable findings.

Known proof gap: no live external-channel restart replay; covered with focused pending-delivery, heartbeat replay, restart-recovery, provider replay, and gateway sanitizer regression tests.

steipete · 2026-05-15T17:38:39Z

Landed via rebase onto main.

Source commit: 99cc779
Landed commit: daef8e7
Gate: focused replay/pending-delivery/gateway tests passed locally; diff hygiene and changelog attribution passed; codex-review clean.
Real behavior proof gate: passed after PR body terminal-capture proof update.

Thanks @steipete!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99cc7795b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-15T17:39:08Z

+import { stripInternalMetadataForDisplay } from "./display-text-sanitize.js";
+
+export function sanitizePendingFinalDeliveryText(text: string): string {
+  let stripped = stripInternalMetadataForDisplay(text).trim();


Avoid stripping user-visible timestamp prefixes in pending replay text

Calling stripInternalMetadataForDisplay on every pending-final-delivery string can silently alter legitimate assistant output, because that sanitizer removes leading timestamp-like prefixes (via the inbound-metadata timestamp regex) even when the text is not OpenClaw metadata. If a reply starts with content like [Mon 2026-05-15 09:30 UTC] ... and delivery is retried/replayed, the recovered message no longer matches what the model produced, which corrupts user-visible replay behavior.

Useful? React with 👍 / 👎.

fix: prune replay control messages

99cc779

openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling size: M maintainer Maintainer-authored PR labels May 15, 2026

steipete merged commit daef8e7 into main May 15, 2026
94 of 101 checks passed

steipete deleted the fix/replay-prune-control-messages-76629 branch May 15, 2026 17:35

steipete added the proof: override Maintainer override for the external PR real behavior proof gate. label May 15, 2026

steipete mentioned this pull request May 15, 2026

[Bug] replay should prune internal/no-op control messages before provider context #76629

Closed

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prune replay control messages#82242

fix: prune replay control messages#82242
steipete merged 1 commit into
mainfrom
fix/replay-prune-control-messages-76629

steipete commented May 15, 2026 •

edited

Loading

Uh oh!

steipete commented May 15, 2026

Uh oh!

Uh oh!

steipete commented May 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

steipete commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Real behavior proof

Root Cause

Regression Test Plan

User-visible / Behavior Changes

Diagram

Security Impact

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

steipete commented May 15, 2026

Uh oh!

Uh oh!

steipete commented May 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

steipete commented May 15, 2026 •

edited

Loading