Skip to content

Suppress heartbeat fallback after message-tool delivery#84273

Closed
TurboTheTurtle wants to merge 4 commits into
openclaw:mainfrom
TurboTheTurtle:fix/heartbeat-message-tool-fallback-84217
Closed

Suppress heartbeat fallback after message-tool delivery#84273
TurboTheTurtle wants to merge 4 commits into
openclaw:mainfrom
TurboTheTurtle:fix/heartbeat-message-tool-fallback-84217

Conversation

@TurboTheTurtle
Copy link
Copy Markdown
Contributor

Summary

Fixes #84217.

  • Carries internal message-tool delivery evidence on reply payload metadata after a real message tool send succeeds in message-tool-only mode.
  • Makes the heartbeat runner suppress its fallback text when that delivery evidence is present, instead of sending duplicate narration through the heartbeat channel.
  • Adds a heartbeat regression covering the issue shape: message-tool delivery evidence plus stray fallback text.

Root cause

Heartbeat dispatch only understood the heartbeat_respond tool channel data. When non-Codex providers produced a generic message tool call and also returned visible fallback text, runHeartbeatOnce treated the fallback text as a normal heartbeat reply and sent it to Telegram.

Real behavior proof

Behavior or issue addressed:
Heartbeat message-tool-only dispatch no longer sends fallback text when a visible generic message tool send has already succeeded.

Real environment tested:
Local OpenClaw source checkout with production runHeartbeatOnce, production reply payload metadata helpers, and a real temporary session store on macOS. The proof used an in-memory Telegram sender so the dispatch path could be observed without contacting Telegram.

Exact steps or command run after this patch:
PATH=/Users/andy/.cache/codex-runtimes/codex-primary-runtime/dependencies/node/bin:$PATH node --import tsx --input-type=module --eval '<script importing runHeartbeatOnce, seeding a temp session store, returning markReplyPayloadForMessageToolDelivery({ text: "fallback narration that would duplicate the message tool" }) from getReplyFromConfig, and recording Telegram send calls>'

Evidence after fix:

{
  "result": {
    "status": "ran"
  },
  "replyCalls": [
    {
      "sourceReplyDeliveryMode": "message_tool_only",
      "enableHeartbeatTool": true,
      "forceHeartbeatTool": true
    }
  ],
  "telegramSendCallCount": 0,
  "heartbeatEvent": {
    "status": "sent",
    "preview": "fallback narration that would duplicate the message tool",
    "channel": "telegram"
  }
}

Observed result after fix:
The heartbeat run still uses message-tool-only mode, records a successful heartbeat event, and does not call the Telegram fallback sender for the duplicate narration.

What was not tested:
No live Telegram API call was made; the proof observes the production heartbeat dispatch decision before network delivery.

Validation

  • node scripts/run-vitest.mjs src/infra/heartbeat-runner.tool-response.test.ts
  • node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-payloads.test.ts src/auto-reply/reply/dispatch-from-config.test.ts
  • git diff --check

Attribution

If maintainers squash or rework this PR, please preserve author attribution or include:

Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com>

@openclaw-barnacle openclaw-barnacle Bot added size: S proof: supplied External PR includes structured after-fix real behavior proof. labels May 19, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 19, 2026

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds internal reply-payload metadata for successful generic message-tool sends, suppresses heartbeat fallback delivery when that metadata is present, and adds heartbeat/runReplyAgent regression coverage.

Reproducibility: yes. from source inspection, but not by a live run in this read-only review. Current main allows a generic message-tool send plus free text to leave heartbeatToolResponse undefined and then deliver the free-text fallback through sendDurableMessageBatch.

PR rating
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🦐 gold shrimp
Summary: The implementation is focused and well aimed, but proof is below the Telegram-visible bar and the patch has one route-scoping blocker.

Rank-up moves:

  • Route-scope the message-tool delivery marker using the existing dedupe route semantics.
  • Add redacted live Telegram proof or a Mantis artifact showing no duplicate fallback message after a heartbeat message-tool send.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

PR egg
🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature, rarity, or ASCII portrait is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

Real behavior proof
Needs stronger real behavior proof before merge: The PR body includes terminal-style after-fix output from production heartbeat code, but it uses an in-memory Telegram sender; this Telegram-visible delivery change still needs a redacted live transcript, recording, transport log, linked artifact, Mantis proof, or explicit maintainer proof override before merge. After proof is added to the PR body, ClawSweeper should re-review automatically; if not, a maintainer can comment @clawsweeper re-review.

Mantis proof suggestion
A native Telegram proof would directly show whether one heartbeat tick produces only the message-tool content and no duplicate fallback bubble. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

telegram desktop proof: verify a heartbeat tick where the agent calls message(action=send) and also emits fallback text produces only one Telegram message, with no duplicate fallback.

Risk before merge
Why this matters: - The current patch treats any successful heartbeat message-tool send as proof that the heartbeat target was already served; if the tool send targeted a different provider, chat, account, or thread, the configured heartbeat fallback text could be dropped.

  • The supplied proof uses an in-memory Telegram sender, so it does not yet show the live Telegram-visible outcome for the duplicate-suppression behavior.

Maintainer options:

  1. Route-scope the delivery marker before merge (recommended)
    Use the existing provider/target/account/thread matching rules before marking heartbeat payloads as message-tool-delivered, then keep the current focused regressions.
  2. Require live Telegram duplicate-suppression proof
    Ask for a redacted Telegram transcript, recording, transport log, or Mantis artifact showing one heartbeat tick sends only the tool message and no fallback duplicate.
  3. Accept broad suppression intentionally
    Maintainers may accept the current broader suppression, but they should own the edge case where a message-tool send to another route suppresses fallback text for the heartbeat target.

Next step before merge
Needs contributor live Telegram proof plus an author/maintainer fix for route-scoped delivery evidence before merge; this is not a safe repair-automation handoff while proof is missing.

Security
Cleared: The diff only adds in-process reply metadata, heartbeat suppression logic, and tests; it does not touch secrets, dependencies, scripts, permissions, or external code execution.

Review findings

  • [P2] Route-scope message-tool delivery evidence — src/auto-reply/reply/agent-runner.ts:2128-2130
Review details

Best possible solution:

Keep the metadata-based suppression, but only set the delivery marker when the message-tool send matches the heartbeat/source route or target details are unavailable under the existing dedupe fallback, then prove the result with live Telegram-visible evidence or an explicit maintainer proof override.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection, but not by a live run in this read-only review. Current main allows a generic message-tool send plus free text to leave heartbeatToolResponse undefined and then deliver the free-text fallback through sendDurableMessageBatch.

Is this the best way to solve the issue?

Mostly yes, but the marker needs route scoping. The maintainable shape is to carry internal delivery evidence from runReplyAgent to runHeartbeatOnce, while preserving the existing provider/target/account/thread matching boundary already used for message-tool dedupe.

Label justifications:

  • P1: The PR addresses an active duplicate Telegram heartbeat delivery regression that affects real channel output.
  • merge-risk: 🚨 message-delivery: The diff changes when heartbeat fallback text is suppressed, so a mistake can either duplicate Telegram messages or drop a fallback message.

Full review comments:

  • [P2] Route-scope message-tool delivery evidence — src/auto-reply/reply/agent-runner.ts:2128-2130
    This marks heartbeat payloads as delivered whenever any generic message-tool send succeeds. Existing message-tool dedupe is route-aware by provider, target, account, and thread; without the same check here, a heartbeat that sends a tool message to another route can cause runHeartbeatOnce to return before sending fallback text to the configured heartbeat target.
    Confidence: 0.78

Overall correctness: patch is incorrect
Overall confidence: 0.78

Acceptance criteria:

  • node scripts/run-vitest.mjs src/infra/heartbeat-runner.tool-response.test.ts src/auto-reply/reply/agent-runner.runreplyagent.e2e.test.ts src/auto-reply/reply/agent-runner-payloads.test.ts src/auto-reply/reply/dispatch-from-config.test.ts
  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo
  • git diff --check
  • Live Telegram transcript/recording, redacted transport log, or Mantis proof for the duplicate-suppression path

What I checked:

  • Current main duplicate path: On current main, runHeartbeatOnce sets heartbeat message-tool-only options, resolves heartbeat_respond separately, and then falls back to normal sendDurableMessageBatch when the reply payload has free text but no heartbeat tool response. (src/infra/heartbeat-runner.ts:1725, a059309a9f9a)
  • Message-tool dedupe is already route-aware: Existing reply-payload dedupe checks provider, target, account, and thread before deciding a message-tool send matches the originating reply route. (src/auto-reply/reply/reply-payloads-dedupe.ts:206, a059309a9f9a)
  • PR globally marks heartbeat payloads after any message-tool delivery: The PR marks all final heartbeat payloads with message-tool delivery evidence whenever any generic message-tool send succeeds, without checking whether the send matched the heartbeat route. (src/auto-reply/reply/agent-runner.ts:2128, e56dbe516e19)
  • PR suppression return path: The PR adds an early runHeartbeatOnce return when message-tool delivery metadata is present, before the fallback send path that would otherwise deliver normalized heartbeat text. (src/infra/heartbeat-runner.ts:1784, e56dbe516e19)
  • Regression coverage added: The PR adds synthetic regressions for message-tool-only and automatic heartbeat suppression, plus a runReplyAgent metadata-marking regression. (src/infra/heartbeat-runner.tool-response.test.ts:234, e56dbe516e19)
  • Telegram proof standard: The matching maintainer note says Telegram behavior PRs that touch transport-visible behavior need real Telegram proof, preferably bot-to-bot QA or an equivalent live Telegram probe. (.agents/maintainer-notes/telegram.md:40, a059309a9f9a)

Likely related people:

  • @steipete: git shortlog --all shows Peter Steinberger as the dominant contributor across the heartbeat, reply-runner, reply-payload, and heartbeat-tool-response files sampled for this review. (role: heavy area contributor; confidence: high; commits: 50a2481652b6; files: src/infra/heartbeat-runner.ts, src/auto-reply/reply/agent-runner.ts, src/auto-reply/reply-payload.ts)
  • Galin Iliev: Current-main blame and git log -S point to recent work carrying the heartbeat message-tool-only path and adjacent runReplyAgent behavior in commit 57ec361. (role: recent area contributor; confidence: medium; commits: 57ec361682e2; files: src/infra/heartbeat-runner.ts, src/auto-reply/reply/agent-runner.ts)
  • Patrick Erichsen: Commit d60ab48 recently touched Telegram progress preview flows, message-tool behavior, heartbeat docs, and channel delivery surfaces adjacent to this PR. (role: adjacent Telegram/message-delivery contributor; confidence: medium; commits: d60ab485114a; files: extensions/telegram/src/bot-message-dispatch.ts, src/agents/tools/message-tool.ts, src/infra/heartbeat-runner.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against a059309a9f9a.

@TurboTheTurtle TurboTheTurtle marked this pull request as ready for review May 19, 2026 19:24
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. mantis: telegram-visible-proof Mantis should capture Telegram visible proof. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. labels May 19, 2026
@TurboTheTurtle
Copy link
Copy Markdown
Contributor Author

Pushed follow-up commit 1eb166348b to address the P1 automatic-mode coverage finding.

What changed:

  • runReplyAgent now marks reply payloads with message-tool delivery evidence whenever a heartbeat run actually had a successful generic message-tool send, even when sourceReplyDeliveryMode is automatic.
  • runHeartbeatOnce now suppresses fallback text whenever that delivery evidence is present, instead of gating suppression on heartbeat tool prompt mode.
  • Added regressions for both sides of the path: automatic heartbeat payloads are marked after generic message-tool delivery, and automatic heartbeat fallback text is not sent when marked delivery evidence is present.

Validation:

  • PATH=/Users/andy/.cache/codex-runtimes/codex-primary-runtime/dependencies/node/bin:$PATH node scripts/run-vitest.mjs src/infra/heartbeat-runner.tool-response.test.ts src/auto-reply/reply/agent-runner.runreplyagent.e2e.test.ts src/auto-reply/reply/agent-runner-payloads.test.ts src/auto-reply/reply/dispatch-from-config.test.ts — 3 files / 176 tests passed
  • PATH=/Users/andy/.cache/codex-runtimes/codex-primary-runtime/dependencies/node/bin:$PATH pnpm exec oxfmt --check --threads=1 src/auto-reply/reply/agent-runner.ts src/infra/heartbeat-runner.ts src/infra/heartbeat-runner.tool-response.test.ts src/auto-reply/reply/agent-runner.runreplyagent.e2e.test.ts
  • git diff --check

Author attribution verified before push:

1eb166348b Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com> fix(heartbeat): mark generic message-tool delivery
38a494761d Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com> Suppress heartbeat fallback after message-tool delivery

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 19, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 19, 2026
@TurboTheTurtle
Copy link
Copy Markdown
Contributor Author

Pushed a follow-up fix for the failed check-test-types job. The failure was only in the new regression test: TypeScript could not narrow the reply payload after the optional-chain assertion, so I added an explicit guard before reading the metadata.

Local validation:

  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo
  • node scripts/run-vitest.mjs src/infra/heartbeat-runner.tool-response.test.ts src/auto-reply/reply/agent-runner.runreplyagent.e2e.test.ts src/auto-reply/reply/agent-runner-payloads.test.ts src/auto-reply/reply/dispatch-from-config.test.ts (176 tests passed)
  • pnpm exec oxfmt --check --threads=1 src/infra/heartbeat-runner.tool-response.test.ts src/auto-reply/reply/agent-runner.runreplyagent.e2e.test.ts src/auto-reply/reply/agent-runner.ts src/infra/heartbeat-runner.ts
  • git diff --check

@clawsweeper re-review

@TurboTheTurtle
Copy link
Copy Markdown
Contributor Author

One remaining current-head failure appears unrelated to this branch: dependency-change-awareness failed with a GitHub App installation API rate limit before it could inspect files. I tried rerunning the failed workflow, but GitHub requires repository admin rights for that rerun. A maintainer/admin rerun should clear that specific check if the rate limit has cooled down.

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 19, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@TurboTheTurtle
Copy link
Copy Markdown
Contributor Author

Closing this out to keep the open PR queue focused. This one is still marked as needing stronger proof and carries message-delivery risk, and I cannot provide the required live/equivalent proof from the contributor side. Happy to reopen or re-spin if maintainers want this direction later.

@TurboTheTurtle TurboTheTurtle deleted the fix/heartbeat-message-tool-fallback-84217 branch May 20, 2026 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mantis: telegram-visible-proof Mantis should capture Telegram visible proof. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: S status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Heartbeat dispatch delivers free text block alongside message-tool call (chatty non-Codex providers, v2026.5.18)

1 participant