Skip to content

fix: schedule discord heartbeat checks after sends#78087

Merged
steipete merged 2 commits intoopenclaw:mainfrom
bryce-d-greybeard:bryce/issue-77668-discord-heartbeat-readiness
May 6, 2026
Merged

fix: schedule discord heartbeat checks after sends#78087
steipete merged 2 commits intoopenclaw:mainfrom
bryce-d-greybeard:bryce/issue-77668-discord-heartbeat-readiness

Conversation

@bryce-d-greybeard
Copy link
Copy Markdown
Contributor

@bryce-d-greybeard bryce-d-greybeard commented May 5, 2026

Summary

Fixes the Discord gateway heartbeat scheduler so ACK timeout checks are measured from the actual heartbeat send time, not from the HELLO-time fixed interval.

The previous scheduler randomized the first heartbeat but started a fixed interval immediately. If the first heartbeat fired late in that interval — or the event loop was delayed — the next interval tick could check lastHeartbeatAck too soon after the send and trigger a false Gateway heartbeat ACK timeout/reconnect cycle while the Discord channel was still awaiting readiness.

Changes

  • Replace the fixed heartbeat interval with a chained timeout cycle.
  • Keep randomized first heartbeat behavior.
  • After each heartbeat send, schedule the ACK check one full heartbeat interval later.
  • If ACKed, send the next heartbeat and schedule the next check from that send.
  • Keep reconnect cleanup clearing both first-heartbeat and heartbeat-cycle timers.
  • Add dedicated regression coverage for late randomized first heartbeat, genuine ACK timeout, ACKed heartbeat cycles, and timer cleanup.
  • Add a changelog entry crediting both contributors.

Real Behavior Proof

Behavior or issue addressed: Discord gateway heartbeat ACK timeout race causing false reconnect loops and intermittent awaiting gateway readiness hangs (#77668).

Real environment tested: OpenClaw 2026.5.5 from commit 43dcdcd9 on WSL2 Ubuntu 22.04, Node.js v22.22.0, Discord bot runtime.

Exact steps or command run after this patch:

  1. pnpm build in the OpenClaw source checkout.
  2. Patched the installed @openclaw/discord runtime bundle with the recursive timeout heartbeat logic.
  3. systemctl --user restart openclaw-gateway.
  4. Monitored journalctl --user -u openclaw-gateway -f for 2+ minutes.

Evidence after fix: Runtime log excerpt from the real Discord gateway run:

19:32:32 [discord] client initialized as 1483184501986164748; awaiting gateway readiness
19:32:32 WebSocket OPEN event fired
19:32:33 message: op=0 t=READY
19:32:34 message: op=11 t=null
19:33:15 message: op=11 t=null

Observed result after fix: Zero Gateway heartbeat ACK timeout entries; Discord reached READY and stayed connected with repeated heartbeat ACKs instead of reconnecting.

What was not tested: Long-running 24h+ stability, multi-guild load, and Windows native runtime were not covered by this after-fix run. The original #77668 reporter offered to rerun 5 macOS launchd restart cycles once this lands.

Testing

  • pnpm exec oxfmt --write --threads=1 CHANGELOG.md extensions/discord/src/internal/gateway-lifecycle.ts extensions/discord/src/internal/gateway-lifecycle.test.ts extensions/discord/src/internal/gateway.test.ts — passed.
  • pnpm test extensions/discord/src/internal/gateway-lifecycle.test.ts extensions/discord/src/internal/gateway.test.ts — passed, 2 files / 24 tests.
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/discord/src/internal/gateway-lifecycle.ts extensions/discord/src/internal/gateway-lifecycle.test.ts extensions/discord/src/internal/gateway.test.ts — passed.
  • git diff --check — passed.

Fixes #77668
Supersedes #77956

@openclaw-barnacle openclaw-barnacle Bot added channel: discord Channel integration: discord triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. size: S labels May 5, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 5, 2026

ClawSweeper status: review started.

I am starting a fresh review of this pull request: fix: schedule discord heartbeat checks after sends This is item 1/1 in the current shard. Shard 0/1.

This placeholder means the worker is alive and reading the current context. I will edit this same comment with the actual review when the claws are done clicking.

Crustacean status: shell secured, claws on keyboard, evidence pebbles being sorted.

Copy link
Copy Markdown

@byungskers byungskers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the heartbeat timing! Switching from "setInterval" to "setTimeout" makes the ACK check accurately measure one full interval after the actual heartbeat send, which aligns better with Discord Gateway expectations. The fake-timer tests clearly demonstrate the fix.

@RyanSandoval
Copy link
Copy Markdown

I'm the original reporter of #77668 — happy to provide real-behavior proof to help unblock the triage: needs-real-behavior-proof label.

Deterministic repro signature on this host

Configuration that rules out other contributing factors

Offer

If this PR lands, I will:

  1. Update OpenClaw on this affected host
  2. Run launchctl bootout + bootstrap 5 times in a row
  3. Report success/failure rate as a follow-up comment here within an hour

That should give you the empirical data needed to clear the needs-real-behavior-proof label.

Thanks @bryce-d-greybeard for digging into this — same heartbeat-lifecycle race that @wena369 / @mfbergmann / @holgergruenhagen have been narrowing down on the issue thread.

@steipete steipete force-pushed the bryce/issue-77668-discord-heartbeat-readiness branch from 7de7eb7 to bf239b8 Compare May 6, 2026 03:44
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 6, 2026
@steipete steipete merged commit b5c33bc into openclaw:main May 6, 2026
18 of 23 checks passed
@steipete
Copy link
Copy Markdown
Contributor

steipete commented May 6, 2026

Landed via squash onto main.

  • PR head: bf239b8
  • Merge commit: b5c33bc
  • Verification:
    • pnpm test extensions/discord/src/internal/gateway-lifecycle.test.ts extensions/discord/src/internal/gateway.test.ts
    • pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/discord/src/internal/gateway-lifecycle.ts extensions/discord/src/internal/gateway-lifecycle.test.ts extensions/discord/src/internal/gateway.test.ts
    • git diff --check
    • Real behavior proof check passed on the PR head after the body update.

Thanks @bryce-d-greybeard and @NikolaFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: discord Channel integration: discord proof: supplied External PR includes structured after-fix real behavior proof. size: S

Projects

None yet

4 participants