Skip to content

fix(discord): suppress intentional reconnect exception during health monitor restart#55042

Closed
Bartok9 wants to merge 2 commits into
openclaw:mainfrom
Bartok9:fix/55026-discord-health-monitor-crash
Closed

fix(discord): suppress intentional reconnect exception during health monitor restart#55042
Bartok9 wants to merge 2 commits into
openclaw:mainfrom
Bartok9:fix/55026-discord-health-monitor-crash

Conversation

@Bartok9
Copy link
Copy Markdown
Contributor

@Bartok9 Bartok9 commented Mar 26, 2026

Summary

When the Discord health monitor triggers a restart (e.g., due to stale-socket detection), the onAbort handler sets maxAttempts to 0 before calling disconnect(). The gateway library interprets this as a reconnection failure and throws "Max reconnect attempts (0) reached" — causing an uncaught exception that crashes the entire gateway process.

Root Cause

The gateway library's disconnect flow treats maxAttempts: 0 as a signal that reconnection exhausted its attempts, throwing an error. However, in the health monitor's abort flow, maxAttempts: 0 is intentionally set to prevent reconnection during a clean shutdown — this exception should not bubble up.

Fix

Wrap the gateway.disconnect() call in a try-catch that:

  1. Suppresses the expected "Max reconnect attempts" error (this is an intentional abort, not a connection failure)
  2. Rethrows any other unexpected errors (preserves visibility into real issues)

Testing

Added two test cases:

  • suppresses 'Max reconnect attempts' exception during intentional abort — verifies the fix
  • rethrows non-reconnect exceptions during abort — ensures we don't suppress legitimate errors

All 17 tests in provider.lifecycle.test.ts pass.

Impact

  • Before: Gateway crashes every ~30 minutes when health monitor triggers restart
  • After: Health monitor gracefully restarts Discord connection without crashing

Fixes #55026

…monitor restart

When the Discord health monitor triggers a restart (e.g., due to stale-socket
detection), the onAbort handler sets maxAttempts to 0 before calling disconnect().
The gateway library interprets this as a reconnection failure and throws
'Max reconnect attempts (0) reached' — causing an uncaught exception that crashes
the entire gateway process.

This fix wraps the disconnect() call in a try-catch that specifically suppresses
the expected 'Max reconnect attempts' error during intentional aborts, while still
rethrowing any other unexpected errors.

Fixes openclaw#55026
@openclaw-barnacle openclaw-barnacle Bot added channel: discord Channel integration: discord size: S labels Mar 26, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 26, 2026

Greptile Summary

This PR fixes a recurring gateway process crash (~every 30 minutes) caused by the health monitor's onAbort handler setting maxAttempts: 0 before calling gateway.disconnect(). The gateway library interprets that combination as exhausted reconnection attempts and throws "Max reconnect attempts (0) reached", which previously bubbled up as an uncaught exception.\n\nThe fix wraps gateway.disconnect() in a targeted try-catch inside onAbort that suppresses only the expected "Max reconnect attempts" error and rethrows everything else, preserving visibility into genuine failures. Two new test cases validate both branches.\n\nKey changes:\n- provider.lifecycle.ts — adds try-catch in onAbort around gateway.disconnect() with selective suppression based on error message content.\n- provider.lifecycle.test.ts — two new tests: one confirming the known error is suppressed, one confirming unexpected errors still propagate.\n\nThe fix is minimal, well-scoped, and correctly targeted at the root cause. One minor style concern: the error match uses String(err) rather than err.message, and the sentinel string is inlined — see the inline comment.

Confidence Score: 5/5

Safe to merge — the fix correctly suppresses an intentional exception without masking real errors, and both new test cases pass.

The change is small, targeted, and well-tested. The root cause (gateway library throwing on maxAttempts: 0) is clearly documented and the suppression logic is tight. The only open item is a P2 style suggestion about using err.message vs String(err), which does not affect correctness.

No files require special attention.

Important Files Changed

Filename Overview
extensions/discord/src/monitor/provider.lifecycle.ts Wraps gateway.disconnect() in onAbort with a targeted try-catch that suppresses the expected 'Max reconnect attempts (0) reached' exception while rethrowing anything else. The logic is correct and well-scoped.
extensions/discord/src/monitor/provider.lifecycle.test.ts Adds two well-structured tests covering both the suppression path and the rethrow path. Uses the pre-aborted signal pattern to exercise onAbort() synchronously, which is appropriate for these scenarios.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/discord/src/monitor/provider.lifecycle.ts
Line: 139-142

Comment:
**Fragile string-match for error detection**

The suppression logic relies on a partial string match against the serialised error. Two minor concerns:

1. `String(err)` on an `Error` instance yields `"Error: Max reconnect attempts …"`, so the match works today — but using `err instanceof Error ? err.message : String(err)` makes the intent clearer and avoids accidentally matching a hypothetical `Error: Something wrapping 'Max reconnect attempts'` object.

2. If the upstream gateway library ever changes its error message wording, the condition silently stops suppressing, and the crash resurfaces with no warning. Centralising the sentinel string in a named constant (e.g. `const MAX_RECONNECT_ERROR_PREFIX = "Max reconnect attempts"`) makes future auditing easier.

```suggestion
      const message = err instanceof Error ? err.message : String(err);
      if (!message.includes("Max reconnect attempts")) {
        throw err;
      }
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(discord): suppress intentional recon..." | Re-trigger Greptile

Comment on lines +139 to +142
const message = String(err);
if (!message.includes("Max reconnect attempts")) {
throw err;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Fragile string-match for error detection

The suppression logic relies on a partial string match against the serialised error. Two minor concerns:

  1. String(err) on an Error instance yields "Error: Max reconnect attempts …", so the match works today — but using err instanceof Error ? err.message : String(err) makes the intent clearer and avoids accidentally matching a hypothetical Error: Something wrapping 'Max reconnect attempts' object.

  2. If the upstream gateway library ever changes its error message wording, the condition silently stops suppressing, and the crash resurfaces with no warning. Centralising the sentinel string in a named constant (e.g. const MAX_RECONNECT_ERROR_PREFIX = "Max reconnect attempts") makes future auditing easier.

Suggested change
const message = String(err);
if (!message.includes("Max reconnect attempts")) {
throw err;
}
const message = err instanceof Error ? err.message : String(err);
if (!message.includes("Max reconnect attempts")) {
throw err;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/discord/src/monitor/provider.lifecycle.ts
Line: 139-142

Comment:
**Fragile string-match for error detection**

The suppression logic relies on a partial string match against the serialised error. Two minor concerns:

1. `String(err)` on an `Error` instance yields `"Error: Max reconnect attempts …"`, so the match works today — but using `err instanceof Error ? err.message : String(err)` makes the intent clearer and avoids accidentally matching a hypothetical `Error: Something wrapping 'Max reconnect attempts'` object.

2. If the upstream gateway library ever changes its error message wording, the condition silently stops suppressing, and the crash resurfaces with no warning. Centralising the sentinel string in a named constant (e.g. `const MAX_RECONNECT_ERROR_PREFIX = "Max reconnect attempts"`) makes future auditing easier.

```suggestion
      const message = err instanceof Error ? err.message : String(err);
      if (!message.includes("Max reconnect attempts")) {
        throw err;
      }
```

How can I resolve this? If you propose a fix, please make it concise.

Addresses review feedback: uses err.message when available instead of
String(err) to avoid the 'Error: ' prefix in the serialized form.
The matching behavior is unchanged since we use .includes().
@Bartok9
Copy link
Copy Markdown
Contributor Author

Bartok9 commented Mar 27, 2026

Addressed the style feedback — now using err instanceof Error ? err.message : String(err) for cleaner message extraction. The matching behavior is unchanged since we use .includes(), but the explicit type guard makes the intent clearer.

@ryancush
Copy link
Copy Markdown

ryancush commented Apr 7, 2026

This fix is needed — the stale-socket health monitor crash is hitting production users on 2026.3.24. The fix looks correct (try-catch around disconnect() to suppress the expected Max reconnect attempts error on intentional aborts). Happy to help test. Is there anything blocking merge?

@steipete
Copy link
Copy Markdown
Contributor

Closing this as implemented after Codex review.

Current main already covers this Discord reconnect-exhaustion crash with a broader lifecycle/supervisor rewrite, so the open PR is superseded.

What I checked:

So I’m closing this as already implemented rather than keeping a duplicate issue open.

Review notes: reviewed against 34896839ba22; fix evidence: commit 6a61cb73c562.

@steipete steipete closed this Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: discord Channel integration: discord size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Discord health monitor triggers uncaught exception → gateway crash (2026.3.24)

4 participants