Skip to content

Comments

Handle Slack SDK .original in transient network detection#22896

Open
creditblake wants to merge 1 commit intoopenclaw:mainfrom
creditblake:fix/slack-transient-network-detection
Open

Handle Slack SDK .original in transient network detection#22896
creditblake wants to merge 1 commit intoopenclaw:mainfrom
creditblake:fix/slack-transient-network-detection

Conversation

@creditblake
Copy link

@creditblake creditblake commented Feb 21, 2026

Summary

Fixes the gateway crash loop caused by Slack DNS failures. The @slack/web-api SDK wraps network errors in a custom error shape with .original containing the underlying error, but the existing isTransientNetworkError() only checked .cause. This caused ENOTFOUND errors to be unrecognized and trigger process.exit(1).

Changes

  • Add .original recursion in isTransientNetworkError() to handle Slack SDK error wrapping
  • Add extractTransientNetworkCode() helper to extract inner error codes for logging
  • Add explicit guard against slack_webapi_platform_error to ensure auth failures are never suppressed
  • Update log message to show inner error code: [openclaw] Non-fatal unhandled rejection (continuing) [ENOTFOUND]:
  • Add unit tests for Slack SDK wrapper patterns
  • Add integration test confirming no-exit behavior

Root Cause

When DNS fails, the Slack SDK creates:

{
  code: "slack_webapi_request_error",
  message: "A request error occurred: getaddrinfo ENOTFOUND slack.com",
  original: {
    code: "ENOTFOUND",  // ← The real error is here
    message: "getaddrinfo ENOTFOUND slack.com"
  }
}

The handler checked .cause but not .original, so ENOTFOUND was never found → error classified as unknown → process.exit(1) → launchd respawn → repeat.

Verification

  1. Start gateway with Slack configured
  2. Disable Wi-Fi or break DNS (sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder)
  3. Confirm no restart loop (launchctl print gui/$(id -u)/ai.openclaw.gateway | grep runs)
  4. Confirm CLI/dashboard still work: openclaw gateway probe
  5. Check logs show: grep "Non-fatal unhandled rejection (continuing) \[ENOTFOUND\]" ~/.openclaw/logs/gateway.log
  6. Restore network → Slack reconnects automatically

Related Issues

Notes

This change does NOT widen the existing TRANSIENT_NETWORK_CODES set. It only makes the existing policy apply correctly to Slack's wrapped error shape.

Greptile Summary

Adds .original property recursion to isTransientNetworkError() to handle @slack/web-api SDK error wrapping. The Slack SDK wraps underlying network errors (like ENOTFOUND) in a custom error shape with the real error in .original, but the existing handler only checked .cause. This caused DNS failures to be misclassified and trigger process.exit(1).

  • Adds recursive .original checking in both isTransientNetworkError() and new extractTransientNetworkCode() helper
  • Includes explicit guard against slack_webapi_platform_error to ensure auth failures are never suppressed
  • Updates log messages to show inner error codes (e.g., [ENOTFOUND]) for better debugging
  • Comprehensive test coverage for Slack SDK wrapper patterns across multiple transient error codes
  • Integration test confirms no-exit behavior for wrapped errors

Confidence Score: 5/5

  • Safe to merge - focused bug fix with excellent test coverage and no behavioral changes to existing error handling
  • The implementation is well-designed with proper circular reference guards, comprehensive test coverage across multiple scenarios, and an explicit safety guard for platform errors. The change only affects error detection logic for wrapped errors and doesn't widen the set of suppressed error codes.
  • No files require special attention

Last reviewed commit: 44f3d31

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant