Skip to content

Improve Slack startup resilience#19

Open
coe0718 wants to merge 3 commits intoghostwright:mainfrom
coe0718:slack-startup-resilience
Open

Improve Slack startup resilience#19
coe0718 wants to merge 3 commits intoghostwright:mainfrom
coe0718:slack-startup-resilience

Conversation

@coe0718
Copy link
Copy Markdown
Contributor

@coe0718 coe0718 commented Mar 31, 2026

Summary

This improves the Slack half of #16 by making startup more resilient when Socket Mode does not finish connecting.

  • add channel connection timeouts so a hung Slack connect does not block startup forever
  • expose richer channel diagnostics in /health via channel_details
  • track Slack connection errors/state for troubleshooting
  • keep onboarding logging explicit when Slack is still using Web API without an active Socket Mode connection
  • add focused router/slack tests and troubleshooting docs

Root Cause

Phantom can register the Slack channel and start the HTTP server before Socket Mode finishes connecting. If the Slack connect hangs, router.connectAll() can wait forever, leaving Slack registered but never connected and giving the operator very little signal about what state the channel is actually in.

Impact

Operators can now tell the difference between Slack being disconnected, still connecting, or failing with an error, and a stuck Slack connection will no longer stall the rest of startup.

Validation

  • bun test src/channels/__tests__/router.test.ts src/channels/__tests__/slack.test.ts
  • bun run lint
  • bun run typecheck

@coe0718 coe0718 marked this pull request as ready for review March 31, 2026 22:18
imonlinux added a commit to imonlinux/phantom that referenced this pull request Apr 26, 2026
…k adapter

Implements full test suite for nextcloud.ts addressing all critical areas
identified in the nextcloud-talk-review document. 943 lines of tests
covering security, functionality, and edge cases.

Test coverage by category:

1. Signature verification (Fix ghostwright#1, ghostwright#18) - Security Critical
   - Valid HMAC signature acceptance
   - Invalid HMAC signature rejection
   - Replay attack protection via nonce cache
   - Nonce cache size limits (1000 entries, FIFO eviction)
   - Nonce expiration and periodic pruning (5-minute TTL)
   - Asymmetric signing (inbound: random+body, outbound: random+content)

2. Request size limits (Fix ghostwright#2) - Security Critical
   - Content-Length validation before buffering
   - Double-check after reading (missing Content-Length)
   - 64 KB limit enforcement (Nextcloud caps at 32k chars)

3. JSON unwrapping (Fix ghostwright#7) - Functionality Critical
   - ActivityStreams Note objects unwrap correctly
   - Plain text passes through unchanged
   - Literal JSON-like text not corrupted (only Note type unwraps)
   - Invalid JSON fallback to plain text

4. parseConversationId (Fix ghostwright#5) - Correctness Critical
   - Valid conversationId format parsing
   - Missing prefix returns null
   - Tokens containing colons handled correctly (indexOf+slice)
   - Thread-scoped ID to room token extraction

5. Bot loop guard (Fix ghostwright#12) - Multi-Bot Safety
   - Application actor filtering (actorType === "Application")
   - Self-filtering (actorId === config.botId)
   - Person messages processed normally
   - Multi-bot room scenarios

6. Retry and backoff (Fix ghostwright#16) - Resilience
   - 429 rate limiting with Retry-After header
   - 5xx server errors with exponential backoff + jitter
   - Network error retry logic
   - Non-retryable 4xx handling

7. Reaction error handling (Fix ghostwright#9)
   - 404 on remove treated as success
   - 409 on add treated as success
   - 5xx retry for reaction operations

8. URL validation and encoding (Fix ghostwright#17)
   - talkServer scheme removal (http://, https://)
   - Trailing slash removal
   - URL-encoding of roomToken and messageId

9. Target validation (Fix ghostwright#6)
   - Missing target.id rejection (no silent fallback)

10. Emoji normalization (Fix ghostwright#8)
    - Variation selector removal (U+26A0 vs U+26A0 U+FE0F)

11. Unique message IDs (Fix ghostwright#4)
    - crypto.randomUUID() vs Date.now()
    - Uniqueness across concurrent calls

12. Config normalization (Fix ghostwright#13, ghostwright#14)
    - webhookPath default in constructor
    - Configurable port
    - Session window configuration

13. Health check (Fix ghostwright#15)
    - Path precedence (webhook before health)

14. Message ID extraction
    - Numeric and string ID handling
    - Missing ID handling

15. Time-window session coalescing
    - Recent session continuation
    - New session creation
    - Parent message ID handling

16. Capabilities declaration (Fix ghostwright#21)
    - reactions: true declared

All tests use bun:test with mocked dependencies and follow existing
patterns from webhook.test.ts, slack.test.ts, and email.test.ts.

Related: nextcloud-talk-review.md Issue ghostwright#19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant