Skip to content

feat(notifications): operational webhook alerts for daemon events#292

Merged
Aaronontheweb merged 4 commits into
devfrom
claude-wt-alert-hooks
Mar 20, 2026
Merged

feat(notifications): operational webhook alerts for daemon events#292
Aaronontheweb merged 4 commits into
devfrom
claude-wt-alert-hooks

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Summary

Closes #291

  • Adds a webhook notification channel that POSTs structured JSON to configured HTTP endpoints when operational events require human attention
  • Supports multiple webhook targets, deduplication within a configurable time window, and retry with exponential backoff
  • Emits alerts from McpClientManager (MCP auth expired, server disconnected), FailoverChatClient (provider failover, both providers unreachable), AlertingChatClientDecorator (single-provider unreachable), and SlackChannel (connection failure at startup)
  • Adds optional webhook URL input to the init wizard's Exposure step
  • Updates netclaw-config.v1.schema.json with the Notifications section
  • Fixes pre-existing flaky Reconcile_disables_zombie_oneshot_reminders test (blocking .Result → async await, increased polling timeout)

Alert types

Type Severity Trigger
mcp.auth.expired warning MCP server needs OAuth re-auth
mcp.server.disconnected warning MCP server connection failed
channel.disconnected warning Slack connection failed at startup
provider.failover warning Primary LLM failed, using fallback
provider.unreachable critical All LLM providers failed
provider.auth.expired Reserved for #288

Example config

{
  "Notifications": {
    "Webhooks": [
      { "Url": "https://hooks.slack.com/services/T.../B.../xxx" },
      { "Url": "https://ntfy.sh/netclaw-alerts", "Name": "ntfy" }
    ],
    "DeduplicationWindowSeconds": 300,
    "MaxRetries": 2,
    "TimeoutSeconds": 10
  }
}

Example webhook payload

{
  "alertId": "a1b2c3d4e5f6",
  "type": "mcp.server.disconnected",
  "severity": "warning",
  "summary": "MCP server 'memorizer' connection failed: Connection refused",
  "timestamp": "2026-03-19T14:30:00.000Z",
  "source": "netclaw",
  "hostname": "pi1",
  "context": { "serverName": "memorizer" }
}

Test plan

  • dotnet build passes
  • All 1,115 tests pass (0 failures)
  • 8 new tests for WebhookNotificationService covering delivery, dedup, multi-target, headers, retry, graceful degradation
  • Existing tests updated for constructor signature changes
  • Flaky reminder test stabilized
  • Manual: configure webhook URL in netclaw.json, start daemon with misconfigured MCP server, verify POST received
  • Manual: verify no webhooks configured → NullNotificationSink used (log-only)

Add a webhook notification channel that POSTs structured JSON to configured
HTTP endpoints when operational events require human attention. Supports
multiple webhook targets, deduplication, and retry with exponential backoff.

Alert types: mcp.auth.expired, mcp.server.disconnected, channel.disconnected,
provider.failover, provider.unreachable, provider.auth.expired (reserved).

Emission points:
- McpClientManager: MCP server connection failures and OAuth re-auth required
- FailoverChatClient: primary→fallback failover and both-providers-down
- AlertingChatClientDecorator: single-provider unreachable (no fallback)
- SlackChannel: Slack connection failure at startup

Init wizard: optional webhook URL input added to the Exposure step.
Config schema: Notifications section with Webhooks array, dedup window,
retry count, and timeout settings.

Also fixes flaky Reconcile_disables_zombie_oneshot_reminders test by
replacing blocking .Result with async await and increasing poll timeout.
@Aaronontheweb Aaronontheweb mentioned this pull request Mar 20, 2026
- Add ReminderExecutionFailed and ReminderAutoDisabled alert types
- Emit warning alert when a reminder execution fails
- Emit critical alert when a reminder is auto-disabled after hitting
  the failure threshold
- Inject IOperationalNotificationSink into ReminderManagerActor
- Add ReconcileCompleted ack to HandleReconcileAsync so callers can
  synchronize on completion instead of polling
- Fix flaky Reconcile_disables_zombie_oneshot_reminders test by
  replacing Tell + AwaitAssertAsync with deterministic Ask pattern
@Aaronontheweb Aaronontheweb enabled auto-merge (squash) March 20, 2026 02:36
@Aaronontheweb Aaronontheweb merged commit 1968b91 into dev Mar 20, 2026
3 checks passed
@Aaronontheweb Aaronontheweb deleted the claude-wt-alert-hooks branch March 20, 2026 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: operational notification webhook for daemon alerts

1 participant