feat(alerting): handle Sentry resource:issue webhooks (Internal Integration surface)#1291
Conversation
…ration surface)
Closes the silent-skip path for Sentry's "Internal Integration" / Custom
Webhook surface. Prod 2026-05-09: a wedged-lock-canary alert fired in the
cascade Sentry project, the alerting agent was enabled, but no agent ran.
Webhook log id `fbdc6d87-b962-444c-8a2a-a9452a74ff71` shows
`processed=false, decisionReason="Event unparseable or not processable"` —
the trigger was never invoked.
Root cause: `src/router/adapters/sentry.ts:31` whitelisted only
`['event_alert', 'metric_alert']` (Sentry Alert Rule surfaces). The webhook
arrived with `Sentry-Hook-Resource: issue` (Internal Integration default
surface — the natural way users wire Sentry → cascade). Spec 019 was
scoped to event_alert; the issue-lifecycle path was deferred and never
landed. Users who configured Sentry the natural way got silent skips for
every issue.
This adds end-to-end support, mirroring the event_alert pattern:
- Router adapter accepts `'issue'` resource + new test asserting it parses.
- New `SentryIssueLifecycleTrigger` (matches `resource: 'issue'` +
`action: 'created'`) fires the alerting agent. Resolved/archived/etc.
actions are deferred (would auto-close the cascade card; out of scope).
- Distinct AlertSource literal `'sentry-issue'` so the
`(project_id, external_source, external_id)` partial-unique index on
`pr_work_items` doesn't collide if the same Sentry issue arrives via
both surfaces (event_alert and issue) — each surface materializes its
own card.
- New `formatSentryIssueLifecycleCardBody` builds AlertHints from
`data.issue.{title, web_url, level, shortId, culprit, metadata.{filename, function}}`.
Mirror of `formatSentryCardBody` adapted to the issue-lifecycle payload shape.
- Worker-side `processSentryWebhook` extends the materializer dispatch with
a third branch keyed on `agentInput.triggerEvent === 'alerting:issue-lifecycle'`.
Same AlertSlotMissingError graceful-skip + transient-PM-error retry semantics
as the existing two branches.
- `SentryIssuePayload` type updated to match the actual Sentry webhook shape
(nested `data.issue.{...}` instead of flattened `data.{...}` — the captured
prod fixture confirmed the existing type was wrong).
Drive-by lint cleanup (per request): refactored `materializeAlertWorkItem`
to extract `reuseOrLazyHealMapping` and `pollForConcurrentWinner` helpers,
bringing the parent function under the cognitive-complexity ceiling.
Verification:
- 9112 unit tests passing (3 new test files: issue-lifecycle-format,
issue-lifecycle handler, plus extensions to sentry-webhook-handler and
the router/adapters/sentry tests). Captured live prod fixture used as
the regression baseline.
- Lint clean (0 errors, 0 warnings).
- Typecheck clean.
Operator notes:
- Cascade project's PM `lists.alerts` (Trello) / `statuses.alerts`
(JIRA, Linear) must be configured for materialisation to actually
create a card. The pre-flight validation rule at
`src/triggers/shared/integration-validation.ts` already emits a
`pm`-category error when alerting is enabled but the slot is unset —
unchanged; same message will fire for the new `'sentry-issue'` source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nhopeatall
left a comment
There was a problem hiding this comment.
Summary
Requesting changes: the new Sentry issue lifecycle handler is registered, but its enablement event is not declared in the alerting agent definition, so normal resource: issue webhooks still skip before enqueueing an agent run.
Code Issues
Blocking
- src/triggers/sentry/alerting-issue-lifecycle.ts:53 — The handler gates on
alerting:issue-lifecycle, butsrc/agents/definitions/alerting.yamlonly declaresalerting:issue-alertandalerting:metric-alert.checkTriggerEnabledWithParamscallsgetResolvedTriggerConfig, which returnsnullwhen the event is absent from the YAML definition before considering any DB override. The runtime path is: Sentry sendsresource: issue/action: created-> this handler matches -> line 53 resolves an unknown trigger event ->enabled=false-> handler returnsnull-> router logs no trigger matched and no job is queued. That leaves the prod incident path effectively unfixed unless the event is added to the alerting definition/catalog (with the expected context pipeline/default) or this handler intentionally reuses the existingalerting:issue-alertevent.
🕵️ codex · gpt-5.5 · run details
| const triggerConfig = await checkTriggerEnabledWithParams( | ||
| ctx.project.id, | ||
| 'alerting', | ||
| 'alerting:issue-lifecycle', |
There was a problem hiding this comment.
This event is not declared in src/agents/definitions/alerting.yaml (nor the trigger event catalog). Because checkTriggerEnabledWithParams goes through getResolvedTriggerConfig, an unknown event returns null and this handler always sees enabled=false, even if the alerting agent itself is enabled. A resource: issue / created webhook will still return null here and no worker job will be queued unless alerting:issue-lifecycle is added to the alerting definition/catalog (or this intentionally gates on an existing alerting event).
There was a problem hiding this comment.
Fixed. Added as a declared trigger in src/agents/definitions/alerting.yaml with defaultEnabled: true, providers: [sentry], and contextPipeline: [alertingIssue, directoryListing, contextFiles] (matching the existing alerting:issue-alert entry).
Without this entry, getResolvedTriggerConfig couldn't find the event in the YAML definition and returned null, so checkTriggerEnabledWithParams always saw enabled=false — leaving resource: issue webhooks silently skipped even after the router fix.
Also added TRIGGER_EVENTS.ALERTING.ISSUE_LIFECYCLE to the event catalog (events.ts) and updated the handler + webhook-handler.ts to reference the constant instead of raw strings, as required by the trigger-event-string consistency static guard for new handlers.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…inition Adds `alerting:issue-lifecycle` to `src/agents/definitions/alerting.yaml` so `getResolvedTriggerConfig` can find the event. Without this entry the lookup returned `null`, causing `checkTriggerEnabledWithParams` to always see `enabled=false` — leaving the prod incident path effectively unfixed even though the router now accepts `resource: issue` webhooks. Also adds `TRIGGER_EVENTS.ALERTING.ISSUE_LIFECYCLE` to the events catalog and updates the handler + webhook-handler to use the constant (required by the trigger-event-string consistency static guard for new handlers). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nhopeatall
left a comment
There was a problem hiding this comment.
LGTM - verified the new Sentry issue lifecycle path is registered in the trigger catalog/agent definition, routes resource=issue through the Sentry adapter, materializes via the distinct sentry-issue source, and preserves the existing event_alert and metric_alert branches. Focused unit tests passed locally: 5 files, 72 tests.
🕵️ codex · gpt-5.5 · run details
Summary
Closes the silent-skip path for Sentry's "Internal Integration" / Custom Webhook surface. Prod 2026-05-09: a wedged-lock-canary alert fired in the cascade Sentry project, the alerting agent was enabled, but no agent ran. Webhook log id
fbdc6d87-b962-444c-8a2a-a9452a74ff71showsprocessed=false, decisionReason="Event unparseable or not processable"— the trigger was never invoked.Root cause:
src/router/adapters/sentry.ts:31whitelisted only['event_alert', 'metric_alert'](Sentry Alert Rule surfaces). The webhook arrived withSentry-Hook-Resource: issue(Internal Integration default surface — the natural way users wire Sentry → cascade). Spec 019 was scoped toevent_alert; the issue-lifecycle path was deferred and never landed. Users who configured Sentry the natural way got silent skips for every issue.Changes
This is Option B end-to-end — extend cascade to natively process
resource: issuewebhooks, mirroring the existingevent_alertshape.Code
src/router/adapters/sentry.ts:31PROCESSABLE_RESOURCES+='issue'src/sentry/types.tsSentryIssuePayloadto match actual webhook shape (nesteddata.issue.{...}instead of flatteneddata.{...}— the captured prod fixture confirmed the existing type was wrong)src/integrations/alerting/_shared/types.tsAlertSource+='sentry-issue'(distinct literal so the partial-unique index onpr_work_itemsdoesn't collide if the same Sentry issue arrives via both surfaces)src/integrations/alerting/_shared/format.tsformatSentryIssueLifecycleCardBody— builds AlertHints fromdata.issue.{title, web_url, level, shortId, culprit, metadata.{filename, function}}src/triggers/sentry/alerting-issue-lifecycle.tsSentryIssueLifecycleTrigger— matchesresource: 'issue'+action: 'created', fires the alerting agent. Resolved/archived/etc. lifecycle actions are deferred (would auto-close the cascade card; out of scope)src/triggers/sentry/register.tssrc/triggers/sentry/webhook-handler.tsmaterializeSentryAlertWorkItemhelper with three branches (event_alert / metric_alert / issue-lifecycle) keyed onagentInput.triggerEvent. Same AlertSlotMissingError graceful-skip + transient-PM-error retry semantics as beforeTests
tests/unit/triggers/sentry/issue-lifecycle-format.test.ts(7 tests) — pinsformatSentryIssueLifecycleCardBodyagainst the captured prod fixturetests/unit/triggers/sentry/issue-lifecycle.test.ts(18 tests) — pins handlermatches()(issue-resource only, action=created only, distinct from event_alert) andhandle()(alertIssueId, alertOrgId, alertTitle, lockKey/coalesceKey namespacing, deferred materialisation, slot-missing pre-flight)tests/unit/triggers/sentry-webhook-handler.test.ts(+5 tests) — pins materializer-dispatch picksformatSentryIssueLifecycleCardBody+'sentry-issue'AlertSource fortriggerEvent: 'alerting:issue-lifecycle'; existing event_alert + metric_alert paths unchangedtests/unit/router/adapters/sentry.test.ts— flips the previously-asserting-rejection cases to assert acceptance forresource: 'issue'tests/unit/triggers/builtins.test.ts— bumps registered-handler count 24 → 25Drive-by
Refactored
materializeAlertWorkItem(per discussion in review): extractedreuseOrLazyHealMappingandpollForConcurrentWinnerhelpers, bringing the parent under the cognitive-complexity ceiling. No behavioural change — same idempotency contract (existing tests attests/unit/triggers/sentry/alerting-issue-materializer.test.tscontinue to pass unchanged).Distinctness from
event_alertBoth surfaces can deliver for the same Sentry issue ID. The new
'sentry-issue'literal isolates the materializer's dedup namespace:Two surfaces, same Sentry issue ID → two cards materialize (one per surface). That's the safe default; collapsing across surfaces would need a separate decision.
lockKey/coalesceKeyalso use asentry-issue:namespace distinct from the existingsentry:(event_alert) so concurrent deliveries via both surfaces don't lock-contend.Operator pre-req (no code change)
Cascade project's PM
lists.alerts(Trello) /statuses.alerts(JIRA, Linear) must be configured for materialisation to actually create a card. The pre-flight validation rule atsrc/triggers/shared/integration-validation.tsalready emits apm-category error when alerting is enabled but the slot is unset — unchanged; the same message will fire for the new'sentry-issue'source. Configure via the dashboard's PM wizard "Status Mapping" → "Alerts" row.Test plan
npm test— 9112 / 9112 passing (3 new test files + extensions)npm run lint— clean (0 errors, 0 warnings; the priormaterialize.tscomplexity warning is gone after the drive-by refactor)npm run typecheck— cleanfbdc6d87-b962-444c-8a2a-a9452a74ff71) used as regression baseline in the new testsevent_alertandmetric_alertflows pinned unchanged (regression net in extended sentry-webhook-handler tests)Out of scope (deferred)
resource: issue—'resolved','unresolved','archived','assigned'. First cut handles'created'only. Auto-closing the cascade work item on Sentry resolution is spec-019 §7's deferred concern (likely a future spec 020).lists.alertsslot (separate ergonomics ticket).🤖 Generated with Claude Code