feat(slack): Create fallback incident channel on DB failure#176
Merged
Conversation
Add post_message_return_ts as a variant of post_message that returns the message timestamp instead of a boolean, enabling callers to reference the message for threading or pinning. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/zmqQ1m0NJE_-FuqvcuXZ18ZttUQr0XtFVr02YLG2tyw
Add pin_message to pin a message in a Slack channel by timestamp, returning a bool to indicate success. This will be used alongside post_message_return_ts in the fallback channel creation flow. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/8XVdAzyoTtZahrYYEo_XJD2cZgNAOzObwuOpS5oVA4Y
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/9XZ79vcEAJ-h1Azs6eN2Q_tBuEsGT_-oQ0plOdG3Qkg
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/rjrbalkIE0i-YUmHrpHSG7hA6a8SbDA4w3Mlre-w7AY
…handler Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/z4usHPb0NPFSbxmG4_5EpSRblekmiOPuAGoT7yS6jpU
post_message now returns the message timestamp (str | None) instead of bool, making post_message_return_ts redundant. All callers that discarded the return value are unaffected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/HTzewExf7sMcwUKuYVYdLDdeJTjwtlCMFemciQiB6ho
Both the normal DB-backed and fallback (DB-unreachable) incident creation paths now call the same decorate_incident_channel() function for channel setup steps (guide message, DD notebook, Notion doc, IC mention, description, user invites, oncall invites, status channel, feed post). This eliminates ~250 lines of duplicated logic in _create_fallback_channel and prevents drift between the two paths. Primitives-only helpers (page_for_channel, _invite_oncall_to_channel, _create_status_channel_for_context) accept a SlackService via dependency injection so callers can pass their own instance, keeping existing test mocks working without dual-mock complexity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/Pi2pox5vOj8iMXFMdUt1noXDBVhcFwiMUZY98SQGc3E
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/fWpFRQLrfEg5uLpD0-Z-SGzTz5tHLmcotohsKYXl-cI
…efactor The decorate_incident_channel extraction (e184792) moved oncall invite and status channel creation into the shared orchestrator, which calls _invite_oncall_to_channel and _create_status_channel_for_context directly. Tests for on_incident_created were still patching the higher-level wrappers (_invite_oncall_users, _create_status_channel) that are only used by on_severity_changed, so the mocks had no effect. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/Is4vKBCpx9IMlodDre7fniFgvTC_49tk6w6Djl51QW8
…hannel $ Conflicts: $ src/firetower/incidents/hooks.py $ src/firetower/slack_app/handlers/new_incident.py Agent transcript: https://claudescope.sentry.dev/share/wOfmVrIupzZyuS1QphAL0rmTC5f-Kavnrplig2a0QnA
Move Datadog/Notion creation after decorate_incident_channel in the normal path so the guide message appears first in the channel. Narrow the fallback channel trigger to OperationalError only, so non-DB failures (e.g. hook errors after incident is already persisted) send an error DM instead of creating a misleading fallback channel. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/SyhTFocRgYXsz1EyKb08o_I-sJXfpjypi8yKYXWL1ZM
The fallback channel logic only protected serializer.save(), but get_or_create_user_from_slack_id and serializer.is_valid() also hit the DB. If the DB was down, those calls would raise OperationalError before the try block, so the fallback never fired. Extract _create_incident_via_db to group all DB work under one OperationalError handler so the fallback triggers on any DB failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/gKzMkOx7EVyxHv3oPk4TlTJo9JI-lLl3kZL6HFmKgiQ
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/M5ual3bzy5m8nbhKYB0ZfPtFZRjeiszjnSuT9Do-gsk
Cover creating a test Slack app from the manifest, collecting bot and app-level tokens, configuring the feed channel, and running the bot in Socket Mode. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/sQNA-c-iG1pDnBUKoQ_oYDQkecjc2-cKqbCGDd2SQf4
…retower into rgibert/db-fallback-channel
… DD/Notion creation The Notion API rejects empty strings for url properties with a 400 error. In the DB-outage fallback path, incident_url was coerced from None to "" causing silent Notion page creation failures. Also extracts shared Datadog notebook and Notion troubleshooting doc creation logic into _do_create_datadog_notebook and _do_create_troubleshooting_doc helpers, eliminating duplication between the fallback path in decorate_incident_channel and the DB-dedup wrappers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/Y16D4_HD176UY4rSz1vnRvGzmp2bZSvXEwBTc-Bt2-M
…creation Split _do_create_datadog_notebook into API-only creation and a separate _notify_datadog_notebook for Slack bookmark/message posting. The DB-dedup path calls the notification after the transaction commits, restoring the original design that avoids holding the SELECT FOR UPDATE row lock during Slack API round-trips. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/TtWM4yO2EjgDdTzfeXgVUvkZ6z_al9mKUnCUpoVaT0o
…ooting doc Same regression as the Datadog notebook fix: _do_create_troubleshooting_doc bundled Notion API calls with Slack notifications, causing bookmark and message to fire before the ExternalLink URL was committed. If the subsequent transaction failed, Slack would reference a doc that Firetower had no record of. Split into _do_create_troubleshooting_doc (API only) and _notify_troubleshooting_doc (Slack bookmark + message), matching the Datadog notebook pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/SXYWlDsb9OYt_zu_5j71IZ9Yab_Uz9g-Kn_myaoGRf8
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit fb58bf1. Configure here.
PostgreSQL raises InterfaceError (not just OperationalError) when a previously-established connection drops during failover. Catch both so the fallback channel is created in either case. Also escape user-provided title, description, and impact_summary in the fallback metadata message to prevent Slack markup injection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/DxUhIF-6Ihcby1dMJoyuOmOlLEG-ROMhJDQKb76H6rg
spalmurray
reviewed
May 8, 2026
| # TODO: These shouldn't be in frontend/ ? | ||
| env_file: frontend/.env | ||
|
|
||
| slack-bot: |
spalmurray
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

When
serializer.save()fails during incident creation from the Slack modal (DB unreachable, or any other error), we now create a fallback Slack channel namedinc-<uuid[:8]>so teams can coordinate immediately instead of being told to create one manually.The fallback replicates as much of the normal
on_incident_createdflow as possible without DB access:(title, severity, description, tags, captain, etc.) for later backfill
The DB-dependent parts are skipped: no
Incidentrow, noExternalLinkdedup, no Firetower URL. Backfill of orphanedinc-*channels will be handled separately.Also adds
post_message_return_tsandpin_messagemethods toSlackServiceto support pinning the metadata message.Agent transcript: https://claudescope.sentry.dev/share/SZmgtrAzIjJHMrC-H8EqX1AhpwsgNQYghS8s-795rR0
From testing:
