Skip to content

feat: implement escalated notifications for monitors#199

Open
realnoob007 wants to merge 1 commit intoramfam101:developfrom
realnoob007:feat/escalated-notifications
Open

feat: implement escalated notifications for monitors#199
realnoob007 wants to merge 1 commit intoramfam101:developfrom
realnoob007:feat/escalated-notifications

Conversation

@realnoob007
Copy link
Copy Markdown

Describe your changes

Add escalation rules that allow users to define time-based alert escalation when incidents persist. Users can configure multiple escalation rules per monitor, each with a wait time and notification channels to alert after the specified duration of downtime.

Frontend:

  • Add escalation rules UI section to monitor create/edit page
  • Add EscalationRule type, form validation, and i18n translations
  • Support dynamic add/remove of escalation rules with time and notification channel selection

Backend:

  • Add EscalationRule type and escalationRules field to Monitor schema
  • Add escalation checking in heartbeat job on every tick when monitor is down
  • Add escalation message builder for escalation-specific notifications
  • Track sent escalations per incident to prevent duplicate alerts
  • Clear escalation tracking on incident resolution
  • Map escalationRules in repository toEntity methods
  • Support comma-separated CLIENT_HOST for multiple CORS origins

Write your issue number after "Fixes "

Fixes #123

Please ensure all items are checked off before requesting a review. "Checked off" means you need to add an "x" character between brackets so they turn into checkmarks.

  • (Do not skip this or your PR will be closed) I deployed the application locally.
  • (Do not skip this or your PR will be closed) I have performed a self-review and testing of my code.
  • I have included the issue # in the PR.
  • I have added i18n support to visible strings (instead of <div>Add</div>, use):
const { t } = useTranslation();
<div>{t('add')}</div>
  • I have not included any files that are not related to my pull request, including package-lock and package-json if dependencies have not changed
  • I didn't use any hardcoded values (otherwise it will not scale, and will make it difficult to maintain consistency across the application).
  • I made sure font sizes, color choices etc are all referenced from the theme. I don't have any hardcoded dimensions.
  • My PR is granular and targeted to one specific feature.
  • I ran npm run format in server and client directories, which automatically formats your code.
  • I took a screenshot or a video and attached to this PR if there is a UI change.
image

Add escalation rules that allow users to define time-based alert
escalation when incidents persist. Users can configure multiple
escalation rules per monitor, each with a wait time and notification
channels to alert after the specified duration of downtime.

Frontend:
- Add escalation rules UI section to monitor create/edit page
- Add EscalationRule type, form validation, and i18n translations
- Support dynamic add/remove of escalation rules with time and
  notification channel selection

Backend:
- Add EscalationRule type and escalationRules field to Monitor schema
- Add escalation checking in heartbeat job on every tick when monitor
  is down
- Add escalation message builder for escalation-specific notifications
- Track sent escalations per incident to prevent duplicate alerts
- Clear escalation tracking on incident resolution
- Map escalationRules in repository toEntity methods
- Support comma-separated CLIENT_HOST for multiple CORS origins
Copilot AI review requested due to automatic review settings April 9, 2026 01:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements time-based escalation notifications for monitors so additional notification channels can be alerted if an incident remains unresolved past configured thresholds.

Changes:

  • Adds escalationRules to monitor types/schemas and monitor create/edit validation.
  • Triggers escalation checks on each heartbeat tick while a monitor is down/breached and clears escalation tracking on recovery.
  • Adds escalation-specific notification message building and enables comma-separated CORS origins via CLIENT_HOST.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
server/src/validation/monitorValidation.ts Validates escalationRules payload for create/edit monitor requests.
server/src/validation/envValidation.ts Relaxes CLIENT_HOST validation to support non-URL / comma-separated values.
server/src/types/notificationMessage.ts Adds new notification type "escalation".
server/src/types/monitor.ts Introduces EscalationRule and Monitor.escalationRules.
server/src/service/infrastructure/SuperSimpleQueue/SuperSimpleQueueHelper.ts Runs escalation checks during heartbeat processing and clears tracking on resolution.
server/src/service/infrastructure/notificationsService.ts Implements escalation sending and deduplication tracking.
server/src/service/infrastructure/notificationMessageBuilder.ts Builds escalation-specific notification messages.
server/src/repositories/monitors/MongoMonitorsRepository.ts Maps escalationRules from DB documents into monitor entities.
server/src/db/models/Monitor.ts Adds escalationRules to the Mongoose monitor schema.
server/src/app.ts Splits CLIENT_HOST into multiple CORS origins when comma-separated.
client/src/Validation/monitor.ts Adds client-side validation for escalationRules.
client/src/Types/Monitor.ts Adds EscalationRule and Monitor.escalationRules on the client.
client/src/Pages/CreateMonitor/index.tsx Adds escalation rules UI with add/remove and notification channel selection.
client/src/locales/en.json Adds i18n strings for the escalation rules UI.
client/src/Hooks/useMonitorForm.ts Adds escalation rules defaults for the monitor form.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +444 to +450
const activeIncident = await this.incidentsRepository.findActiveByMonitorId(monitor.id, monitor.teamId);
if (!activeIncident) {
return;
}

const incidentStartTime = new Date(parseInt(activeIncident.startTime));
await this.notificationsService.handleEscalations(monitor, incidentStartTime);
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activeIncident.startTime is an ISO string (see incidents repository toDateString), so parseInt(activeIncident.startTime) will parse only the year (e.g. 2026) and produce an incorrect incidentStartTime. This will make incidentDuration huge and escalate immediately. Parse it as a date string instead (e.g. new Date(activeIncident.startTime)) and handle invalid dates explicitly.

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +176
// Check if enough time has elapsed
if (incidentDuration >= rule.waitTime) {
this.escalationsSent.add(ruleKey);

const notifications = await this.notificationsRepository.findNotificationsByIds(rule.notificationIds);
if (notifications.length === 0) {
continue;
}

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleEscalations adds ruleKey to escalationsSent before verifying there are any matching notifications and before knowing whether any send succeeded. If notificationIds is empty/invalid or all sends fail, the rule will be permanently suppressed for this incident. Only mark the rule as sent after a successful send (or at least after a non-empty notification set and an attempted send) depending on desired retry semantics.

Copilot uses AI. Check for mistakes.
Comment on lines +160 to +166
for (const rule of escalationRules) {
const ruleKey = `${incidentKey}:${rule.waitTime}`;

// Skip if already escalated for this rule in this incident
if (this.escalationsSent.has(ruleKey)) {
continue;
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escalation deduplication key is based only on incidentStartTime + waitTime. If a monitor has two escalation rules with the same waitTime but different notificationIds, only one will ever send because they collide on the same ruleKey. Either enforce unique waitTime in validation/UI or include additional rule identity (e.g. index or a stable hash of notificationIds) in the key.

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +31
// Tracks which escalation rules have already been sent, keyed by "monitorId:incidentStartTime:waitTime"
private escalationsSent: Set<string> = new Set();

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escalation tracking is stored only in an in-memory Set. In a multi-process / multi-instance deployment or after a restart, escalations can be re-sent for the same incident because the set is empty again. If the intention is to prevent duplicates per incident across restarts/replicas, persist this state (e.g. on the Incident record) or derive it from stored notification history.

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +31
const allowedOrigin = envSettings.clientHost.includes(",")
? envSettings.clientHost.split(",").map((s) => s.trim())
: envSettings.clientHost;
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLIENT_HOST is now allowed to be comma-separated and is split for CORS, but the same clientHost is also used to build links in emails/notifications (e.g. password reset, invite links, incident URLs). If CLIENT_HOST contains commas, those links will be invalid. Consider introducing a separate env var for CORS origins (or parse and pick a single canonical base URL for link building).

Copilot uses AI. Check for mistakes.
Comment on lines +203 to +209
const details = [
`URL: ${monitor.url}`,
`Status: Down`,
`Type: ${monitor.type}`,
`Downtime Duration: ${durationMinutes} minute(s)`,
`Escalation Threshold: ${waitMinutes} minute(s)`,
];
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escalation message details hard-code Status: Down, but escalations are also triggered for status === "breached". This will send incorrect status information for threshold incidents. Use monitor.status (or map to a friendly label) in the message content.

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +72
escalationRules: z.array(z.object({
waitTime: z.number().min(1000, "Wait time must be at least 1 second"),
notificationIds: z.array(z.string()).min(1, "At least one notification is required"),
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server-side validation allows waitTime as low as 1s, but the client-side schema/UI enforce a minimum of 1 minute and only present minute+ options. This mismatch can lead to confusing behavior (e.g. API accepts values UI can’t set; logs that show 0 minute(s)). Align server validation with the client/UI constraints (or update the client/UI to match the intended minimum).

Copilot uses AI. Check for mistakes.
Comment on lines +96 to +98
escalationRules: z.array(z.object({
waitTime: z.number().min(1000, "Wait time must be at least 1 second"),
notificationIds: z.array(z.string()).min(1, "At least one notification is required"),
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as create validation: server allows waitTime down to 1s while the client enforces 1 minute. Please keep create/edit validation consistent with the UI and the escalation scheduler expectations.

Copilot uses AI. Check for mistakes.
Comment on lines +171 to +186
// Step 6.5. Handle escalation notifications (check on every heartbeat when monitor is down)
if (statusChangeResult.monitor.status === "down" || statusChangeResult.monitor.status === "breached") {
this.handleEscalationCheck(statusChangeResult.monitor).catch((error: unknown) => {
this.logger.error({
message: `Error handling escalations for monitor ${statusChangeResult.monitor.id}: ${error instanceof Error ? error.message : "Unknown error"}`,
service: SERVICE_NAME,
method: "getMonitorJob",
stack: error instanceof Error ? error.stack : undefined,
});
});
}

// Step 6.6. Clear escalation tracking when monitor recovers
if (decision.shouldResolveIncident) {
this.notificationsService.clearEscalationTracking(statusChangeResult.monitor.id);
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New escalation handling in the heartbeat job isn’t covered by existing tests in server/test/superSimpleQueueHelper.test.ts. Adding tests for (1) escalation checks being triggered only when status is down/breached and (2) clearEscalationTracking being called on incident resolution would help prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +894 to +898
<IconButton
size="small"
onClick={() => removeEscalation(index)}
aria-label="Remove escalation rule"
sx={{ mt: theme.spacing(SPACING.LG) }}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aria-label for the remove button is hard-coded in English. Since the page uses i18n for visible strings, this label should also be translated to ensure consistent localization and accessibility.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants