Feature/alert muting#512
Conversation
Implement pattern-based and instance-based alert mute rules that suppress notifications while continuing to log alerts for auditability. Core: - MuteRule model with AND-logic matching across server, metric, database, query text, wait type, and job name fields - MuteRuleService with JSON persistence (Dashboard) and DuckDB persistence (Lite) - Optional expiration (1h/24h/7d/permanent) with automatic skip on expired rules Alert Pipeline: - Mute checks wired into all 7 alert metrics in both editions - Muted alerts logged with muted=true flag, notifications suppressed - DuckDB schema v20: muted column on config_alert_log, config_mute_rules table UI: - MuteRuleDialog for creating/editing rules with match criteria and expiration - ManageMuteRulesWindow with DataGrid listing, enable/disable/delete/purge - Settings button in both editions to access mute rule management - Right-click context menus in alert history: Mute This Alert / Mute Similar - Muted alerts shown with reduced opacity and italic text in alert history MCP: - get_mute_rules tool added to both editions' MCP servers - MuteRuleService registered in DI for MCP tool resolution Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Right-clicking an alert now shows 'View Details...' at the top of the context menu, opening a read-only popup with all alert fields (time, server, metric, values, notification type, status) and a muted indicator banner. Implemented in both Dashboard and Lite editions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Store metric-specific detail text (query text, job names, wait types, space breakdowns, etc.) alongside each alert log entry. The Alert Detail Window now displays a Details section with rich contextual information that varies by metric type: - Long-Running Query: session ID, database, query text, CPU/reads/writes - Long-Running Job: job name, duration, average/P95, % of average - Blocking: blocked session count, longest wait duration - Deadlock: new deadlock count - Poison Wait: per-wait type breakdown (avg ms, delta, tasks) - TempDB Space: usage breakdown by category Both Dashboard and Lite editions updated with matching changes. DuckDB schema bumped to v21 for the new detail_text column. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Clarify the difference between 'Mute This Alert' (server + metric) and 'Mute Similar Alerts' (metric only, all servers). Add documentation for the context-sensitive View Details popup window. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Double-clicking a row in the Alert History DataGrid now opens the Alert Detail Window, matching the existing right-click context menu behavior. Both Dashboard and Lite editions updated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Enabled checkbox in the Manage Mute Rules DataGrid was non-clickable because the grid had IsReadOnly=True. Changed to IsReadOnly=False on the grid with IsReadOnly=True on individual text columns. Added CellEditEnding handler to persist the toggle via the MuteRuleService. Both editions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CellEditEnding fires before the binding commits, so rule.Enabled still held the old value when passed to SetRuleEnabled. Fixed by reading the current state from the service and toggling the inverse instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaced DataGridCheckBoxColumn with a DataGridTemplateColumn containing a regular CheckBox. The WPF DataGrid requires the first click to enter edit mode on DataGridCheckBoxColumn, making it feel unresponsive. A templated CheckBox with a Click handler responds immediately and persists the change directly via the service. Reverted grid to IsReadOnly=True since the checkbox is now independent of the edit pipeline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move BuildBlockingContextAsync and BuildDeadlockContextAsync calls before RecordAlert in Dashboard so the rich context (session IDs, databases, queries, wait times, lock modes) is stored as detail text instead of just summary counts. Falls back to simple text if the context builder returns null. Lite already had this behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MuteRuleDialog now shows only relevant pattern fields based on the selected metric type: - Blocking: Database, Query Text - Deadlocks: Database - High CPU / TempDB Space: no pattern fields - Poison Wait: Wait Type - Long-Running Query: Database, Query Text, Wait Type - Long-Running Job: Job Name - (any): all fields shown Hidden fields are cleared on save to prevent stale matches. ManageMuteRulesWindow Toggle and Delete buttons now return focus to the rules grid and preserve selection index, enabling repeated operations without re-clicking the list. Both Dashboard and Lite editions updated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Added missing metrics to the alert thresholds table (poison waits, long-running queries, TempDB space, long-running agent jobs) and documented the three poison wait types: THREADPOOL, RESOURCE_SEMAPHORE, and RESOURCE_SEMAPHORE_QUERY_COMPILE. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Root cause: DataGrid SelectionChanged event bubbled up to the TabControl's SelectionChanged handler, which called RefreshAlerts() on every row click, replacing ItemsSource and clearing selection before the UI could render it. Fixes: - Guard TabControl handler with OriginalSource check to ignore bubbled events from child controls - Use visual tree hit-testing for double-click handlers instead of relying on SelectedItem (more robust in both editions) - Defensive Owner handling for AlertDetailWindow Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Dashboard: Add OriginalSource guard to ServerTabControl_SelectionChanged to prevent child control SelectionChanged events from triggering the handler (same bubbling bug pattern fixed earlier in Lite). 2. Lite: Add re-entrancy guard to ServerTab refresh timer to prevent overlapping RefreshAllDataAsync calls if a refresh takes longer than the 60-second timer interval. Matches Dashboard's _isRefreshing pattern. 3. Dashboard: Add missing DefaultRowStyle to WaitDrillDownWindow resources so the DataGrid's RowStyle reference resolves and rows get the context menu. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
erikdarlingdata
left a comment
There was a problem hiding this comment.
Code Review: Alert Muting
Overall this is well-structured with full parity across both apps. Nice work on the context-sensitive dialog, the auditability (muted alerts still logged), and the MCP integration. A few things to address:
Bugs
1. [BLOCKER] DuckDB migration NOT NULL on ALTER TABLE
Lite/Database/DuckDbInitializer.cs:563 — The v20 migration uses BOOLEAN NOT NULL DEFAULT false, but DuckDB doesn't support NOT NULL on ALTER TABLE ADD COLUMN. There's even a comment in the v9 migration (line 424) warning about this exact issue. The catch block swallows the error so the column never gets added, and subsequent INSERTs that reference muted will fail.
Fix: Change to BOOLEAN DEFAULT false (drop the NOT NULL). The NOT NULL in Schema.CreateAlertLogTable for fresh installs is fine — this only affects the ALTER TABLE migration path.
2. "Mute This Alert" dialog title says "Edit Mute Rule"
Both Dashboard/Controls/AlertsHistoryContent.xaml.cs and Lite/Controls/AlertsHistoryTab.xaml.cs pass a pre-filled MuteRule to the MuteRuleDialog(MuteRule existingRule) constructor. Since existingRule != null, the dialog sets its title to "Edit Mute Rule" instead of "Create Mute Rule".
Fix: Either use the no-arg constructor and set fields after, add a bool isNew parameter, or use the AlertMuteContext constructor path.
3. "Mute This Alert" doesn't pre-fill context fields
Right-clicking an alert only pre-fills server name + metric name, not database/query text/wait type/job name. The AlertMuteContext constructor exists for rich pre-fill but is never called from the UI — the structured fields aren't on the display model. Users have to manually type pattern fields even though the detail info is visible in the alert.
This could be deferred to a follow-up, but worth noting the AlertMuteContext constructor is currently dead code.
Medium
4. Lite MuteRuleService skips the LockedConnection pattern
All DuckDB operations in Lite/Services/MuteRuleService.cs call _dbInitializer.CreateConnection() directly without AcquireReadLock()/AcquireWriteLock(). The established pattern in LocalDataService uses locks to coordinate with CHECKPOINT and archive operations. A concurrent CHECKPOINT during a mute rule write could cause issues.
5. Cache-first, persist-second risks silent data loss
Lite's MuteRuleService mutates the in-memory cache under lock, then does a best-effort async DuckDB write. If the write fails silently, the user's changes are lost on next app restart since LoadAsync() reloads from DuckDB.
Consider: attempt the DB write first, update the cache only on success. Or at minimum log a visible warning when persistence fails.
6. Poison wait mute only checks worst wait type
Lite/MainWindow.xaml.cs:1195 — The mute context only includes worst.WaitType. If a user mutes RESOURCE_SEMAPHORE but THREADPOOL is the worst wait, the mute check evaluates THREADPOOL and the alert fires. If the muted wait happens to be worst, other unmuted waits get suppressed too. This is a design choice but may surprise users.
7. MCP tools gap
get_alert_historyin both apps doesn't include the newmutedordetail_textcolumnsget_mute_rulesis missing from the README tools table (line 411)- README tool counts (lines 49, 278) need incrementing
Low / Cosmetic
8. Hardcoded color #33D97706 on MutedBanner in AlertDetailWindow — verify it looks right on Light/Cool Breeze themes.
9. Dashboard AlertDetailWindow missing Icon attribute (Lite sets /EDD.ico).
10. Dashboard MuteRuleService.Save() swallows all exceptions silently — should at least Logger.Error() them.
11. No automatic purge of expired rules — they accumulate until manual purge. Consider purging on startup in LoadAsync().
What's Done Well
- All 7 alert metrics covered with consistent mute check pattern
- Muted alerts still logged for full auditability
- Thread safety via
lock(_lock)with defensiveToList()copies - All DuckDB queries properly parameterized (no SQL injection)
- Schema migrations idempotent with
IF NOT EXISTS config_mute_rulescorrectly excluded from archivable tables- Context-sensitive dialog hides irrelevant fields per metric type
- Clean settings integration entry point
- Schema test updated correctly
Bug #1 is a blocker for existing Lite users upgrading. The rest can be follow-up items if needed.
Bugs: - Fix DuckDB migration v20 NOT NULL on ALTER TABLE (blocker for upgrades) - Fix Mute This Alert dialog showing 'Edit' title instead of 'Create' - Use AlertMuteContext constructor for proper field pre-fill path Medium: - Lite MuteRuleService now uses AcquireReadLock/AcquireWriteLock for CHECKPOINT coordination (matches LocalDataService pattern) - Restructure to persist-first: attempt DB write before updating cache, log warnings on failure instead of silently swallowing - Add poison wait mute limitation comment in both editions - Add muted and detail_text columns to get_alert_history MCP tool - Add get_mute_rules to README tools table, update tool counts Low/Cosmetic: - MutedBanner uses DynamicResource WarningColor instead of hardcoded color - Add missing Icon attribute to Dashboard AlertDetailWindow - Add Logger.Error() to Dashboard MuteRuleService.Save() catch block - Auto-purge expired mute rules on startup in both editions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR Review Feedback — Changes Made (commit
|
| File | Changes |
|---|---|
Dashboard/AlertDetailWindow.xaml |
Icon, theme-aware MutedBanner |
Dashboard/Controls/AlertsHistoryContent.xaml.cs |
Use AlertMuteContext constructor |
Dashboard/MainWindow.xaml.cs |
Poison wait limitation comment |
Dashboard/Mcp/McpAlertTools.cs |
Add muted + detail_text columns |
Dashboard/Services/MuteRuleService.cs |
Error logging in Save(), auto-purge |
Lite/Controls/AlertsHistoryTab.xaml.cs |
Use AlertMuteContext constructor |
Lite/Database/DuckDbInitializer.cs |
Fix NOT NULL on v20 migration |
Lite/MainWindow.xaml.cs |
Poison wait limitation comment |
Lite/Mcp/McpAlertTools.cs |
Add muted + detail_text columns |
Lite/Services/MuteRuleService.cs |
Locked connections, persist-first, logging, auto-purge |
Lite/Windows/AlertDetailWindow.xaml |
Theme-aware MutedBanner |
README.md |
Tool counts, get_mute_rules in tools table |
Build: 0 errors, 0 new warnings (all warnings are pre-existing). All 6 unit tests pass.
Resolved conflict in Lite/Controls/ServerTab.xaml.cs: kept our re-entrancy guard (_isRefreshing) and adopted upstream's fullRefresh: false parameter for timer-based refreshes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DuckDB does not support NOT NULL on ALTER TABLE ADD COLUMN, which caused the v20 migration blocker. This test scans DuckDbInitializer.cs source code for violations, catching the mistake at build/CI time. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use regex with Singleline mode to match ADD COLUMN ... NOT NULL across line boundaries within the same SQL statement (delimited by semicolons). Also strips comments before scanning to avoid false positives. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
erikdarlingdata
left a comment
There was a problem hiding this comment.
Code Review — Feature/alert muting
Nice work on this feature, Hannah — the architecture is clean and the first round of fixes in 7dc5744 addressed a lot. Here's what I found in the current state:
BLOCKERS (4)
1. DynamicResource on SolidColorBrush.Color will crash at runtime
Both Dashboard/AlertDetailWindow.xaml:78 and Lite/Windows/AlertDetailWindow.xaml:78:
<SolidColorBrush Color="{DynamicResource WarningColor}" Opacity="0.2"/>WPF doesn't support DynamicResource on Freezable.Color — this throws InvalidOperationException when the MutedBanner becomes visible. Change to StaticResource.
2. CheckBox Enabled toggle persists the wrong value
Lite/Windows/ManageMuteRulesWindow.xaml:38 and Dashboard/ManageMuteRulesWindow.xaml:37 — the Click handler fires before the WPF binding pushes IsChecked back to rule.Enabled, so SetRuleEnabled(rule.Id, rule.Enabled) persists the pre-toggle value. Fix: read cb.IsChecked == true directly, or negate rule.Enabled in the handler.
3. Lite get_alert_history MCP tool queries config_alert_log instead of v_config_alert_log
Lite/Mcp/McpAlertTools.cs:40 — misses all archived alert history. Also omits AND dismissed = FALSE, so dismissed alerts leak into results. Should call dataService.GetAlertHistoryAsync() instead of duplicating raw SQL.
4. Dashboard get_alert_settings MCP tool — missing DI registration
Dashboard/Mcp/McpAlertTools.cs:22 — GetAlertSettings takes UserPreferencesService via DI, but it's never registered in McpHostService.cs. Will throw InvalidOperationException when called.
MEDIUM (9)
5. Edit dialog mutates the original MuteRule in-place (Dashboard)
Dashboard/MuteRuleDialog.xaml.cs:24-29 — Rule = existingRule stores a direct reference. If a future change adds intermediate writes or the user cancels after partial state, the live rule is corrupted. Should clone before editing.
6. "Mute Similar Alerts" opens in Edit mode (Lite)
Lite/Controls/AlertsHistoryTab.xaml.cs:463-468 — passes a new MuteRule{...} to the existingRule constructor, so the dialog title says "Edit Mute Rule" instead of "Create." Should use the AlertMuteContext constructor.
7. AddRuleAsync adds to memory even when DB persist fails (Lite)
Lite/Services/MuteRuleService.cs:101-116 — rule works until restart, then silently vanishes. User has no indication it didn't persist. Consider not adding to _rules on failure, or surfacing a warning.
8. LogAlertAsync fallback writes to DuckDB without AcquireWriteLock()
Lite/Services/EmailAlertService.cs:210-238 — the INSERT runs without lock coordination, can conflict with CHECKPOINT or archive operations.
9. High CPU alert missing detailText (Lite)
Lite/MainWindow.xaml.cs (CPU alert section) — all other alert types pass detailText to TrySendAlertEmailAsync, but High CPU omits it. The "View Details" dialog for muted CPU alerts shows no details.
10. No guard against "mute everything" rule
Both MuteRule.Matches() — a rule with all filter fields null matches every alert. Summary shows "(matches all alerts)" but there's no validation or confirmation dialog preventing accidental creation.
11. PurgeExpiredRulesAsync removes from memory before confirming DB delete (Lite)
Lite/Services/MuteRuleService.cs:205-234 — if DB delete fails, rules reappear as zombies on restart.
12. Null-forgiving contextMenu! (Dashboard)
Dashboard/Controls/AlertsHistoryContent.xaml.cs:465,477,499 — menuItem.Parent as ContextMenu can return null. Should add null checks instead of !.
13. Inconsistent get_alert_settings JSON shapes between Dashboard and Lite
Field names differ: notifications_enabled vs alerts_enabled, notify_connection_lost vs notify_connection_changes, threshold field names vary. MCP clients switching editions need different parsing logic.
LOW (7)
14. AlertDetailWindow has no Close button or Escape key support (both editions)
15. Editing a rule resets its expiration timer from "now" (both editions)
16. Schema difference: fresh installs have NOT NULL on muted/dismissed columns; migrations create them nullable (Lite) — reader code handles both, but schema diverges
17. Auto-purge on startup silently removes expired rules with no trace
18. Hidden pattern fields retain stale text when metric changes and field re-appears
19. MuteRule model duplicated between Dashboard and Lite (future sync burden)
20. Grid rows 14-15 defined but unused in Dashboard/MuteRuleDialog.xaml
Summary
The 4 blockers need fixes before merge — #1 and #2 will crash or behave incorrectly at runtime, #3 and #4 break MCP tools. The medium items are worth addressing but aren't ship-stoppers individually. The lows are cleanup/hardening.
Great feature overall — looking forward to the next round! 🎯
WPF does not support DynamicResource on Freezable properties like SolidColorBrush.Color. Changed to StaticResource in both editions' AlertDetailWindow.xaml to avoid potential runtime crash. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n/PerformanceMonitor into feature/alert-muting
The Click handler fires before WPF binding updates rule.Enabled, so SetRuleEnabled was persisting the pre-toggle value. Read cb.IsChecked directly instead of rule.Enabled in both editions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaced raw SQL against config_alert_log with a call to dataService.GetAlertHistoryAsync(), which queries v_config_alert_log (includes archived parquet data) and filters out dismissed alerts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The get_alert_settings MCP tool takes UserPreferencesService via DI, but it was never registered in McpHostService, causing InvalidOperationException when called. Added it to the constructor and DI registration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… lock, CPU detailText, mute-all guard, purge ordering, null checks, JSON alignment 5. MuteRuleDialog now clones the rule before editing (both editions) 6. Mute Similar uses AlertMuteContext constructor (both editions) 7. AddRuleAsync skips memory-add on persist failure (Lite) 8. LogAlertAsync acquires write lock before INSERT (Lite) 9. CPU alert now passes detailText to TrySendAlertEmailAsync (Lite) 10. Save_Click warns when all filter fields are null (both editions) 11. PurgeExpiredRulesAsync deletes from DB before memory (Lite) 12. Null checks on ContextMenu cast in AlertsHistoryContent (Dashboard) 13. Aligned get_alert_settings JSON field names across editions (Lite) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Blockers fixed1. DynamicResource on Freezable.Color crash risk
2. CheckBox Enabled toggle persisting wrong value
3. Lite MCP get_alert_history queries wrong table
4. Dashboard get_alert_settings MCP tool — missing DI registration
5. Migration guard test improvement
Bug Fixes (9 issues addressed)6. Edit dialog mutates original MuteRule in-place (Dashboard & Lite)
7. "Mute Similar Alerts" opens in Edit mode (Lite & Dashboard)
8. AddRuleAsync adds to memory even when DB persist fails (Lite)
9. LogAlertAsync missing AcquireWriteLock (Lite)
10. High CPU alert missing detailText (Lite)
11. No guard against "mute everything" rule (Dashboard & Lite)
12. PurgeExpiredRulesAsync removes from memory before DB delete (Lite)
13. Null-forgiving contextMenu! (Dashboard)
14. Inconsistent get_alert_settings JSON shapes (Lite vs Dashboard)
Files changed (44 files, +3006 / -222 lines total across all commits) |
erikdarlingdata
left a comment
There was a problem hiding this comment.
Follow-up Review — All Fix Commits Verified
Checked all 5 new commits (8b2efd3, 40e5819, 8a20b75, 5cf7fce, 5a543cc). Every item from the original review has been addressed:
| # | Item | Status |
|---|---|---|
| 1 | StaticResource on MutedBanner |
✅ Fixed |
| 2 | CheckBox reads cb.IsChecked directly |
✅ Fixed |
| 3 | Lite MCP uses dataService.GetAlertHistoryAsync() |
✅ Fixed |
| 4 | UserPreferencesService DI registration |
✅ Fixed |
| 5 | Clone-on-edit in MuteRuleDialog | ✅ Fixed |
| 6 | Mute Similar uses AlertMuteContext constructor |
✅ Fixed |
| 7 | AddRuleAsync returns early on persist failure |
✅ Fixed |
| 8 | LogAlertAsync acquires write lock |
✅ Fixed |
| 9 | CPU alert passes detailText |
✅ Fixed |
| 10 | Mute-all guard with confirmation dialog | ✅ Fixed |
| 11 | Purge deletes from DB before memory | ✅ Fixed |
| 12 | Null checks on ContextMenu cast |
✅ Fixed |
| 13 | JSON field alignment | ✅ Not a bug — Dashboard and Lite have genuinely different config models (separate lost/restored toggles vs single connection-changes toggle, Dashboard-only SMTP health tracking). The MCP output correctly reflects each app's actual settings. |
One minor note
EmailAlertService.LogAlertAsync acquires the write lock after CreateConnection(), while MuteRuleService consistently acquires the lock before creating the connection. Not a functional issue (the connection object is inert until opened), just a pattern inconsistency. Optional cleanup.
Looks good to merge. Nice work on the thorough fixes! 👍
1. EmailAlertService.LogAlertAsync: acquire write lock BEFORE creating connection, matching the pattern used throughout the codebase. 2. Dashboard CPU alert: pass detailText to RecordAlert so the alert detail window shows CPU percentage and threshold. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Changes since last reviewSelf-review pass (code review + concurrency audit + cross-edition parity)Ran three automated review passes against the full diff. Found and fixed two real issues; dismissed two false positives. Bugs fixedLock ordering in
Dashboard CPU alert missing
False positives investigated and dismissedDashboard
Missing
Minor parity note (not a bug)
|
What does this PR do?
Adds configurable muting of Alerts. For full description, plan, and concise change-log see #493
Which component(s) does this affect?
How was this tested?
Manually tested both Dashboard and Lite per screenshots shown in #493
Checklist
dotnet build -c Debug)