Skip to content

Feature/alert muting#512

Merged
erikdarlingdata merged 33 commits intoerikdarlingdata:devfrom
HannahVernon:feature/alert-muting
Mar 11, 2026
Merged

Feature/alert muting#512
erikdarlingdata merged 33 commits intoerikdarlingdata:devfrom
HannahVernon:feature/alert-muting

Conversation

@HannahVernon
Copy link
Copy Markdown
Contributor

What does this PR do?

Adds configurable muting of Alerts. For full description, plan, and concise change-log see #493

Which component(s) does this affect?

  • [✅] Full Dashboard
  • [✅] Lite Dashboard
  • Lite Tests
  • SQL collection scripts
  • CLI Installer
  • GUI Installer
  • [✅] Documentation

How was this tested?

Manually tested both Dashboard and Lite per screenshots shown in #493

Checklist

  • [✅] I have read the contributing guide
  • [✅] My code builds with zero warnings (dotnet build -c Debug)
  • [✅] I have tested my changes against at least one SQL Server version
  • [✅] I have not introduced any hardcoded credentials or server names

HannahVernon and others added 18 commits March 9, 2026 12:36
Implement pattern-based and instance-based alert mute rules that suppress
notifications while continuing to log alerts for auditability.

Core:
- MuteRule model with AND-logic matching across server, metric, database,
  query text, wait type, and job name fields
- MuteRuleService with JSON persistence (Dashboard) and DuckDB persistence (Lite)
- Optional expiration (1h/24h/7d/permanent) with automatic skip on expired rules

Alert Pipeline:
- Mute checks wired into all 7 alert metrics in both editions
- Muted alerts logged with muted=true flag, notifications suppressed
- DuckDB schema v20: muted column on config_alert_log, config_mute_rules table

UI:
- MuteRuleDialog for creating/editing rules with match criteria and expiration
- ManageMuteRulesWindow with DataGrid listing, enable/disable/delete/purge
- Settings button in both editions to access mute rule management
- Right-click context menus in alert history: Mute This Alert / Mute Similar
- Muted alerts shown with reduced opacity and italic text in alert history

MCP:
- get_mute_rules tool added to both editions' MCP servers
- MuteRuleService registered in DI for MCP tool resolution

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Right-clicking an alert now shows 'View Details...' at the top of the
context menu, opening a read-only popup with all alert fields (time,
server, metric, values, notification type, status) and a muted indicator
banner. Implemented in both Dashboard and Lite editions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Store metric-specific detail text (query text, job names, wait types,
space breakdowns, etc.) alongside each alert log entry. The Alert Detail
Window now displays a Details section with rich contextual information
that varies by metric type:

- Long-Running Query: session ID, database, query text, CPU/reads/writes
- Long-Running Job: job name, duration, average/P95, % of average
- Blocking: blocked session count, longest wait duration
- Deadlock: new deadlock count
- Poison Wait: per-wait type breakdown (avg ms, delta, tasks)
- TempDB Space: usage breakdown by category

Both Dashboard and Lite editions updated with matching changes.
DuckDB schema bumped to v21 for the new detail_text column.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Clarify the difference between 'Mute This Alert' (server + metric)
and 'Mute Similar Alerts' (metric only, all servers). Add documentation
for the context-sensitive View Details popup window.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Double-clicking a row in the Alert History DataGrid now opens the
Alert Detail Window, matching the existing right-click context menu
behavior. Both Dashboard and Lite editions updated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Enabled checkbox in the Manage Mute Rules DataGrid was non-clickable
because the grid had IsReadOnly=True. Changed to IsReadOnly=False on the
grid with IsReadOnly=True on individual text columns. Added CellEditEnding
handler to persist the toggle via the MuteRuleService. Both editions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CellEditEnding fires before the binding commits, so rule.Enabled still
held the old value when passed to SetRuleEnabled. Fixed by reading the
current state from the service and toggling the inverse instead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaced DataGridCheckBoxColumn with a DataGridTemplateColumn containing
a regular CheckBox. The WPF DataGrid requires the first click to enter
edit mode on DataGridCheckBoxColumn, making it feel unresponsive. A
templated CheckBox with a Click handler responds immediately and persists
the change directly via the service. Reverted grid to IsReadOnly=True
since the checkbox is now independent of the edit pipeline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move BuildBlockingContextAsync and BuildDeadlockContextAsync calls before
RecordAlert in Dashboard so the rich context (session IDs, databases,
queries, wait times, lock modes) is stored as detail text instead of just
summary counts. Falls back to simple text if the context builder returns
null. Lite already had this behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MuteRuleDialog now shows only relevant pattern fields based on the
selected metric type:
- Blocking: Database, Query Text
- Deadlocks: Database
- High CPU / TempDB Space: no pattern fields
- Poison Wait: Wait Type
- Long-Running Query: Database, Query Text, Wait Type
- Long-Running Job: Job Name
- (any): all fields shown

Hidden fields are cleared on save to prevent stale matches.

ManageMuteRulesWindow Toggle and Delete buttons now return focus to the
rules grid and preserve selection index, enabling repeated operations
without re-clicking the list.

Both Dashboard and Lite editions updated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Added missing metrics to the alert thresholds table (poison waits,
long-running queries, TempDB space, long-running agent jobs) and
documented the three poison wait types: THREADPOOL, RESOURCE_SEMAPHORE,
and RESOURCE_SEMAPHORE_QUERY_COMPILE.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Root cause: DataGrid SelectionChanged event bubbled up to the
TabControl's SelectionChanged handler, which called RefreshAlerts()
on every row click, replacing ItemsSource and clearing selection
before the UI could render it.

Fixes:
- Guard TabControl handler with OriginalSource check to ignore
  bubbled events from child controls
- Use visual tree hit-testing for double-click handlers instead of
  relying on SelectedItem (more robust in both editions)
- Defensive Owner handling for AlertDetailWindow

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Dashboard: Add OriginalSource guard to ServerTabControl_SelectionChanged
   to prevent child control SelectionChanged events from triggering the
   handler (same bubbling bug pattern fixed earlier in Lite).

2. Lite: Add re-entrancy guard to ServerTab refresh timer to prevent
   overlapping RefreshAllDataAsync calls if a refresh takes longer than
   the 60-second timer interval. Matches Dashboard's _isRefreshing pattern.

3. Dashboard: Add missing DefaultRowStyle to WaitDrillDownWindow resources
   so the DataGrid's RowStyle reference resolves and rows get the context
   menu.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Owner

@erikdarlingdata erikdarlingdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Alert Muting

Overall this is well-structured with full parity across both apps. Nice work on the context-sensitive dialog, the auditability (muted alerts still logged), and the MCP integration. A few things to address:


Bugs

1. [BLOCKER] DuckDB migration NOT NULL on ALTER TABLE
Lite/Database/DuckDbInitializer.cs:563 — The v20 migration uses BOOLEAN NOT NULL DEFAULT false, but DuckDB doesn't support NOT NULL on ALTER TABLE ADD COLUMN. There's even a comment in the v9 migration (line 424) warning about this exact issue. The catch block swallows the error so the column never gets added, and subsequent INSERTs that reference muted will fail.

Fix: Change to BOOLEAN DEFAULT false (drop the NOT NULL). The NOT NULL in Schema.CreateAlertLogTable for fresh installs is fine — this only affects the ALTER TABLE migration path.

2. "Mute This Alert" dialog title says "Edit Mute Rule"
Both Dashboard/Controls/AlertsHistoryContent.xaml.cs and Lite/Controls/AlertsHistoryTab.xaml.cs pass a pre-filled MuteRule to the MuteRuleDialog(MuteRule existingRule) constructor. Since existingRule != null, the dialog sets its title to "Edit Mute Rule" instead of "Create Mute Rule".

Fix: Either use the no-arg constructor and set fields after, add a bool isNew parameter, or use the AlertMuteContext constructor path.

3. "Mute This Alert" doesn't pre-fill context fields
Right-clicking an alert only pre-fills server name + metric name, not database/query text/wait type/job name. The AlertMuteContext constructor exists for rich pre-fill but is never called from the UI — the structured fields aren't on the display model. Users have to manually type pattern fields even though the detail info is visible in the alert.

This could be deferred to a follow-up, but worth noting the AlertMuteContext constructor is currently dead code.


Medium

4. Lite MuteRuleService skips the LockedConnection pattern
All DuckDB operations in Lite/Services/MuteRuleService.cs call _dbInitializer.CreateConnection() directly without AcquireReadLock()/AcquireWriteLock(). The established pattern in LocalDataService uses locks to coordinate with CHECKPOINT and archive operations. A concurrent CHECKPOINT during a mute rule write could cause issues.

5. Cache-first, persist-second risks silent data loss
Lite's MuteRuleService mutates the in-memory cache under lock, then does a best-effort async DuckDB write. If the write fails silently, the user's changes are lost on next app restart since LoadAsync() reloads from DuckDB.

Consider: attempt the DB write first, update the cache only on success. Or at minimum log a visible warning when persistence fails.

6. Poison wait mute only checks worst wait type
Lite/MainWindow.xaml.cs:1195 — The mute context only includes worst.WaitType. If a user mutes RESOURCE_SEMAPHORE but THREADPOOL is the worst wait, the mute check evaluates THREADPOOL and the alert fires. If the muted wait happens to be worst, other unmuted waits get suppressed too. This is a design choice but may surprise users.

7. MCP tools gap

  • get_alert_history in both apps doesn't include the new muted or detail_text columns
  • get_mute_rules is missing from the README tools table (line 411)
  • README tool counts (lines 49, 278) need incrementing

Low / Cosmetic

8. Hardcoded color #33D97706 on MutedBanner in AlertDetailWindow — verify it looks right on Light/Cool Breeze themes.

9. Dashboard AlertDetailWindow missing Icon attribute (Lite sets /EDD.ico).

10. Dashboard MuteRuleService.Save() swallows all exceptions silently — should at least Logger.Error() them.

11. No automatic purge of expired rules — they accumulate until manual purge. Consider purging on startup in LoadAsync().


What's Done Well

  • All 7 alert metrics covered with consistent mute check pattern
  • Muted alerts still logged for full auditability
  • Thread safety via lock(_lock) with defensive ToList() copies
  • All DuckDB queries properly parameterized (no SQL injection)
  • Schema migrations idempotent with IF NOT EXISTS
  • config_mute_rules correctly excluded from archivable tables
  • Context-sensitive dialog hides irrelevant fields per metric type
  • Clean settings integration entry point
  • Schema test updated correctly

Bug #1 is a blocker for existing Lite users upgrading. The rest can be follow-up items if needed.

erikdarlingdata

This comment was marked as resolved.

Bugs:
- Fix DuckDB migration v20 NOT NULL on ALTER TABLE (blocker for upgrades)
- Fix Mute This Alert dialog showing 'Edit' title instead of 'Create'
- Use AlertMuteContext constructor for proper field pre-fill path

Medium:
- Lite MuteRuleService now uses AcquireReadLock/AcquireWriteLock for
  CHECKPOINT coordination (matches LocalDataService pattern)
- Restructure to persist-first: attempt DB write before updating cache,
  log warnings on failure instead of silently swallowing
- Add poison wait mute limitation comment in both editions
- Add muted and detail_text columns to get_alert_history MCP tool
- Add get_mute_rules to README tools table, update tool counts

Low/Cosmetic:
- MutedBanner uses DynamicResource WarningColor instead of hardcoded color
- Add missing Icon attribute to Dashboard AlertDetailWindow
- Add Logger.Error() to Dashboard MuteRuleService.Save() catch block
- Auto-purge expired mute rules on startup in both editions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@HannahVernon
Copy link
Copy Markdown
Contributor Author

PR Review Feedback — Changes Made (commit 7dc5744)

All 11 items from the review have been addressed. Here's a summary organized by severity:

Bug Fixes

1. [BLOCKER] DuckDB NOT NULL on ALTER TABLE (DuckDbInitializer.cs:563)
Changed v20 migration from BOOLEAN NOT NULL DEFAULT false to BOOLEAN DEFAULT false. DuckDB does not support NOT NULL on ALTER TABLE ADD COLUMN — this would crash on upgrade. Matches the pattern established in the v9 migration (line 424) which already had a comment about this limitation.

2. Dialog title showing "Edit" instead of "Create" (both editions)
Changed MuteThisAlert_Click in both AlertsHistoryContent.xaml.cs (Dashboard) and AlertsHistoryTab.xaml.cs (Lite) to construct MuteRuleDialog with an AlertMuteContext instead of a MuteRule. This uses the context-aware constructor which sets the title to "Create Mute Rule" and pre-fills available fields.

3. Context pre-fill / dead code (both editions)
By switching to the AlertMuteContext constructor (item #2 above), the pre-fill code path is now exercised — ServerName and MetricName are populated from the alert row. Richer field extraction from DetailText (database, query text, etc.) is deferred to a follow-up as suggested.

Medium Priority

4. Lite MuteRuleService — missing locked connections
Added AcquireReadLock() / AcquireWriteLock() to all DuckDB operations in Lite's MuteRuleService.cs, matching the LockedConnection pattern used throughout LocalDataService. This prevents CHECKPOINT from reorganizing the database file while mute rule operations are in progress.

5. Persist-first pattern (Lite MuteRuleService)
Restructured all mutation methods (AddRuleAsync, RemoveRuleAsync, UpdateRuleAsync, SetRuleEnabledAsync) to attempt the DB write first, then update the in-memory cache. Failures are logged via AppLogger.Warn() so the feature still works for the current session even if DuckDB persistence fails.

6. Poison wait mute limitation comment (both editions)
Added explanatory comment in both MainWindow.xaml.cs files (Dashboard line ~1390, Lite line ~1195) documenting that the mute check uses the worst triggered wait type, and that muting a non-worst type won't suppress the alert.

7. MCP tool gaps (both editions + README)

  • Added muted and detail_text columns to get_alert_history results in both McpAlertTools.cs files
  • Added get_mute_rules to the Alerts row in the README tools table
  • Updated tool counts: Dashboard 27→28, Lite 31→32

Low Priority / Cosmetic

8. MutedBanner theme color (both editions)
Replaced hardcoded #33D97706 background with a theme-aware SolidColorBrush using {DynamicResource WarningColor} at 0.2 opacity in both AlertDetailWindow.xaml files. This ensures the muted banner renders correctly across Dark, Light, and CoolBreeze themes.

9. Dashboard AlertDetailWindow icon
Added Icon="/EDD.ico" to Dashboard/AlertDetailWindow.xaml to match the application icon shown on other windows.

10. Dashboard MuteRuleService.Save() error logging
Replaced silent catch {} with catch (Exception ex) { Helpers.Logger.Error(...) } in the Dashboard MuteRuleService.Save() method so persistence failures are logged.

11. Auto-purge expired rules on startup

  • Dashboard: PurgeExpiredRules() is called after Load() in the MuteRuleService constructor
  • Lite: PurgeExpiredRulesAsync() is called at the end of LoadAsync()

Files Changed (12 files, +112 / -65)

File Changes
Dashboard/AlertDetailWindow.xaml Icon, theme-aware MutedBanner
Dashboard/Controls/AlertsHistoryContent.xaml.cs Use AlertMuteContext constructor
Dashboard/MainWindow.xaml.cs Poison wait limitation comment
Dashboard/Mcp/McpAlertTools.cs Add muted + detail_text columns
Dashboard/Services/MuteRuleService.cs Error logging in Save(), auto-purge
Lite/Controls/AlertsHistoryTab.xaml.cs Use AlertMuteContext constructor
Lite/Database/DuckDbInitializer.cs Fix NOT NULL on v20 migration
Lite/MainWindow.xaml.cs Poison wait limitation comment
Lite/Mcp/McpAlertTools.cs Add muted + detail_text columns
Lite/Services/MuteRuleService.cs Locked connections, persist-first, logging, auto-purge
Lite/Windows/AlertDetailWindow.xaml Theme-aware MutedBanner
README.md Tool counts, get_mute_rules in tools table

Build: 0 errors, 0 new warnings (all warnings are pre-existing). All 6 unit tests pass.

HannahVernon and others added 7 commits March 10, 2026 20:10
Resolved conflict in Lite/Controls/ServerTab.xaml.cs: kept our
re-entrancy guard (_isRefreshing) and adopted upstream's
fullRefresh: false parameter for timer-based refreshes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DuckDB does not support NOT NULL on ALTER TABLE ADD COLUMN, which
caused the v20 migration blocker. This test scans DuckDbInitializer.cs
source code for violations, catching the mistake at build/CI time.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use regex with Singleline mode to match ADD COLUMN ... NOT NULL across
line boundaries within the same SQL statement (delimited by semicolons).
Also strips comments before scanning to avoid false positives.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Owner

@erikdarlingdata erikdarlingdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — Feature/alert muting

Nice work on this feature, Hannah — the architecture is clean and the first round of fixes in 7dc5744 addressed a lot. Here's what I found in the current state:


BLOCKERS (4)

1. DynamicResource on SolidColorBrush.Color will crash at runtime
Both Dashboard/AlertDetailWindow.xaml:78 and Lite/Windows/AlertDetailWindow.xaml:78:

<SolidColorBrush Color="{DynamicResource WarningColor}" Opacity="0.2"/>

WPF doesn't support DynamicResource on Freezable.Color — this throws InvalidOperationException when the MutedBanner becomes visible. Change to StaticResource.

2. CheckBox Enabled toggle persists the wrong value
Lite/Windows/ManageMuteRulesWindow.xaml:38 and Dashboard/ManageMuteRulesWindow.xaml:37 — the Click handler fires before the WPF binding pushes IsChecked back to rule.Enabled, so SetRuleEnabled(rule.Id, rule.Enabled) persists the pre-toggle value. Fix: read cb.IsChecked == true directly, or negate rule.Enabled in the handler.

3. Lite get_alert_history MCP tool queries config_alert_log instead of v_config_alert_log
Lite/Mcp/McpAlertTools.cs:40 — misses all archived alert history. Also omits AND dismissed = FALSE, so dismissed alerts leak into results. Should call dataService.GetAlertHistoryAsync() instead of duplicating raw SQL.

4. Dashboard get_alert_settings MCP tool — missing DI registration
Dashboard/Mcp/McpAlertTools.cs:22GetAlertSettings takes UserPreferencesService via DI, but it's never registered in McpHostService.cs. Will throw InvalidOperationException when called.


MEDIUM (9)

5. Edit dialog mutates the original MuteRule in-place (Dashboard)
Dashboard/MuteRuleDialog.xaml.cs:24-29Rule = existingRule stores a direct reference. If a future change adds intermediate writes or the user cancels after partial state, the live rule is corrupted. Should clone before editing.

6. "Mute Similar Alerts" opens in Edit mode (Lite)
Lite/Controls/AlertsHistoryTab.xaml.cs:463-468 — passes a new MuteRule{...} to the existingRule constructor, so the dialog title says "Edit Mute Rule" instead of "Create." Should use the AlertMuteContext constructor.

7. AddRuleAsync adds to memory even when DB persist fails (Lite)
Lite/Services/MuteRuleService.cs:101-116 — rule works until restart, then silently vanishes. User has no indication it didn't persist. Consider not adding to _rules on failure, or surfacing a warning.

8. LogAlertAsync fallback writes to DuckDB without AcquireWriteLock()
Lite/Services/EmailAlertService.cs:210-238 — the INSERT runs without lock coordination, can conflict with CHECKPOINT or archive operations.

9. High CPU alert missing detailText (Lite)
Lite/MainWindow.xaml.cs (CPU alert section) — all other alert types pass detailText to TrySendAlertEmailAsync, but High CPU omits it. The "View Details" dialog for muted CPU alerts shows no details.

10. No guard against "mute everything" rule
Both MuteRule.Matches() — a rule with all filter fields null matches every alert. Summary shows "(matches all alerts)" but there's no validation or confirmation dialog preventing accidental creation.

11. PurgeExpiredRulesAsync removes from memory before confirming DB delete (Lite)
Lite/Services/MuteRuleService.cs:205-234 — if DB delete fails, rules reappear as zombies on restart.

12. Null-forgiving contextMenu! (Dashboard)
Dashboard/Controls/AlertsHistoryContent.xaml.cs:465,477,499menuItem.Parent as ContextMenu can return null. Should add null checks instead of !.

13. Inconsistent get_alert_settings JSON shapes between Dashboard and Lite
Field names differ: notifications_enabled vs alerts_enabled, notify_connection_lost vs notify_connection_changes, threshold field names vary. MCP clients switching editions need different parsing logic.


LOW (7)

14. AlertDetailWindow has no Close button or Escape key support (both editions)
15. Editing a rule resets its expiration timer from "now" (both editions)
16. Schema difference: fresh installs have NOT NULL on muted/dismissed columns; migrations create them nullable (Lite) — reader code handles both, but schema diverges
17. Auto-purge on startup silently removes expired rules with no trace
18. Hidden pattern fields retain stale text when metric changes and field re-appears
19. MuteRule model duplicated between Dashboard and Lite (future sync burden)
20. Grid rows 14-15 defined but unused in Dashboard/MuteRuleDialog.xaml


Summary

The 4 blockers need fixes before merge — #1 and #2 will crash or behave incorrectly at runtime, #3 and #4 break MCP tools. The medium items are worth addressing but aren't ship-stoppers individually. The lows are cleanup/hardening.

Great feature overall — looking forward to the next round! 🎯

HannahVernon and others added 6 commits March 11, 2026 13:36
WPF does not support DynamicResource on Freezable properties like
SolidColorBrush.Color. Changed to StaticResource in both editions'
AlertDetailWindow.xaml to avoid potential runtime crash.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Click handler fires before WPF binding updates rule.Enabled, so
SetRuleEnabled was persisting the pre-toggle value. Read cb.IsChecked
directly instead of rule.Enabled in both editions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaced raw SQL against config_alert_log with a call to
dataService.GetAlertHistoryAsync(), which queries v_config_alert_log
(includes archived parquet data) and filters out dismissed alerts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The get_alert_settings MCP tool takes UserPreferencesService via DI,
but it was never registered in McpHostService, causing
InvalidOperationException when called. Added it to the constructor
and DI registration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… lock, CPU detailText, mute-all guard, purge ordering, null checks, JSON alignment

5. MuteRuleDialog now clones the rule before editing (both editions)
6. Mute Similar uses AlertMuteContext constructor (both editions)
7. AddRuleAsync skips memory-add on persist failure (Lite)
8. LogAlertAsync acquires write lock before INSERT (Lite)
9. CPU alert now passes detailText to TrySendAlertEmailAsync (Lite)
10. Save_Click warns when all filter fields are null (both editions)
11. PurgeExpiredRulesAsync deletes from DB before memory (Lite)
12. Null checks on ContextMenu cast in AlertsHistoryContent (Dashboard)
13. Aligned get_alert_settings JSON field names across editions (Lite)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@HannahVernon
Copy link
Copy Markdown
Contributor Author

Blockers fixed

1. DynamicResource on Freezable.Color crash risk

  • Changed DynamicResource WarningColor to StaticResource WarningColor on SolidColorBrush.Color in both AlertDetailWindow.xaml files — WPF doesn't support DynamicResource on Freezable properties

2. CheckBox Enabled toggle persisting wrong value

  • EnabledCheckBox_Click handler now reads cb.IsChecked == true instead of rule.Enabled, which hadn't been updated by the WPF binding at Click time

3. Lite MCP get_alert_history queries wrong table

  • Replaced raw SQL against config_alert_log with call to dataService.GetAlertHistoryAsync(), which queries v_config_alert_log (includes archived parquet data) and filters dismissed = FALSE

4. Dashboard get_alert_settings MCP tool — missing DI registration

  • Added UserPreferencesService to McpHostService constructor and DI container registration

5. Migration guard test improvement

  • Updated Migrations_DoNotUseNotNullOnAlterTableAddColumn test to detect multi-line SQL statements using regex with Singleline mode, not just per-line scanning

Bug Fixes (9 issues addressed)

6. Edit dialog mutates original MuteRule in-place (Dashboard & Lite)

  • Added MuteRule.Clone() method to both editions
  • Edit dialog now clones the rule before modifying, preventing corruption of the live rule if the dialog is cancelled after partial state changes

7. "Mute Similar Alerts" opens in Edit mode (Lite & Dashboard)

  • Changed from MuteRuleDialog(new MuteRule{...}) (which shows "Edit Mute Rule") to MuteRuleDialog(new AlertMuteContext{...}) (which shows "Create Mute Rule")

8. AddRuleAsync adds to memory even when DB persist fails (Lite)

  • Now returns early on persist failure instead of adding to memory — rule won't silently vanish on restart

9. LogAlertAsync missing AcquireWriteLock (Lite)

  • EmailAlertService.LogAlertAsync INSERT now acquires write lock from DuckDbInitializer, preventing conflicts with CHECKPOINT or archive operations

10. High CPU alert missing detailText (Lite)

  • CPU alert now passes detailText to TrySendAlertEmailAsync, consistent with all other alert types

11. No guard against "mute everything" rule (Dashboard & Lite)

  • Added confirmation dialog in Save_Click when all filter fields are null: "This rule has no filters and will mute ALL alerts. Are you sure?"

12. PurgeExpiredRulesAsync removes from memory before DB delete (Lite)

  • Reordered: DB delete now happens first, memory removal only on success — prevents zombie rules reappearing after restart on DB failure

13. Null-forgiving contextMenu! (Dashboard)

  • Added proper null checks for menuItem.Parent as ContextMenu in ViewAlertDetails_Click, MuteThisAlert_Click, and MuteSimilarAlerts_Click

14. Inconsistent get_alert_settings JSON shapes (Lite vs Dashboard)

  • Aligned Lite field names to match Dashboard: alerts_enablednotifications_enabled, thresholdthreshold_percent (CPU) / threshold_seconds (blocking)

Files changed (44 files, +3006 / -222 lines total across all commits)

Copy link
Copy Markdown
Owner

@erikdarlingdata erikdarlingdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review — All Fix Commits Verified

Checked all 5 new commits (8b2efd3, 40e5819, 8a20b75, 5cf7fce, 5a543cc). Every item from the original review has been addressed:

# Item Status
1 StaticResource on MutedBanner ✅ Fixed
2 CheckBox reads cb.IsChecked directly ✅ Fixed
3 Lite MCP uses dataService.GetAlertHistoryAsync() ✅ Fixed
4 UserPreferencesService DI registration ✅ Fixed
5 Clone-on-edit in MuteRuleDialog ✅ Fixed
6 Mute Similar uses AlertMuteContext constructor ✅ Fixed
7 AddRuleAsync returns early on persist failure ✅ Fixed
8 LogAlertAsync acquires write lock ✅ Fixed
9 CPU alert passes detailText ✅ Fixed
10 Mute-all guard with confirmation dialog ✅ Fixed
11 Purge deletes from DB before memory ✅ Fixed
12 Null checks on ContextMenu cast ✅ Fixed
13 JSON field alignment ✅ Not a bug — Dashboard and Lite have genuinely different config models (separate lost/restored toggles vs single connection-changes toggle, Dashboard-only SMTP health tracking). The MCP output correctly reflects each app's actual settings.

One minor note

EmailAlertService.LogAlertAsync acquires the write lock after CreateConnection(), while MuteRuleService consistently acquires the lock before creating the connection. Not a functional issue (the connection object is inert until opened), just a pattern inconsistency. Optional cleanup.

Looks good to merge. Nice work on the thorough fixes! 👍

1. EmailAlertService.LogAlertAsync: acquire write lock BEFORE creating
   connection, matching the pattern used throughout the codebase.
2. Dashboard CPU alert: pass detailText to RecordAlert so the alert
   detail window shows CPU percentage and threshold.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@HannahVernon
Copy link
Copy Markdown
Contributor Author

Changes since last review

Self-review pass (code review + concurrency audit + cross-edition parity)

Ran three automated review passes against the full diff. Found and fixed two real issues; dismissed two false positives.


Bugs fixed

Lock ordering in Lite/Services/EmailAlertService.cs:LogAlertAsync

  • Write lock was acquired AFTER creating the DuckDB connection, violating the lock-then-connect pattern used everywhere else in the codebase
  • Swapped the two lines so AcquireWriteLock() comes before CreateConnection()

Dashboard CPU alert missing detailText

False positives investigated and dismissed

Dashboard MuteRuleService.Save() "race condition"

  • Automated review flagged Save() as unprotected, but all callers invoke it from inside lock(_lock)_rules is fully protected during JSON serialization. Not a bug.

Missing ApplyDefaultExpiration() in AlertMuteContext constructor

  • That feature lives on feature/alert-muting-part-2, not this PR. Not applicable here.

Minor parity note (not a bug)

  • Lite MCP get_alert_settings uses notify_connection_changes (single toggle) vs Dashboard uses notify_connection_lost + notify_connection_restored (two toggles). This reflects a genuine feature difference between editions — Lite has a single setting, Dashboard has granular control.

@erikdarlingdata erikdarlingdata merged commit 5609d83 into erikdarlingdata:dev Mar 11, 2026
3 checks passed
This was referenced Mar 12, 2026
@erikdarlingdata erikdarlingdata mentioned this pull request Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants