Skip to content

v3.0.0 - A Month Of Sundays

Choose a tag to compare

@erikdarlingdata erikdarlingdata released this 16 Jun 23:41
1dad2f6

[3.0.0] - 2026-06-16

Important

  • Major release — 2.11.0 → 3.0.0, no breaking changes. This version rolls up a codebase-wide correctness and security hardening pass (the code-review-* series spanning the SQL schema, collectors, and views, the installer, the Lite and Dashboard services, and the shared libraries); a major UI-responsiveness overhaul that moves the data path off the WPF dispatcher in both apps; new object- and index-level collection (per-table / per-index size, growth, usage, and locking/contention); the rebuilt Recommendations / Apply Fix engine (advise-and-act, with safe and destructive fixes appliable behind informed two-sided consent); and a batch of smaller fixes and features. Nothing here is a breaking change — existing installations upgrade in place via upgrades/2.11.0-to-3.0.0/ (typed blocked-process columns, a nullable host-CPU column, the TRANSACTION_MUTEX ignored wait, and new server-health columns), and the Dashboard and Lite apps auto-update over the top

Fixed

  • Lite and Dashboard: Azure SQL Database shows its real product name in FinOps → Server Inventory — the Edition column displayed the legacy SQL Azure value that SERVERPROPERTY('Edition') returns for Azure SQL DB; it now reads Azure SQL Database (<service tier>) (e.g. Azure SQL Database (General Purpose)), derived from DATABASEPROPERTYEX(DB_NAME(), 'Edition'), for any engine-edition-5 instance. Normalized at every edition display/storage site across both apps — the live inventory queries (Lite + Dashboard) and the SQL-side collectors (install/42, install/53) plus Lite's server_properties collector — so the value is consistent app-wide; on-prem editions are unchanged. (The licensing-recommendation queries are left raw and identical in both apps: they only do an Enterprise substring check and never display the edition for Azure.)

  • Dashboard: "Deadlocks Cleared" no longer flaps right after every deadlock (#1091) — deadlock detection is edge-triggered off a delta against the cumulative perfmon counter, so the check immediately after a deadlock saw a zero delta and fired a "Deadlocks Cleared" notification ~one interval (≈60s) after every "Deadlock Detected". The alert now stays active and clears only once a deadlock-quiet window (1 hour) has elapsed since the last new deadlock, so the detect/clear pair lines up with Lite, whose rolling 1-hour count drains about an hour after the last deadlock. Each new deadlock resets the window. The clear message is now "No deadlocks in the last hour" (was "No deadlocks since last check"). Covered by DeadlockAlertClearPolicyTests

  • Lite: blocking and deadlock alerts no longer re-fire for the same events every cooldown (#1091) — the overview alert engine treated the blocking and deadlock counts as a level: each check compared the rolling 1-hour count against the threshold, so a single deadlock (or blocked-process report) kept the count above the threshold for the whole hour it lingered in the window, and the alert re-fired every cooldown (the reporter saw the same "2 deadlocks in the last hour" notification every five minutes for an hour). The Dashboard already edge-triggers off a delta; Lite now does too. Both alerts are gated by a new RollingCountAlertGate that fires only when the rolling count climbs above the count recorded at the last fired alert — a genuinely new event. The watermark decays as old events age out of the window (so a later rise re-alerts), resets when the window empties, and advances only when an alert actually fires (so an event arriving during a cooldown is reported once the cooldown elapses rather than being swallowed). Covered by RollingCountAlertGateTests

  • Lite and Dashboard: low-disk (Volume Free Space) alert no longer re-fires every cooldown for a standing full volume — a breached volume is a sustained condition, but the alert engine treated free space as a level and re-fired every AlertCooldownMinutes (default 5) for as long as the volume stayed below threshold. Besides the repeated tray/email, every cycle wrote a fresh Alert-History row, so dismissing the alert appeared not to work — the dismissed row was immediately replaced by an identical, newer one. The alert is now gated by a shared LowDiskAlertGate that notifies only on a fresh breach or one that has worsened by at least 1 percentage point of free space below the last-alerted level, and clears its watermark when the volume recovers — mirroring the failed-job watermark and the #1091 rolling-count edge trigger. Fixed identically in both apps. Covered by LowDiskAlertGateTests

  • Lite and Dashboard: low-disk and failed-Agent-job conditions now light the server tab badge (#754/#749) — the per-server tab badge was driven only by blocking, deadlocks, CPU, and memory, so a server whose only problem was a full volume or a failed Agent job showed no tab indicator — you couldn't tell which server was affected at a glance (the alert surfaced only as a one-shot tray toast and an Alert-History row). Both apps now fold the alert engine's active low-disk / failed-job state into the badge: it lights while the breach (or a failure within the lookback window) is active, auto-clears when the disk recovers or the failure ages out, and acknowledges/silences exactly like the other badges. This is the persistent-indicator complement to the low-disk re-fire fix above — alert once, then stay quietly flagged instead of re-nagging. Covered by AlertBadgeConditionTests

  • Lite: blocking/deadlock XE sessions now self-heal and failures are surfaced (#1086) — the PerformanceMonitor_BlockedProcess and PerformanceMonitor_Deadlock Extended Events sessions were created only when a server tab was opened; the recurring background collection loop never created or retried them. A server monitored without an open tab (e.g. app minimized to tray after a restart), or a first attempt that failed (connection not ready, missing ALTER ANY EVENT SESSION), left blocking/deadlock capture permanently dead — while the collectors read the non-existent ring buffer, got zero rows, and reported OK. The session ensure now runs inside the collector itself on every cycle (cheap existence check once created), so both the tab-open path and the background loop create/start/retry it. A failed ensure can no longer be masked: it fails the collector run, shows in the status-bar collector health (including permission failures, which previously didn't count as "erroring"), and fires a one-time tray notification ("Capture Not Running") on the transition. The Azure SQL DB database-scoped sessions also gain STARTUP_STATE = ON so they restart automatically after a failover

  • Dashboard: blocking/deadlock XE sessions self-heal, Azure SQL DB sessions are actually created, and a missing session raises a Capture Down alert — same silent-failure family as #1086, worse on the Dashboard side. (1) The server-scoped sessions were created once at install and never re-ensured: if later stopped or dropped, collect.blocked_process_xml_collector and collect.deadlock_xml_collector swallowed the missing-session error and logged SUCCESS with zero rows forever. Both procs now ensure (create/start) the session at the top of every run. (2) On Azure SQL DB, the code comments claimed the database-scoped sessions were "auto-created by the collection procedures" — nothing anywhere created them, so blocking/deadlock capture was 100% non-functional on Azure SQL DB; the procs now create and start them (database_xml_deadlock_report for deadlocks — the Azure read also filtered on the wrong event name and would have returned nothing even with a session present). (3) Honest logging: when the session is genuinely absent and can't be created (typically missing ALTER ANY EVENT SESSION on-prem / CREATE ANY DATABASE EVENT SESSION on Azure SQL DB), the run logs SESSION_MISSING with the real error instead of SUCCESS. (4) The alert engine reads that status and raises a Capture Down alert through the standard pipeline — snoozable tray notification, email, webhook, alert history, cooldown, and mute — with a Capture Restored clear when the session comes back. Note: on Azure SQL DB the blocked-process threshold cannot be set via sp_configure and Microsoft documents no default, so the blocked-process session may exist yet capture nothing there; deadlock capture has no such dependency

  • Blocked-process and deadlock XML processors no longer loop on un-parseable events — the second-phase parsers (collect.process_blocked_process_xmlsp_HumanEventsBlockViewer, and collect.process_deadlock_xmlsp_BlitzLock) only marked a captured event processed when the parse produced at least one row. Events that legitimately yield zero — a self-block or non-lock wait (e.g. a memory-grant RESOURCE_SEMAPHORE wait that tripped blocked_process_threshold, which SQL Server reports as a session blocked by itself), or a deadlock graph the parser can't reconstruct — were never marked, so every collection cycle re-ran the CPU-intensive parser over the same dead events and re-logged a perpetual NO_RESULTS while the staging table never drained. Both processors now mark events processed after any clean parse run and log SUCCESS; genuine parse failures still roll back and retry. Separately, the blocked-process processor's parse window was half-open (event_time < @end_date), so a batch of reports sharing one timestamp — the common case, since a blocked-process monitor loop emits every report at a single instant — fell outside [MIN, MAX) and was silently dropped; the upper bound is now inclusive (matching the deadlock processor). Covered by tools/test_blocked_process_processor.sql using real self-block and two-session samples

  • Lite and Dashboard UI no longer goes blank or disappears after sleep/wake (#1050) — closing a laptop lid (or locking the screen) and then resuming could leave the app running with no usable window: notifications kept firing but the window was gone from the desktop and taskbar, and relaunching showed an empty window until a full exit/restart. Two causes, both fixed. (1) WPF's GPU render thread can lose its rendering surface across a sleep/wake or RDP reconnect and never recover, leaving a live-but-blank window; both apps now use software rendering (RenderOptions.ProcessRenderMode = SoftwareOnly) to remove the GPU dependency — charts are unaffected because ScottPlot already renders via SkiaSharp. (2) When Windows turned the sleep-driven minimize into a hidden window, the minimize-to-tray logic left it hidden with no automatic way back; a new shared resume guard now restores the window from the tray on resume/unlock if it was visible beforehand (a window the user deliberately sent to the tray is left alone)

  • "Silence All Alerts" now suppresses email too (#1035) — right-clicking a monitored instance and choosing Silence All Alerts hid tray notifications and Alerts-tab badges, but two email paths ignored the silenced state and kept sending: connection up/down emails (Server Unreachable / Server Restored) and analysis-finding emails (the narrative findings from the analysis engine, which include CPU/memory/blocking stories). Only the threshold-alert path (High CPU, blocking, deadlocks, etc.) honored silencing. Both gaps are closed — a silenced server now produces no tray, email, or alert-history row from any path. The analysis path was the likely source of the reporter's "High CPU" email, since the threshold-based High CPU alert was already suppressed. The shared AnalysisNotificationService (used by Lite too) gains an optional per-server silence predicate; Lite has no silencing feature and passes none

  • Dashboard time labels are now consistently 24-hour (#1012) — the time-range header at the top of each tab (e.g. "Original: May 28, 11:30 PM – May 29, 1:30 AM (PST)") and the Query Performance heatmap x-axis tick labels used h:mm tt, while every other timestamp in the app (footer "Last refresh", DataGrid columns, slicer, tooltips, logs) already used 24-hour HH:mm/HH:mm:ss. The AM/PM marker was also being truncated in the column shown by the reporter. Normalized the four outliers to HH:mm to match the rest of the app. The Lite heatmap had the same h:mm tt straggler — fixed alongside

  • Lite UI no longer freezes during archival (#979) — archival held DuckDB's exclusive write lock across the entire export-to-Parquet step, blocking every UI query (tab switches showed the spinning wheel, worse with more monitored servers). Export-to-Parquet only reads the database, so it now runs under a shared read lock concurrently with the UI; only the brief DELETE takes the exclusive write lock

  • Lite FinOps no longer recommends an edition downgrade on an Availability Group secondary (#980) — the licensing recommendations suggested "downgrade to Standard to save $X/mo" for any Enterprise instance, with no AG awareness. On a secondary replica that advice is misleading — every replica in an AG must run the same edition. FinOps now detects the AG replica role and, on a secondary, shows an informational note instead of the downgrade/savings estimate

  • Lite alert emails no longer re-fire after an app restart (#981) — the per-metric email cooldown lived only in memory, so restarting Lite cleared it and an alert sent minutes earlier could be sent again immediately. The cooldown is now seeded from config_alert_log (the most recent successful send for that server/metric) the first time each alert is evaluated, so it survives restarts

  • Dashboard alert emails no longer re-fire after an app restart — brings Dashboard EmailAlertService to parity with the Lite-side persistence introduced in #981. The cooldown is now seeded from the in-memory alert log (loaded from alert_history.json on startup) the first time each {serverId}:{metricName} key is evaluated

  • Analysis-finding notification cooldowns now persist across restarts on both Lite and Dashboard — the per-finding re-notification cooldown in AnalysisNotificationService lived only in memory, so restarting either app cleared it and a finding that had just fired (and entered its AnalysisNotifyCooldownMinutes cooldown) could re-notify immediately. The cooldown now seeds lazily from the alert log (Lite: config_alert_log; Dashboard: alert_history.json) on first lookup per finding, mirroring the email-cooldown pattern from #981. Entries past 2× the cooldown window are pruned on each notify cycle so the dictionary stays bounded

  • Data Retention job no longer fails with xp_delete_file error 22049 (#972) — the trace-file cleanup added in v2.11.0 passed a wildcard path to xp_delete_file, raising an uncatchable Msg 22049 that failed the entire PerformanceMonitor - Data Retention Agent job on every run once any Monitor_LongQueries_*.trc files existed. xp_delete_file also cannot delete .trc files at all — it only accepts SQL Server backup files and Maintenance Plan report files — so that cleanup step has been removed from config.data_retention

  • Codebase-wide correctness and security hardening pass — a broad review (the code-review-* PR series, #1093#1108) fixed defects across the stack without changing behavior users depend on:

    • Shared libraries — defects in the extracted PerformanceMonitor.Analysis / .PlanAnalysis / .Ui / .Common code
    • Dashboard — timezone and CPU-path defects
    • Lite — services, analysis, and UI defects, plus ArchiveService data-loss / corruption fixes
    • Installer — CLI version-detection and failure-handling
    • SQL — high-impact collector defects, view / analyzer crashes (including a Linux CPU gap), and schema / job / validation defects
  • FinOps no longer recommends downgrading to Standard Edition on a server running Availability Groups (#1085) — an Enterprise instance with no TDE was told to "review whether Standard Edition would meet workload requirements" even when it was running AGs, which Standard supports only in the limited Basic Availability Groups form. FinOps now counts advanced (non-basic) AGs via sys.availability_groups.basic_features and, when any are present, appends a caveat naming the AG count and Standard's Basic-AG limitations (two replicas, one database per group, no readable secondary), retitles the finding to "review Availability Group requirements before downgrading," and lowers its confidence — the savings estimate is retained. The Dashboard, which previously had no AG awareness at all, was brought to full parity and also gains the #980 AG-secondary informational note it never received

  • Server-tab alert badge is now clearable (#1092) — the red alert badge on a server tab could previously only be cleared through an undiscoverable right-click menu. Left-clicking the badge now acknowledges and clears it (hand cursor, "Click to dismiss · Right-click for options" tooltip), and Alert History Dismiss All clears the matching server badge(s) too. A follow-up (#1122) closed the last gap: Dismiss Selected now also clears the badge for every distinct server represented in the dismissed rows. On the Dashboard, which already had richer auto-resolving badges, this added the missing left-click affordance for parity

  • Long-running-query alert no longer constantly trips on CDC capture jobs (#1096) — the Change Data Capture capture job runs as a continuous SQL Agent session (sp_MScdc_capture_jobsp_cdc_scan), so its elapsed time permanently exceeded the long-running-query threshold and the alert fired non-stop; none of the four existing wait_type-based exclusions caught it. Both apps gain an Exclude CDC capture jobs toggle (default on) that identifies the capture session server-side by decoding its Agent program_name to a job_id and matching msdb.dbo.cdc_jobs (job_type = 'capture'), falling back to a whole-text match when msdb is unreadable or cdc_jobs doesn't yet exist — so it stays CDC-specific and never hides unrelated Agent jobs. Dashboard filters the live DMV query inline; Lite computes a per-row is_cdc_capture flag in the collector (its snapshots store only statement-level text) and filters on read

Changed

  • Plan parsing / analysis extracted to shared library PerformanceMonitor.PlanAnalysis — the previously duplicated ShowPlanParser, PlanAnalyzer, BenefitScorer, PlanLayoutEngine, and PlanModels pairs across Dashboard/Services + Dashboard/Models and Lite/Services + Lite/Models are now one copy referenced by both apps via <ProjectReference>. The new library targets net10.0 (no WPF) and has zero dependency on PerformanceMonitor.Analysis — the two shared libraries are independent. ~5,100 LOC of byte-equivalent duplication eliminated. The planalyzer-sync-checker agent is retired (no copies to sync). ActualPlanExecutor stays per-app this release because it calls ReproScriptBuilder (Class B, drifted between Lite and Dashboard); both will be extracted in a follow-up PR once ReproScriptBuilder is reconciled and a logging abstraction is designed
  • PlanIconMapper split to break a shared-library WPF dependencyShowPlanParser calls PlanIconMapper.GetIconName to populate PlanNode.IconName during parse, but the rest of PlanIconMapper is WPF-bound (GetIcon returns BitmapImage). The pure-data half (the IconMap dictionary + the GetIconName lookup) is now IconNameMapper inside PerformanceMonitor.PlanAnalysis. The per-app PlanIconMapper.GetIcon(string iconName) is unchanged; the per-app GetIconName forwarder is gone (ShowPlanParser calls IconNameMapper.GetIconName directly, and there were no other callers)
  • Analysis engine extracted to shared library PerformanceMonitor.Analysis — the previously duplicated FactScorer, RelationshipGraph, InferenceEngine, AnalysisModels, IFactCollector, IPlanFetcher, and BlockingChainReconstructor pairs across Dashboard/Analysis/ and Lite/Analysis/ are now one copy referenced by both apps and both test projects via <ProjectReference>. The new library targets net10.0 (no WPF) so it can be picked up by future non-WPF consumers without a multi-target rewrite. The blocking-reconstructor-sync-checker agent is retired (no copies to sync). BlockingChainReconstructorTests ported to Dashboard.Tests (10 tests) as part of the same change — Dashboard now exercises the same reconstruction coverage as Lite. AnalysisService and the DB-bound adapters (*FactCollector, *DrillDownCollector, *FindingStore, *AnomalyDetector, *BaselineProvider, *PlanFetcher) stay per-app because they bind to DuckDBConnection vs SqlConnection. PlanAnalyzer and its planalyzer-sync-checker are outside this extraction's scope and stay
  • Trace files are now bounded at the source (#972) — collect.trace_management_collector creates the long-query trace with a rollover file-count cap (@filecount, via the new @max_files parameter, default 5), so SQL Server itself deletes the oldest .trc file as the trace rolls. The scheduled collector also now issues START instead of RESTART: it keeps one trace running rather than tearing it down and spawning a fresh timestamped trace — and a fresh batch of orphaned files — every cycle
  • Blocked-process reports expose blocker-side fields as typed columnscollect.blocking_BlockedProcessReport now carries blocking_spid, blocking_last_tran_started, blocking_status, blocked_sql_text, and blocking_sql_text populated at insert time from blocked_process_report_xml. Existing rows are backfilled idempotently by the 2.11.0 → 3.0.0 upgrade script
  • Blocking-chain reconstruction now reads typed columns from collect.blocking_BlockedProcessReport instead of re-parsing blocked_process_report_xml on every analysis cycle — eliminates up to 5000 XElement.Parse calls per BLOCKING_CHAIN fact collection. The Dashboard BlockedProcessXmlParser has been deleted; the Lite collection-time parser is unchanged (Lite has no SQL-side staging table and still parses once at collect time)
  • Analysis minimum-data threshold lowered to 24 hoursLite/Analysis/AnalysisService.cs and Dashboard/Analysis/AnalysisService.cs now require 24 hours of collected data before analysis runs, down from 72. Validated empirically as sufficient for fraction-of-period calculations, so a fresh install starts producing findings after one day instead of three
  • Major UI-responsiveness overhaul — the data path now runs off the WPF dispatcher in both apps — DuckDB.NET is synchronous, so in Lite await _dataService.X() completed on the calling (UI) thread, and a single DuckDB connection open under load is ~750 ms; the result was multi-hundred-millisecond to multi-second UI freezes on the per-minute pipeline, refreshes, and alert checks. The fix moves the work onto pool threads (Task.Run) across the board: Lite's background collect/checkpoint/archive pipeline, the full-refresh fan-out, the 60-second sub-tab refreshes, picker charts, the overview sweep, timeline lanes, connect, and the FinOps and Recommendations reads; the Dashboard's ServerTab row materialization and its execution-plan parse/analyze; and — found later by wall-clock thread-time profiling under a HammerDB TPC-C load — the alert-check / overview-sweep DuckDB queries that were still on the dispatcher (#1121, which cut the worst measured dispatcher stall from ~1.2 s to under 10 ms). Lite also skips the heavy refresh for non-selected (hidden) server tabs, the shared crosshair/hover hot path was made cheaper for both apps, and Dashboard timers gained re-entrancy guards. A cluster of long-session memory leaks that progressively degraded responsiveness was fixed alongside (#1116): an Alerts-tab DispatcherTimer that kept ticking after the tab closed, unbounded per-run alert-key dictionaries, a tray-service handler re-subscribed on every theme change, and plan-viewer controls leaked through a static theme event. (The related sleep/wake blank-window and software-rendering fix is tracked separately under #1050 above.) Net effect: the UI stays responsive under heavy collection and query load

Added

  • tools/Remove-OrphanedTraceFiles.ps1 (#972) — one-time cleanup script for Monitor_LongQueries_*.trc files left on disk by versions through 2.11.0. Run it on the SQL Server host; it skips files belonging to a running trace and files that are in use
  • FactAdvice and FactRemediation in PerformanceMonitor.Analysis — new shared-library data layer that maps every scorable fact-key to a Headline / Investigation / Remediation advice block, plus a copy-paste-ready sp_query_store_force_plan T-SQL generator for PLAN_REGRESSION findings (gated to that single fact-key in v1; PARAMETER_SENSITIVITY deliberately does not generate plan-force T-SQL because forcing locks in the wrong plan for some parameter values). Drill-down collectors now also project best_plan_id (via MAX(plan_id) in the plan-dedup CTE) so the generated EXEC carries the integer ID sp_query_store_force_plan actually accepts, not just the hash. Lite's BuildContext now mirrors Dashboard's — both apps emit a Diagnosis card at Details[0] carrying Story / Severity / Notify threshold / Confidence / Facts / Database / Window before the drill-down items. The rendering surfaces that consume this data (email HTML, plain-text email, Teams + Slack webhook payloads, in-app Alert Details window) ship in a separate follow-up PR
  • Object- and index-level collection: sizes, growth, usage, and locking/contention (#1103) — both apps gain a daily collector that snapshots per-table and per-index storage (sys.dm_db_partition_stats), index usage (sys.dm_db_index_usage_stats — seeks/scans/lookups/updates), and per-object locking/latch/escalation (sys.dm_db_index_operational_stats — row-lock waits, page-latch waits, lock-escalation attempts), all from stock DMVs verified stable from SQL Server 2016 through 2025 and on Azure SQL DB / Managed Instance. On-prem and MI iterate user databases (honoring the collector exclusion list); Azure SQL DB uses its single-database branch. Dashboard collects into collect.index_object_stats (install/55_collect_index_object_stats.sql, scheduled daily with 90-day retention picked up by dynamic retention); Lite collects into DuckDB with archival registered. Three new FinOps sub-tabs in each app — Object Sizes & Growth (per-table size plus 7-/30-day growth and daily rate), Index Usage (Unused / Write-only / Active classification), and Locking & Contention (top-contended indexes) — plus MCP read tools (get_table_index_sizes, get_index_usage, get_object_locking). Because the daily snapshots are cumulative, the new object-growth (ANOMALY_OBJECT_GROWTH, a table grew >100 MB and ≥20% day-over-day) and lock-contention (ANOMALY_OBJECT_CONTENTION, an index gained ≥60s of new row-lock wait) alerts are delta-based (the two most recent snapshots, reset-guarded) and flow through the existing anomaly → AnalysisNotificationService pipeline. Thresholds are fixed constants in this release; making them user-configurable is a follow-up
  • Recommendations / Apply Fix engine (advise-and-act rebuild) — the analysis engine's advisory output is now a first-class Recommendations surface in both apps, alongside Critical Issues. Each finding renders as a card with a plain-language Headline / Investigation / Remediation block (from the FactAdvice/FactRemediation shared-library data layer) and routes the reader into the relevant in-app view or MCP tool instead of dumping raw DMV queries. Advise-only recommendations include server-config advisories (MAXDOP / cost threshold for parallelism / max server memory), per-database config (autogrowth, percent-growth on large files), server-health facts (Lock Pages in Memory, Instant File Initialization, recent memory dumps), and missing-index / plan-warning recommendations mined from collected plans — missing-index CREATE statements are surfaced as copy-paste text. A subset is appliable in place behind informed, two-sided consent: always-safe ALTER DATABASE SET config fixes, and the destructive RCSI (enable read-committed snapshot) and clear cached plan (DBCC FREEPROCCACHE / unforce) fixes, which gate behind an acknowledge-each-risk dialog that quantifies both the risk of changing and the risk of doing nothing from the finding's own monitoring data. The advice and remediation T-SQL also render across every notification surface — email (HTML and plain text), Teams and Slack webhook payloads, and the in-app Alert Details window — and through the analyze_server and get_analysis_findings MCP tools
  • Low volume free-space alert (#754) in both apps — a new Volume Free Space alert (default on) fires when a monitored server's disk volume drops below a free-space percentage or a fixed GB amount (set either threshold to 0 to disable that dimension; if both are set, either breach fires). It reads the per-volume size/free data already collected by the database-size collector, evaluates every volume on the server, and fires one alert per server naming the worst (lowest-free) volume with up to five breaching volumes in the context — with the same cooldown, mute, alert-history, tray, and email plumbing as the existing tempdb-space alert. Defaults: 10% / 5 GB. Azure SQL DB has no volume data, so the alert never fires there
  • Failed SQL Agent job alert (#749) in both apps — complements the existing job-duration alerts with a Failed Agent Job alert (default on) that issues a live msdb.dbo.sysjobhistory query at alert-check time for job-outcome rows (step_id = 0, run_status = 0) that failed within a configurable look-back window (default 60 minutes). The read degrades gracefully when the login lacks msdb / SQLAgentReaderRole access (returns empty, never faults the alert cycle) and is skipped entirely on Azure SQL DB, which has no SQL Agent
  • Installer: optional custom data/log file locations (#768) — two optional CLI flags, --data-path and --log-path (both --flag VALUE and --flag=VALUE forms accepted), place the PerformanceMonitor database's .mdf/.ldf on specific server-side volumes at install time; an omitted flag falls back to the instance default path as before. The paths apply only on first creation (the create block is guarded by IF DB_ID(N'PerformanceMonitor') IS NULL), and Azure SQL Managed Instance ignores them. The path is validated and escaped (control characters and the dangerous filename characters are rejected; single quotes are doubled in both the C# injection layer and the dynamic CREATE DATABASE) because a data-file FILENAME literal cannot be parameterized