Skip retrying collectors after permission denial (#857)#870
Skip retrying collectors after permission denial (#857)#870erikdarlingdata merged 1 commit intodevfrom
Conversation
The collector loop already classifies SQL errors 229 / 297 / 300 as PERMISSIONS status and excludes them from the failure rate, but it keeps re-running the collector every interval and logging an identical denial each time. For DB-scoped logins on Azure SQL DB (e.g. D365FO) this churns the collection log and gives no new information — the permission won't change mid-session. Flag the collector on first denial and short-circuit RunCollectorAsync so we don't make the round-trip or the log entry. Flag is in-memory per (server, collector) — cleared on app restart so newly granted permissions are picked up on the next launch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erikdarlingdata
left a comment
There was a problem hiding this comment.
What this does
Follow-up to #857. Adds an in-memory IsPermissionRestricted flag on CollectorHealthEntry, set in RecordCollectorResult when a collector hits SQL error 229/297/300, and short-circuits RunCollectorAsync on the next scheduled run so we stop churning identical denials into the collection log. Scope is correctly limited to Lite (issue originated with D365FO on Azure SQL).
Good
- Lite-only change, base branch
dev— correct per repo conventions. - Flag is read/written under the existing
_healthLock; no new lock and no lock-ordering risk (IsCollectorPermissionRestrictedtakes a single lock and returns). - Decision to leave per-collector DB-scoped fallback queries out of scope is the right call — those have different semantics from the boxed DMVs and each deserves its own decision.
NO_PERMISSIONSrender path inLite/Services/LocalDataService.CollectionHealth.csis unchanged — the single recorded denial row still drives the UI state correctly.- No PlanAnalyzer changes (N/A), no SQL changes (N/A), no workflow/signing changes (N/A), no schema upgrade concerns (N/A).
Needs attention
- Flag never clears mid-session.
ClearHealthForServeris only wired to server removal, not server edit (Lite/MainWindow.xaml.cs:1118). Fixing perms or swapping in a higher-privilege login via the Edit dialog won't recover the collector until app restart. See inline on line 50. - SUCCESS branch doesn't reset the flag.
RecordCollectorResultlatchesIsPermissionRestricted = truebut never sets it back tofalse. Self-healing would be one line. See inline on line 242. - No test coverage added. The PR description explicitly defers smoke testing to the nightly + @TrudAX; unit tests on the flag state machine are the only thing catching regressions in the meantime. See inline on line 265.
- Minor — skip check sits downstream of several cheap gates in
RunCollectorAsync; could live in the scheduler instead. Not a blocker, just noted for future cleanup. See inline on line 415.
No blockers. (1) and (2) together would close the "user fixes perms without restart" gap cleanly.
Generated by Claude Code
| * identical denials every interval. Cleared on app restart — if | ||
| * permissions get granted later, the next launch retries once. | ||
| */ | ||
| public bool IsPermissionRestricted { get; set; } |
There was a problem hiding this comment.
Flag is never cleared mid-session. ClearHealthForServer is only invoked from ServerContextMenu_Remove_Click (Lite/MainWindow.xaml.cs:1144); the Edit-server success path at Lite/MainWindow.xaml.cs:1124 just calls RefreshServerList(). So if a user hits a permission denial, fixes perms on the SQL side (or edits the server entry to use a higher-privilege login) without restarting the app, the collector stays skipped until restart even though the denial is no longer valid.
Cheap fix: call _collectorService?.ClearHealthForServer(...) in the dialog.ShowDialog() == true branch of ServerContextMenu_Edit_Click, or expose a narrower ClearPermissionFlagsForServer that only resets IsPermissionRestricted so success/error counts aren't lost.
Generated by Claude Code
| the rest of the app session. */ | ||
| entry.LastErrorTime = DateTime.UtcNow; | ||
| entry.LastErrorMessage = errorMessage; | ||
| entry.IsPermissionRestricted = true; |
There was a problem hiding this comment.
IsPermissionRestricted is latched to true but never flipped back to false. If RecordCollectorResult is later called with status SUCCESS (e.g. after the Edit-server fix suggested above, or if perms get granted and the flag is cleared by some future path), the success branch at line 227 doesn't reset this flag. Worth resetting entry.IsPermissionRestricted = false; in the SUCCESS branch so the state is self-healing even without an explicit clear call.
Generated by Claude Code
| _logger?.LogDebug("Skipping collector '{Collector}' for server '{Server}' - permission denied this session", | ||
| collectorName, server.DisplayName); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Minor: the skip happens after GetConnectionStatus, IsCollectorSupported, and the MFA gate — all cheap, but it means every skipped collector still runs three dictionary lookups and a method call before bailing. Not worth restructuring, but if RunAllCollectorsForServerAsync ever gets called on a server where most collectors are permission-restricted, the cumulative cost is noticeable. Consider hoisting the restricted check into the scheduler (GetDueCollectorsForServer) so denied collectors aren't even enumerated. Not a blocker.
Generated by Claude Code
| return _collectorHealth.TryGetValue((serverId, collectorName), out var entry) | ||
| && entry.IsPermissionRestricted; | ||
| } | ||
| } |
There was a problem hiding this comment.
No test coverage in Lite.Tests/. The behavior here is pure state logic and easily unit-testable: instantiate RemoteCollectorService, call RecordCollectorResult(..., "PERMISSIONS", ...) via a test-only accessor (or refactor the entry dictionary behind a small testable surface), then assert IsCollectorPermissionRestricted returns true and that a subsequent SUCCESS resets it (see the other comment). Worth adding alongside FactCollectorTests.cs or as a new CollectorHealthTests.cs — the D365FO smoke test in the PR description is deferred, so unit coverage is the only thing that catches a regression before the next nightly.
Generated by Claude Code
) Azure SQL Database DBs hosted in an elastic pool (notably D365FO customer tenants) enforce VIEW SERVER PERFORMANCE STATE on sys.dm_os_schedulers regardless of the login's DB-scoped grants — VIEW DATABASE STATE + VIEW DATABASE PERFORMANCE STATE on the user DB are not sufficient. Verified by reproducing the failure in a standard Azure SQL DB elastic pool with a contained DB user; bare sys.dm_exec_requests/sys.dm_os_sys_info/sys.dm_os_performance_counters succeed but sys.dm_os_memory_clerks / sys.dm_os_schedulers / sys.dm_os_waiting_tasks fail with error 300. The other failing collectors (memory_clerks, waiting_tasks, tempdb_stats) have no DB-scoped alternative and will stay skip-gated via #870 for these users. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Implements #843 in Lite * Implements #843 for Full Dashboard * Add trailing newlines to ScrollPanBehavior files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Harden DuckDB queries: parameterize values, escape paths, fix IsArchiving race Addresses security findings from #840: - #846: Escape single quotes in file paths interpolated into read_parquet() and COPY TO - #847: Use DuckDB $1 parameters for DateTime values instead of string interpolation - #849: Make IsArchiving volatile-backed to prevent stale reads across threads Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Encrypt webhook URLs with DPAPI via Windows Credential Manager Moves Teams and Slack webhook URLs from plaintext settings.json/preferences.json to Windows Credential Manager (DPAPI-encrypted), matching the existing pattern used for SMTP passwords and SQL Server credentials. Includes automatic migration: on first settings load, any plaintext URLs are moved to Credential Manager and removed from the JSON file. Closes #848 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Lazy-load server tabs: only load visible tab on open, full-load on first visit Initial tab open and Refresh button now only load the currently visible tab. First switch to any tab triggers a full refresh of that tab (all sub-tabs). Subsequent refreshes only hit the active sub-tab. Ctrl+Click on Refresh Tab (or Ctrl+F5) refreshes all tabs at once. Apply to All Tabs retains existing full-refresh behavior. Fixes #835 — prevents heavy queries (e.g. GetQueryStatsAsync) from running on tab open when the user is only viewing Overview. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Cap query/procedure/query store grid results to TOP 500 GetQueryStatsAsync, GetProcedureStatsAsync, and GetQueryStoreDataAsync were returning unbounded result sets. With 49 databases and 742K rows in query_stats over 3 days, the GROUP BY with plan XML could produce thousands of rows and timeout after 120 seconds. TOP 500 ordered by avg CPU desc is plenty for a grid view and prevents the query from consuming unbounded memory on large installations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove pointless WAITFOR DECOMPRESS filters from stats/store queries The CAST(DECOMPRESS(...)) NOT LIKE N'WAITFOR%' filter was decompressing query text on every row in query_stats and query_store_data just to skip WAITFOR queries. WAITFOR has no plan and no meaningful stats — it only matters in query snapshots (active sessions), where the filter remains. On a 742K-row query_stats table, this was a significant contributor to the 120-second query timeouts reported in #835. The snapshot filters (report.query_snapshots) and MCP phased queries are untouched — they filter after TOP on already-hydrated text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Refactor query/procedure/query store stats to phased DECOMPRESS approach All three grid queries now use a 3-phase pattern: 1. Aggregate numerics into temp table (no DECOMPRESS) 2. Sum across lifetimes, rank TOP 500 3. OUTER APPLY to decompress text/plan for only the 500 winners On a 742K-row query_stats table, this reduces DECOMPRESS calls from 742K to 500 — eliminating the 16+ minute query times reported in #835. Matches the existing phased pattern used by the MCP query tools. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix FinOps TDE recommendation on SQL Server 2019+ (#854) TDE moved to Standard Edition in SQL 2019, so dm_db_persisted_sku_features no longer reports it as Enterprise-only. Add version check to give version-appropriate licensing guidance instead of falsely claiming no databases use TDE. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Sync PlanAnalyzer and BenefitScorer from PerformanceStudio (Apr 9-16) Port PS PRs #216, #217, #219, #224, #229, #230, #231 to PM. PlanAnalyzer changes: - Rule 5: Suppress for Key Lookups (point lookups mislead per-execution estimates) - Rule 8: Enhanced parallel skew with batch mode sort detection and practical context - Rule 9: Large memory grant shows top 3 consumers sorted by row count - Rule 10: Key lookup overhaul — show output columns, check predicate filtering, softer advice - Rules 11/12/29: Suppress on 0-execution nodes (operator never ran) - Rule 11: I/O wait severity elevation when scan hits disk - Rule 24: FormatNodeRef helper includes object name for data access operators - Rule 26: Suppress when row goal prediction was correct, specific cause detection - Wait stats: DescribeWaitType with full wait type coverage, multi-wait summary - New helpers: GetWaitLabel, HasSignificantIoWaits, IdentifyRowGoalCause, FormatNodeRef - GetOperatorOwnElapsedMs changed to internal for BenefitScorer access BenefitScorer (new file): - Stage 1: MaxBenefitPercent for operator-level rules (filter, spill, lookup, etc.) - Stage 2: Wait stats benefit scoring with parallel allocation (Joe's formula) PlanModels additions: - MaxBenefitPercent and ActionableFix on PlanWarning - WaitBenefit class and WaitBenefits list on PlanStatement Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fall back to single-database mode when Azure master is inaccessible (#857) On Azure SQL DB, some logins (e.g. Microsoft Dynamics 365 FO) are granted access only to a specific user database and not to master. The three collectors that enumerate databases via master — query_stats, database_size_stats, file_io_stats — would fail the first time and produce an empty screen. GetAzureDatabaseListAsync now catches known access-denied/login-failed errors from the master connection, caches the per-server decision, and returns the connection's InitialCatalog as a single-element list. The three callers already loop per-database, so single-DB mode works without further changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add nonclustered indexes for query/procedure/query store lookups Phase 3 OUTER APPLY hydration of compressed query_text/plan_text was forcing an Eager Index Spool over the full collect.query_stats table (and similar for procedure_stats / query_store_data), which took 104 seconds on a 742K-row table in #835. Changes: - Remove CONVERT(binary(8), nvarchar-hash, 1) anti-pattern from OUTER APPLY WHERE clauses by keeping query_hash as native binary(8) in temp tables. query_hash is only converted to nvarchar(20) in the final output projection. - Add three nonclustered indexes (install script and upgrade script): IX_query_stats_hash_lookup (query_hash, database_name, collection_time DESC) IX_procedure_stats_name_lookup (database_name, schema_name, object_name, collection_time DESC) IX_query_store_data_id_lookup (database_name, query_id, collection_time DESC) - Indexes use SORT_IN_TEMPDB = ON and DATA_COMPRESSION = PAGE. - ONLINE = ON is applied conditionally via dynamic SQL based on SERVERPROPERTY('EngineEdition') — Enterprise/Developer/Azure only, since Standard/Web/Express don't support online index operations. Tested against CADelete's 742K-row table: Phase 3 went from 104s to well under 1s (5s total for the full three-phase query). Fixes #835 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Scope query snapshots to current database on Azure SQL DB (#857) On Azure SQL Database, logins without access to master can't resolve cross-database rows returned by sys.dm_exec_requests, which caused the Live Snapshot button and the query snapshots collector to error in D365FO-style environments (reported by @TrudAX in #857 after PR #858). BuildQuerySnapshotsQuery now takes an isAzureSqlDatabase flag and emits AND der.database_id = DB_ID() only when true. Boxed SQL Server, MI, and elastic pool behavior is unchanged. The Live Snapshot button path gets the flag through a new ServerTab constructor parameter wired from the cached ServerConnectionStatus.SqlEngineEdition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Polish Lite chart axes and sub-tab styling - Chart X-axis prints the date line only on the first tick and on ticks where the date changes; all other ticks show time only. Format respects current culture (en-GB → dd/MM, de-DE → dd.MM, 24h clocks, etc.). Implemented as a DateTimeTicksBottomDateChange() extension in Lite/Helpers/AxesExtensions.cs and applied to every DateTimeTicksBottom call site in ServerTab and CorrelatedTimelineLanesControl. - Server name no longer duplicated in the ServerTab header status line; ConnectionStatusText now shows just "Connecting..." / "Last refresh: ...". - Chart tick label font bumped from 12 to 13 for readability. - New SubTabItemStyle (thin accent underline, transparent background) in all three themes, applied to Queries / Memory / File I/O / Blocking / Perfmon / Running Jobs sub-TabControls so sub-tab selection no longer looks identical to main-tab selection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Port Lite chart/tab polish to Dashboard + LSP diagnostics cleanup Dashboard polish (ports the same items merged to Lite in #862): - New Dashboard/Helpers/AxesExtensions.cs with DateTimeTicksBottomDateChange(), culture-aware (dd/MM for en-GB, dd.MM for de-DE, 24h clocks, etc.). All 52 call sites of DateTimeTicksBottom() across 10 files swapped to use it. - TabHelpers.ApplyTheme + ReapplyAxisColors bump chart tick label font from 12 to 13 so numbers read cleaner on wide charts. - SubTabItemStyle added to Dark / Light / CoolBreeze themes: thin accent underline + transparent background instead of filled cyan, so sub-tabs don't look identical to main tabs when selected. Wired via ItemContainerStyle on 11 sub-TabControls (Overview's inner tabs, Collection Health's inner tabs, Locking, ConfigChanges, CurrentConfig, FinOps, Memory, ResourceMetrics ×2, SystemEvents, QueryPerformance). LSP diagnostics cleanup (tracked work from chore/lsp-diagnostics-cleanup): - Small nullability/warning fixes across Dashboard and Lite services, analysis helpers, and BenefitScorer / PlanAnalyzer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix Overview crosshair disappearing after tab switches / layout passes Root cause: the control wired `Unloaded += ...Dispose()` on the crosshair manager, and WPF fires Unloaded for transient reasons (tab virtualization, layout rebuilds, etc.), not just when the control is actually going away. Dispose() clears the manager's lane list, after which ReattachVLines runs over an empty list and the crosshair is gone permanently. Changes: - Remove the Unloaded → Dispose() handler in both Lite and Dashboard copies. The manager holds only managed state (a Popup + lane references) — GC will clean it up with the control. - Remove the now-redundant `_isRefreshing` flag from CorrelatedCrosshairManager. The `lane.VLine == null` check in OnMouseMove is a sufficient "not ready" guard and is self-healing once VLines are recreated. - Wrap ReattachVLines in a try/finally on the control side, with a new idempotent EnsureVLinesAttached() safety net that only creates VLines for lanes where they're still null. - Make CreateVLine catch per-lane exceptions so one failing chart can't prevent the others from recovering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix Memory Pressure Events chart filter; add MCP interpretation (#865) Chart previously filtered to HIGH severity only (indicator>=3), which on most servers never fires, producing an empty chart even when sp_pressuredetector- level medium pressure (indicator=2) was occurring constantly. Switch to stacked bars per hour, split by SQL Server (process) vs Operating System (system), with severe events capped on top of medium in a darker shade. Extend ChartHoverHelper to support BarPlot tooltips. Add MCP guidance for interpreting indicator values and routing to the right follow-up tool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Port Memory Pressure Events feature to Lite (#865) Lite was missing the RING_BUFFER_RESOURCE_MONITOR collector entirely — no collector, no table, no chart, no MCP tool. This adds the full feature: - Schema: new memory_pressure_events table + index, schema v25, added to ArchivableTables, server-id-fix list, and ArchiveService. - Collector: CollectMemoryPressureEventsAsync queries the ring buffer and client-side-dedupes against DuckDB's MAX(sample_time). Azure SQL DB returns zero rows (ring buffer not exposed there). Scheduled every 5 min (Aggressive and Balanced presets) or 15 min (Low-Impact). - UI: new 'Memory Pressure Events' sub-tab on the Memory tab with the same stacked-bar chart as Dashboard (SQL Server medium/severe, Operating System medium/severe). Wired into full-load and sub-tab-switch refresh paths. - Hover: ported the BarPlot support from Dashboard's ChartHoverHelper so bar tooltips work and report the correct segment height for stacked bars. - MCP: new get_memory_pressure_events tool + the 'Interpreting Memory Pressure Events' guidance section in McpInstructions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Bump schema table count test to 30 for memory_pressure_events Companion update to the new memory_pressure_events table added in this PR. SchemaStatements_MatchTableCount asserts the total table count; needs to move from 29 to 30 to reflect the new table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix blocked process report plan lookup (#867) (#868) Right-click > View Plan on a Blocked Process Reports row silently fell through (no handler case) and Get Actual Plan erred with "no query text." - Split the grid onto its own BlockedProcessContextMenu with separate View Blocked Plan / View Blocking Plan actions; drop Get Actual Plan (re-executing a mid-transaction blocked query is a foot-gun). - Parse all <frame> entries from the BPR XML's executionStack, filter the 42-byte all-zero sql_handle placeholder (dynamic SQL / system context), default stmtstart=0 / stmtend=-1 per the dm_exec_text_query_plan convention. Matches sp_HumanEventsBlockViewer's XPath and join shape. - Add FetchPlanBySqlHandleAsync keyed on sql_handle + statement offsets against sys.dm_exec_query_stats. Caller iterates frames until one resolves; falls back to a clear "plan no longer in cache" message. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Pre-filter query snapshot requests into #temp on Azure SQL DB (#857) (#869) Follow-up to #861. The DB_ID() predicate in the WHERE clause wasn't enough — the OUTER APPLYs to sys.dm_exec_sql_text and sys.dm_exec_text_query_plan were still being evaluated against master-scoped rows from sys.dm_exec_requests before the filter was applied, tripping VIEW SERVER PERFORMANCE STATE errors for DB-scoped logins (D365FO). A CTE or derived table wouldn't guarantee the filter order, so materialise the filtered request rows into #req first and drive the DMFs off that. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Stop retrying collectors after non-transient permission denial (#857) (#870) The collector loop already classifies SQL errors 229 / 297 / 300 as PERMISSIONS status and excludes them from the failure rate, but it keeps re-running the collector every interval and logging an identical denial each time. For DB-scoped logins on Azure SQL DB (e.g. D365FO) this churns the collection log and gives no new information — the permission won't change mid-session. Flag the collector on first denial and short-circuit RunCollectorAsync so we don't make the round-trip or the log entry. Flag is in-memory per (server, collector) — cleared on app restart so newly granted permissions are picked up on the next launch. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Skip live query plans on Azure SQL DB (#857) (#871) sys.dm_exec_query_statistics_xml requires VIEW SERVER PERFORMANCE STATE on Azure SQL Database regardless of scope, so DB-scoped logins (e.g. D365FO) still hit error 300 even after the #temp pre-filter landed in #869. The OUTER APPLY evaluates the DMF for every session in #req and fails on the permission check before returning rows. Force supportsLiveQueryPlan=false for SqlEngineEdition=5 in both the collector and the Live Snapshot button paths. Boxed SQL Server and Azure MI (edition 8) still get live plans as before. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix FinOps recommendation severity sort order (#872) Sort recommendations by severity rank (High=1, Medium=2, Low=3) instead of alphabetically. Adds SeveritySort property to RecommendationRow and uses it as SortMemberPath for the Severity column. Display strings are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix FinOps severity sort order in Dashboard (#872) Severity column was sorting alphabetically (High, Low, Medium) instead of by severity ranking. Added SeveritySort computed property on FinOpsRecommendation, ordered results by it, and wired the DataGrid column's SortMemberPath so click-sort matches the default order. Mirrors the Lite fix in PR #874. * Drop sys.dm_os_schedulers from memory_stats on Azure SQL DB (#857) (#876) Azure SQL Database DBs hosted in an elastic pool (notably D365FO customer tenants) enforce VIEW SERVER PERFORMANCE STATE on sys.dm_os_schedulers regardless of the login's DB-scoped grants — VIEW DATABASE STATE + VIEW DATABASE PERFORMANCE STATE on the user DB are not sufficient. Verified by reproducing the failure in a standard Azure SQL DB elastic pool with a contained DB user; bare sys.dm_exec_requests/sys.dm_os_sys_info/sys.dm_os_performance_counters succeed but sys.dm_os_memory_clerks / sys.dm_os_schedulers / sys.dm_os_waiting_tasks fail with error 300. The other failing collectors (memory_clerks, waiting_tasks, tempdb_stats) have no DB-scoped alternative and will stay skip-gated via #870 for these users. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Release v2.8.0: version bumps and changelog (#877) Adds nonclustered indexes to collect.query_stats, procedure_stats, and query_store_data for Dashboard grid lookups (#835). Ports Memory Pressure Events to Lite (#865). Multiple Azure SQL DB collector fixes (#857). FinOps severity sort order fix (#872). Grid auto-scrolling (#843). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Scope v2.8.0 webhook DPAPI note to Dashboard (#879) Lite webhook URLs still read from plaintext settings — avoids implying the security hardening shipped to both editions. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: ClaudioESSilva <claudiosil100@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Follow-up to #857. @TrudAX's Collection Health screenshot showed five collectors (
memory_clerks,memory_stats,tempdb_stats,query_snapshots,waiting_tasks) all stuck at "10 runs, 0 success" — each re-running every collection interval, hitting the same SQL error 300 (VIEW SERVER PERFORMANCE STATE permission was denied) every time, and logging a fresh identical denial row into the collection log.The denial isn't transient. A DB-scoped login (D365FO) will never grow server-level permission mid-session. Retrying just churns the log.
This PR flags the collector on first denial and short-circuits
RunCollectorAsyncso we skip the round-trip and the log entry entirely.CollectorHealthEntry.IsPermissionRestrictedbool set whenRecordCollectorResultsees statusPERMISSIONSIsCollectorPermissionRestricted(serverId, collectorName)helper read under the existing_healthLockRunCollectorAsyncright after the existing MFA-cancelled skip, logging at Debug levelCollection Health's
NO_PERMISSIONSstatus logic is unchanged — it still renders correctly from the single recorded denial row.Per-collector DB-scoped fallback queries (e.g.
sys.dm_db_resource_statsfor memory_stats) were considered but deliberately out of scope — their semantics differ from the boxed DMVs and each is its own decision.Test plan
dotnet build Lite/PerformanceMonitorLite.csproj -c Debug— 0 errorsNO_PERMISSIONSinstead of10 runs, 0 successgrowing unbounded🤖 Generated with Claude Code