Skip to content

Skip retrying collectors after permission denial (#857)#870

Merged
erikdarlingdata merged 1 commit intodevfrom
fix/857-skip-permission-denied-collectors
Apr 20, 2026
Merged

Skip retrying collectors after permission denial (#857)#870
erikdarlingdata merged 1 commit intodevfrom
fix/857-skip-permission-denied-collectors

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

Follow-up to #857. @TrudAX's Collection Health screenshot showed five collectors (memory_clerks, memory_stats, tempdb_stats, query_snapshots, waiting_tasks) all stuck at "10 runs, 0 success" — each re-running every collection interval, hitting the same SQL error 300 (VIEW SERVER PERFORMANCE STATE permission was denied) every time, and logging a fresh identical denial row into the collection log.

The denial isn't transient. A DB-scoped login (D365FO) will never grow server-level permission mid-session. Retrying just churns the log.

This PR flags the collector on first denial and short-circuits RunCollectorAsync so we skip the round-trip and the log entry entirely.

  • New CollectorHealthEntry.IsPermissionRestricted bool set when RecordCollectorResult sees status PERMISSIONS
  • New IsCollectorPermissionRestricted(serverId, collectorName) helper read under the existing _healthLock
  • Early-return in RunCollectorAsync right after the existing MFA-cancelled skip, logging at Debug level
  • Flag is in-memory per (server, collector) — app restart retries once. If perms are still missing the flag re-applies after the first attempt

Collection Health's NO_PERMISSIONS status logic is unchanged — it still renders correctly from the single recorded denial row.

Per-collector DB-scoped fallback queries (e.g. sys.dm_db_resource_stats for memory_stats) were considered but deliberately out of scope — their semantics differ from the boxed DMVs and each is its own decision.

Test plan

  • Build: dotnet build Lite/PerformanceMonitorLite.csproj -c Debug — 0 errors
  • Boxed SQL Server smoke test: no behavioural change expected (no permission errors normally)
  • D365FO confirmation (deferred to @TrudAX once the nightly lands): collectors should hit denial once then stop; Collection Health shows NO_PERMISSIONS instead of 10 runs, 0 success growing unbounded

🤖 Generated with Claude Code

The collector loop already classifies SQL errors 229 / 297 / 300 as
PERMISSIONS status and excludes them from the failure rate, but it
keeps re-running the collector every interval and logging an
identical denial each time. For DB-scoped logins on Azure SQL DB
(e.g. D365FO) this churns the collection log and gives no new
information — the permission won't change mid-session.

Flag the collector on first denial and short-circuit RunCollectorAsync
so we don't make the round-trip or the log entry. Flag is in-memory
per (server, collector) — cleared on app restart so newly granted
permissions are picked up on the next launch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@erikdarlingdata erikdarlingdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this does

Follow-up to #857. Adds an in-memory IsPermissionRestricted flag on CollectorHealthEntry, set in RecordCollectorResult when a collector hits SQL error 229/297/300, and short-circuits RunCollectorAsync on the next scheduled run so we stop churning identical denials into the collection log. Scope is correctly limited to Lite (issue originated with D365FO on Azure SQL).

Good

  • Lite-only change, base branch dev — correct per repo conventions.
  • Flag is read/written under the existing _healthLock; no new lock and no lock-ordering risk (IsCollectorPermissionRestricted takes a single lock and returns).
  • Decision to leave per-collector DB-scoped fallback queries out of scope is the right call — those have different semantics from the boxed DMVs and each deserves its own decision.
  • NO_PERMISSIONS render path in Lite/Services/LocalDataService.CollectionHealth.cs is unchanged — the single recorded denial row still drives the UI state correctly.
  • No PlanAnalyzer changes (N/A), no SQL changes (N/A), no workflow/signing changes (N/A), no schema upgrade concerns (N/A).

Needs attention

  1. Flag never clears mid-session. ClearHealthForServer is only wired to server removal, not server edit (Lite/MainWindow.xaml.cs:1118). Fixing perms or swapping in a higher-privilege login via the Edit dialog won't recover the collector until app restart. See inline on line 50.
  2. SUCCESS branch doesn't reset the flag. RecordCollectorResult latches IsPermissionRestricted = true but never sets it back to false. Self-healing would be one line. See inline on line 242.
  3. No test coverage added. The PR description explicitly defers smoke testing to the nightly + @TrudAX; unit tests on the flag state machine are the only thing catching regressions in the meantime. See inline on line 265.
  4. Minor — skip check sits downstream of several cheap gates in RunCollectorAsync; could live in the scheduler instead. Not a blocker, just noted for future cleanup. See inline on line 415.

No blockers. (1) and (2) together would close the "user fixes perms without restart" gap cleanly.


Generated by Claude Code

* identical denials every interval. Cleared on app restart — if
* permissions get granted later, the next launch retries once.
*/
public bool IsPermissionRestricted { get; set; }
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flag is never cleared mid-session. ClearHealthForServer is only invoked from ServerContextMenu_Remove_Click (Lite/MainWindow.xaml.cs:1144); the Edit-server success path at Lite/MainWindow.xaml.cs:1124 just calls RefreshServerList(). So if a user hits a permission denial, fixes perms on the SQL side (or edits the server entry to use a higher-privilege login) without restarting the app, the collector stays skipped until restart even though the denial is no longer valid.

Cheap fix: call _collectorService?.ClearHealthForServer(...) in the dialog.ShowDialog() == true branch of ServerContextMenu_Edit_Click, or expose a narrower ClearPermissionFlagsForServer that only resets IsPermissionRestricted so success/error counts aren't lost.


Generated by Claude Code

the rest of the app session. */
entry.LastErrorTime = DateTime.UtcNow;
entry.LastErrorMessage = errorMessage;
entry.IsPermissionRestricted = true;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsPermissionRestricted is latched to true but never flipped back to false. If RecordCollectorResult is later called with status SUCCESS (e.g. after the Edit-server fix suggested above, or if perms get granted and the flag is cleared by some future path), the success branch at line 227 doesn't reset this flag. Worth resetting entry.IsPermissionRestricted = false; in the SUCCESS branch so the state is self-healing even without an explicit clear call.


Generated by Claude Code

_logger?.LogDebug("Skipping collector '{Collector}' for server '{Server}' - permission denied this session",
collectorName, server.DisplayName);
return;
}
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: the skip happens after GetConnectionStatus, IsCollectorSupported, and the MFA gate — all cheap, but it means every skipped collector still runs three dictionary lookups and a method call before bailing. Not worth restructuring, but if RunAllCollectorsForServerAsync ever gets called on a server where most collectors are permission-restricted, the cumulative cost is noticeable. Consider hoisting the restricted check into the scheduler (GetDueCollectorsForServer) so denied collectors aren't even enumerated. Not a blocker.


Generated by Claude Code

return _collectorHealth.TryGetValue((serverId, collectorName), out var entry)
&& entry.IsPermissionRestricted;
}
}
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test coverage in Lite.Tests/. The behavior here is pure state logic and easily unit-testable: instantiate RemoteCollectorService, call RecordCollectorResult(..., "PERMISSIONS", ...) via a test-only accessor (or refactor the entry dictionary behind a small testable surface), then assert IsCollectorPermissionRestricted returns true and that a subsequent SUCCESS resets it (see the other comment). Worth adding alongside FactCollectorTests.cs or as a new CollectorHealthTests.cs — the D365FO smoke test in the PR description is deferred, so unit coverage is the only thing that catches a regression before the next nightly.


Generated by Claude Code

@erikdarlingdata erikdarlingdata merged commit 4cebae2 into dev Apr 20, 2026
3 checks passed
@erikdarlingdata erikdarlingdata deleted the fix/857-skip-permission-denied-collectors branch April 20, 2026 16:27
erikdarlingdata added a commit that referenced this pull request Apr 21, 2026
)

Azure SQL Database DBs hosted in an elastic pool (notably D365FO
customer tenants) enforce VIEW SERVER PERFORMANCE STATE on
sys.dm_os_schedulers regardless of the login's DB-scoped grants —
VIEW DATABASE STATE + VIEW DATABASE PERFORMANCE STATE on the user DB
are not sufficient. Verified by reproducing the failure in a
standard Azure SQL DB elastic pool with a contained DB user; bare
sys.dm_exec_requests/sys.dm_os_sys_info/sys.dm_os_performance_counters
succeed but sys.dm_os_memory_clerks / sys.dm_os_schedulers /
sys.dm_os_waiting_tasks fail with error 300.

The other failing collectors (memory_clerks, waiting_tasks,
tempdb_stats) have no DB-scoped alternative and will stay skip-gated
via #870 for these users.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erikdarlingdata added a commit that referenced this pull request Apr 22, 2026
* Implements #843 in Lite

* Implements #843 for Full Dashboard

* Add trailing newlines to ScrollPanBehavior files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Harden DuckDB queries: parameterize values, escape paths, fix IsArchiving race

Addresses security findings from #840:
- #846: Escape single quotes in file paths interpolated into read_parquet() and COPY TO
- #847: Use DuckDB $1 parameters for DateTime values instead of string interpolation
- #849: Make IsArchiving volatile-backed to prevent stale reads across threads

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Encrypt webhook URLs with DPAPI via Windows Credential Manager

Moves Teams and Slack webhook URLs from plaintext settings.json/preferences.json
to Windows Credential Manager (DPAPI-encrypted), matching the existing pattern
used for SMTP passwords and SQL Server credentials.

Includes automatic migration: on first settings load, any plaintext URLs are
moved to Credential Manager and removed from the JSON file.

Closes #848

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Lazy-load server tabs: only load visible tab on open, full-load on first visit

Initial tab open and Refresh button now only load the currently visible tab.
First switch to any tab triggers a full refresh of that tab (all sub-tabs).
Subsequent refreshes only hit the active sub-tab.

Ctrl+Click on Refresh Tab (or Ctrl+F5) refreshes all tabs at once.
Apply to All Tabs retains existing full-refresh behavior.

Fixes #835 — prevents heavy queries (e.g. GetQueryStatsAsync) from running
on tab open when the user is only viewing Overview.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Cap query/procedure/query store grid results to TOP 500

GetQueryStatsAsync, GetProcedureStatsAsync, and GetQueryStoreDataAsync
were returning unbounded result sets. With 49 databases and 742K rows
in query_stats over 3 days, the GROUP BY with plan XML could produce
thousands of rows and timeout after 120 seconds.

TOP 500 ordered by avg CPU desc is plenty for a grid view and prevents
the query from consuming unbounded memory on large installations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove pointless WAITFOR DECOMPRESS filters from stats/store queries

The CAST(DECOMPRESS(...)) NOT LIKE N'WAITFOR%' filter was decompressing
query text on every row in query_stats and query_store_data just to skip
WAITFOR queries. WAITFOR has no plan and no meaningful stats — it only
matters in query snapshots (active sessions), where the filter remains.

On a 742K-row query_stats table, this was a significant contributor to
the 120-second query timeouts reported in #835.

The snapshot filters (report.query_snapshots) and MCP phased queries
are untouched — they filter after TOP on already-hydrated text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Refactor query/procedure/query store stats to phased DECOMPRESS approach

All three grid queries now use a 3-phase pattern:
1. Aggregate numerics into temp table (no DECOMPRESS)
2. Sum across lifetimes, rank TOP 500
3. OUTER APPLY to decompress text/plan for only the 500 winners

On a 742K-row query_stats table, this reduces DECOMPRESS calls from
742K to 500 — eliminating the 16+ minute query times reported in #835.

Matches the existing phased pattern used by the MCP query tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix FinOps TDE recommendation on SQL Server 2019+ (#854)

TDE moved to Standard Edition in SQL 2019, so dm_db_persisted_sku_features
no longer reports it as Enterprise-only. Add version check to give
version-appropriate licensing guidance instead of falsely claiming no
databases use TDE.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Sync PlanAnalyzer and BenefitScorer from PerformanceStudio (Apr 9-16)

Port PS PRs #216, #217, #219, #224, #229, #230, #231 to PM.

PlanAnalyzer changes:
- Rule 5: Suppress for Key Lookups (point lookups mislead per-execution estimates)
- Rule 8: Enhanced parallel skew with batch mode sort detection and practical context
- Rule 9: Large memory grant shows top 3 consumers sorted by row count
- Rule 10: Key lookup overhaul — show output columns, check predicate filtering, softer advice
- Rules 11/12/29: Suppress on 0-execution nodes (operator never ran)
- Rule 11: I/O wait severity elevation when scan hits disk
- Rule 24: FormatNodeRef helper includes object name for data access operators
- Rule 26: Suppress when row goal prediction was correct, specific cause detection
- Wait stats: DescribeWaitType with full wait type coverage, multi-wait summary
- New helpers: GetWaitLabel, HasSignificantIoWaits, IdentifyRowGoalCause, FormatNodeRef
- GetOperatorOwnElapsedMs changed to internal for BenefitScorer access

BenefitScorer (new file):
- Stage 1: MaxBenefitPercent for operator-level rules (filter, spill, lookup, etc.)
- Stage 2: Wait stats benefit scoring with parallel allocation (Joe's formula)

PlanModels additions:
- MaxBenefitPercent and ActionableFix on PlanWarning
- WaitBenefit class and WaitBenefits list on PlanStatement

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fall back to single-database mode when Azure master is inaccessible (#857)

On Azure SQL DB, some logins (e.g. Microsoft Dynamics 365 FO) are granted
access only to a specific user database and not to master. The three
collectors that enumerate databases via master — query_stats,
database_size_stats, file_io_stats — would fail the first time and
produce an empty screen.

GetAzureDatabaseListAsync now catches known access-denied/login-failed
errors from the master connection, caches the per-server decision, and
returns the connection's InitialCatalog as a single-element list. The
three callers already loop per-database, so single-DB mode works without
further changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add nonclustered indexes for query/procedure/query store lookups

Phase 3 OUTER APPLY hydration of compressed query_text/plan_text was forcing
an Eager Index Spool over the full collect.query_stats table (and similar
for procedure_stats / query_store_data), which took 104 seconds on a
742K-row table in #835.

Changes:
- Remove CONVERT(binary(8), nvarchar-hash, 1) anti-pattern from OUTER APPLY
  WHERE clauses by keeping query_hash as native binary(8) in temp tables.
  query_hash is only converted to nvarchar(20) in the final output projection.
- Add three nonclustered indexes (install script and upgrade script):
    IX_query_stats_hash_lookup (query_hash, database_name, collection_time DESC)
    IX_procedure_stats_name_lookup (database_name, schema_name, object_name, collection_time DESC)
    IX_query_store_data_id_lookup (database_name, query_id, collection_time DESC)
- Indexes use SORT_IN_TEMPDB = ON and DATA_COMPRESSION = PAGE.
- ONLINE = ON is applied conditionally via dynamic SQL based on
  SERVERPROPERTY('EngineEdition') — Enterprise/Developer/Azure only, since
  Standard/Web/Express don't support online index operations.

Tested against CADelete's 742K-row table: Phase 3 went from 104s to
well under 1s (5s total for the full three-phase query).

Fixes #835

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Scope query snapshots to current database on Azure SQL DB (#857)

On Azure SQL Database, logins without access to master can't resolve
cross-database rows returned by sys.dm_exec_requests, which caused the
Live Snapshot button and the query snapshots collector to error in
D365FO-style environments (reported by @TrudAX in #857 after PR #858).

BuildQuerySnapshotsQuery now takes an isAzureSqlDatabase flag and emits
AND der.database_id = DB_ID() only when true. Boxed SQL Server, MI, and
elastic pool behavior is unchanged. The Live Snapshot button path gets
the flag through a new ServerTab constructor parameter wired from the
cached ServerConnectionStatus.SqlEngineEdition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Polish Lite chart axes and sub-tab styling

- Chart X-axis prints the date line only on the first tick and on ticks
  where the date changes; all other ticks show time only. Format respects
  current culture (en-GB → dd/MM, de-DE → dd.MM, 24h clocks, etc.).
  Implemented as a DateTimeTicksBottomDateChange() extension in
  Lite/Helpers/AxesExtensions.cs and applied to every DateTimeTicksBottom
  call site in ServerTab and CorrelatedTimelineLanesControl.
- Server name no longer duplicated in the ServerTab header status line;
  ConnectionStatusText now shows just "Connecting..." / "Last refresh: ...".
- Chart tick label font bumped from 12 to 13 for readability.
- New SubTabItemStyle (thin accent underline, transparent background) in
  all three themes, applied to Queries / Memory / File I/O / Blocking /
  Perfmon / Running Jobs sub-TabControls so sub-tab selection no longer
  looks identical to main-tab selection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Port Lite chart/tab polish to Dashboard + LSP diagnostics cleanup

Dashboard polish (ports the same items merged to Lite in #862):
- New Dashboard/Helpers/AxesExtensions.cs with DateTimeTicksBottomDateChange(),
  culture-aware (dd/MM for en-GB, dd.MM for de-DE, 24h clocks, etc.). All 52
  call sites of DateTimeTicksBottom() across 10 files swapped to use it.
- TabHelpers.ApplyTheme + ReapplyAxisColors bump chart tick label font from
  12 to 13 so numbers read cleaner on wide charts.
- SubTabItemStyle added to Dark / Light / CoolBreeze themes: thin accent
  underline + transparent background instead of filled cyan, so sub-tabs
  don't look identical to main tabs when selected. Wired via
  ItemContainerStyle on 11 sub-TabControls (Overview's inner tabs,
  Collection Health's inner tabs, Locking, ConfigChanges, CurrentConfig,
  FinOps, Memory, ResourceMetrics ×2, SystemEvents, QueryPerformance).

LSP diagnostics cleanup (tracked work from chore/lsp-diagnostics-cleanup):
- Small nullability/warning fixes across Dashboard and Lite services,
  analysis helpers, and BenefitScorer / PlanAnalyzer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Overview crosshair disappearing after tab switches / layout passes

Root cause: the control wired `Unloaded += ...Dispose()` on the crosshair
manager, and WPF fires Unloaded for transient reasons (tab virtualization,
layout rebuilds, etc.), not just when the control is actually going away.
Dispose() clears the manager's lane list, after which ReattachVLines runs
over an empty list and the crosshair is gone permanently.

Changes:
- Remove the Unloaded → Dispose() handler in both Lite and Dashboard copies.
  The manager holds only managed state (a Popup + lane references) — GC
  will clean it up with the control.
- Remove the now-redundant `_isRefreshing` flag from CorrelatedCrosshairManager.
  The `lane.VLine == null` check in OnMouseMove is a sufficient "not ready"
  guard and is self-healing once VLines are recreated.
- Wrap ReattachVLines in a try/finally on the control side, with a new
  idempotent EnsureVLinesAttached() safety net that only creates VLines
  for lanes where they're still null.
- Make CreateVLine catch per-lane exceptions so one failing chart can't
  prevent the others from recovering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Memory Pressure Events chart filter; add MCP interpretation (#865)

Chart previously filtered to HIGH severity only (indicator>=3), which on most
servers never fires, producing an empty chart even when sp_pressuredetector-
level medium pressure (indicator=2) was occurring constantly. Switch to stacked
bars per hour, split by SQL Server (process) vs Operating System (system), with
severe events capped on top of medium in a darker shade. Extend ChartHoverHelper
to support BarPlot tooltips. Add MCP guidance for interpreting indicator values
and routing to the right follow-up tool.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Port Memory Pressure Events feature to Lite (#865)

Lite was missing the RING_BUFFER_RESOURCE_MONITOR collector entirely — no
collector, no table, no chart, no MCP tool. This adds the full feature:

- Schema: new memory_pressure_events table + index, schema v25, added to
  ArchivableTables, server-id-fix list, and ArchiveService.
- Collector: CollectMemoryPressureEventsAsync queries the ring buffer and
  client-side-dedupes against DuckDB's MAX(sample_time). Azure SQL DB returns
  zero rows (ring buffer not exposed there). Scheduled every 5 min (Aggressive
  and Balanced presets) or 15 min (Low-Impact).
- UI: new 'Memory Pressure Events' sub-tab on the Memory tab with the same
  stacked-bar chart as Dashboard (SQL Server medium/severe, Operating System
  medium/severe). Wired into full-load and sub-tab-switch refresh paths.
- Hover: ported the BarPlot support from Dashboard's ChartHoverHelper so bar
  tooltips work and report the correct segment height for stacked bars.
- MCP: new get_memory_pressure_events tool + the 'Interpreting Memory
  Pressure Events' guidance section in McpInstructions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Bump schema table count test to 30 for memory_pressure_events

Companion update to the new memory_pressure_events table added in this PR.
SchemaStatements_MatchTableCount asserts the total table count; needs to
move from 29 to 30 to reflect the new table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix blocked process report plan lookup (#867) (#868)

Right-click > View Plan on a Blocked Process Reports row silently fell
through (no handler case) and Get Actual Plan erred with "no query text."

- Split the grid onto its own BlockedProcessContextMenu with separate
  View Blocked Plan / View Blocking Plan actions; drop Get Actual Plan
  (re-executing a mid-transaction blocked query is a foot-gun).
- Parse all <frame> entries from the BPR XML's executionStack, filter
  the 42-byte all-zero sql_handle placeholder (dynamic SQL / system
  context), default stmtstart=0 / stmtend=-1 per the dm_exec_text_query_plan
  convention. Matches sp_HumanEventsBlockViewer's XPath and join shape.
- Add FetchPlanBySqlHandleAsync keyed on sql_handle + statement offsets
  against sys.dm_exec_query_stats. Caller iterates frames until one
  resolves; falls back to a clear "plan no longer in cache" message.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Pre-filter query snapshot requests into #temp on Azure SQL DB (#857) (#869)

Follow-up to #861. The DB_ID() predicate in the WHERE clause wasn't
enough — the OUTER APPLYs to sys.dm_exec_sql_text and
sys.dm_exec_text_query_plan were still being evaluated against
master-scoped rows from sys.dm_exec_requests before the filter was
applied, tripping VIEW SERVER PERFORMANCE STATE errors for DB-scoped
logins (D365FO). A CTE or derived table wouldn't guarantee the
filter order, so materialise the filtered request rows into #req
first and drive the DMFs off that.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Stop retrying collectors after non-transient permission denial (#857) (#870)

The collector loop already classifies SQL errors 229 / 297 / 300 as
PERMISSIONS status and excludes them from the failure rate, but it
keeps re-running the collector every interval and logging an
identical denial each time. For DB-scoped logins on Azure SQL DB
(e.g. D365FO) this churns the collection log and gives no new
information — the permission won't change mid-session.

Flag the collector on first denial and short-circuit RunCollectorAsync
so we don't make the round-trip or the log entry. Flag is in-memory
per (server, collector) — cleared on app restart so newly granted
permissions are picked up on the next launch.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Skip live query plans on Azure SQL DB (#857) (#871)

sys.dm_exec_query_statistics_xml requires VIEW SERVER PERFORMANCE
STATE on Azure SQL Database regardless of scope, so DB-scoped logins
(e.g. D365FO) still hit error 300 even after the #temp pre-filter
landed in #869. The OUTER APPLY evaluates the DMF for every session
in #req and fails on the permission check before returning rows.

Force supportsLiveQueryPlan=false for SqlEngineEdition=5 in both the
collector and the Live Snapshot button paths. Boxed SQL Server and
Azure MI (edition 8) still get live plans as before.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix FinOps recommendation severity sort order (#872)

Sort recommendations by severity rank (High=1, Medium=2, Low=3)
instead of alphabetically. Adds SeveritySort property to
RecommendationRow and uses it as SortMemberPath for the Severity
column. Display strings are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix FinOps severity sort order in Dashboard (#872)

Severity column was sorting alphabetically (High, Low, Medium) instead
of by severity ranking. Added SeveritySort computed property on
FinOpsRecommendation, ordered results by it, and wired the DataGrid
column's SortMemberPath so click-sort matches the default order.

Mirrors the Lite fix in PR #874.

* Drop sys.dm_os_schedulers from memory_stats on Azure SQL DB (#857) (#876)

Azure SQL Database DBs hosted in an elastic pool (notably D365FO
customer tenants) enforce VIEW SERVER PERFORMANCE STATE on
sys.dm_os_schedulers regardless of the login's DB-scoped grants —
VIEW DATABASE STATE + VIEW DATABASE PERFORMANCE STATE on the user DB
are not sufficient. Verified by reproducing the failure in a
standard Azure SQL DB elastic pool with a contained DB user; bare
sys.dm_exec_requests/sys.dm_os_sys_info/sys.dm_os_performance_counters
succeed but sys.dm_os_memory_clerks / sys.dm_os_schedulers /
sys.dm_os_waiting_tasks fail with error 300.

The other failing collectors (memory_clerks, waiting_tasks,
tempdb_stats) have no DB-scoped alternative and will stay skip-gated
via #870 for these users.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Release v2.8.0: version bumps and changelog (#877)

Adds nonclustered indexes to collect.query_stats, procedure_stats, and
query_store_data for Dashboard grid lookups (#835). Ports Memory Pressure
Events to Lite (#865). Multiple Azure SQL DB collector fixes (#857).
FinOps severity sort order fix (#872). Grid auto-scrolling (#843).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Scope v2.8.0 webhook DPAPI note to Dashboard (#879)

Lite webhook URLs still read from plaintext settings — avoids implying
the security hardening shipped to both editions.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: ClaudioESSilva <claudiosil100@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant