Skip to content

B3 PR-A: approval-gated Apply Fix — privileged execution core (no UI)#1032

Merged
erikdarlingdata merged 4 commits into
devfrom
feature/b3-pr-a-apply-fix-core
May 31, 2026
Merged

B3 PR-A: approval-gated Apply Fix — privileged execution core (no UI)#1032
erikdarlingdata merged 4 commits into
devfrom
feature/b3-pr-a-apply-fix-core

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

@erikdarlingdata erikdarlingdata commented May 30, 2026

PR-A — Approval-gated "Apply Fix": privileged execution core (no UI)

PR-A of B3: the security-critical core that mutates a monitored, likely-production SQL Server via sp_query_store_force_plan. No UI, no MCP exposure, no mutating caller — PR-B adds the gated Apply button over this already-reviewed core. Do not merge until the security-reviewer pass on this code.

Plan (spec): C:\Users\edarl\.claude\plans\b3-phase1-implementation.md. Baseline origin/dev b404c76. Three commits: core, gate-split, real-server fixes.

The six security invariants and how each is enforced

  1. Structured execution only. sp_query_store_force_plan/unforce_plan is issued with typed @query_id/@plan_id as SqlDbType.BigInt; never the rendered SQL text. database is applied only as InitialCatalog, built solely through SqlConnectionStringBuilder (never concatenated). No general Execute(sql) exists. → DatabaseService.Remediation.cs.
  2. Single-connection self-gating (R2-MOD-1). ForcePlanAsync/UnforcePlanAsync open one retargeted SqlConnection and run, on that same open connection with no re-open: identity/permission gate (DB_NAME()==target assert (A5) + DB-scoped ALTER check), then the Query Store freshness read, then — only if all pass — the EXEC. The outcome carries the @@SPID at the gate read and at the EXEC; the real-server harness asserts they are equal (observed 72==72). ApplyAsync takes no preflight disposition and re-derives its own gate — unbypassable.
  3. No elevation. Runs under the existing per-server monitoring connection. No credential prompt, no SecureString/SqlCredential, no elevated path. has_alter=0 fails closed with map-then-grant guidance. used_elevated_cred always 0.
  4. Audit-table-absent = HARD BLOCK (R2-MOD-2). OBJECT_ID('config.remediation_action_log') IS NULL (monitoring connection) blocks every target with no mutation attempted.
  5. Audited apply + unapply. One config.remediation_action_log row per attempt (success/skip/error/abort), after both apply and unapply, on the monitoring connection.
  6. Never automatic. PR-A has no caller that triggers a mutation (A4 no-caller test). PR-B adds the gated UI.

Real-server findings — two bugs the faked-executor unit tests could not reach

Caught by running the actual DatabaseServiceRemediationExecutor + ForcePlanHandler against sql2022 (the faked IRemediationExecutor can't exercise live T-SQL):

  • ALTER-permission form (security-gate correctness). The plan's O5 specified HAS_PERMS_BY_NAME(NULL, NULL, 'ALTER'), but that server-scoped form returns NULL even for sysadmin (verified on sql2022) — so the gate would fail closed for every login and Apply could never run. Fixed to the DB-scoped form HAS_PERMS_BY_NAME(DB_NAME(), 'DATABASE', 'ALTER') (1 for ALTER-holders incl. sysadmin/db_owner, 0 otherwise). DB_NAME() keeps it correct after the catalog retarget.
  • Unforce delegation. UnforcePlanAsync delegated with isUnforce: false (would re-force on un-apply); corrected to isUnforce: true.
  • Gate split so has_alter=0 fails closed before any Query Store catalog read (a least-privilege login lacks VIEW DATABASE STATE and would otherwise error 297 reading sys.query_store_plan). Identity+ALTER use only always-accessible intrinsics; the QS read runs only after ALTER passes — on the same open connection (R2-MOD-1 intact).

Structured-params persistence

RemediationAction/ForcePlanTarget (.Analysis, incl. LatestCpuPerExecUs/BestCpuPerExecUs for render stability — M1). FactRemediation refactored to extract once (ExtractPlanRegressionTargets) + BuildAction; rendered preview byte-for-byte unchanged. Round-trips via an optional member on AlertDetailItem/AlertContextDto/AlertContextSerializer in the existing contextJson. Old contexts without it → null (no Apply, no crash).

Upgrade script

upgrades/2.11.0-to-2.12.0/02_create_remediation_action_log.sql (+ upgrade.txt), config schema, idempotent (OBJECT_ID guard). 2.12.0 is the in-flight version (tag v2.11.0, csprojs 2.11.0) so it appends to the existing folder. No install/*.sql edited.

Tests (unit)

  • Golden render-stability (byte-for-byte incl. the two (cpu/exec … us) lines).
  • Serializer round-trip (Dashboard + Lite); legacy JSON without the field → null.
  • Handler gate vs faked executor: has_alter=0 fail-closed; audit-table-absent block; already-forced/QS-off/stale/wrong-DB dispositions; per-target independence; gate re-derivation; applied-but-unlogged; per-outcome audit; un-apply restricted to prior B3 forces.
  • A4 no-caller: ForcePlanAsync/UnforcePlanAsync referenced only in the executor seam.
  • Suites green: Dashboard.Tests 68/0, Lite.Tests 310/0.

Real-server verification (sql2022, Query Store DB)

  • Installer @2.12.0 (temporary <Version>/<AssemblyVersion>/<FileVersion>/<InformationalVersion> bump in both installer csprojs — reverted, not in this PR; server is the bare hostname SQL2022): Existing installation detected: v2.11.0.0Found 1 upgrade(s) → applied 2.11.0-to-2.12.0 (02_create_remediation_action_log.sql - Success), RC=0 "Installation completed successfully". Verified: config.remediation_action_log PRESENT (15 cols), history recorded 2.12.0.0 : SUCCESS. Second run: No pending upgrades found (idempotent).
  • Executor/handler harness (throwaway, passed 1/1; not committed). Audit table ground-truth (queried via sqlcmd):
    • force/success, executing_login=sa, operator_identity=HARNESS\testeris_forced_plan=1.
    • R2-MOD-1: gate @@SPID == EXEC @@SPID (72 == 72).
    • unforce/success, executing_login=sais_forced_plan=0.
    • Absent-table → Blocked, no mutation.
    • Scoped login (b3_noalter, VIEW DATABASE STATE but no ALTER) → has_alter=False, PermissionDenied, fail closed, no mutation. (Preflight surfaced the login name correctly; the no-ALTER message login string had a harness-logging quirk — the audit table and preflight both record the correct login.)

Not UI-reachable

A4 no-caller test confirms nothing in the running app reaches the executor. DatabaseService.Remediation.* methods are internal, reachable only via DatabaseServiceRemediationExecutor.

Notes / deviations

  • Executor sets standard ANSI options (incl. ARITHABORT ON) before the gate/EXEC — sp_query_store_force_plan errors 1934 otherwise (Microsoft.Data.SqlClient defaults ARITHABORT OFF). Same open connection, so R2-MOD-1 holds.
  • RemediationIdentity carries an optional SourceAlertRef (audit traceback) — minor superset of the plan's { OperatorIdentity }.
  • Plan O5 was wrong about the HAS_PERMS_BY_NAME form (see above); corrected with real-server evidence.
  • All six invariants met; none could-not-be-met.

Do NOT merge — awaiting security-reviewer pass; PR-B follows.

🤖 Generated with Claude Code

erikdarlingdata and others added 4 commits May 30, 2026 19:37
The security-critical core for the "Apply Fix" feature: structured remediation
params, the audited force-plan execution path, and its self-gating handler.
No UI, no MCP exposure, no mutating caller — PR-B wires the gated UI later.

Shared libs:
- RemediationAction / ForcePlanTarget (PerformanceMonitor.Analysis): typed,
  data-only payload. FactRemediation refactored to extract once
  (ExtractPlanRegressionTargets) + BuildAction; the rendered preview is
  byte-for-byte unchanged (golden test).
- AlertContext round-trips RemediationAction in the existing contextJson; legacy
  contexts without it deserialize to null (no Apply). Populated in
  FindingMessageFormatter.BuildContext.

Dashboard execution core (dead code until PR-B):
- IRemediationExecutor seam + IRemediationHandler + RemediationHandlerRegistry +
  one ForcePlanHandler (PLAN_REGRESSION).
- DatabaseService.Remediation.cs: structured sp_query_store_force_plan /
  unforce with typed BigInt params; DB applied only as InitialCatalog via
  SqlConnectionStringBuilder. The authoritative gate (DB_NAME assert +
  HAS_PERMS_BY_NAME ALTER + freshness) and the EXEC run on ONE open connection
  (R2-MOD-1). No elevation, existing monitoring connection only; has_alter=0
  fails closed with grant guidance.
- Audit-table-absent is a HARD BLOCK before any mutation (R2-MOD-2). Every
  apply/unapply attempt writes config.remediation_action_log.

Schema: upgrades/2.11.0-to-2.12.0/02_create_remediation_action_log.sql (config
schema, idempotent). Coupled to the 2.12.0 schema upgrade.

Tests: golden render-stability, serializer round-trip + legacy-null, faked-
executor handler gate (fail-closed perms, absent-table block, freshness/skip,
per-target independence, gate re-derivation, applied-but-unlogged), un-apply
restriction, and an A4 no-caller guard.

Plan: C:\Users\edarl\.claude\plans\b3-phase1-implementation.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Real-server verification on sql2022 caught that a least-privilege login (no
ALTER, no VIEW DATABASE STATE) hit error 297 reading sys.query_store_plan /
sys.database_query_store_options inside the gate query — so the intended clean
"PermissionDenied + grant guidance" never surfaced (it came back as a wrong-DB
Blocked with a null gate row instead).

Fix: split the per-target gate into two reads on the SAME open connection
(R2-MOD-1 preserved):
  1. ReadGateIdentityAsync — DB_NAME / SUSER_SNAME / HAS_PERMS_BY_NAME / @@spid
     (always-accessible intrinsics). DB-match (A5) and ALTER are checked here.
  2. ReadQueryStoreStateAsync — qs_state / plan_present / is_forced /
     force_failure_count, run only after the ALTER check passes (and, in
     preflight, only when ALTER is held and the DB matches).

So has_alter=0 fails closed with the map-then-grant message without ever
touching Query Store catalog views the login can't read. Verified on sql2022:
NO-ALTER scoped login -> PermissionDenied (login surfaced), no mutation; force/
unforce/audit and the single-connection gate (gate SPID == exec SPID) still pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r findings)

Two bugs the faked-executor unit tests could not reach, caught by real-server
verification on sql2022:

1. HAS_PERMS_BY_NAME form. The plan's O5 specified
   HAS_PERMS_BY_NAME(NULL, NULL, 'ALTER') — but that server-scoped form returns
   NULL even for a sysadmin (verified on sql2022), so the gate would fail closed
   for EVERY login and Apply could never run. Switched to the DB-scoped form
   HAS_PERMS_BY_NAME(DB_NAME(), 'DATABASE', 'ALTER'), which returns 1 for a
   principal holding ALTER on the connected DB (sysadmin / db_owner / granted)
   and 0 otherwise — the permission sp_query_store_force_plan actually needs.
   DB_NAME() (not a literal) stays correct after the catalog retarget.

2. UnforcePlanAsync delegated to ForceOrUnforceAsync with isUnforce: false,
   so un-apply would re-force instead of unforce. Corrected to isUnforce: true.

Verified end-to-end on sql2022 (throwaway harness, not committed): force ->
is_forced_plan=1 + audit force/success (executing_login=sa); single connection
for gate+EXEC (gate @@spid == exec @@spid, e.g. 72==72); unforce ->
is_forced_plan=0 + audit unforce/success; audit-table-absent -> Blocked, no
mutation; scoped login without ALTER -> PermissionDenied, fail closed, no
mutation. config.remediation_action_log rows confirmed on the server.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eview LOW-1)

The PR-A no-caller guard greps only ForcePlanAsync/UnforcePlanAsync, but PR-B
reaches the privileged executor via the handler/registry types
(registry.TryGet(...).ApplyAsync()) without ever typing those method names — so
the guard that protects PR-A's "not UI-reachable" invariant would pass a PR-B
wiring silently. Extend the guard to the whole machinery
(RemediationHandlerRegistry / DatabaseServiceRemediationExecutor /
ForcePlanHandler / IRemediationExecutor / IRemediationHandler) so it actually
fires when a future surface wires it in. Still green today (nothing outside the
core references these).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 7432f45 into dev May 31, 2026
2 checks passed
@erikdarlingdata erikdarlingdata deleted the feature/b3-pr-a-apply-fix-core branch May 31, 2026 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant