hardening: pre-tenant readiness pass by giftedloser · Pull Request #8 · giftedloser/Runway

giftedloser · 2026-05-02T21:53:05Z

Summary

Pre-tenant readiness audit + targeted hardening pass. No new product
features, no version bump, no Settings IA changes, no Graph scope
removals. All findings either fixed under tight scope or marked PASS
after verification.

Findings fixed

Security

Gate POST /api/rules/preview behind requireDelegatedAuth so a
same-host process can't fan out predicates against synced tenant
data (rules.ts). Focused tests in
rules-preview-gate.api.test.ts.
Gate GET /api/settings/tag-config behind requireDelegatedAuth
for consistency with the mutation siblings — verified the client
reads via the parent /api/settings endpoint, so no flow regresses
(settings.ts). Existing test
updated to assert 401.
Desktop-token compare switched to crypto.timingSafeEqual with
explicit length-mismatch and missing-header guards
(local-access.ts). Three new
cases in local-access.api.test.ts.
Pino redact paths extended to cover request body credential
fields and the idempotency-key request header
(logger.ts). Bodies aren't logged today;
this is defence in depth.

Destructive UX

Typed-confirmation fallback uses
serial -> deviceName -> intuneId -> autopilotId. Devices missing
all four refuse to render the destructive flow, with an explanatory
toast. Replaces a silent fallback to the literal string CONFIRM
(ActionsToolbar.tsx).
At-rest danger affordance on destructive remote-action buttons:
subtle critical-color left border, critical-tinted icon, explicit
focus-visible outline matching severity, and aria-label that
includes "destructive" for screen readers.

Performance

Migration 015 adds idx_action_log_triggered_at ON action_log(triggered_at DESC)
so the audit-log CSV/NDJSON export at
GET /api/actions/logs/export no longer falls back to a sequential
scan as action history grows.

Scale tooling

npm run db:seed:scale generates ~4,500 synthetic devices
across ~30 properties with realistic problem distribution and
prints rough query timings. Dev-only; never invoked by tests/CI.
Sample run on this branch:
- Seed total: 278 ms
- Dashboard query: 37 ms (4500 devices, 7 failure patterns)
- Devices page 1 (25): 2.9 ms
- Tags inventory (31 tags): 2.7 ms
- Provisioning lookup (146 devices): 0.8 ms
- Device detail: 0.3 ms
- computeAllDeviceStates full run: 184 ms

Findings verified PASS (no change)

Properties sidebar overflow — the existing outer <nav> already
has lg:overflow-y-auto and the footer is a separate sibling
outside the nav, so 30+ properties scroll cleanly inside the nav and
the footer stays pinned. No inner cap needed.
CSV exports (server + client) — both sides already escape
formula-injection prefixes (=, +, -, @) and quote
newlines/quotes. Filenames include the date.
Idempotency keys — minted fresh per click in
useActions.ts, 24h replay window
on the server with scope-hash validation.
Effect cleanup sweep — EntityPicker (AbortController), HelpTooltip
(event listener), SyncStatusPill (state-only), FirstRunBanner (SSR-safe
storage guards), LapsWidget/BitLockerWidget (auto-rehide timers),
ConfirmDialog/BulkActionConfirm/Tags drawer (keydown listeners) all
clean up on unmount. No leaks found.
Local-storage namespacing — all keys under pilotcheck.pref.*
or runway.*. No secrets/tokens stored.
Graph timeout/retry — 15s AbortController, 3-retry on 429/5xx,
Retry-After clamped to 1-60s. Partial sync preserves prior good
data.
Tauri shell — restrictive CSP, embedded updater pubkey, narrow
window-control capability, in-memory desktop token.

Findings intentionally deferred

Per-resource sync error tracking — current sync_log granularity
is per-sync with NDJSON errors array. Surfacing per-resource
status on the Sync page is a UI improvement that's not blocking
tenant testing.
Confirmation modal with Intune/Autopilot ID alongside name+serial
— current modal already shows device name + serial; adding portal
IDs is a nice-to-have.
Removing User.ReadBasic.All etc. — verified all three scopes
the initial audit flagged are actually used (EntityPicker user
search, group-actions writes, autopilot-import). Documented as
"Optional" by feature in the README/security-report tables instead.

Permissions docs

README.md and docs/security-report.md now show an Optional?
column on the delegated permission table so a security owner can grant
only the scopes for the features they'll use. Application scopes are
all required.

Live-tenant dry-run runbook

New docs/live-tenant-dry-run.md walks
through prerequisites, first sync, read-only verification, first safe
action (sync one known device), portal deep-link verification,
EntityPicker dry run, LAPS/BitLocker on a lab device, deferred
destructive list, log inspection, disconnect path, and what to capture.

Tests added

test/server/rules-preview-gate.api.test.ts — 401 unauthenticated /
200 after delegated sign-in, including the loopback Origin header
the real client sets.
test/server/local-access.api.test.ts — three new cases: empty
desktop token, length mismatch, same-length wrong-content token.

Existing test updated:

test/server/provisioning-groups.api.test.ts — GET tag-config now
expects 401, matching the new gate.

Verification

npm run lint — clean
npm run typecheck — clean
npm run test — 324 passed / 32 files
npm run test:e2e — 30 passed / 4 files
npm run build — client (~430 KB) + server (271 KB), clean
git diff --check — clean
package.json version unchanged (1.6.0)

Risks

Scope of destructive action visual change — the new at-rest
danger affordance is subtle (left border + tinted icon). Visually
reviewing on each theme would help but isn't required for the gate.
Tag-config gate — closes a loophole the client never used; if an
external automation script relied on the unauthenticated GET, it
will now need a delegated session.
All other changes are additive (tests, docs, scale seed) or
defence-in-depth (token compare, redact paths, action_log index).

Verdict

READY FOR REVIEW. Suggest squash-merge once a reviewer scans
the auth/destructive-UX commits and the live-tenant dry-run doc.

- Gate POST /api/rules/preview behind requireDelegatedAuth so rule authors must be signed in to evaluate predicates against the live device snapshot. New rules-preview-gate.api.test.ts proves the 401 / 200 transitions across the auth boundary. - Gate GET /api/settings/tag-config behind requireDelegatedAuth for consistency with the mutation siblings. The Tags view continues to read mappings via the parent /api/settings endpoint, so no client flow regresses; provisioning-groups.api.test.ts updated to assert the new 401. - Switch the desktop-token compare to crypto.timingSafeEqual with explicit length-mismatch and missing-header guards. Adds three focused cases to local-access.api.test.ts (empty token, length mismatch, same-length wrong content). - Extend pino redact paths to cover request body credential fields (clientSecret, azureClientSecret, password, token, accessToken, idToken, refreshToken, session secrets, certificate thumbprints) plus the idempotency-key request header. Defence in depth — bodies aren't logged today.

The /api/actions/logs/export route reads up to 100k rows ordered by triggered_at DESC. Without a covering index that's a sequential scan; adding idx_action_log_triggered_at keeps the export snappy as action history grows past the first tenant soak.

…dance - Build the typed-confirmation token from serial -> deviceName -> intuneId -> autopilotId. Devices missing all four refuse to render the destructive flow; the previous behaviour silently fell back to the literal string "CONFIRM", which let an admin confirm a destructive action against an unidentifiable record. An explanatory toast surfaces the missing-identifier case. - Destructive remote-action buttons now carry a subtle critical-color left border at rest and a critical-tinted icon, so an admin can read the danger affordance without hovering. Hover and confirmation behaviour unchanged. - Add an explicit focus-visible outline that matches severity (critical for destructive, accent for safe), and an aria-label that includes the "destructive" suffix for screen readers and keyboard navigation safety.

Adds npm run db:seed:scale which builds a synthetic SnapshotPayload of ~4,500 devices spread across ~30 properties with realistic problem distribution (no_profile, not_in_target_group, user_mismatch, tag_mismatch, stalled, compliance_drift, orphan_autopilot, no_autopilot, hybrid_risk). Reuses the production persistSnapshot + computeAllDeviceStates helpers and prints rough query timings: • build payload ~18ms • persist snapshot ~75ms • compute all device states ~184ms • dashboard query 37ms (4500 devices) • devices page 1 (25) 2.9ms • tags inventory (31 tags) 2.7ms • provisioning lookup (146 devices) 0.8ms • device detail 0.3ms • recompute (second pass) ~186ms Indicative timings only — never invoked by the test suite or CI. The script wipes whatever DATABASE_PATH points at, so use a dedicated DATABASE_PATH=./data/scale-smoke.sqlite when running it.

- New docs/live-tenant-dry-run.md: a step-by-step runbook for the first time Runway is pointed at a real tenant. Read-and-look-only for the first soak; destructive actions explicitly deferred. Pairs with the existing live-testing-checklist.md preflight rather than replacing it. - README + docs/security-report.md permission tables now have an Optional? column so a security owner can grant only the delegated scopes for the features they intend to use during the pilot. The five application scopes remain non-optional. - CHANGELOG [Unreleased] entries under Security, Performance, Changed, Added, and Documentation describing the readiness pass.

- Revert gratuitous export on hasDesktopToken; nothing else imports it. - Drop req.headers["idempotency-key"] from the redact list. Idempotency keys are per-click UUIDs (not secrets) and redacting them costs debug value when chasing duplicate-dispatch behaviour. - Soften the "clear permission error" claim in the README + docs/security-report.md permission tables. Action handlers surface the Graph status code (e.g. 403), not a translated permission string. Updated copy points operators at Action Audit and explains that 403 distinguishes "missing Graph scope" from "missing Entra/Intune role". - Add a defensive DATABASE_PATH guard to seed-scale.ts: refuses to run when DATABASE_PATH is unset or points at the default pilotcheck.sqlite, so a fat-finger can't wipe the user's primary database. Verified that the guard trips on the default and lets ./data/scale-smoke.sqlite through. Skipped my own self-review item about a device_state_history composite index — the existing PRIMARY KEY (device_key, computed_at) is already a covering index for the trend query at devices.ts:669. A separate index would be redundant.

giftedloser added 6 commits May 2, 2026 15:51

giftedloser marked this pull request as ready for review May 3, 2026 03:18

giftedloser merged commit f772c55 into main May 3, 2026
1 check passed

giftedloser deleted the codex/pre-tenant-readiness branch May 3, 2026 03:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hardening: pre-tenant readiness pass#8

hardening: pre-tenant readiness pass#8
giftedloser merged 6 commits into
mainfrom
codex/pre-tenant-readiness

giftedloser commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

giftedloser commented May 2, 2026

Summary

Findings fixed

Security

Destructive UX

Performance

Scale tooling

Findings verified PASS (no change)

Findings intentionally deferred

Permissions docs

Live-tenant dry-run runbook

Tests added

Verification

Risks

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant