Fix audit findings: 6 high + 5 medium + 4 low/info by jeremyfelt · Pull Request #40 · happyprime/reglance

jeremyfelt · 2026-06-03T20:04:39Z

Implements 16 findings from the multi-persona audit, as 12 atomic commits. Each fix has tests where it's unit-testable and was verified by npm test (62 tests) + npm run lint (0 errors).

Stacked on #39. Branched off task/tests-and-validation because this work builds on that PR's test harness and the compareSlug/padImage exports. Base is set to #39's branch so the diff here is only the audit fixes; merge #39 first and GitHub will retarget this to trunk. (The compare rework here also supersedes #39's compareSlug export usage.)

High

HIGH-001 — postinstall moved to a best-effort scripts/postinstall.mjs that honors PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD and never fails a consumer's install; capture now reports a missing browser actionably; playwright pinned. (The PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD in test.yml from Add config validation and a test suite #39 is now actually effective.)
HIGH-002 — capture records degraded pages instead of faking success; loud summary; opt-in --fail-on-degraded for a non-zero CI exit (D-004).
HIGH-003 — control reports "Promoted N of M", warns on partial promotion, writes controls/manifest.json; compare warns when the baseline mixes controls from multiple runs (D-004).
HIGH-004 — --concurrency=0/negative/8x are rejected up front instead of hanging the capture loop forever.
HIGH-005 / HIGH-006 — compare now runs across a worker_threads pool (off the main thread, across cores), bounded by a configurable --compare-concurrency as the memory cap, with the redundant pad copy removed.

Medium

MED-004 — unmatched --only key now errors (listing known keys) in all three commands instead of a misleading success.
MED-005 — width-mismatched slugs are reported as a large diff instead of being silently dropped.
MED-006 / LOW-013 — the three stacked capture waits are reconciled into one event-driven, configurable settle; autoScroll grows-until-stable; timeouts are configurable.
MED-007 — capture's fixed batches replaced with a queue-based worker pool; stagger applied once at worker startup.
MED-008 — HTML diff guarded by a 2MB size limit and built via array-join; runs in the compare worker.

Low / Info

LOW-002 — TLS validation now stays on for non-local hosts; ignored only for .test/localhost, or via --insecure (with a warning) (D-002).
LOW-012 — writable-output preflight + actionable write-error messages.
LOW-015 — control drops orphan HTML snapshots.
INFO-001 — capture warns when a path points off the configured domain (kept non-breaking per D-001; not blocked, since local dev legitimately uses loopback hosts).

New CLI / config surface

Flags: --fail-on-degraded, --insecure, --compare-concurrency=<n>.
Config: timeouts: { goto, settle }.
README and reglance.example.json updated.

Decisions applied

D-001 (config treated as untrusted), D-002 (TLS on for non-local), D-004 (permissive + loud + manifest). D-003 (npm publishing / issue #20) is not in this PR.

Deliberate scope notes

HIGH-006: a full dimension-aware memory budget was left out; the configurable concurrency cap + reduced allocation is the pragmatic bound.
INFO-001: off-domain hosts are warned, not blocked (blocking would break the tool's local-dev use of loopback/.test).

🤖 Generated with Claude Code

`--concurrency=0` previously flowed into the capture batching loop as the step size, so `i += 0` never advanced and capture hung forever with no error; negatives ran the loop backward to the same effect, and `parseInt` silently truncated `8x` to 8. Extract flag parsing into `src/args.mjs` as a testable `toInt` that uses `Number` (rejecting trailing garbage) and enforces a minimum — 1 for concurrency, 0 for stagger (which legitimately disables staggering). It throws an actionable error caught by the CLI's top-level handler instead of calling process.exit, which makes it unit-testable. Also clamp the batch step to a positive value as defense in depth. Fixes HIGH-004. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`capture` already guarded against a `--only` filter that matched nothing, but `control` and `compare` applied the same filter with no guard: a typo'd key (`reglance control blgo`) silently filtered the target list to empty and printed a success-looking summary ("Promoted 0 capture(s)" / "Nothing to compare"). Factor the filter into a shared `filterTargets` helper in config.mjs that throws — listing the known keys to help spot the typo — and use it in all three commands. capture's behavior is now consistent (a non-zero exit via the CLI's error handler instead of a swallowed exit 0). Fixes MED-004. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When the retry loop exhausted its attempts it set `success = true`, logged "continuing anyway", and screenshotted whatever was on screen — an error page, a blank page, a half-loaded page — with no signal. The empty `catch {}` discarded the navigation error, and `capture` printed "✅ Capture complete" and exited 0 regardless. A silently-captured error page then becomes a baseline via `control`, defeating the tool's core promise, with CI staying green. Per the D-004 decision (permissive + loud): - The retry loop now records the real error and pushes the slug to a `failures` list instead of faking success; best-effort screenshotting is kept but the slug is marked degraded. - The outer catch records which slug failed rather than swallowing it. - `capture` prints a loud summary of every degraded slug and tells the operator to review before running `control`. - A new `--fail-on-degraded` flag opts into a non-zero exit so CI can gate on it; the default still exits 0 to preserve partial/best-effort runs. The exit-policy decision is factored into a pure, unit-tested `shouldFailRun` helper; the browser-driven path is covered by manual verification (capture against an unreachable domain). Fixes HIGH-002. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

control moved whatever captures existed and printed "Promoted N", never revealing that the controls for missing slugs were left untouched — so a partial capture (a failed target, or a --only run) silently produced a baseline mixing captures from different eras, which compare then trusted. Per the D-004 decision (permissive + loud, with a manifest): - control now reports "Promoted N of M" and warns when N < M that the untouched controls are now stale relative to the freshly promoted ones, and returns { moved, expected }. - Each promotion is recorded in controls/manifest.json (new src/manifest.mjs) with a shared per-run timestamp. - compare reads the manifest and warns when the baseline mixes controls from more than one control run, naming the oldest slug. Also fixes LOW-015: an orphan HTML snapshot (HTML present, PNG absent from a failed prior run) is now removed rather than left to mispair with a future PNG. Fixes HIGH-003, LOW-015. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When the control and capture differed in width, compareSlug warned and returned null, omitting the slug entirely — no row, no diff image, no entry in the comparison count. A width change is itself a visual regression (a collapsed layout, an appearing/disappearing scrollbar), so the report showed "all clear" for exactly the pages that changed most, with only a console warning that scrolls past. Generalize padImage to pad onto a common width-and-height canvas (returning the image untouched when it already fits, which also avoids a needless copy), and pad both images that way before pixelmatch. A width change now surfaces as a large diff and the slug appears in the report. The test that asserted a width mismatch returns null is rewritten to assert the new contract. Fixes MED-005. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

compare processed every (target × viewport) pair in a single serial loop, each iteration decoding two PNGs and running pixelmatch synchronously on the main thread — so the whole run was single-core and the event loop was blocked start to finish, while capture (next to it) already ran in parallel. - Run each comparison in a worker_threads pool (src/compare-worker.mjs) that reuses workers off a shared queue, so diffs spread across cores, the main thread stays responsive, and a slow slug doesn't gate the others (HIGH-005). - Bound peak memory by capping the pool: each in-flight comparison holds several large RGBA buffers, so concurrency defaults to CPU-count − 1 and is tunable via --compare-concurrency for very tall pages / constrained runners. padImage now returns the image untouched when no padding is needed, dropping a redundant full-buffer copy (HIGH-006). - Guard the HTML line-diff: skip snapshots over 2MB (which otherwise grind through diffLines for no useful result) and build the diff via array-join instead of repeated string `+=` (MED-008). A full dimension-aware memory budget was left out as a deliberate simplification; the configurable cap plus reduced allocation is the pragmatic bound. An end-to-end compare() test exercises the pool. Fixes HIGH-005, HIGH-006, MED-008. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Capture ran targets in fixed batches joined by `Promise.all`, with two costs: the stagger delay was `j * staggerDelay` (an absolute delay from batch start), so the last context in each batch waited `(concurrency-1) * stagger` before starting — raising concurrency to go faster grew that tail and re-burst the herd at every batch boundary; and `Promise.all` waited for the whole batch, so one slow target (a page hitting the timeout × retries) blocked the finished contexts from picking up new work. Use a queue-based pool instead: `concurrency` workers each pull the next target as soon as they finish, so throughput is bounded by total work over concurrency rather than by the slowest target per batch, and the stagger is applied once per worker at startup. Failure collection is unchanged. Fixes MED-007. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Each capture stacked three overlapping waits: goto with waitUntil 'networkidle' (already idle-bounded), a fixed-cadence auto-scroll (100ms per 500px step — ~3s to scroll a 15000px page regardless of load speed), and a hand-rolled busy-poll waitForNetworkIdle that re-confirmed idle with a 3s cap shorter than the 15s goto timeout (so lazy assets could be cut off). Reconcile MED-006 (drop the redundant double wait, speed the scroll) and LOW-013 (the post-scroll cap is too short and not configurable) into one change: - autoScroll now scrolls to the bottom and repeats only until the page height stabilizes (50ms between checks, capped at 100 iterations), so short pages settle almost instantly and tall ones keep going as content loads. - Replace the busy-poll with a single event-driven page.waitForLoadState('networkidle') after scrolling, bounded by a configurable timeout and best-effort (a never-idle page still screenshots). - goto and settle timeouts are now configurable via a `timeouts` block in reglance.json (defaults: goto 15000, settle 8000 — up from the old 3000 cap). Fixes MED-006, LOW-013. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Capture launched Chromium with --ignore-certificate-errors unconditionally, so TLS validation was off for every run — including when pointed at a public HTTPS host, where a swapped or expired cert would be captured silently. (The ignoreHTTPSErrors flag was also passed to launch(), where it has no effect; it belongs on the browser context.) Per the D-002 decision: ignore certificate errors only for local development hosts (localhost, 127.0.0.1, ::1, and .test/.local/.localhost domains), where self-signed certs are normal. For any other host, validation stays on unless the new --insecure flag is passed, and capture warns when it disables checks for a non-local host. ignoreHTTPSErrors is now correctly applied on newContext. Fixes LOW-002. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

LOW-012: capture and compare wrote artifacts with no handling for write failures, so a full disk or read-only output surfaced as an opaque error (or, in capture, was swallowed). ensureOutputDir now fails fast with a clear message when the output directory is not writable, and compare wraps its diff and html-diff writes to report the path and error code (EACCES/ENOSPC/...) instead of a bare stack. INFO-001: a `paths` value that is an absolute URL bypasses the configured domain and is navigated as-is. Per the D-001 decision (treat config as potentially untrusted) this stays supported but is no longer silent — capture now warns which path keys point off the configured domain, so off-domain (and potentially internal-network) capture is a conscious choice. The hosts are not blocked, since local development legitimately uses loopback/.test addresses; the documentation note covers the remaining surface. Fixes LOW-012, INFO-001. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The postinstall ran `playwright install chromium` directly. Because reglance is a dev dependency, that hook runs in every consuming install, and a failed download (air-gapped runner, blocked CDN, proxy) aborted the consumer's entire `npm ci`. The PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD env var set in CI did nothing, because the explicit `playwright install` command ignores it — so CI also re-downloaded Chromium on every leg. - Move the hook to scripts/postinstall.mjs (added to the published `files`): it returns early when PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD is set, otherwise runs the install and, on failure, prints an actionable message but always exits 0 so it can never break a consumer's install. - capture now catches a missing-browser launch error and tells the user to run `npx playwright install chromium`, instead of a raw Playwright stack. - Pin playwright to an exact version so the installed browser revision is deterministic (Dependabot already bumps it). The CI skip flag in test.yml is now meaningful (the comment is finally true). Fixes HIGH-001. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Update the README and example config for the changes in this branch: new --fail-on-degraded / --insecure / --compare-concurrency flags, the configurable `timeouts` block, the off-domain path warning, and the degraded-capture and partial-promotion (manifest) safeguards. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The multi-persona code review writes its artifacts under .reviews/; keep them out of git like the other generated output directories. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jeremyfelt and others added 14 commits June 3, 2026 12:41

Merge branch 'task/tests-and-validation' into task/audit-fixes

1f6f1cb

Ignore the .reviews/ code-review output directory

380237e

The multi-persona code review writes its artifacts under .reviews/; keep them out of git like the other generated output directories. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jeremyfelt merged commit 6d38731 into task/tests-and-validation Jun 3, 2026
2 checks passed

jeremyfelt deleted the task/audit-fixes branch June 3, 2026 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix audit findings: 6 high + 5 medium + 4 low/info#40

Fix audit findings: 6 high + 5 medium + 4 low/info#40
jeremyfelt merged 14 commits into
task/tests-and-validationfrom
task/audit-fixes

jeremyfelt commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeremyfelt commented Jun 3, 2026

High

Medium

Low / Info

New CLI / config surface

Decisions applied

Deliberate scope notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant