Skip to content

Fix audit findings: 6 high + 5 medium + 4 low/info#40

Merged
jeremyfelt merged 14 commits into
task/tests-and-validationfrom
task/audit-fixes
Jun 3, 2026
Merged

Fix audit findings: 6 high + 5 medium + 4 low/info#40
jeremyfelt merged 14 commits into
task/tests-and-validationfrom
task/audit-fixes

Conversation

@jeremyfelt
Copy link
Copy Markdown
Member

Implements 16 findings from the multi-persona audit, as 12 atomic commits. Each fix has tests where it's unit-testable and was verified by npm test (62 tests) + npm run lint (0 errors).

Stacked on #39. Branched off task/tests-and-validation because this work builds on that PR's test harness and the compareSlug/padImage exports. Base is set to #39's branch so the diff here is only the audit fixes; merge #39 first and GitHub will retarget this to trunk. (The compare rework here also supersedes #39's compareSlug export usage.)

High

  • HIGH-001postinstall moved to a best-effort scripts/postinstall.mjs that honors PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD and never fails a consumer's install; capture now reports a missing browser actionably; playwright pinned. (The PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD in test.yml from Add config validation and a test suite #39 is now actually effective.)
  • HIGH-002capture records degraded pages instead of faking success; loud summary; opt-in --fail-on-degraded for a non-zero CI exit (D-004).
  • HIGH-003control reports "Promoted N of M", warns on partial promotion, writes controls/manifest.json; compare warns when the baseline mixes controls from multiple runs (D-004).
  • HIGH-004--concurrency=0/negative/8x are rejected up front instead of hanging the capture loop forever.
  • HIGH-005 / HIGH-006compare now runs across a worker_threads pool (off the main thread, across cores), bounded by a configurable --compare-concurrency as the memory cap, with the redundant pad copy removed.

Medium

  • MED-004 — unmatched --only key now errors (listing known keys) in all three commands instead of a misleading success.
  • MED-005 — width-mismatched slugs are reported as a large diff instead of being silently dropped.
  • MED-006 / LOW-013 — the three stacked capture waits are reconciled into one event-driven, configurable settle; autoScroll grows-until-stable; timeouts are configurable.
  • MED-007 — capture's fixed batches replaced with a queue-based worker pool; stagger applied once at worker startup.
  • MED-008 — HTML diff guarded by a 2MB size limit and built via array-join; runs in the compare worker.

Low / Info

  • LOW-002 — TLS validation now stays on for non-local hosts; ignored only for .test/localhost, or via --insecure (with a warning) (D-002).
  • LOW-012 — writable-output preflight + actionable write-error messages.
  • LOW-015control drops orphan HTML snapshots.
  • INFO-001capture warns when a path points off the configured domain (kept non-breaking per D-001; not blocked, since local dev legitimately uses loopback hosts).

New CLI / config surface

  • Flags: --fail-on-degraded, --insecure, --compare-concurrency=<n>.
  • Config: timeouts: { goto, settle }.
  • README and reglance.example.json updated.

Decisions applied

D-001 (config treated as untrusted), D-002 (TLS on for non-local), D-004 (permissive + loud + manifest). D-003 (npm publishing / issue #20) is not in this PR.

Deliberate scope notes

  • HIGH-006: a full dimension-aware memory budget was left out; the configurable concurrency cap + reduced allocation is the pragmatic bound.
  • INFO-001: off-domain hosts are warned, not blocked (blocking would break the tool's local-dev use of loopback/.test).

🤖 Generated with Claude Code

jeremyfelt and others added 14 commits June 3, 2026 12:41
`--concurrency=0` previously flowed into the capture batching loop as the
step size, so `i += 0` never advanced and capture hung forever with no
error; negatives ran the loop backward to the same effect, and `parseInt`
silently truncated `8x` to 8.

Extract flag parsing into `src/args.mjs` as a testable `toInt` that uses
`Number` (rejecting trailing garbage) and enforces a minimum — 1 for
concurrency, 0 for stagger (which legitimately disables staggering). It
throws an actionable error caught by the CLI's top-level handler instead
of calling process.exit, which makes it unit-testable. Also clamp the
batch step to a positive value as defense in depth.

Fixes HIGH-004.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`capture` already guarded against a `--only` filter that matched nothing,
but `control` and `compare` applied the same filter with no guard: a typo'd
key (`reglance control blgo`) silently filtered the target list to empty and
printed a success-looking summary ("Promoted 0 capture(s)" / "Nothing to
compare").

Factor the filter into a shared `filterTargets` helper in config.mjs that
throws — listing the known keys to help spot the typo — and use it in all
three commands. capture's behavior is now consistent (a non-zero exit via
the CLI's error handler instead of a swallowed exit 0).

Fixes MED-004.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When the retry loop exhausted its attempts it set `success = true`, logged
"continuing anyway", and screenshotted whatever was on screen — an error
page, a blank page, a half-loaded page — with no signal. The empty
`catch {}` discarded the navigation error, and `capture` printed
"✅ Capture complete" and exited 0 regardless. A silently-captured error
page then becomes a baseline via `control`, defeating the tool's core
promise, with CI staying green.

Per the D-004 decision (permissive + loud):
- The retry loop now records the real error and pushes the slug to a
  `failures` list instead of faking success; best-effort screenshotting is
  kept but the slug is marked degraded.
- The outer catch records which slug failed rather than swallowing it.
- `capture` prints a loud summary of every degraded slug and tells the
  operator to review before running `control`.
- A new `--fail-on-degraded` flag opts into a non-zero exit so CI can gate
  on it; the default still exits 0 to preserve partial/best-effort runs.

The exit-policy decision is factored into a pure, unit-tested
`shouldFailRun` helper; the browser-driven path is covered by manual
verification (capture against an unreachable domain).

Fixes HIGH-002.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
control moved whatever captures existed and printed "Promoted N", never
revealing that the controls for missing slugs were left untouched — so a
partial capture (a failed target, or a --only run) silently produced a
baseline mixing captures from different eras, which compare then trusted.

Per the D-004 decision (permissive + loud, with a manifest):
- control now reports "Promoted N of M" and warns when N < M that the
  untouched controls are now stale relative to the freshly promoted ones,
  and returns { moved, expected }.
- Each promotion is recorded in controls/manifest.json (new src/manifest.mjs)
  with a shared per-run timestamp.
- compare reads the manifest and warns when the baseline mixes controls from
  more than one control run, naming the oldest slug.

Also fixes LOW-015: an orphan HTML snapshot (HTML present, PNG absent from a
failed prior run) is now removed rather than left to mispair with a future
PNG.

Fixes HIGH-003, LOW-015.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When the control and capture differed in width, compareSlug warned and
returned null, omitting the slug entirely — no row, no diff image, no entry
in the comparison count. A width change is itself a visual regression (a
collapsed layout, an appearing/disappearing scrollbar), so the report showed
"all clear" for exactly the pages that changed most, with only a console
warning that scrolls past.

Generalize padImage to pad onto a common width-and-height canvas (returning
the image untouched when it already fits, which also avoids a needless copy),
and pad both images that way before pixelmatch. A width change now surfaces
as a large diff and the slug appears in the report. The test that asserted a
width mismatch returns null is rewritten to assert the new contract.

Fixes MED-005.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
compare processed every (target × viewport) pair in a single serial loop,
each iteration decoding two PNGs and running pixelmatch synchronously on the
main thread — so the whole run was single-core and the event loop was blocked
start to finish, while capture (next to it) already ran in parallel.

- Run each comparison in a worker_threads pool (src/compare-worker.mjs) that
  reuses workers off a shared queue, so diffs spread across cores, the main
  thread stays responsive, and a slow slug doesn't gate the others
  (HIGH-005).
- Bound peak memory by capping the pool: each in-flight comparison holds
  several large RGBA buffers, so concurrency defaults to CPU-count − 1 and is
  tunable via --compare-concurrency for very tall pages / constrained runners.
  padImage now returns the image untouched when no padding is needed, dropping
  a redundant full-buffer copy (HIGH-006).
- Guard the HTML line-diff: skip snapshots over 2MB (which otherwise grind
  through diffLines for no useful result) and build the diff via array-join
  instead of repeated string `+=` (MED-008).

A full dimension-aware memory budget was left out as a deliberate
simplification; the configurable cap plus reduced allocation is the pragmatic
bound. An end-to-end compare() test exercises the pool.

Fixes HIGH-005, HIGH-006, MED-008.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Capture ran targets in fixed batches joined by `Promise.all`, with two costs:
the stagger delay was `j * staggerDelay` (an absolute delay from batch start),
so the last context in each batch waited `(concurrency-1) * stagger` before
starting — raising concurrency to go faster grew that tail and re-burst the
herd at every batch boundary; and `Promise.all` waited for the whole batch, so
one slow target (a page hitting the timeout × retries) blocked the finished
contexts from picking up new work.

Use a queue-based pool instead: `concurrency` workers each pull the next
target as soon as they finish, so throughput is bounded by total work over
concurrency rather than by the slowest target per batch, and the stagger is
applied once per worker at startup. Failure collection is unchanged.

Fixes MED-007.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Each capture stacked three overlapping waits: goto with waitUntil
'networkidle' (already idle-bounded), a fixed-cadence auto-scroll (100ms per
500px step — ~3s to scroll a 15000px page regardless of load speed), and a
hand-rolled busy-poll waitForNetworkIdle that re-confirmed idle with a 3s cap
shorter than the 15s goto timeout (so lazy assets could be cut off).

Reconcile MED-006 (drop the redundant double wait, speed the scroll) and
LOW-013 (the post-scroll cap is too short and not configurable) into one
change:
- autoScroll now scrolls to the bottom and repeats only until the page height
  stabilizes (50ms between checks, capped at 100 iterations), so short pages
  settle almost instantly and tall ones keep going as content loads.
- Replace the busy-poll with a single event-driven
  page.waitForLoadState('networkidle') after scrolling, bounded by a
  configurable timeout and best-effort (a never-idle page still screenshots).
- goto and settle timeouts are now configurable via a `timeouts` block in
  reglance.json (defaults: goto 15000, settle 8000 — up from the old 3000 cap).

Fixes MED-006, LOW-013.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Capture launched Chromium with --ignore-certificate-errors unconditionally,
so TLS validation was off for every run — including when pointed at a public
HTTPS host, where a swapped or expired cert would be captured silently.
(The ignoreHTTPSErrors flag was also passed to launch(), where it has no
effect; it belongs on the browser context.)

Per the D-002 decision: ignore certificate errors only for local development
hosts (localhost, 127.0.0.1, ::1, and .test/.local/.localhost domains), where
self-signed certs are normal. For any other host, validation stays on unless
the new --insecure flag is passed, and capture warns when it disables checks
for a non-local host. ignoreHTTPSErrors is now correctly applied on
newContext.

Fixes LOW-002.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LOW-012: capture and compare wrote artifacts with no handling for write
failures, so a full disk or read-only output surfaced as an opaque error
(or, in capture, was swallowed). ensureOutputDir now fails fast with a clear
message when the output directory is not writable, and compare wraps its diff
and html-diff writes to report the path and error code (EACCES/ENOSPC/...)
instead of a bare stack.

INFO-001: a `paths` value that is an absolute URL bypasses the configured
domain and is navigated as-is. Per the D-001 decision (treat config as
potentially untrusted) this stays supported but is no longer silent — capture
now warns which path keys point off the configured domain, so off-domain
(and potentially internal-network) capture is a conscious choice. The hosts
are not blocked, since local development legitimately uses loopback/.test
addresses; the documentation note covers the remaining surface.

Fixes LOW-012, INFO-001.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The postinstall ran `playwright install chromium` directly. Because reglance
is a dev dependency, that hook runs in every consuming install, and a failed
download (air-gapped runner, blocked CDN, proxy) aborted the consumer's entire
`npm ci`. The PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD env var set in CI did nothing,
because the explicit `playwright install` command ignores it — so CI also
re-downloaded Chromium on every leg.

- Move the hook to scripts/postinstall.mjs (added to the published `files`):
  it returns early when PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD is set, otherwise runs
  the install and, on failure, prints an actionable message but always exits 0
  so it can never break a consumer's install.
- capture now catches a missing-browser launch error and tells the user to run
  `npx playwright install chromium`, instead of a raw Playwright stack.
- Pin playwright to an exact version so the installed browser revision is
  deterministic (Dependabot already bumps it).

The CI skip flag in test.yml is now meaningful (the comment is finally true).

Fixes HIGH-001.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update the README and example config for the changes in this branch:
new --fail-on-degraded / --insecure / --compare-concurrency flags, the
configurable `timeouts` block, the off-domain path warning, and the
degraded-capture and partial-promotion (manifest) safeguards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The multi-persona code review writes its artifacts under .reviews/; keep them
out of git like the other generated output directories.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jeremyfelt jeremyfelt merged commit 6d38731 into task/tests-and-validation Jun 3, 2026
2 checks passed
@jeremyfelt jeremyfelt deleted the task/audit-fixes branch June 3, 2026 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant