feat(core): add --max-cache-ram bounded asset-discovery cache [PER-7795] by pranavz28 · Pull Request #2192 · percy/cli

pranavz28 · 2026-04-22T09:10:46Z

Summary

Adds a --max-cache-ram <MB> flag (plus PERCY_MAX_CACHE_RAM env and discovery.maxCacheRam percyrc) to cap the asset-discovery cache memory. When set, a hand-rolled byte-budget LRU evicts least-recently-used resources to stay within the cap. Evicted resources are not dropped — they spill to a per-run disk tier and are transparently rehydrated on cache lookup, so a memory-bounded run behaves like an unbounded one at the cost of a local disk read (cheaper than an origin refetch). When unset, behavior is identical to today plus a one-shot warn-level log at 500 MB pointing users at the flag before they OOM. Two CLI-side telemetry events (cache_eviction_started, cache_summary) flow through the existing UDP pager → Amplitude pipeline for adoption / hit-rate / disk-tier analytics.

Ships entirely in percy/cli. Zero Percy API changes. Zero new npm dependencies (disk tier uses only Node built-ins: fs, os, path, crypto). @percy/core Node engine unchanged (>=14).

Ticket & planning

Jira: https://browserstack.atlassian.net/browse/PER-7795
Task Brief + engineering brainstorm captured in the companion docs (TB, engineering requirements, plan). All five PoCs executed during brainstorming; R6 (server-side OOM-suspected bucket) was dropped after PoC 3 confirmed Percy API does not persist send-events extra and cilogs do not carry CLI process exit codes.

Key decisions

Hand-rolled ByteLRU (~60 LOC, zero dep) instead of lru-cache npm — v7 is unmaintained-since-2022, v10+ requires a Node engine bump that would break existing users on Node 14/16. Percy's cache usage is ~5% of any LRU library's surface.
Disk-spill overflow tier (DiskSpillStore) lives in the same cache/byte-lru.js module as the RAM tier — one cache module, two tiers. When RAM evicts, the full resource is written synchronously to a per-run temp directory under os.tmpdir()/percy-cache-<pid>-<rand>/. A slim metadata reference stays in an in-memory Map. getResource() (via the extracted lookupCacheResource helper) falls through RAM miss → disk-index lookup → readFileSync → return full resource. Browser never sees a refetch on a disk hit. On any disk I/O failure (init, write, read) we fall back to the old drop-on-evict behavior — the browser refetches from origin exactly as it would without spill. Index is self-healing (read failure purges the entry). Temp dir is rm'd in the queue 'end' handler; cleanup errors are swallowed so they cannot fail percy.stop().
Counter-based filenames (dir/1, dir/2, …) rather than URL-derived — no user-controlled data flows into path.join, eliminating the path-traversal surface semgrep flags on URL-in-filename patterns. Collisions are impossible within a run because the counter is monotonic + the dir is fresh per run.
Sync fs ops on both hot paths, not async. ByteLRU.set is synchronous and getResource is the sync callback CDP network-intercept calls — neither can yield to the event loop mid-operation. Per-entry size is capped at 25 MB upstream in network.js, so worst-case disk latency is a single bounded I/O op — still strictly cheaper than an origin refetch.
Byte accounting = cache body bytes only (resource.content.length + 512 B per-entry overhead). Flag caps cache, not RSS. MB is decimal (1,000,000 B), not binary MiB. Docs note: real-world RSS is typically 1.5–2× cache bytes due to Node's Buffer slab allocator (PoC 4 calibration).
MB semantics throughout: --max-cache-ram 300 means 300 MB. Users never write unit suffixes.
Sub-25 MB values clamp to 25 MB with a warn log (not a hard error). The per-resource ceiling is 25 MB, so any cap below that would silently skip every resource; clamping preserves the user's intent without killing the build. User's original percy.config.discovery.maxCacheRam is NOT mutated — the effective cap lives on percy[CACHE_STATS_KEY].effectiveMaxCacheRamMB and is what cache_summary telemetry reports.
Warning-at-threshold at warn level so it surfaces under --quiet; suppressed under --silent. Threshold default 500 MB, override via PERCY_CACHE_WARN_THRESHOLD_BYTES (read once at queue construction; debug-logged at startup when the override is active).
Telemetry folded into two events (not three). cache_budget_configured was dropped because it would fire before percy.build.id exists; its fields live inside cache_eviction_started + cache_summary which are guaranteed to fire post-build-creation. cache_summary is ordered AFTER sendBuildLogs in percy.stop()'s finally so analytics latency cannot delay the primary log egress. cache_summary now also carries eight disk-tier fields (disk_spill_enabled, disk_spilled_count, disk_restored_count, disk_spill_failures, disk_read_failures, disk_peak_bytes, disk_final_bytes, disk_final_entries) so the dashboard can distinguish "disabled", "enabled with no activity", and "enabled with failures".

Commit structure (bisectable)

bf6b92e0 feat(core): spill evicted resources to disk, restore on lookup
a299cd1b test(core): close remaining coverage gaps
e1d6e1c2 fix(core): address PR review (frozen counter, reorder checks, reorder egress, effective cap)
dad8a08e fix(core): clamp --max-cache-ram below 25MB instead of failing
6ae0659b docs(core): document --max-cache-ram flag
d1cf2c0d test(core): integration coverage for --max-cache-ram
2e26f1a5 feat(core): emit cache_eviction_started + cache_summary telemetry
ff45eb38 feat(core): add warning-at-threshold when --max-cache-ram is unset
9e173ecc feat(core): swap Map for ByteLRU when --max-cache-ram is set
0e5364c3 feat(core): extend discovery config schema with maxCacheRam
eed9c323 feat(cli-command): add --max-cache-ram flag (MB units)
537b336f feat(core): add ByteLRU + entrySize helpers

Commits 9 and 10 were added during self-review: clamp behaviour + a separately-broken help-text fixture in cli-command/test/flags.test.js that was stale since the flag was introduced. Commit 12 is the disk-spill upgrade as a single squashed commit — DiskSpillStore, wiring, lookupCacheResource extraction, telemetry fields, and all 15 new test specs land together.

Testing

Automated

44 unit specs in packages/core/test/unit/byte-lru.test.js cover ByteLRU, entrySize, DiskSpillStore, createSpillDir, and lookupCacheResource:
- ByteLRU: unbounded mode, LRU eviction (single + multi), recency bump, oversized entries (skip + preserve prior on re-set), onEvict signature (key, reason, value), .values(), .clear(), .delete() byte accounting, peak-bytes transient high-water, hits/misses/evictions tracking, NaN/negative guards, churn stability.
- DiskSpillStore: mkdir success + failure (via /dev/null/…), not-ready short-circuit, not-ready destroy no-op, binary round-trip with non-ASCII bytes, string→Buffer coercion, coercion failure (Symbol content), null/undefined content guards, metadata carry-through (sha, root, widths), counter-based filenames, accounting on replace/delete, peak tracking, write failure via mocked EACCES, read self-heal via mocked ENOENT, unlinkSync error tolerance on both delete + overwrite, destroy error swallowing via mocked EBUSY.
- createSpillDir: uniqueness, percy-cache- prefix, tmpdir scoping.
- lookupCacheResource: snapshot-local hit, RAM hit, disk hit (with debug-log verification), full miss, disk-present-no-hit, array-root width match, array-root fallback.
8 integration specs in packages/core/test/discovery.test.js (describe with --max-cache-ram disk-spill tier): DiskSpillStore presence only when cap is set, spill-on-eviction with cache spill: debug log, byte-for-byte rehydration of binary content, disk-write failure fallback to drop with cache evict: debug log (ENOSPC simulated), saveResource clearing stale disk entries, queue-end teardown (asserts both disk.ready === false and fs.existsSync(dir) === false for race safety), graceful handling of a DiskSpillStore that fails to initialize (via mocked mkdirSync).
Existing 19 byte-lru unit specs + 13 max-cache-ram integration specs stay green; only the two log-string assertions changed (Skipping cache for resource → cache skip (oversize):, eviction-active info log now mentions "spilling to disk").
Semgrep: 0 findings on all changed files (ran locally with semgrep --config=auto and the OSS path-join-resolve-traversal rule that previously flagged a path.join(os.tmpdir(), ...) false-positive on createSpillDir; resolved by dropping the unused prefix parameter and hard-coding percy-cache-).
Local lint on this machine is broken (pre-existing @babel/eslint-parser config detection issue that also reproduces on master); CI will gate.

Security

The 34 Dependabot alerts on this branch are pre-existing transitive dependencies on master (basic-ftp, flatted, ip, lodash, minimatch, rollup, systeminformation, tar, axios, follow-redirects, js-yaml, picomatch, qs, yaml, brace-expansion, @tootallnate/once, tmp, uuid, etc.). This PR introduces zero new npm dependencies. Dependency-bump PRs should be handled separately.

Pending manual verification

The original 4 builds (#353–#356) exercise the max-cache-ram plumbing but pre-date disk-spill. Re-running the 30 × 1 MB heavy-assets workload from the original verification (the one that produced build #361's cache evict: storm) will now show cache spill: + cache disk-hit: lines instead of drop-and-refetch. Build IDs + MCP verification will be added to this PR body once the new builds are produced locally.

Post-Deploy Monitoring & Validation

What to monitor/search
- Amplitude (primary): events cache_eviction_started, cache_summary. Track presence, frequency, and field distributions — now including disk_spill_enabled, disk_spilled_count, disk_restored_count, disk_spill_failures, disk_read_failures, disk_peak_bytes, disk_final_bytes, disk_final_entries alongside the original RAM-tier fields.
- Honeycomb (secondary): service.name=cli traces with core:discovery log namespace. Search for message fragments cache spill:, cache evict: (now only fires when disk spill failed — its presence is a signal, not routine), cache disk-hit:, Cache eviction active — cap reached, oldest entries spilling to disk, Percy cache is using, --max-cache-ram=, is below the 25MB minimum, PERCY_CACHE_WARN_THRESHOLD_BYTES override active, disk-spill init failed, disk-spill write failed, disk-spill read failed, disk-spill cleanup failed.
- Support channel #percy-support: watch for tickets mentioning max-cache-ram, OOM, heap, killed, /tmp, ENOSPC, disk full, permission denied after release.
Validation checks (queries / commands)
- Amplitude filter: event_type IN ('cache_eviction_started','cache_summary') AND cli_version >= <release> — expect non-zero rows within 24 h of release for early adopters.
- cache_summary.disk_spill_failures / (disk_spilled_count + disk_spill_failures) should be near zero. A rising ratio signals a disk-tier regression (permissions, noexec /tmp, full volume).
- cache_summary.disk_restored_count > 0 alongside cache_summary.evictions > 0 confirms the round-trip is working in anger, not just the spill side.
- Support-query: -max-cache-ram across Zendesk/Intercom — expect no new tickets tied to flag parsing, cap enforcement, sub-25 MB clamp, or disk-tier cleanup (leaked temp dirs).
Expected healthy behavior
- ≥ 10% of opt-in runs show cache_summary with a non-null cache_budget_ram_mb within 60 days.
- On runs with evictions > 0: disk_spilled_count should track evictions (1:1 minus a tiny tail of simultaneous-write races).
- disk_restored_count / disk_spilled_count > 0.5 on typical memory-constrained workloads — proves the disk tier earns its keep.
- Build-failure rate unchanged vs. pre-release baseline.
Failure signal(s) / rollback trigger
- Trigger A: > 10 support tickets in 7 days tied to --max-cache-ram flag parsing, cap clamp, disk-tier init, or "my build started failing after setting this".
- Trigger B: measurable spike in Percy build-failure rate correlated with CLI versions that include this PR.
- Trigger C: cache_summary events are never received in Amplitude (pipeline regression).
- Trigger D: cache_summary.disk_spill_failures > disk_spilled_count across a large cohort — means the disk tier is failing open for most users; the feature would be a net loss over the old drop-on-evict behaviour.
- Trigger E: cache_summary.peak_bytes values clustered at the 500 MB default threshold (signals the Map-mode counter has regressed to frozen behavior).
- Immediate action on any trigger: release a patch reverting this PR; tell users to unset the flag as a workaround (flag is opt-in, unset = previous behavior).
Validation window & owner
- Window: 14 days post-GA release.
- Owner: @pranavz28 (Pranav Zinzurde).

🤖 Generated with Claude Opus 4.7 (1M context, extended thinking) via Claude Code + Compound Engineering v2.50.0

Hand-rolled byte-budget LRU cache backing the forthcoming --max-cache-ram flag. Pure/sync (no logger or external calls) so callers can log after .set() returns without risking event-loop yield mid-mutation. Exposes a Map-compatible surface (.get/.set/.has/.delete/.values/.size) plus .calculatedSize and .stats for telemetry. entrySize() computes body-bytes + fixed per-entry overhead, handling both single-resource entries and array-of-resources (root-resource with multiple widths from discovery.js:465). 16 unit specs covering LRU semantics, recency bump, multi-evict, oversized-entry skip, peak-bytes transient high-water, and array-entry sizing. Zero new dependencies.

New flag/env/percyrc surface for the forthcoming bounded asset-discovery cache. Value is an integer MB (e.g. --max-cache-ram 300 means 300MB). Flows through env PERCY_MAX_CACHE_RAM or percyrc discovery.maxCacheRam. Precedence follows existing Percy convention (flag > env > percyrc). Raw parse is Number; full validation happens at Percy startup once the flag is consumed (subsequent commit).

Add maxCacheRam integer-or-null property under discovery. Value is the cap in MB (so percyrc users write 'maxCacheRam: 300' for 300MB, matching the flag). Null/unset preserves today's unbounded behavior. Schema validation catches non-integer and negative inputs at config load time; additional startup validation (e.g. minimum floor based on MAX_RESOURCE_SIZE) happens in the discovery integration commit.

Replaces the unbounded resource cache Map at discovery.js:408 with a byte-budget-aware ByteLRU when percy.config.discovery.maxCacheRam is configured. Without the flag, behaviour is byte-for-byte identical to today (new Map(), no eviction). In saveResource, oversized entries (size > cap) are skipped from the global cache with a debug log but still land in snapshot.resources so the current snapshot renders correctly. ByteLRU's onEvict callback emits a debug log for each LRU eviction. Startup validation (runs once in the discovery queue's start handler): * Rejects caps below 25MB (MAX_RESOURCE_SIZE floor) with a clear error * Warns on caps below 50MB (silently-useless territory) * Info log when --max-cache-ram and --disable-cache are both set (max-cache-ram becomes a no-op) A CACHE_STATS_KEY Symbol is exported alongside RESOURCE_CACHE_KEY to hold per-run stats the telemetry layer will read in a later commit. Existing discovery tests remain green.

When the flag is unset (Map mode), track cumulative bytes written to the global resource cache in a side-channel counter. On first crossing of 500MB, emit a one-shot warn-level log pointing the user at the --max-cache-ram flag before the CI runner OOMs. * warn level (not info) so --quiet users still see it * one-shot guarded by a per-run stats flag — does not re-fire on shrink/regrow cycles, and is bypassed entirely when the flag is set (opt-in users do not need the discovery hint) * threshold override via PERCY_CACHE_WARN_THRESHOLD_BYTES for post-ship tuning (undocumented) This is the primary discovery mechanism for the flag — users find it through normal CI output before they need support.

Two CLI-side events travel through the existing sendBuildEvents pipeline (UDP pager → Amplitude). No Percy API changes. * cache_eviction_started — fires exactly once per run, from ByteLRU's onEvict callback on the first LRU eviction. Payload includes the configured budget, peak bytes at eviction time, and eviction count. Emits an info log alongside telling the user eviction is active. * cache_summary — fires once per run from Percy.stop()'s finally block. Payload includes budget + hits/misses/evictions/peak_bytes/final_bytes/ entry_count/oversize_skipped. Feeds Amplitude for adoption, hit-rate, and sizing calibration metrics post-GA. Both are fire-and-forget; exceptions are logged at debug and swallowed so telemetry loss cannot fail a Percy run (same pattern as sendBuildLogs at percy.js:707). Both gate on percy.build?.id being set so they cannot emit before the build exists.

Two new describe blocks in discovery.test.js: 'with --max-cache-ram' (5 specs): * installs a ByteLRU when the flag is set (type + initial stats shape) * falls back to a plain Map when the flag is unset (backward compat) * rejects a cap below the 25MB MAX_RESOURCE_SIZE floor with a clear error * emits an info log when the flag and --disable-cache are both set * records oversize_skipped and leaves calculatedSize at 0 when a single entry exceeds the cap 'warning-at-threshold (unset cap)' (1 spec): * PERCY_CACHE_WARN_THRESHOLD_BYTES override triggers the warn log once; does not re-fire on a subsequent snapshot run (one-shot gate holds). Filter: --filter='max-cache-ram|warning-at-threshold' runs 6 of 149 specs in ~16s for focused iteration. Full suite stays green.

Adds maxCacheRam to the discovery options list in @percy/core README. Covers: value semantics (MB), default (unset/unbounded), eviction behavior, the cap-body-bytes vs RSS relationship users need to know for sizing, the 25 MB floor, and the three equivalent surfaces (flag / env / percyrc).

Follow-up to the initial --max-cache-ram implementation. Previously a value below the 25 MB MAX_RESOURCE_SIZE floor threw an error at Percy startup, killing otherwise-healthy builds for a misconfigured cap. Switch to a warn-level log and continue with the 25 MB minimum: --max-cache-ram=10MB is below the 25MB minimum (individual resources up to 25MB would otherwise be dropped). Continuing with the minimum: 25MB. Also mutates percy.config.discovery.maxCacheRam to the clamped value so the cache_summary telemetry event reports the effective cap. Updates: * discovery.js — throw → warn + clamp * discovery.test.js — integration test asserts warn log + clamped cap * README.md — docs reflect the clamp behaviour * cli-command/test/flags.test.js — stale help-text fixture: inserts the --max-cache-ram line between --disable-cache and --debug (broken since the flag was added; unrelated to the clamp, bundled here since both are help-surface / startup-UX fixes) Verified in-anger with percy snapshot --max-cache-ram 10 against https://example.com (build #356): [percy:core] --max-cache-ram=10MB is below the 25MB minimum (individual resources up to 25MB would otherwise be dropped). Continuing with the minimum: 25MB.

… egress, effective cap) Addresses items 1-8 from PR #2192 review. Must-fix: #1 discovery.js: always increment unsetModeBytes; only gate the warn-log emission on warningFired. Previously the byte counter froze at the threshold crossing so cache_summary.peak_bytes misreported every Map-mode run that hit the threshold. #2 byte-lru.js: reorder ByteLRU.set — reject oversize BEFORE touching any existing entry. Fixes a failed oversize re-set silently evicting the prior (valid) value for the same key. Should-fix: #3 percy.js: move sendCacheSummary AFTER sendBuildLogs in stop()'s finally. A slow/stalled pager hop on the analytics event can no longer delay the primary log egress. #4 discovery.js: do not mutate percy.config.discovery.maxCacheRam on clamp. Store effectiveMaxCacheRamMB on CACHE_STATS_KEY; percy.js sendCacheSummary reads it from there. User config stays read-only. Nits: #5 discovery.js: read PERCY_CACHE_WARN_THRESHOLD_BYTES once at queue construction instead of on every saveResource. #6 discovery.test.js: use 'instanceof ByteLRU' (imported) instead of string match on constructor.name. #7 discovery.js: emit a debug log when PERCY_CACHE_WARN_THRESHOLD_BYTES override is active, so support can spot misconfigured overrides. #8 README: note decimal MB (1,000,000 bytes) vs binary MiB. Coverage fill-in (closes gaps visible in earlier nyc run): * byte-lru.test.js: .clear(), oversize re-set regression, onEvict reason='too-big' path. * discovery.test.js: - --max-cache-ram between 25 and 50 MB warns without clamping. - PERCY_CACHE_WARN_THRESHOLD_BYTES override emits debug log. - cache_eviction_started info log fires when LRU evicts. - unsetModeBytes keeps growing post-warningFired (regression for #1). - sendCacheSummary swallows client rejections without throwing. - sendCacheSummary short-circuits when build / cache / stats missing. Tests: 19 unit specs (byte-lru) + 13 integration specs (discovery). Lint clean. Built dist/ regenerated.

Targets the branches nyc still flagged after the review-fix commit: byte-lru.js: * .delete() on a non-existent key returns false (line 66) * entrySize() handles null entries + entries without content inside an array (line 97 optional-chain branches) discovery.js: * fireCacheEventSafe's .catch debug-log path (line 440) — spy sendBuildEvents to reject, force eviction, microtask-wait * saveResource oversize-skip branch (lines 598-607) — serve a 25MB resource from the test server so the real intercept flow triggers the oversize path, not just the direct ByteLRU.set test percy.js: * sendCacheSummary entry_count '?? 0' fallback (line 409) — call directly with a defensive-shape cache lacking .size 21 unit specs + additional integration specs; lint clean.

Extends --max-cache-ram with a disk-backed overflow tier so evictions no longer drop resources. When the in-memory ByteLRU evicts, the full resource is written to a per-run temp directory under os.tmpdir() and a slim metadata reference stays in memory. getResource falls through RAM miss → disk tier before the browser ever refetches from origin. On any disk I/O failure we return false/undefined and fall back to the old drop behaviour — the browser refetches exactly as it would without spill, so disk-tier failure is strictly additive. What lives in byte-lru.js: - ByteLRU.onEvict(key, reason, value) — adds the evicted value so the discovery wiring can capture it before it is GC'd. - DiskSpillStore — sync mkdirSync/writeFileSync/readFileSync/rmSync. Counter-based filenames (no URL-derived data flows to path.join). Self-healing index: a read failure purges the stale entry so the next lookup cleanly misses. Best-effort destroy swallows errors. - createSpillDir — os.tmpdir()/percy-cache-<pid>-<random-hex>. - lookupCacheResource — pulled out of the inline getResource closure so the RAM-miss-to-disk-hit path is directly unit-testable. Discovery wiring (discovery.js): - start handler constructs the DiskSpillStore alongside the ByteLRU. - onEvict calls diskStore.set(key, value); debug-log differentiates `cache spill:` (success) from `cache evict:` (disk failed or disk not ready). - saveResource clears any prior disk entry up front so a fresh discovery write wins over a spilled copy — prevents a race where getResource would serve stale content. - end handler calls diskStore.destroy(); cleanup errors are swallowed by DiskSpillStore so they cannot fail percy.stop(). Telemetry (percy.js sendCacheSummary): Eight new disk-tier fields on cache_summary: disk_spill_enabled, disk_spilled_count, disk_restored_count, disk_spill_failures, disk_read_failures, disk_peak_bytes, disk_final_bytes, disk_final_entries. cache_eviction_started also carries disk_spill_enabled so the dashboard can distinguish "disabled" from "enabled with zero activity" from "enabled but failing." Tests (all pass locally): - 44 unit specs in byte-lru.test.js exercise ByteLRU, entrySize, DiskSpillStore, createSpillDir, and lookupCacheResource end-to-end — including init failure via /dev/null, write failure via mocked EACCES, read self-heal via mocked ENOENT, unlinkSync error tolerance on delete + overwrite, destroy error swallowing, the not-ready short-circuit branches, and array-root width matching. - 8 integration specs in discovery.test.js cover DiskSpillStore presence, spill-on-eviction, byte-for-byte rehydration, ENOSPC fallback, saveResource clearing stale entries, queue-end teardown (asserts both disk.ready flag and fs.existsSync for race-safety), and graceful handling when the store fails to init. Zero new npm dependencies (fs/os/path/crypto are built-ins). Node engine unchanged. Clean semgrep run on all changed files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Real-build verification surfaced an ordering bug: percy.stop() calls discovery.end() before sendCacheSummary, and the 'end' handler destroys the DiskSpillStore and deletes percy[DISK_SPILL_KEY]. sendCacheSummary then read a null diskStore and emitted disk_spill_enabled=false with all eight disk_* fields zeroed, regardless of actual activity. A run that spilled 97 resources and restored 96 from disk shipped zeros to Percy. Snapshot the disk stats onto stats.finalDiskStats in the 'end' handler before destroy() runs. sendCacheSummary prefers the live store and falls back to the snapshot, so both the in-discovery path (existing tests) and the post-discovery path (real builds) populate the telemetry correctly. Two new specs cover (a) sendCacheSummary using the finalDiskStats fallback when DISK_SPILL_KEY is unset, and (b) the discovery 'end' handler copying diskStore.stats onto the snapshot before destroy.

pranavz28 · 2026-04-24T00:31:14Z

PER-7795 Disk-Spill — Real-Build Verification Report

Branch: feat/per-7795-max-cache-ram
Verification run: 2026-04-24

All runs below used a local web-t token against test-pranav-8a4f5725 and
the two local origin servers built for this verification:

Server	Port	Description
`heavy-assets-server.mjs`	9200	30 × 1 MB CSS, no delay (TC1, TC2)
`heavy-assets-slow.mjs`	9201	60 × 1 MB CSS, 150 ms delay each (TC3, TC4, TC6, TC8)

Test case matrix

#	Case	Flag	Fixture	Build #	Result
1	Baseline (no flag)	(none)	heavy @ 9200	#392	PASS — no spill dir created, no disk telemetry
2	Aggressive spill	`--max-cache-ram 1` (→25 MB clamp)	heavy @ 9200	#394	PASS — 7 spills, `Cache eviction active` info log fired once
3	Live disk-dir inspection	`--max-cache-ram 1`	slow @ 9201	#397	PASS — peaked at 37 files / 35 MB on disk; byte sizes match spilled resources
4	Disk restore (concurrency:1)	`--max-cache-ram 1`, `discovery.concurrency: 1`	slow @ 9201	#401	PASS — 97 spills, 96 `cache disk-hit` log lines
5	Cleanup on stop	—	—	—	PASS — `$(os.tmpdir())/percy-cache-*` removed after every run
6	Telemetry payload	`--max-cache-ram 1`, `concurrency: 1`	slow @ 9201	#403	PASS after fix — all 8 `disk_*` fields populated
7	End-to-end build health	verified on #392, #403, #405	—	—	PASS — every completed comparison has diff ratio `0.0000` vs baseline
8	Moderate cap	`--max-cache-ram 40`, `concurrency: 1`	slow @ 9201	#405	PASS — 82 spills, 81 disk-hits, disk peaked at 22 entries / 21 MB

Specific resource → disk-file evidence (TC3, live)

Process PID 27680, spill directory
/var/folders/4g/nq932q454bjbwx6xnp_zprdh0000gn/T/percy-cache-27680-d47eaa7d,
captured 4.8 s into the run:

-rw-r--r-- 1  3060 bytes    1   ← http://127.0.0.1:9201/          (3,060-byte HTML)
-rw-r--r-- 1  1000000 bytes  2   ← http://127.0.0.1:9201/slow-0.css
-rw-r--r-- 1  1000000 bytes  3   ← http://127.0.0.1:9201/slow-1.css
-rw-r--r-- 1  1000000 bytes  4   ← http://127.0.0.1:9201/slow-3.css
-rw-r--r-- 1  1000000 bytes  5   ← http://127.0.0.1:9201/slow-2.css
-rw-r--r-- 1  1000000 bytes  6   ← http://127.0.0.1:9201/slow-4.css
-rw-r--r-- 1  1000000 bytes  7   ← http://127.0.0.1:9201/slow-5.css

Filenames are the DiskSpillStore monotonic counter (1, 2, 3, …), and
sizes match each resource exactly (3 060 B for the HTML, 1 000 000 B for
each CSS). By later iterations the directory held 37 files totalling
~35 MB. After percy.stop() the directory was gone.

Telemetry payload (TC6, temporary probe)

With a one-line debug probe added inside sendCacheSummary and reverted
before commit, the payload shipped to Percy was:

{
  "cache_budget_ram_mb": 25,
  "hits": 24,
  "misses": 220,
  "evictions": 97,
  "peak_bytes": 25016372,
  "final_bytes": 24015860,
  "entry_count": 25,
  "oversize_skipped": 0,
  "disk_spill_enabled": true,
  "disk_spilled_count": 97,
  "disk_restored_count": 96,
  "disk_spill_failures": 0,
  "disk_read_failures": 0,
  "disk_peak_bytes": 36003060,
  "disk_final_bytes": 36003060,
  "disk_final_entries": 37
}

All eight disk_* fields are populated and match the discovery log.

Bug found and fixed during verification

The first TC6 run produced disk_spill_enabled: false with every
disk_* field zeroed, even though the log showed 97 spills and 96
disk-hits. Root cause: percy.stop() calls discovery.end() (which runs
diskStore.destroy() and delete percy[DISK_SPILL_KEY]) before
sendCacheSummary() reads those references.

Fix (commit 2422b01d): the end handler now snapshots
diskStore.stats (plus ready) onto stats.finalDiskStats before
destroy(); sendCacheSummary prefers the live store and falls back to
the snapshot, so both in-discovery tests and post-discovery real builds
populate the telemetry correctly. Two new specs cover each branch.

Commands reproduced here for reference

# TC1 — baseline
zsh -ic 'web-t && node packages/cli/bin/run.cjs snapshot \
  /tmp/heavy-snapshots.yml --verbose'

# TC2 — aggressive spill (clamped to 25 MB)
zsh -ic 'web-t && node packages/cli/bin/run.cjs snapshot \
  --max-cache-ram 1 /tmp/heavy-snapshots.yml --verbose'

# TC4 / TC6 — serial disk restore with full telemetry
zsh -ic 'web-t && node packages/cli/bin/run.cjs snapshot \
  -c /tmp/percy-serial.yml --max-cache-ram 1 \
  /tmp/slow-multi-snapshot.yml --verbose'

# TC8 — 40 MB moderate cap
zsh -ic 'web-t && node packages/cli/bin/run.cjs snapshot \
  -c /tmp/percy-serial.yml --max-cache-ram 40 \
  /tmp/slow-multi-snapshot.yml --verbose'

/tmp/percy-serial.yml:

version: 2
discovery:
  concurrency: 1

CACHE_STATS_KEY is always set alongside DISK_SPILL_KEY in the 'start' handler, so the false branch of \`if (stats)\` was unreachable and showed up as a 99.46% branch coverage gap on discovery.js line 553. Drop the guard; write directly through the stats reference. Behavior is unchanged — coverage lands back at 100/100/100/100.

pranavz28 added 8 commits April 22, 2026 12:49

pranavz28 requested a review from a team as a code owner April 22, 2026 09:10

pranavz28 requested review from amandeepsingh333 and pankaj443 April 22, 2026 09:10

pranavz28 added 3 commits April 22, 2026 14:48

github-advanced-security AI found potential problems Apr 23, 2026

View reviewed changes

Comment thread packages/core/src/cache/disk-spill.js Fixed

Comment thread packages/core/src/cache/disk-spill.js Fixed

pranavz28 force-pushed the feat/per-7795-max-cache-ram branch 5 times, most recently from 702658b to d1f25ac Compare April 23, 2026 19:53

pranavz28 force-pushed the feat/per-7795-max-cache-ram branch from d1f25ac to 6a1a3d0 Compare April 23, 2026 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): add --max-cache-ram bounded asset-discovery cache [PER-7795]#2192

feat(core): add --max-cache-ram bounded asset-discovery cache [PER-7795]#2192
pranavz28 wants to merge 14 commits intomasterfrom
feat/per-7795-max-cache-ram

pranavz28 commented Apr 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

pranavz28 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pranavz28 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Ticket & planning

Key decisions

Commit structure (bisectable)

Testing

Automated

Security

Pending manual verification

Post-Deploy Monitoring & Validation

Uh oh!

Uh oh!

Uh oh!

pranavz28 commented Apr 24, 2026

PER-7795 Disk-Spill — Real-Build Verification Report

Test case matrix

Specific resource → disk-file evidence (TC3, live)

Telemetry payload (TC6, temporary probe)

Bug found and fixed during verification

Commands reproduced here for reference

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pranavz28 commented Apr 22, 2026 •

edited

Loading