Skip to content

test(producer): extract frameDirMaxIndexCache to its own module and pin cross-job isolation#381

Open
vanceingalls wants to merge 1 commit intovance/streaming-encoder-lifecycle-testsfrom
vance/frame-dir-cache-isolation-tests
Open

test(producer): extract frameDirMaxIndexCache to its own module and pin cross-job isolation#381
vanceingalls wants to merge 1 commit intovance/streaming-encoder-lifecycle-testsfrom
vance/frame-dir-cache-isolation-tests

Conversation

@vanceingalls
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls commented Apr 21, 2026

Summary

Extract the frameDirMaxIndexCache from a private module-scoped Map inside renderOrchestrator.ts into its own frameDirCache.ts module, then add a 11-test bun:test suite that pins the cross-job isolation contract added in Chunk 5B.

Why

Chunk 9E of plans/hdr-followups.md. The cache lived as a private Map inside renderOrchestrator.ts, which made the cross-job isolation contract from Chunk 5B impossible to unit-test directly. Extracting it both makes the contract testable and reduces orchestrator complexity slightly.

What changed

  • New packages/producer/src/services/frameDirCache.ts exposes getMaxFrameIndex / clearMaxFrameIndex / getMaxFrameIndexCacheSize (plus a test-only __resetMaxFrameIndexCacheForTests helper). Behavior is unchanged: callers still get the same module-scoped sharing inside a job, and renderOrchestrator's outer finally still clears every entry it registered so the cache cannot grow monotonically across renders.
  • renderOrchestrator.ts: imports the new helpers, drops the unused readdirSync import, updates inline comments, and replaces two frameDirMaxIndexCache.delete sites with clearMaxFrameIndex.
  • New frameDirCache.test.ts (bun:test, 11 tests) covering:
    • Reading the max index from a populated directory.
    • Ignoring filenames that don't match frame_NNNN.png (wrong ext, wrong prefix, wrong case, double extension, empty index group, same-named subdirectory).
    • Empty- and missing-directory paths returning 0 and being cached.
    • Intra-job invariant: subsequent readdir mutations not observed once cached.
    • clearMaxFrameIndex forcing a re-read; returns false for paths that were never cached.
    • Per-directory isolation when multiple directories are registered.
    • The cross-job contract from Chunk 5B: cache empty between well-behaved jobs, doesn't grow monotonically across 20 simulated renders with 3 HDR videos each (steady-state cache size stays at 3), and a buggy job that forgets to clear leaks exactly its own entries rather than affecting unrelated jobs.

Test plan

  • frameDirCache.test.ts 11/11 pass.
  • Existing producer tests unchanged.
  • Behavior preserved: same module-scoped sharing inside a job, same outer-finally eviction.

Stack

Chunk 9E of plans/hdr-followups.md. Test-driven extraction; complements Chunk 5B.

Copy link
Copy Markdown
Collaborator Author

vanceingalls commented Apr 21, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@vanceingalls vanceingalls force-pushed the vance/frame-dir-cache-isolation-tests branch from e4eb138 to d781d68 Compare April 22, 2026 02:03
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting frameDirMaxIndexCache into its own module is the right move after #371 added the eviction logic — a module-scoped Map with cross-cutting add/evict/sweep semantics deserves its own seam rather than living as a top-level constant in renderOrchestrator.ts. The cross-job isolation tests are the bit I most care about: the previous lifetime-leak bug was about the cache persisting entries across renders, and the only durable way to prevent regression is to explicitly assert that a new render starts with a clean cache state relative to the previous one. A unit test with two mock render jobs where job 2 must not see job 1's entries is the right shape.

Three things worth considering for followups (non-blocking):

  1. Bounded-size enforcement. #371 added eviction but didn't add a size cap. If some future render path forgets to call the eviction block, the cache still self-limits at something reasonable (say, max 1000 entries with LRU eviction). Defense in depth.
  2. Metric emission on eviction. A Datadog counter on "cache entries evicted during render" would make it observable when something's leaking; currently the only signal of a regression is process memory climbing under sustained load.
  3. The cache key shape should probably include the render-job ID explicitly (not implicit via the frameDir path) — makes cross-job contamination mechanically impossible rather than just tested-against.

Approved.

Rames Jusso

@vanceingalls vanceingalls force-pushed the vance/streaming-encoder-lifecycle-tests branch from 9b2eefa to e270588 Compare April 22, 2026 03:03
@vanceingalls vanceingalls force-pushed the vance/frame-dir-cache-isolation-tests branch from d781d68 to fb6866c Compare April 22, 2026 03:03
@vanceingalls vanceingalls force-pushed the vance/streaming-encoder-lifecycle-tests branch from e270588 to 445409b Compare April 22, 2026 03:34
@vanceingalls vanceingalls force-pushed the vance/frame-dir-cache-isolation-tests branch from fb6866c to 6cc7215 Compare April 22, 2026 03:34
@vanceingalls vanceingalls force-pushed the vance/streaming-encoder-lifecycle-tests branch from 445409b to d83c70c Compare April 22, 2026 04:45
@vanceingalls vanceingalls force-pushed the vance/frame-dir-cache-isolation-tests branch from 6cc7215 to c9aefa7 Compare April 22, 2026 04:45
@vanceingalls
Copy link
Copy Markdown
Collaborator Author

Thanks for the approval @jrusso1020. Addressed (1) directly in this PR; (2) and (3) are tracked as follow-ups.

1. Bounded-size enforcement — done in this PR.

The module now enforces a hard MAX_ENTRIES = 1000 cap with O(1) approximate LRU eviction (Map insertion-order trick: `delete`-then-`set` on every hit moves the entry to most-recently-used; `Map.keys().next()` returns the LRU on insert when at cap). The orchestrator's `finally` remains the primary boundedness mechanism — the LRU cap is defense in depth so a future code path that forgets `clearMaxFrameIndex` cannot leak without bound. Worst-case resident size: ~88 KB. Sized at ~100× the working set of a single job, so well-behaved callers never trip it.

This is documented at the top of `packages/producer/src/services/frameDirCache.ts` and pinned by three tests in `frameDirCache.test.ts`:

  • `bounded LRU: cache size never exceeds MAX_ENTRIES under sustained insert pressure`
  • `bounded LRU: oldest entries are evicted first, newest are retained`
  • `bounded LRU: re-accessing an entry bumps its recency and protects it from eviction`

The cross-job isolation contract you most cared about is also pinned at the unit-test layer with three explicit tests:

  • `cross-job isolation: cache is empty between jobs when callers honor the contract`
  • `cross-job isolation: cache does not grow monotonically across many jobs`
  • `cross-job isolation: a job that forgets to clear leaks exactly its own entries (regression bound)`

2. Metric emission on eviction — follow-up.

Not in this PR. The mechanical hooks are in place (the LRU eviction site in `getMaxFrameIndex` and `getMaxFrameIndexCacheSize()` for sampling), so wiring a Datadog counter is a small follow-up. Tracked in `plans/hdr-followups.md`.

3. Job-ID-scoped cache key — follow-up.

Not in this PR. The current key is the `frameDir` path string, which is implicitly job-scoped because `workDir` is per-job, but you're right that an explicit `{ jobId, frameDir }` composite key would make cross-job contamination mechanically impossible rather than tested-against. Will need a small `renderOrchestrator.ts` plumbing change to thread the job ID. Tracked in `plans/hdr-followups.md`.

No outstanding action items in this PR.

@qodo-ai-reviewer
Copy link
Copy Markdown

Hi, Several LRU tests use hardcoded /tmp/... paths instead of tmpdir(), which is non-portable and will fail or behave unexpectedly on environments without a /tmp (notably Windows) if these tests are later wired into CI.

Severity: remediation recommended | Category: maintainability

How to fix: Use tmpdir() for paths

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

The LRU tests hardcode /tmp/... paths, which isn’t portable to Windows.

Issue Context

These are currently intended as synthetic “missing paths” to avoid filesystem overhead, but they can still be generated using tmpdir() without creating directories.

Fix Focus Areas

  • packages/producer/src/services/frameDirCache.test.ts[261-316]

Suggested changes

  • Replace string literals like /tmp/frame-cap-test-${i} with join(tmpdir(), frame-cap-test-${i}).
  • Keep them non-existent (don’t mkdir), preserving the original performance intent.

Found by Qodo code review

@vanceingalls vanceingalls force-pushed the vance/streaming-encoder-lifecycle-tests branch from d83c70c to c4cf447 Compare April 22, 2026 18:01
@vanceingalls vanceingalls force-pushed the vance/frame-dir-cache-isolation-tests branch from c9aefa7 to 8bda232 Compare April 22, 2026 18:01
@vanceingalls vanceingalls force-pushed the vance/streaming-encoder-lifecycle-tests branch from c4cf447 to 88f43c4 Compare April 22, 2026 18:35
@vanceingalls vanceingalls force-pushed the vance/frame-dir-cache-isolation-tests branch 2 times, most recently from 01404fa to 7f94754 Compare April 22, 2026 18:50
@vanceingalls vanceingalls force-pushed the vance/streaming-encoder-lifecycle-tests branch 2 times, most recently from 736c5bc to 4502711 Compare April 22, 2026 18:59
@vanceingalls vanceingalls force-pushed the vance/frame-dir-cache-isolation-tests branch from 7f94754 to 3aaaa48 Compare April 22, 2026 18:59
@vanceingalls vanceingalls force-pushed the vance/streaming-encoder-lifecycle-tests branch from 4502711 to de981e0 Compare April 22, 2026 19:34
@vanceingalls vanceingalls force-pushed the vance/frame-dir-cache-isolation-tests branch from 3aaaa48 to a26c7c9 Compare April 22, 2026 19:34
@vanceingalls vanceingalls force-pushed the vance/streaming-encoder-lifecycle-tests branch from de981e0 to acc15ce Compare April 22, 2026 20:45
@vanceingalls vanceingalls force-pushed the vance/frame-dir-cache-isolation-tests branch from a26c7c9 to 3998701 Compare April 22, 2026 20:45
@vanceingalls vanceingalls force-pushed the vance/streaming-encoder-lifecycle-tests branch from acc15ce to ddcef60 Compare April 22, 2026 20:48
@vanceingalls vanceingalls force-pushed the vance/frame-dir-cache-isolation-tests branch 2 times, most recently from 04f1932 to d62dbc2 Compare April 22, 2026 22:13
@vanceingalls vanceingalls force-pushed the vance/streaming-encoder-lifecycle-tests branch 2 times, most recently from 76b62e3 to f47ae00 Compare April 22, 2026 22:52
…in cross-job isolation

Chunk 9E. The frame-directory max-index cache lived as a private
module-scoped Map inside renderOrchestrator.ts, which made the cross-job
isolation contract added in Chunk 5B impossible to unit-test directly.

Extract the cache into packages/producer/src/services/frameDirCache.ts
behind getMaxFrameIndex / clearMaxFrameIndex / getMaxFrameIndexCacheSize
(plus a test-only __resetMaxFrameIndexCacheForTests helper). Behavior is
unchanged: callers still get the same module-scoped sharing inside a
job, and renderOrchestrator's outer finally still clears every entry it
registered so the cache cannot grow monotonically across renders.

renderOrchestrator now imports the new helpers, drops the unused
readdirSync import, and updates the inline comments to point at the new
module. Two cleanup sites that previously called
frameDirMaxIndexCache.delete now call clearMaxFrameIndex.

Add frameDirCache.test.ts (bun:test) with 11 tests covering:
  - reading the max index from a populated directory
  - ignoring filenames that do not match frame_NNNN.png (wrong ext,
    wrong prefix, wrong case, double extension, empty index group, and
    a same-named subdirectory)
  - empty- and missing-directory paths returning 0 and being cached
  - the intra-job invariant that subsequent readdir mutations are not
    observed once a directory has been cached
  - clearMaxFrameIndex forcing a re-read and returning false for paths
    that were never cached
  - per-directory isolation when multiple directories are registered
  - the cross-job contract from Chunk 5B: the cache is empty between
    well-behaved jobs, does not grow monotonically across 20 simulated
    renders with 3 HDR videos each (steady-state cache size stays at 3),
    and a buggy job that forgets to clear leaks exactly its own entries
    rather than affecting unrelated jobs.

Made-with: Cursor
@vanceingalls vanceingalls force-pushed the vance/frame-dir-cache-isolation-tests branch from d62dbc2 to b171d31 Compare April 22, 2026 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants