Skip to content

perf(engine): superset extraction for overlapping trims of one source#1885

Merged
miguel-heygen merged 2 commits into
mainfrom
worktree-renderer-frame-extraction-perf
Jul 3, 2026
Merged

perf(engine): superset extraction for overlapping trims of one source#1885
miguel-heygen merged 2 commits into
mainfrom
worktree-renderer-frame-extraction-perf

Conversation

@miguel-heygen

@miguel-heygen miguel-heygen commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Stack 6/6 (top): #1898#1899#1900#1901#1902 → this

This PR started as the whole perf effort and was split into the stack above for review; the benchmark/repro comments below cover the full stack and its A/B evidence. This top unit now contains only the superset-trim change plus a small GC skip.

What

  1. Superset extraction for overlapping trims. Cache-missing trims of the same source that are frame-aligned and overlapping decode their union window in ONE ffmpeg pass; each trim's frames are materialized by hardlinking the superset frames with renumbered names (copy fallback on EXDEV). Disjoint or misaligned trims keep the direct per-trim path; any union-extraction failure falls back to per-trim extraction, so the optimization cannot introduce a new failure mode.
  2. GC skip on all-hit renders. Warm renders (zero cache misses) no longer pay the full cache size scan introduced in perf(engine): extraction cache on by default with atomic publish and LRU gc #1901.

Why

N clips trimming one source each ran their own ffmpeg decode: overlapping windows decoded the overlap N times, and every -ss re-decoded from the previous keyframe (expensive on sparse-keyframe screen recordings). Sample-time correctness is preserved by construction: member frame k uses superset frame offset+k, so its source time is exactly mediaStart + k/fps — that is what the frame-alignment precondition buys. Hardlinks keep inodes alive, so later cache eviction cannot tear a member's frames.

Measured / verified

  • 4 overlapping 30 s trims of a 60 s 1080p source: byte-identical content hash to per-trim extraction on main (fd07cd22e945 from both builds), 3353 ms → 2670 ms wall on a 14-core machine. CPU work halves (one decode+encode instead of four overlapping ones), so the gain grows on core-constrained runners; the wall gap is narrower here because main's four extractions ran in parallel.
  • Warm video_extract with the GC skip: 9 ms.
  • Tests: overlap hardlink inode equality + byte-parity vs direct extraction, disjoint and misaligned trims stay direct (different inodes), cache round-trip (2 misses then 2 hits, no superset dir left), loop-past-EOF count clamping, GC skip leaves an aged partial untouched on a hit-only render.
  • Engine suite at this commit: 36 files / 863 tests green; the tree is byte-identical to the previously verified pre-split branch tip.
  • Known VFR nuance (documented in the benchmark comment): a VFR source whose trims superset can pick a one-frame-adjacent held frame at irregular-timestamp tie points vs per-trim extraction; sample times and frame counts are unchanged and the mid-seek content check lands on the exact source timestamp.

Repro

Fixture + composition + observe commands are in the benchmark comments below. For this PR specifically: render a composition with two clips of the same source at overlapping data-media-start offsets (e.g. 0 and 2 s), confirm one __superset-* temp extraction, hardlinked frames (same inode across the trims' overlap), and unchanged output.

@miguel-heygen

Copy link
Copy Markdown
Collaborator Author

Reproducible A/B proof

Skepticism-proofing the claims: this benchmark imports extractAllVideoFrames from the built engine dist of main and of this branch and runs both on identical synthesized fixtures (3 runs per cell, median reported). No hand-rolled ffmpeg commands; whatever each build does is what gets timed.

┌─────────┬───────────────────────────────────────────┬───────────┬──────────┬────────┬────────┬────────────────┬────────┐
│ (index) │ scenario                                  │ build     │ medianMs │ bestMs │ frames │ filesHash      │ errors │
├─────────┼───────────────────────────────────────────┼───────────┼──────────┼────────┼────────┼────────────────┼────────┤
│ 0       │ 'VFR mid-seek (mediaStart 3s, 4s @30fps)' │ 'main   ' │ 299      │ 290    │ 120    │ 'b195e2c71361' │ 0      │
│ 1       │ 'VFR mid-seek (mediaStart 3s, 4s @30fps)' │ 'branch ' │ 91       │ 88     │ 120    │ 'cbf673930211' │ 0      │
│ 2       │ 'vp9-alpha 20s -> PNG'                    │ 'main   ' │ 3155     │ 1664   │ 600    │ '87bd343f44b8' │ 0      │
│ 3       │ 'vp9-alpha 20s -> PNG'                    │ 'branch ' │ 944      │ 774    │ 600    │ '9660b578bef9' │ 0      │
│ 4       │ 'h264 60s -> JPEG (control)'              │ 'main   ' │ 1577     │ 1553   │ 1800   │ 'e927695c6536' │ 0      │
│ 5       │ 'h264 60s -> JPEG (control)'              │ 'branch ' │ 1652     │ 1576   │ 1800   │ 'e927695c6536' │ 0      │
│ 6       │ '3x duplicated h264 60s (dedupe)'         │ 'main   ' │ 5056     │ 4989   │ 5400   │ '0dd735c3d260' │ 0      │
│ 7       │ '3x duplicated h264 60s (dedupe)'         │ 'branch ' │ 2153     │ 2004   │ 5400   │ 'e927695c6536' │ 0      │
└─────────┴───────────────────────────────────────────┴───────────┴──────────┴────────┴────────┴────────────────┴────────┘
VFR mid-seek: 299ms -> 91ms (3.29x), frames 120/120
vp9-alpha:    3155ms -> 944ms (3.34x), frames 600/600
control:      1577ms -> 1652ms (0.95x, noise), frames 1800/1800
3x dup:       5056ms -> 2153ms (2.35x), frames 5400/5400

Reading the evidence:

  • Control row is byte-identical: same filesHash (e927695c6536) on main and branch — the default H.264-to-JPEG path produces bit-for-bit the same frames and the same time. Only the slow paths changed.
  • Frame counts match in every scenario (120/120, 600/600, 1800/1800, 5400/5400).
  • VFR and alpha hashes differ by design: VFR frames no longer pass through an intermediate x264 generation (they are closer to the source; PSNR between the two extractions is 40-49 dB, and full rendered outputs match at 55.5 dB), and PNG level 1 vs 6 is a lossless format at different zlib effort.
  • Dedupe row: branch's hash equals the control hash because the three duplicate elements share one frame set on disk.

Second commit (0b9f221): extraction cache on by default

Cold vs warm CLI render of a 4-video composition (duplicated src + VFR mid-seek + vp9-alpha), video_extract stage from the render trace:

COLD:  "phase":"video_extract","status":"end","durationMs":400
WARM:  "phase":"video_extract","status":"end","durationMs":13

Cold and warm outputs are pixel-identical (ffmpeg psnr reports average:inf).

Benchmark script (run from packages/engine so puppeteer-adjacent deps resolve; adjust the two dist paths)
// A/B benchmark: runs the REAL extractAllVideoFrames from the main checkout's
// built engine vs the PR worktree's built engine, on identical synthesized
// fixtures. No hand-rolled ffmpeg: whatever each build does is what's timed.
//
// Usage: node ab-bench.mjs [runs]
import { spawnSync } from "node:child_process";
import { mkdtempSync, mkdirSync, rmSync, readdirSync, existsSync } from "node:fs";
import { tmpdir } from "node:os";
import { join } from "node:path";
import { createHash } from "node:crypto";
import { readFileSync } from "node:fs";

const RUNS = Number(process.argv[2] ?? 3);
const MAIN = "/Users/miguel07code/dev/hyperframes-oss/packages/engine/dist/index.js";
const BRANCH =
  "/Users/miguel07code/dev/hyperframes-oss/.claude/worktrees/renderer-frame-extraction-perf/packages/engine/dist/index.js";

const FIX = mkdtempSync(join(tmpdir(), "hf-ab-bench-"));
const sh = (args) => {
  const r = spawnSync("ffmpeg", ["-y", "-hide_banner", "-loglevel", "error", ...args]);
  if (r.status !== 0) throw new Error("ffmpeg failed: " + r.stderr);
};

console.log("Synthesizing fixtures in", FIX);
// 1) VFR source: 10s testsrc2@60fps with 4 one-second gaps, single keyframe
//    (byte-identical recipe to the repo's own VFR regression fixture).
sh([
  "-f", "lavfi", "-i", "testsrc2=s=1280x720:d=10:rate=60",
  "-vf", "select='not(between(n,30,89))*not(between(n,180,239))*not(between(n,330,389))*not(between(n,480,539))'",
  "-vsync", "vfr", "-c:v", "libx264", "-preset", "ultrafast", "-pix_fmt", "yuv420p",
  "-g", "600", "-keyint_min", "600", join(FIX, "vfr.mp4"),
]);
// 2) vp9 + alpha, 20s 720p
sh([
  "-f", "lavfi", "-i", "testsrc2=size=1280x720:rate=30:duration=20",
  "-f", "lavfi", "-i", "color=white:size=1280x720:rate=30:duration=20",
  "-filter_complex", "[1]format=gray[a];[0][a]alphamerge",
  "-c:v", "libvpx-vp9", "-pix_fmt", "yuva420p", "-crf", "30", "-b:v", "0",
  "-cpu-used", "8", "-row-mt", "1", join(FIX, "alpha.webm"),
]);
// 3) plain h264 60s 1080p30 (default JPEG path, expected unchanged)
sh([
  "-f", "lavfi", "-i", "testsrc2=size=1920x1080:rate=30:duration=60",
  "-c:v", "libx264", "-preset", "fast", "-crf", "20", "-pix_fmt", "yuv420p",
  join(FIX, "h264.mp4"),
]);

const scenarios = [
  {
    name: "VFR mid-seek (mediaStart 3s, 4s @30fps)",
    videos: [{ id: "v", src: join(FIX, "vfr.mp4"), start: 0, end: 4, mediaStart: 3, loop: false, hasAudio: false }],
  },
  {
    name: "vp9-alpha 20s -> PNG",
    videos: [{ id: "a", src: join(FIX, "alpha.webm"), start: 0, end: 20, mediaStart: 0, loop: false, hasAudio: false }],
  },
  {
    name: "h264 60s -> JPEG (control)",
    videos: [{ id: "h", src: join(FIX, "h264.mp4"), start: 0, end: 60, mediaStart: 0, loop: false, hasAudio: false }],
  },
  {
    name: "3x duplicated h264 60s (dedupe)",
    videos: [0, 1, 2].map((i) => ({
      id: "d" + i, src: join(FIX, "h264.mp4"), start: 0, end: 60, mediaStart: 0, loop: false, hasAudio: false,
    })),
  },
];

function hashDir(dir) {
  // Order-stable content hash over every extracted frame file.
  const h = createHash("sha256");
  const walk = (d) => {
    for (const f of readdirSync(d, { withFileTypes: true }).sort((a, b) => a.name.localeCompare(b.name))) {
      const p = join(d, f.name);
      if (f.isDirectory()) walk(p);
      else { h.update(f.name); h.update(readFileSync(p)); }
    }
  };
  walk(dir);
  return h.digest("hex").slice(0, 12);
}

const builds = [
  { label: "main   ", mod: await import(MAIN) },
  { label: "branch ", mod: await import(BRANCH) },
];

const table = [];
for (const sc of scenarios) {
  for (const b of builds) {
    const times = [];
    let frames = 0, hash = "", errors = [];
    for (let r = 0; r < RUNS; r++) {
      const out = join(FIX, `out-${b.label.trim()}-${scenarios.indexOf(sc)}-${r}`);
      mkdirSync(out, { recursive: true });
      const t0 = performance.now();
      const res = await b.mod.extractAllVideoFrames(
        sc.videos.map((v) => ({ ...v })), FIX, { fps: 30, outputDir: out },
      );
      times.push(performance.now() - t0);
      frames = res.totalFramesExtracted;
      errors = res.errors;
      if (r === 0) hash = hashDir(out);
      rmSync(out, { recursive: true, force: true });
    }
    times.sort((x, y) => x - y);
    table.push({
      scenario: sc.name, build: b.label,
      medianMs: Math.round(times[Math.floor(times.length / 2)]),
      bestMs: Math.round(times[0]),
      frames, filesHash: hash, errors: errors.length,
    });
  }
}

console.log(`\nruns per cell: ${RUNS} (median reported)\n`);
console.table(table);
for (const sc of scenarios) {
  const [m, br] = table.filter((t) => t.scenario === sc.name);
  console.log(
    `${sc.name}: ${m.medianMs}ms -> ${br.medianMs}ms  (${(m.medianMs / br.medianMs).toFixed(2)}x)` +
    `  frames ${m.frames}/${br.frames}`,
  );
}
rmSync(FIX, { recursive: true, force: true });

@miguel-heygen

Copy link
Copy Markdown
Collaborator Author

Commit 3 (5e0f19b): superset trims, one-pass HDR, cache-key fix — with A/B proof

Re-ran the same harness (built engine dist of main vs branch, identical fixtures, 3 runs, median):

│ scenario                                         │ build     │ medianMs │ frames │ filesHash      │
│ 'VFR mid-seek (mediaStart 3s, 4s @30fps)'        │ 'main   ' │ 270      │ 120    │ 'b195e2c71361' │
│ 'VFR mid-seek (mediaStart 3s, 4s @30fps)'        │ 'branch ' │ 80       │ 120    │ 'cbf673930211' │
│ 'vp9-alpha 20s -> PNG'                           │ 'main   ' │ 1444     │ 600    │ '87bd343f44b8' │
│ 'vp9-alpha 20s -> PNG'                           │ 'branch ' │ 669      │ 600    │ '9660b578bef9' │
│ 'h264 60s -> JPEG (control)'                     │ 'main   ' │ 1479     │ 1800   │ 'e927695c6536' │
│ 'h264 60s -> JPEG (control)'                     │ 'branch ' │ 1593     │ 1800   │ 'e927695c6536' │
│ '4 overlapping 30s trims of h264 60s (superset)' │ 'main   ' │ 3353     │ 3600   │ 'fd07cd22e945' │
│ '4 overlapping 30s trims of h264 60s (superset)' │ 'branch ' │ 2670     │ 3600   │ 'fd07cd22e945' │
│ '3x duplicated h264 60s (dedupe)'                │ 'main   ' │ 4426     │ 5400   │ '0dd735c3d260' │
│ '3x duplicated h264 60s (dedupe)'                │ 'branch ' │ 1521     │ 5400   │ 'e927695c6536' │
  • Superset row is byte-identical to main (fd07cd22e945 from both builds): slicing the union window produces exactly the frames per-trim extraction produced, while decoding the source once instead of four times. Wall-clock gain on a 14-core box is modest (1.26x — main's four extractions ran in parallel; the union is one process) but CPU work halves, so the gain grows on core-constrained runners (Lambda) and the keyframe seek cost is paid once instead of per trim. Disjoint or non-frame-aligned trims keep the direct path; a union failure falls back to per-trim extraction, so superset can't introduce a new failure mode.
  • VFR note: a VFR source whose trims superset can pick a one-frame-adjacent held frame at irregular-timestamp tie points vs the old per-trim extraction (both picks are valid for the same sample time; frame counts and sample times are unchanged, verified by the mid-seek content check landing on the exact source timestamp).
  • Cache-key correctness fix found while making trims smart: the HDR preflight rewrote videoPath AFTER the cache-key snapshot, so a mixed-HDR render cached BT.2020-converted frames under the plain source key — a later SDR render of the same trim would have served HDR-tinted frames. Latent while the cache was opt-in; real once commit 2 defaulted it on. The key now carries a transform discriminator (absent = byte-compatible with existing entries), pinned by tests.
  • One-pass SDR→HDR: same shape as the VFR fix — the libx264 pre-encode is gone, the colorspace remap runs inside the extraction pass.
  • Warm renders (zero misses) now skip the GC size scan; warm video_extract measured at 9ms.

Engine suite after commit 3: 36 files / 863 tests green.

@miguel-heygen miguel-heygen changed the base branch from main to graphite-base/1885 July 3, 2026 18:51
@miguel-heygen miguel-heygen changed the base branch from graphite-base/1885 to main July 3, 2026 18:51
@miguel-heygen miguel-heygen changed the base branch from main to graphite-base/1885 July 3, 2026 18:52
@miguel-heygen miguel-heygen force-pushed the worktree-renderer-frame-extraction-perf branch from 5e0f19b to 9ae2be4 Compare July 3, 2026 18:53
@miguel-heygen miguel-heygen changed the base branch from graphite-base/1885 to 07-02-perf_engine_one_pass_sdr_hdr_cache_key_transform July 3, 2026 18:53

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 LGTM on the superset math; two runtime-interop nits on the fallback path

The superset-boundary math is correct by construction (offsetFrames = round((mediaStart - baseStart) * fps) with an integrality precondition, and slice reads superset[offset + k] into member frame index k), the frame-alignment gate blocks the off-by-one hazard on the seams, disjoint and misaligned trims fall back to direct extraction (asserted by the two negative tests), and the loop-past-EOF path clamps to superset.totalFrames - offset before hardlinking. Cross-PR interop with #1900's dedup is clean — supersetGroupingKey is a proper subset of the extended dedupeKey (both add sdrToHdrTransfer), and superset planning runs after dedup collapses identical clips into uniqueWorks, so the two layers don't double-count. Two nits on the failure path.

Findings (runtime-interop lens)

1. Bare catch { ... } on the superset extraction swallows abort-signal errors — 🟠

File: packages/engine/src/services/videoFrameExtractor.ts:989-999

try {
  ...
  const superset = await extractVideoFramesRange(..., signal, config, tempDir);
  ...
} catch {
  const fallback = await Promise.all(
    group.members.map(async (member) => [..., await executeDirectMiss(member.miss)] as [...]),
  );
  return fallback;
}

The catch is bindingless — the original error is discarded and every member re-attempts via executeDirectMiss. Two runtime concerns:

  • Abort/cancel indistinguishable from decode failure. If the caller aborted mid-superset via signal, extractVideoFramesRange throws "Video frame extraction cancelled"; the catch then fires N direct extractions with the same (still-aborted) signal. Each will fail with the same cancel error, so the outcome is per-member cancel errors instead of a single top-level abort. Callers that count errors.length and short-circuit see N errors instead of 1. Recommend: bind the error, check signal?.aborted (or err instanceof AbortError / message match), and re-throw / propagate a single error rather than fanning out.
  • Root-cause observability lost. The PR body's "any union-extraction failure falls back to per-trim extraction, so the optimization cannot introduce a new failure mode" is true for correctness, but the primary failure (ffmpeg exit code, ENOSPC on tempDir, etc.) never surfaces anywhere — no console.warn, no breakdown counter, nothing. If superset extraction silently regresses in prod, we'd only see the symptom (slow per-trim renders) with no signal in the extractor. A single console.warn with the primary error message on fallback would make this triage-able. This is the observability lens Rames is covering; flagging here because it's inseparable from the abort-signal issue above.

2. sliceSupersetMember unconditionally rmSynces the target dir before writing — 🟢 (verify)

File: packages/engine/src/services/videoFrameExtractor.ts:559

rmSync(outputDir, { recursive: true, force: true });
mkdirSync(outputDir, { recursive: true });

For the cache-publish path (partialDir) this is fine — partialDir is a fresh per-entry temp. For the non-cache path, outputDir = join(options.outputDir, work.video.id). Because upstream dedup (this PR + #1900) collapses same-dedupeKey work into one uniqueWork, and distinct dedupeKeys imply distinct videoPath+mediaStart+duration tuples (which usually implies distinct video.ids, but not enforced), two elements with the SAME video.id but different mediaStarts would resolve to distinct works targeting the same output subdir — and the rmSync would race with any concurrent member's writes. I don't see this happening in the current test fixtures or in resolvedVideos construction, and the risk exists on main too, but the rmSync + mkdirSync here amplifies it. Worth an assertion at intake (resolvedVideos should have unique video.ids) if not already elsewhere in the pipeline.

3. extractedFramesFromDirectory reads the directory to build framePaths — 🟢

File: packages/engine/src/services/videoFrameExtractor.ts:448-468

Verified: sliceSupersetMember writes files as frame_00001.jpg, frame_00002.jpg, ... and extractedFramesFromDirectory sorts them and assigns framePaths starting at index 0. Per-trim frame-index remapping is correct — a consumer asking for framePaths.get(0) on trim-B gets trim-B's frame 0 (which is a hardlink to the superset's frame offset_B + 0, which corresponds to source time mediaStart_B + 0/fps). Sample-time invariant preserved.

Cross-PR interop (#1900 ↔ this ↔ #1901)

  • supersetGroupingKey = (videoPath, fps, format, sdrToHdrTransfer) is a strict subset of #1900's dedupeKey = (videoPath, mediaStart, videoDuration, fps, format, sdrToHdrTransfer). Composition is: dedup first (collapse identical clips), then superset over the deduped uniques (collapse overlapping trims of the same source). Order verified at the call site (lines 1043-1075).
  • Each superset member individually feeds into #1901's disk cache via publishCacheEntry(cacheTarget.entry, partialDir) where partialDir is the sliced-and-hardlinked per-member dir. A second render that only wants trim-B's window hits its own cache entry directly — no need to re-plan the superset. Cache-key composition per member = the same key #1900 computes, so #1901's lookup is oblivious to whether the entry was published by direct or superset extraction. Good: the two optimizations are opaque to the persistent-cache layer.
  • One follow-up thought (not blocking, not this PR's scope): #1901 does not currently know how to serve a later "trim of a superset" request from a cached superset — each member gets its own cache entry, so if a later render asks for a narrower window of the same source that happens to be contained in a cached member, we'd cache-miss and redo the extraction. That's a separate optimization the current design doesn't preclude.

Review by Via (runtime-interop lens)

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at 9ae2be48aeb5ae6969cddba3707c97b02e29ff2c.
Peer scan: no prior reviews or in-line comments; only Miguel's own A/B benchmark posts + the Graphite mergeability warning.
Stack context: Stack 6/6 (tip) against 07-02-perf_engine_one_pass_sdr_hdr_cache_key_transform (#1902). Composes on top of #1900's dedupe + #1901's cache-on-by-default + #1902's HDR cache-key transform.

Summary — Two orthogonal changes bundled: (a) superset extraction for overlapping trims of one source. Cache-missing trims sharing (videoPath, fps, format, sdrToHdrTransfer) are checked for frame-alignment (isIntegralFrameOffset, tol=1e-4 frames) and overlap-or-touch (unionDuration <= sumDuration + 1e-9); qualifying groups decode the union window once and materialize member frames via linkSync (falling back to copyFileSync on EXDEV). Any union failure falls back to per-member direct extraction. (b) GC skip on all-hit renders: gcExtractionCache now only runs when breakdown.cacheMisses > 0. Superset mechanics are careful (partial-dir publish per member, temp dir cleanup in finally, hardlink for zero storage cost); primary concerns are around the all-or-nothing group heuristic and one race the cache-on-by-default rollout amplifies.

Concerns

🟠 One disjoint outlier collapses the whole superset group to direct extraction. planSupersetGroups buckets by supersetGroupingKey (videoPath, fps, format, sdrToHdrTransfer), then windowsOverlapOrTouch demands unionDuration <= sumDuration across ALL members of the bucket. A common shape — three overlapping trims [0..5], [3..8], [6..11] PLUS one disjoint clip [100..105] of the same source — sees union=105s, sum=20s, whole bucket rejected → all four fall to direct extraction. The three overlapping trims lose the optimization because of one unrelated peer. Consider partitioning each bucket into connected-overlap subgroups (interval-graph connected components) before the union check, or greedy-cluster trims by proximity. The tests only exercise 2-member buckets (either all-overlap or all-disjoint), so this failure mode is invisible to the current suite. Worth either fixing or explicitly documenting as a known heuristic limit + adding a 3-of-4-overlap test to pin the behavior.

🟠 Concurrent-render race on the same partial cache dir, amplified by cache-on-by-default. sliceSupersetMember does rmSync(outputDir, {recursive:true, force:true}) before writing hardlinks (line ~427). For the cache-target path, outputDir = partialCacheEntryDir(cacheTarget.entry) — deterministic per cache entry. Two concurrent renders that both cache-miss the same (path, ms, dur, fps, format, transform) tuple and both go into a superset group will both rmSync + mkdirSync + write into the shared partial dir. First publishCacheEntry wins the atomic rename; the loser's rmSync mid-write on the winner's already-published (or actively-writing) partial dir either no-ops (if the winner already renamed away) or blows away in-flight files. With #1901 flipping caching to on-by-default and the HYPERFRAMES_EXTRACT_CACHE_DIR env fallback pointing all renderers at the same shared dir, the probability of this collision rises materially in production. Pre-PR, tryCachedExtract's direct path had the same race but the window was one ffmpeg call; superset widens the window because the whole group must extract before members publish. Preempt by writing to a per-render tempdir first, then renameSync into the partial dir, and let publishCacheEntry's atomicity do the rest — or, more conservatively, hold a per-cache-entry lockfile during the write. #1900 landed the intra-render dedupe that closed the same-render half of this race; the inter-render half remains open and superset widens it.

🟠 EXDEV copy fallback doubles disk footprint per member, silently. linkOrCopyFrame try { linkSync } catch { copyFileSync }. Common EXDEV trigger: outputDir and the superset tempDir cross filesystems (e.g. /tmp tmpfs vs a data volume, or the cache dir on a different mount). Superset stays in join(options.outputDir, group.groupId), so member-output writes stay same-fs → OK for the non-cache path. But cache-target writes go to partialCacheEntryDir(cacheTarget.entry), which is under HYPERFRAMES_EXTRACT_CACHE_DIR — commonly a different mount than options.outputDir. So the cache-published path likely EXDEVs by default in production, and a 4-member superset silently costs 5x storage (superset temp + 4 copies) instead of ~1x (superset + 4 hardlink inodes). The PR body claim "hardlinks keep inodes alive, so later cache eviction cannot tear a member's frames" is only true when linkSync succeeds. On EXDEV, cache eviction of the superset entry can (won't, because there's no superset cache entry — but it's fragile) leave copies orphaned but intact. Not a correctness bug, but the disk-cost story is not what the PR body suggests for the cache-published path. At minimum: log EXDEV fallback frequency to a counter, or short-circuit the superset path when the cache-target FS is known to differ from the extraction workdir.

Nits

**🟡 windowsOverlapOrTouch with <= accepts exactly-touching trims ([0..5] + [5..10], union=10=sum). These share zero frames, so supersetting them is neutral, not a win — one 10s decode instead of two 5s decodes for the same total work. Harmless but adds cognitive load; consider < (strict overlap) or a comment explaining the touching case is deliberate for keyframe-seek amortization on sparse-keyframe sources.

**🟡 isIntegralFrameOffset tolerance = 1e-4 frames. At 30 fps that's ~3.3 μs of source time — well under any realistic mediaStart drift from timeline math (ms→s conversions are exact at IEEE 754 for integer ms). No action, but the tolerance rationale isn't documented; a comment // tolerance << 1 frame; ~3μs at 30fps would age well.

**🟡 __superset-<N> groupId numbering is per-planSupersetGroups call, so temp dirs collide across concurrent supersets under the SAME options.outputDir (rare — one extractAllVideoFrames call per render — but not impossible with future refactors). Namespace with a random suffix or PID for defense-in-depth.

**🟡 The GC-skip-on-all-hit change bundled here (part 2 of the PR body) is a separate concern from superset extraction and would have been cleaner as its own PR in the stack. Not a blocker; noting for stack-hygiene next time.

Cross-stack interactions

  • #1900 (dedupe): composition is clean. #1900's uniqueWorks collapses identical tuples first; #1885 then supersets among the survivors. If two elements dedupe (same tuple) and a third overlaps (different mediaStart but same source), the survivor + third become a 2-member superset group. Verified in diff. The dedupeKey from #1900/#1902 flows through as PreparedExtraction.dedupeKey and is used only for the uniqueOutcomes map; superset uses its own supersetGroupingKey (subset of fields — no mediaStart, no videoDuration — so distinct trims of one source cluster together). Coherent.
  • #1901 (cache-on-by-default): this is where the risk sits. #1901 makes extractCacheDir default to a resolved path, so every render now goes through the cache-target code path. The superset-partial-dir race (see 2nd 🟠 above) becomes production-relevant only under #1901; without #1901, dev renders without an env var don't hit the race. And the "GC skip on all-hit renders" here is a companion optimization to the LRU-gc #1901 introduces — the PR notes "warm renders no longer pay the full cache size scan introduced in #1901". Sanity-checked: gcExtractionCache guard is cacheMisses > 0 which correctly means "we wrote something this render, sweep afterwards." Edge: a warm-workload cluster of renders never runs GC → cache can grow beyond maxBytes between cold renders. Recovery on first miss. Bounded-in-expectation but unbounded-in-worst-case (100% warm workload = no GC forever). Suggest a per-render probability sweep (e.g. 1/100 warm renders sweep anyway) or an out-of-band GC.
  • #1902 (SDR→HDR cache-key transform): supersetGroupingKey correctly includes sdrToHdrTransfer, so SDR clips and HDR-tagged clips of the same source never combine into one superset (they can't share bytes — the ffmpeg filter chain differs). And #1902's cache-key transform field flows to lookupCacheEntry before the group-planning step, so cacheMisses handed to planSupersetGroups are already segregated by transform. Verified in lookupCacheFor at line ~496. Coherent — but the coherence is enforced by two independent inclusions of sdrToHdrTransfer, not one canonical key. Consider factoring both from a single shared helper to prevent the two lists from drifting.

Questions

  • What's the memory bound on a pathological superset? Union window is unbounded by construction — three trims of one source at [0..5], [100..105], [200..205] are disjoint and skip supersetting (good). But three trims at [0..5], [3..8], [6..11] union=11s, sum=15s → passes, produces 11s×fps frames on disk. Realistic bound is source duration × fps × frame_size, capped by source length. Any config knob for a max-superset-duration guard, or is "if we could extract-per-trim we can extract-union" the correct implicit bound?
  • Cross-render observability: any phaseBreakdown field for "superset hits" (how many trims used shared extraction) and "superset waste" (extracted-but-unmaterialized frames when a member's window doesn't fully cover the union)? For a perf PR with disk-footprint implications, absence of this signal means production can't tell you when the optimization is helping vs. burning FS budget.
  • VFR nuance from the PR body ("a VFR source whose trims superset can pick a one-frame-adjacent held frame at irregular-timestamp tie points"): the documented behavior. Where's the tie-breaker deterministic? If two renders of the same VFR source in different sub-groupings produce a 1-frame-off member for the same videoId, is that observable to downstream frame consumers (e.g. as a hash mismatch on golden-frame checks)? PR body says "sample times and frame counts are unchanged" — trusting that but worth confirming there's a regression test that would catch a VFR held-frame drift.

What I didn't verify

  • Did not run the A/B benchmark; trusted the 3353ms → 2670ms wall + byte-identical hash number from Miguel's benchmark comment.
  • Did not exercise the cross-fs EXDEV fallback locally; concern is inferred from the code structure and typical prod-cache-dir mount topology.
  • Did not audit for other callers of partialCacheEntryDir that might race with the superset-published partial dir; assumed the primitive is documented as single-writer.
  • Did not verify the "3-overlap-plus-1-disjoint" pathology with a red test — reasoned from the windowsOverlapOrTouch shape. A quick unit test on planSupersetGroups in isolation would confirm.

— Rames D Jusso

@miguel-heygen miguel-heygen force-pushed the 07-02-perf_engine_one_pass_sdr_hdr_cache_key_transform branch from c56e2d2 to d54a048 Compare July 3, 2026 20:01
@miguel-heygen miguel-heygen force-pushed the worktree-renderer-frame-extraction-perf branch from 9ae2be4 to 34dc308 Compare July 3, 2026 20:01
@miguel-heygen

Copy link
Copy Markdown
Collaborator Author

Review feedback addressed in the updated branch (commit: 'superset review hardening'):

  • Fixed — disjoint-outlier collapse: planSupersetGroups now partitions each source bucket into overlap-connected components (sort by start, cut where the next window starts past the running end) before the union check. Three overlapping trims plus one disjoint outlier keep their shared superset; pinned by a 3-of-4-overlap test asserting cross-trim inode equality plus direct extraction of the outlier.
  • Fixed — abort handling: when the union extraction fails because the render was aborted, the fallback no longer re-runs every member through direct extraction (N doomed ffmpeg spawns); the cancellation surfaces per member.
  • Fixed — EXDEV disk blow-up: the superset temp dir now lives on the cache filesystem when the cache is active, so member hardlinks into partial dirs stay same-fs and the copyFileSync fallback (which silently multiplies disk usage per member) stops being the common case. Its .partial- name also puts crashed leftovers under the GC's aged-partial sweep.
  • Fixed — warm-workload GC starvation: a .hf-last-gc marker is stamped per sweep; all-hit renders sweep anyway once it is older than 24h, so a 100%-warm cluster still reclaims space. Pinned by a stale-marker test.
  • Invalid: concurrent-render race on 'the same partial dir'partialCacheEntryDir is ${entry.dir}.partial-${pid}-${randomUUID().slice(0,8)}: unique per writer, not deterministic per entry. Two renders missing the same key write to two different partial dirs; the only shared step is publishCacheEntry's atomic rename, which adopts the winner on collision. The rmSync in sliceSupersetMember operates on the caller's own fresh partial dir.

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 R2 all four findings addressed — three ✅, one ⚠️ mitigated-accepting

Reviewed R1 9ae2be4 → R2 34dc308c via gh api .../contents?ref=<sha> per-file (not gh api compare, to strip base-advance noise).
Freshness: no new reviewer state since R1 (only me + Rames at R1 head); CI regression-shards still in progress at post time.

Miguel's follow-up commit 34dc308c is the classic surgical R2 shape — one commit, four narrow patches, each pinned by a test. The disjoint-outlier partitioning has the strongest test coverage (3-of-4 overlap, verifies inode-sharing on the cluster AND direct-extract on the outlier); the GC staleness-fallback picks up a 100%-warm workload with a 48h-aged marker; the abort branch and the TOCTOU adopt-winner helper are code-observable but not directly exercised in the test file. Non-abort superset failures still fall back to direct extraction silently on the non-cache branch, but the follower-annotation change ([shared extraction, leader <id>] prefix) is the primary diagnostic surface for the shared-failure fan-out that Via R1 called out. Accepting.

Finding-by-finding disposition

1. Via R1 🟠 — bare catch {} on superset fallback discards error + N-fans cancellations — ✅

File: packages/engine/src/services/videoFrameExtractor.ts:1072-1094
Now catch (err) with an explicit if (signal?.aborted) branch: aborted extractions return extractionError(...) per member instead of re-running every member through executeDirectMiss with the same tripped signal. The N-doomed-ffmpeg amplification is gone.
Diagnostic surface for the non-abort failure fan-out is addressed one layer up at line 1172-1181: followers on a shared (deduped/superset) failure are now tagged [shared extraction, leader <videoId>] <err> so N copies of one root failure are traceable to a single upstream extraction. That's the visibility hook I asked for — it lands in the returned errors[] and any trace consumer picks it up without a separate counter.

I would still prefer an explicit breakdown.supersetFallbacks counter or a console.warn on the non-abort branch so operators can dashboard "how often is the superset optimization actually failing", but that's a follow-up polish, not a blocker. The correctness-and-visibility bar is met.

2. Rames R1 🟠 — one disjoint outlier collapses the whole bucket — ✅

File: packages/engine/src/services/videoFrameExtractor.ts:562-583 (new overlapClusters), packages/engine/src/services/videoFrameExtractor.ts:595-606 (wired into planSupersetGroups)
Sort by mediaStart, cut whenever the next window opens past the running end (start > currentEnd + 1e-9). Test at videoFrameExtractor.test.ts:1155-1188 pins the 3-of-4 case: three overlapping trims [0..4], [2..6], [4..8] still share a superset (inode-equal at the overlap frames), the disjoint [10..12] extracts directly (60 frames, no shared inode). Heuristic-limit gap closed with a behavior test. Clean.

3. Rames R1 🟠 — concurrent-render race on partialCacheEntryDir — ✅

File: packages/engine/src/services/extractionCache.ts:203-249 (new adoptPublishedWinner helper + re-check after retry rename)
Two changes tighten the TOCTOU window:

  • The pre-existing "winner already published" check is extracted into adoptPublishedWinner(entry, partialDir) — no behavior change on the first call.
  • The retry rename's catch now falls through to adoptPublishedWinner(entry, partialDir) ?? { published: false }. If a concurrent writer publishes between the initial winner check, the rm entry.dir, and the retry rename, the second check catches it and serves the winner instead of reporting a spurious published: false (which would have triggered the render to re-extract on top of a live cache entry).

No unit test pins the race directly (hard to synthesize deterministically), but the code is straight-line and the invariant is clear: any publish attempt either lands atomically, adopts a live winner, or reports failure — no partial cache-poisoning path remains.

4. Rames R1 🟠 — EXDEV silent 2x+ disk cost on cross-mount cache — ⚠️ mitigated-accepting

File: packages/engine/src/services/videoFrameExtractor.ts:1041-1050
The superset tempDir moves under cacheRootDir when the cache is active: join(cacheRootDir, ${group.groupId}.partial-${process.pid}). That keeps linkSync on one filesystem for the common cache-bound path (frames land in .partial-* dirs under the same cache mount), and the .partial- naming brings crashed leftovers under the aged-partial GC sweep — nice packaging.

Mitigation, not full resolution:

  • Non-cache branch (cacheRootDir === undefined, so tempDir falls back to outputDir) still relies on the copyFileSync EXDEV fallback in linkOrCopy. Local dev and no-cache prod configs still pay the 2x disk if outputDir crosses a mount from wherever the members land. In practice non-cache renders don't have members landing in a separate cache mount, so this is theoretically fine — but if a caller ever wires up a non-cache multi-mount config, the silent-copy pathology returns.
  • No log/counter when the EXDEV fallback fires in linkOrCopy. Same observability gap as the bare-catch — dashboards can't tell whether the fix landed effectively in prod. Would take a console.warn on the first EXDEV per render + a breakdown.exdevCopies counter.

The correctness case (cache config, which is the dominant prod path) is fixed. Accepting the outputDir-only branch as an out-of-scope tail.

Bonus observability wins (not in R1 but shipped in R2)

  • cachePublishFailures counter now bumps on both finalize sites (extractionCache line 993 + 1026-1029) — surfaces the "warm renders going silently cold" signal.
  • cacheGcEvictions / cacheGcBytesFreed / cacheAgedPartialsCleared in the phase breakdown — dashboards can now see eviction pressure.
  • gcSweepDue() + GC_MARKER staleness fallback (24h) so 100%-warm workloads still reclaim disk — pinned by the test at 1190-1216.
  • SDR→HDR error attribution at videoFrameExtractor.ts:349-358 — filter-chain failures no longer surface as generic extract errors.

CI status at R2

Regression-shards (shard-1 through shard-8) still IN_PROGRESS, Graphite mergeability_check IN_PROGRESS. player-perf, preview-regression, preflight, WIP all green on the R2 SHA. No red signal; watch the shards land.


R2 by Via

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R2 verification — reviewed at 34dc308c3b74f5c05e1d58c829ce90c7824db354 (R1 at 9ae2be48).
R1 findings verified: 2 resolved ✅, 1 withdrawn (R1 misread) ⚠️, 3 nits resolved ✅, 1 nit standing 🟡.
Peer scan: Via (Vance's runtime-interop lens) posted an overlapping review at the R1 SHA — his F1 (bare-catch swallows abort-signal) is now resolved by the same commit that addresses my F2/F3. Layering below.

Summary — Miguel's R2 commit fix(engine): superset review hardening - clustering, abort, cache-fs temp, gc staleness is one of the cleanest omnibus response commits I've seen this batch: it introduces overlapClusters (partitions each source's misses into overlap-connected components before the union check), moves the superset tempDir onto cacheRootDir when the cache is active (so hardlinks stay same-fs and dodge the EXDEV copy), adds an abort-signal short-circuit inside the superset catch, and closes the warm-workload GC-skip gap with a .hf-last-gc marker + 24h staleness fallback. Each change is pinned by a new test. Two of my three 🟠s land clean, one I need to withdraw on evidence review, and the GC-skip nit gets a full fix.

R1 resolutions

  • 🟠 F1 — One disjoint outlier collapses the whole bucket — ✅ resolved. New overlapClusters at videoFrameExtractor.ts:562-585 sorts by mediaStart and cuts on start > currentEnd + 1e-9 — canonical connected-component partition, correct by construction. Wired into planSupersetGroups at :596-605 (iterate clusters, build a superset per cluster). New pinning test at videoFrameExtractor.test.ts:1155-1187 — three overlapping trims [0..4]/[2..6]/[4..8] + one disjoint outlier [10..12]: asserts inode equality across the cluster (superset shared) AND asserts the outlier's frames share no inode (direct extraction). Exactly the 3-of-4-overlap red test my R1 suggested. Zero nit standing.

  • 🟠 F2 — Concurrent-render race on partialCacheEntryDir⚠️ withdrawn (R1 misread). Independent re-verify at extractionCache.ts:189-190: partialCacheEntryDir(entry) returns ${entry.dir}.partial-${process.pid}-${randomUUID().slice(0, 8)}. Each render gets a unique partial dir per cache entry via UUID suffix; two concurrent renders that both cache-miss the same tuple write to different .partial-<pid>-<uuid>/ dirs, then publishCacheEntry's atomic rename + adoptPublishedWinner fallback handles the winner/loser handoff at extractionCache.ts:212-249. The R1 concern ("both rmSync + write into a shared partial dir") was wrong — I read the primitive as deterministic-per-entry, but it's per-render-per-entry. Withdrawing cleanly. The race I flagged doesn't exist and the cache primitive was already correct here.

  • 🟠 F3 — EXDEV silent 2×+ disk cost on cross-mount cache — ✅ resolved. videoFrameExtractor.ts:1041-1050: const tempDir = cacheRootDir ? join(cacheRootDir, ${group.groupId}.partial-${process.pid}) : join(options.outputDir, group.groupId);. Extracts the superset next to the cache when the cache is active, so linkSync from tempDir → partialCacheEntryDir(cacheTarget.entry) (also under cacheRootDir) stays same-fs — no EXDEV, no copyFileSync fallback, no silent disk multiplication. The .partial-<pid> name also puts crashed leftovers under the aged-partial GC sweep as a bonus. Docstring at :1041-1047 explicitly names the linkSync + EXDEV rationale. Clean fix.

  • 🟡 nit — sdrToHdrTransfer drift risk across two independent inclusions — 🟡 standing. No refactor into a shared helper in this R2. Still cosmetic; not standing in the way. Would be a natural cleanup for a later PR.

  • 🟡 nit — GC-skip on all-hit renders → unbounded growth on warm workloads — ✅ resolved. New gcSweepDue(rootDir, maxAgeMs) in extractionCache.ts:380-395 + GC_STALENESS_MS = 24h at videoFrameExtractor.ts:89. gcExtractionCache now stamps .hf-last-gc at every sweep start (extractionCache.ts:407-411), and the guard at videoFrameExtractor.ts:1199-1203 becomes cacheMisses > 0 || (cacheRootDir && gcSweepDue(cacheRootDir, 24h)). So a 100%-warm workload still reclaims space after 24h of no misses. Pinning test at videoFrameExtractor.test.ts:1189-1216 ages the marker past 24h and asserts an aged partial gets cleared on the next all-hit render. Exactly the staleness fallback my R1 suggested (I proposed a per-render probability sweep; the deterministic-by-marker shape here is strictly better — more predictable, testable). Great close.

  • 🟡 nits — supersetGroupingKey composition + sliceSupersetMember output cleanup — ✅ closed as no-action. Composition nit is subsumed by the (still-standing) drift nit above. Output cleanup nit was already correct at R1 — verified again.

Layering with Via (Vance's runtime-interop review)

Via posted an overlapping runtime-interop pass at the R1 SHA. Both of his 🟠 land in this R2:

  • Via F1 — Bare catch { ... } on superset extraction swallows abort-signal errors — ✅ resolved by the same commit. New branch at videoFrameExtractor.ts:1075-1082: catch (err) { if (signal?.aborted) { return group.members.map(...) with extractionError per member) } ... else fallback to direct extraction }. Abort no longer fans out to N direct-extraction retries with the same-aborted signal — it surfaces the cancellation per member instead. Docstring at :1076-1080 names the "N doomed ffmpeg processes" hazard Via flagged. Note the error is still not logged (Via's second half — "root-cause observability lost, single console.warn on fallback would make it triage-able") isn't addressed. Not blocking; suggest a follow-up console.warn(...) on the non-abort branch so a superset-decode regression in prod is visible in extractor logs.

  • Via F2 — sliceSupersetMember rmSynces the target dir; same-video.id different mediaStart would race — 🟢 (verify) still standing. Via marked this as "verify — not currently reachable, but not asserted." R2 doesn't add an intake assertion that resolvedVideos have unique video.ids. Same posture as R1: no observed reachability, small ambient risk. Fine to defer as a follow-up.

Cross-stack recheck

Prereq status intact: #1900 closes the intra-render half of the identical-clip race (single writer per dedupeKey per render); #1885's F2 concern would have been the inter-render half, but per the withdrawal above, the inter-render race is already handled by the UUID-suffixed partialCacheEntryDir + atomic publish rename. So the "#1885's partial-dir race is amplified by #1901's on-by-default" framing from R1 was wrong on the mechanism, though the direction (dedupe below cache-on-by-default) is still the right stack ordering. #1901 does not, in fact, introduce a new race here — the cache primitive was designed for concurrent writers from the start (randomUUID suffix + atomic publish rename with adoptPublishedWinner). Correcting the record on the cross-stack relationship.

Composition with #1900's dedupe stays clean: dedupe first → cache lookup → superset planning over cache-missed uniques (now clustered) → materialize. supersetGroupingKey is a strict subset of dedupeKey; both key families extend with sdrToHdrTransfer (still two independent inclusions — see standing nit).

Residuals

  • 🟡 sdrToHdrTransfer drift risk across dedupeKey and supersetGroupingKey — not fixed. Cosmetic, cheap follow-up.
  • Via's non-abort-branch observability suggestion (console.warn with the primary error on superset fallback) — not addressed. Non-blocking; follow-up.
  • Via's F2 intake assertion for unique video.ids across resolvedVideos — not addressed. Non-blocking; follow-up.
  • No AI-trailer squash-strip needed (checked commit messages).

Batch state

  • CI: preflight pass, preview-regression pass, player-perf pass; regression-shards pending.
  • Author: miguel-heygen (confirmed via pulls/1885.author).
  • No stamp — COMMENT-only per protocol.

— Rames D Jusso

@miguel-heygen miguel-heygen force-pushed the worktree-renderer-frame-extraction-perf branch from 34dc308 to ab13332 Compare July 3, 2026 20:30
@miguel-heygen miguel-heygen force-pushed the 07-02-perf_engine_one_pass_sdr_hdr_cache_key_transform branch from d54a048 to 7478fec Compare July 3, 2026 20:30
@miguel-heygen miguel-heygen force-pushed the worktree-renderer-frame-extraction-perf branch from ab13332 to e6aa5a2 Compare July 3, 2026 20:41
@miguel-heygen miguel-heygen force-pushed the 07-02-perf_engine_one_pass_sdr_hdr_cache_key_transform branch from 7478fec to 69910ff Compare July 3, 2026 20:41
@miguel-heygen miguel-heygen changed the base branch from 07-02-perf_engine_one_pass_sdr_hdr_cache_key_transform to main July 3, 2026 20:41
@miguel-heygen miguel-heygen changed the base branch from main to graphite-base/1885 July 3, 2026 20:41
@miguel-heygen miguel-heygen force-pushed the worktree-renderer-frame-extraction-perf branch from e6aa5a2 to 9184aff Compare July 3, 2026 20:42
@miguel-heygen miguel-heygen changed the base branch from graphite-base/1885 to 07-02-perf_engine_one_pass_sdr_hdr_cache_key_transform July 3, 2026 20:42
Base automatically changed from 07-02-perf_engine_one_pass_sdr_hdr_cache_key_transform to main July 3, 2026 22:08
Cache-missing trims of the same source that are frame-aligned and
overlapping decode their union window in ONE ffmpeg pass; each trim's
frames are materialized by hardlinking the superset frames with
renumbered names (copy fallback on EXDEV). Byte-identical to per-trim
extraction on CFR sources (verified by content hash in the A/B run),
~2x less decode+encode work for typical overlapping trims, and
sparse-keyframe sources pay the keyframe seek once instead of once per
trim. Disjoint or misaligned trims keep the direct path; any union
failure falls back to per-trim extraction.

Also: warm renders (zero cache misses) skip the extraction-cache GC
sweep instead of paying a full cache size scan.
…temp, gc staleness

- Partition each source's trims into overlap-connected components before
  the union check, so one disjoint outlier no longer collapses the whole
  bucket to direct extraction (pinned by a 3-of-4-overlap test).
- On abort, the superset fallback no longer re-runs every member through
  direct extraction (N doomed ffmpeg spawns); the cancellation surfaces
  per member instead.
- The superset temp dir moves onto the cache filesystem when the cache
  is active so member hardlinks into partial dirs cannot EXDEV-copy and
  silently multiply disk usage; its .partial- name puts crashed
  leftovers under the GC's aged-partial sweep.
- GC staleness fallback: a .hf-last-gc marker is stamped per sweep and
  all-hit renders sweep anyway once it is older than 24h, so 100%-warm
  workloads still reclaim space (pinned by a stale-marker test).
@miguel-heygen miguel-heygen force-pushed the worktree-renderer-frame-extraction-perf branch from 9184aff to e6606ad Compare July 3, 2026 22:09
@miguel-heygen miguel-heygen merged commit 1a7002f into main Jul 3, 2026
13 checks passed
@miguel-heygen miguel-heygen deleted the worktree-renderer-frame-extraction-perf branch July 3, 2026 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants