Summary
`tests/integration/memory-leak.test.ts` (the `should not leak memory over multiple init/shutdown cycles` case) reports heap deltas in the low single-digit MB range that look like a Runtime leak but are actually vitest harness overhead. This was diagnosed and worked around in commit 355e342 (raised the threshold to 6000 KB, added `--expose-gc` to vitest workers, and added a final pre-measurement GC call so the baseline and final reads are symmetric). Filing this so the underlying limitation is documented and we can revisit the test design.
Evidence
A standalone Node script running the exact same workload as the failing test — 20 init/shutdown cycles, 10 plugins per cycle, action + screen registration, full introspection, with `global.gc()` between cycles — measures ~32 KB delta on Node 24 / V8 13.x.
The same workload inside a vitest test (any pool, with `--expose-gc` enabled) measures ~3.9 MB on Node 24. The 4 MB delta is per-test-file vitest harness state — module graph entries, source maps, snapshot bookkeeping — that doesn't release within the worker's lifetime.
This wasn't visible on Node 22 because V8 12.x had a tighter harness footprint and the existing 3 MB threshold hid it. Node 24 / V8 13.x exposed it.
Why it matters
The test claims to enforce "Requirement 12.1: Base runtime memory increase < 100KB" but it can't actually measure to that resolution — the harness floor is two orders of magnitude above the target. The current 6000 KB threshold catches a leak only if it's already in the 10+ MB range at this scale (200 plugin lifecycles).
Suggested fix
Rewrite the test to subtract a harness baseline. Roughly:
```ts
// Run an empty cycle loop first to measure harness allocation
const harnessBaseline = await measureWorkload(emptyWorkload, cycles);
// Then the real workload
const fullDelta = await measureWorkload(realWorkload, cycles);
const runtimeDelta = fullDelta - harnessBaseline;
expect(runtimeDelta).toBeLessThan(100); // back to the requirement target
```
The empty-workload measurement isolates the per-iteration harness cost, leaving the real Runtime contribution. Needs care around vitest's allocation patterns (the second measurement may not pay the same fixed cost as the first), but worth attempting.
Acceptance
- The test enforces a threshold meaningful at the level of the original requirement (low hundreds of KB, not low MB).
- The test passes on Node 22 and Node 24 deterministically across many seeds.
- A real Runtime leak (e.g. an intentionally-introduced retained reference in shutdown) is caught by the test, demonstrated in a separate fixture or commit.
Related
- Diagnosis happened during CI bring-up for v0.1.0 → 355e342.
- See the comment block in the test for the in-line context.
Summary
`tests/integration/memory-leak.test.ts` (the `should not leak memory over multiple init/shutdown cycles` case) reports heap deltas in the low single-digit MB range that look like a Runtime leak but are actually vitest harness overhead. This was diagnosed and worked around in commit 355e342 (raised the threshold to 6000 KB, added `--expose-gc` to vitest workers, and added a final pre-measurement GC call so the baseline and final reads are symmetric). Filing this so the underlying limitation is documented and we can revisit the test design.
Evidence
A standalone Node script running the exact same workload as the failing test — 20 init/shutdown cycles, 10 plugins per cycle, action + screen registration, full introspection, with `global.gc()` between cycles — measures ~32 KB delta on Node 24 / V8 13.x.
The same workload inside a vitest test (any pool, with `--expose-gc` enabled) measures ~3.9 MB on Node 24. The 4 MB delta is per-test-file vitest harness state — module graph entries, source maps, snapshot bookkeeping — that doesn't release within the worker's lifetime.
This wasn't visible on Node 22 because V8 12.x had a tighter harness footprint and the existing 3 MB threshold hid it. Node 24 / V8 13.x exposed it.
Why it matters
The test claims to enforce "Requirement 12.1: Base runtime memory increase < 100KB" but it can't actually measure to that resolution — the harness floor is two orders of magnitude above the target. The current 6000 KB threshold catches a leak only if it's already in the 10+ MB range at this scale (200 plugin lifecycles).
Suggested fix
Rewrite the test to subtract a harness baseline. Roughly:
```ts
// Run an empty cycle loop first to measure harness allocation
const harnessBaseline = await measureWorkload(emptyWorkload, cycles);
// Then the real workload
const fullDelta = await measureWorkload(realWorkload, cycles);
const runtimeDelta = fullDelta - harnessBaseline;
expect(runtimeDelta).toBeLessThan(100); // back to the requirement target
```
The empty-workload measurement isolates the per-iteration harness cost, leaving the real Runtime contribution. Needs care around vitest's allocation patterns (the second measurement may not pay the same fixed cost as the first), but worth attempting.
Acceptance
Related