Skip to content

docs(soak): close Phase 6 24-h re-run — 9/9 assertions PASS end-to-end#246

Merged
githubrobbi merged 1 commit into
mainfrom
docs/close-phase-6-24h-rerun-2026-05-15
May 15, 2026
Merged

docs(soak): close Phase 6 24-h re-run — 9/9 assertions PASS end-to-end#246
githubrobbi merged 1 commit into
mainfrom
docs/close-phase-6-24h-rerun-2026-05-15

Conversation

@githubrobbi
Copy link
Copy Markdown
Collaborator

Summary

The last remaining v0.6.0 24-h-soak gate now closes against a live Windows-host capture.

LOG/uffs_soak/phase6-20260514-122946/ ran for 24 h on the 7-drive reference box (2026-05-14 12:29:46Z → 2026-05-15 12:29:46Z) against the post-PR-218 harness fix (`shard.ttl=debug` → `shard.ttl=trace`). Validator reported 9 of 9 assertions PASS end-to-end.

Contract evidence

Contract Evidence Status
C never demotes below Warm 0 `to=Parked` events for letter=C; 2 870 `Demote target clamped by per-drive min_tier` debug events ✅ end-to-end
Peer drives demote Warm → Parked D / E / F / G / M / S each fired 2 transitions ✅ end-to-end
Adaptive TTL bonus engages C.max_warm_ttl=3 786 s vs peer max=300 s (12.6× bonus) ✅ end-to-end (was §4.5b ⚠️)

Memory trajectory

```
00h 23h post-load
Working Set (WS) : 6 746 800 128 → 22 888 448 → 69 238 784 (308× WS trim, 3× post-load re-page)
Private Memory : 8 293 457 920 → 1 791 188 992 → 1 669 558 272 (78 % real release as drives demoted)
Virtual Memory : 28 172 120 064 → 28 168 974 336 → 28 168 974 336 (flat — no address-space leak)
NPM (non-paged) : 26 736 → 26 328 → 26 328 (flat)
```

The 78 % PM release is materially different from §4.5d ws-trace (where pm_bytes stayed within 3 % because the keep-warm worker held all 7 drives Warm). Here the controller actively demoted the 6 peer drives, which unloaded cold shards from memory — exactly the intended tiering behavior. The end-of-soak synthetic-load window on C re-paged recently-needed shards back into WS without re-allocating PM (PM actually continued to fall slightly).

No `panic` / `OutOfMemoryError` / `FATAL` log lines across the 24-h window.

Why the re-run was needed (§4.5b deferral root-cause)

The pre-fix soak (`phase6-20260509-213122/`) was filtered by `RUST_LOG=shard.ttl=debug`, which dropped every `below-ttl` TRACE event — the only event class carrying the bonused `warm_ttl_sec` field during the synthetic-load window (because drive C sits in Warm/Hot with `idle_secs ≈ 0`, so the demote-eval ladder never reaches either DEBUG arm).

PR #218 raised the harness filter to `shard.ttl=trace`. This re-run is the first 24-h soak captured against that fix.

Docs updated

File Change
`docs/architecture/memory-tiering-windows-host-validation.md` New §4.5e closing capture sub-section. §4.5b's closing paragraph forwards to §4.5e. §2 criterion 3 cross-ref picks up §4.5e.
`docs/architecture/memory-tiering-bake-criteria.md` §3 Phase-6 checkbox ticked `[x]`; status text refreshed; cross-refs §4.5b + §4.5e.
`docs/architecture/memory-tiering-readiness-validation-2026-05-05.md` Remaining-items list restructured: Phase 6 removed (closed); paper-trail subsection added for Phase 6 / Phase 7 / ws-trace closures; §6 closure note refreshed.
`CHANGELOG.md` New "Verified — Phase 6 24-h Windows-host soak closes end-to-end (2026-05-14/15)" entry under Unreleased, with gate-status table showing all three v0.6.0 24-h-soak gates green.

v0.6.0 gate slate

Gate Source Result Closed
Phase 6 (`min_tier=Warm` floor + adaptive bonus) `phase6-20260514-122946/` 9 / 9 PASS 2026-05-15
Phase 7 (USN-journal churn) `phase7-20260510-214412/` 7 / 7 PASS (regex fix) 2026-05-13
ws-trace (Working-Set trajectory) `wstrace-20260513-113344/` 4 / 4 PASS 2026-05-13

Only the one-week `main` bake remains.

Discipline notes

  • Pure-docs PR; no Rust changes.
  • Atomic single-purpose commit, GPG-signed.
  • Pre-push gates green locally (lint-fast: 3s; lint-pre-push: 4s).
  • No suppression hacks; no tests touched; no public-API changes.

…d (`phase6-20260514-122946/`)

Last remaining v0.6.0 24-h-soak gate closes against a live Windows-host capture.  LOG/uffs_soak/phase6-20260514-122946/ ran 2026-05-14 12:29:46Z → 2026-05-15 12:29:46Z on the 7-drive reference box against the post-PR-218 harness fix (shard.ttl=debug → shard.ttl=trace, which makes the catch-all 'below-ttl' TRACE events carrying the bonused warm_ttl_sec field visible to the validator).  Validator reported 9 of 9 assertions PASS.

Direct end-to-end evidence for all three Phase 6 contracts:

  * Drive C 'min_tier=Warm' floor: 0 to=Parked events on letter=C across 24h; 2 870 'Demote target clamped by per-drive min_tier' debug events (floor was actively applied thousands of times, not coincidentally not-tripped).

  * Peer-drive demotion: D/E/F/G/M/S each fired exactly 2 Warm→Parked transitions on the configured warm_ttl_base_secs.

  * Adaptive TTL bonus: C.max_warm_ttl=3 786s vs max(peers.warm_ttl_sec)=300s — 12.6× bonus on the high-rate drive, matching the +600·log2(rate) formula in crate::cache::policy::warm_ttl.  This was the criterion deferred in §4.5b's 2026-05-11 reference run.

Memory trajectory across the 24h window also validates the tiering machinery doing real work:

    WS  : 6 746 800 128 →    22 888 448 →    69 238 784  (308× WS trim, 3× post-load re-page)

    PM  : 8 293 457 920 → 1 791 188 992 → 1 669 558 272  (78 % real release as drives demoted)

    VM  : 28 172 120 064 → 28 168 974 336 → 28 168 974 336  (flat — no address-space leak)

    NPM :         26 736 →        26 328 →        26 328  (flat)

The 78% PM release is materially different from §4.5d ws-trace (where pm_bytes stayed within 3% because the keep-warm worker held all 7 drives Warm).  Here the controller actively demoted the 6 peer drives, unloading cold shards from memory — exactly the intended tiering behavior under sustained idle.  The end-of-soak synthetic-load window on C re-paged recently-needed shards back into WS without re-allocating PM (PM actually continued to fall slightly), confirming the page-cache vs. private-bytes split is healthy.

No panic/OutOfMemoryError/FATAL log lines across the 24h window.

Docs updated:

  * memory-tiering-windows-host-validation.md — new §4.5e closing capture sub-section (per-snapshot memory trajectory, contract assertion table, operational signals).  §4.5b's closing paragraph now points forward to §4.5e.  §2 criterion 3's cross-reference picks up §4.5e alongside §4.5b.

  * memory-tiering-bake-criteria.md §3 — Phase-6 checkbox ticked [x] with status reflecting the 2026-05-15 closure.  Body text cross-refs §4.5b (deferral root-cause) and §4.5e (closing capture).

  * memory-tiering-readiness-validation-2026-05-05.md — 'remaining items' list restructured: Phase 6 removed (now closed); paper trail subsection added for Phase 6 / Phase 7 / ws-trace gate closures; §6 closure note updated to reflect that only the one-week main bake remains.

  * CHANGELOG.md — new 'Verified — Phase 6 24-h Windows-host soak closes end-to-end (2026-05-14/15)' entry under Unreleased, with the gate-status table showing all three v0.6.0 24-h-soak gates green.
@githubrobbi githubrobbi enabled auto-merge (squash) May 15, 2026 12:54
@githubrobbi githubrobbi merged commit b8af7f3 into main May 15, 2026
18 checks passed
@githubrobbi githubrobbi deleted the docs/close-phase-6-24h-rerun-2026-05-15 branch May 15, 2026 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant