Add staged read loading strategy#85
Conversation
…y evict of mmap-resident tensors Closes PR #83 P1 review threads: - PRRT_kwDORRHzJs5_hhbx (src/emel/io/mmap/actions.hpp:209-225): on unmap failure the OS mapping/file descriptor may still be live, so effect_mark_unmap_failed_and_release_slot must keep the slot owned (in_use/base/mapped_bytes/os_resource intact) and not push the handle back onto free_stack. The caller can then retry release and a later map_tensor cannot reuse the slot while the prior mapping leaks. - PRRT_kwDORRHzJs5_hhby (src/emel/model/tensor/guards.hpp): the legacy evict_tensor path nulled tensor pointers without releasing the mmap mapping, leaking the slot and OS mapping. evict_tensor_request_valid now rejects mmap_resident lifecycle so the legacy evict routes to errored. Mapped tensors must be released through release_mapped_load. Failing unit coverage added first per AGENTS.md: - tests/io/mmap/lifecycle_tests.cpp 'io mmap unmap failure keeps mapping slot owned for retry' drives effect_mark_unmap_failed_and_release_slot against a constructed live-slot context and asserts in_use stays true, base/bytes/os_resource preserved, and the handle does not appear in free_stack. - tests/model/tensor/lifecycle_tests.cpp 'model_tensor_evict_tensor_rejects_mmap_resident_tensors' brings a tensor to mmap_resident via request_mapped_load, dispatches legacy evict_tensor, asserts CHECK_FALSE on the dispatch and invalid_request error_out, confirms lifecycle/buffer/handle survive, and proves the proper release_mapped_load path still works after the rejected evict. snapshots/lint/clang_format.txt baseline refreshed via maintained scripts/lint_snapshot.sh --update to register the now-clang-formatted tests/io/mmap/lifecycle_tests.cpp (single-line addition).
Adds the missing `#### Phase NNN:` detail blocks (Goal, Depends on, Requirements, Success Criteria, Plans) inside the active v1.25 milestone section of `.planning/ROADMAP.md`, derived from `.planning/REQUIREMENTS.md` traceability and the canonical structure used in the v1.24 archived roadmap. Required so `gsd-tools.cjs roadmap analyze` returns `phase_count: 7` instead of `0` and `gsd-plan-phase` has goals/criteria to plan against. Planning-artifact only — no source, test, snapshot, benchmark, model, or quality-gate behavior changes.
- Cover public read_tensor_batch copied-byte success and first-error reporting - Assert ready-state recovery after each batch dispatch
- Add shared tensor_load_span and public read_tensor_batch events - Route batch validation through explicit guards and copy selected spans in io/read
- Add Plan 01 execution summary and self-check - Advance Phase 225 state, roadmap progress, and requirements traceability
- add Plan 04 execution summary and validation evidence - advance Phase 225 state and roadmap progress
- keep reopened Phase 225 requirements pending until validation evidence - remove stale state claims that v1.25 gap closure is already validated
- point v1.25 roadmap closeout artifacts at archived milestone files - clarify archived requirements versus reopened active requirements
- add Plan 05 execution summary with validation evidence - advance state and roadmap progress to Plan 06
- publish dyld launch blocker output and source-backed fallback - mark Phase 225 validation compliant from scoped gate evidence
- align active and archived v1.25 audits with Phase 225 evidence - summarize dyld fallback and validation outcomes
- update state, roadmap, and requirements after Plan 06 evidence - record final task hashes and self-check in summary
…loading-strategy # Conflicts: # .planning/MILESTONES.md # .planning/PROJECT.md # .planning/ROADMAP.md # .planning/STATE.md # .planning/architecture/mermaid/model_tensor.mmd # .planning/architecture/model_tensor.md # CMakeLists.txt # README.md # docs/roadmap.md # docs/templates/README.md.j2 # snapshots/bench/benchmarks.txt # snapshots/lint/clang_format.txt # snapshots/quality_gates/timing.txt # src/emel/machines.hpp # src/emel/model/tensor/actions.hpp # src/emel/model/tensor/context.hpp # src/emel/model/tensor/detail.hpp # src/emel/model/tensor/errors.hpp # src/emel/model/tensor/events.hpp # src/emel/model/tensor/guards.hpp # src/emel/model/tensor/sm.hpp # tests/io/mmap/lifecycle_tests.cpp # tests/model/tensor/lifecycle_tests.cpp # tools/bench/diarization/sortformer_fixture.hpp # tools/bench/generation_bench.cpp # tools/embedded_size/emel_probe/main.cpp # tools/paritychecker/parity_engines.cpp
Implements the bounded staged-read loading path through public I/O and tensor loader surfaces, with source-backed tests, guardrails, snapshots, and milestone audit evidence. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep workspace-local agent cache and Cursor rule files out of the milestone PR payload. Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve the squashed v1.25 mainline overlap while preserving the v1.26 staged-read milestone payload for review against main. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 938cb2d195
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR adds a new io/staged_read Stateforward.SML loading strategy and integrates it into the existing public I/O/model loading surfaces, while also updating benchmarks, quality-gate defaults, and milestone/planning documentation for v1.26.
Changes:
- Introduce the
src/emel/io/staged_readactor (events/errors/context/detail) and expose it viaio::loaderas a newstrategy_kind::staged_read. - Extend
model/tensor(andmodel/loaderguards) to support staged-load request/error plumbing and storage-backed strategy routing. - Update bench/parity/probe tooling plus quality-gate/bench/fuzz scripts and documentation to reflect staged-read support and new bounded default benchmark settings.
Reviewed changes
Copilot reviewed 130 out of 130 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/paritychecker/parity_engines.cpp | Wires io_staged_read into parity harness loader wiring and storage allocation for staged strategy. |
| tools/embedded_size/emel_probe/main.cpp | Injects io_staged_read into embedded probe loader wiring and staged strategy storage handling. |
| tools/bench/quality_gates_tests.cpp | Adds tests asserting bounded benchmark defaults in scripts + runner sources. |
| tools/bench/model_load_strategy.hpp | Adds env parsing/naming for staged_read strategy selection. |
| tools/bench/generation_bench.cpp | Wires io_staged_read into generation bench fixture and expands workload filter to accept all. |
| tools/bench/diarization/sortformer_fixture.hpp | Wires io_staged_read into diarization fixture and staged storage handling. |
| tools/bench/diarization/sortformer_bench.cpp | Adds diarization env overrides for iters/runs and bounds pipeline benchmark config. |
| tools/bench/bench_runner.cpp | Reduces default benchmark iterations/runs/warmups for routine runs. |
| tests/text/encoders/ugm_tests.cpp | Adds trie-path preservation regression test for UGM table rebuild. |
| src/emel/text/encoders/ugm/detail.hpp | Switches UGM trie insertion to naive_trie::insert. |
| src/emel/text/encoders/types.hpp | Fixes naive_trie::insert to avoid holding references across vector growth. |
| src/emel/model/tensor/events.hpp | Adds public request_staged_load + done/error event types and includes staged-read errors. |
| src/emel/model/tensor/errors.hpp | Adds tensor error codes for staged-read unsupported/failed outcomes. |
| src/emel/model/tensor/detail.hpp | Adds staged-load runtime/status carriers for tensor staged-load path. |
| src/emel/model/tensor/context.hpp | Injects io_staged_read pointer into tensor actor context. |
| src/emel/model/tensor/actions.hpp | Implements staged-load dispatch via io::staged_read and publishes done/error outcomes. |
| src/emel/model/loader/sm.hpp | Generalizes “storage-backed strategy” guards for read_copy + staged_read routing. |
| src/emel/model/loader/guards.hpp | Adds staged-read strategy guards and storage-backed guard wrappers. |
| src/emel/model/loader/events.hpp | Documents read_copy_storage as shared backing for storage-backed strategies. |
| src/emel/machines.hpp | Adds IoStagedRead alias and includes staged-read machine header. |
| src/emel/io/staged_read/events.hpp | Defines staged window + batch public events and done/error payloads. |
| src/emel/io/staged_read/errors.hpp | Defines staged-read error taxonomy and platform support macro. |
| src/emel/io/staged_read/detail.hpp | Adds internal runtime/status carriers for staged-window and batch processing. |
| src/emel/io/staged_read/context.hpp | Adds empty staged-read persistent context per actor rules. |
| src/emel/io/loader/sm.hpp | Adds staged-read dispatch states/transitions for single and batch loads. |
| src/emel/io/loader/guards.hpp | Adds staged-read strategy + source-span validity guards and staged actor presence checks. |
| src/emel/io/loader/events.hpp | Adds strategy_kind::staged_read. |
| src/emel/io/loader/context.hpp | Injects io_staged_read pointer into loader context. |
| src/emel/io/loader/actions.hpp | Implements staged-read single/batch dispatch and staged-read callbacks. |
| snapshots/quality_gates/timing.txt | Updates persisted quality-gate timing snapshot. |
| snapshots/lint/clang_format.txt | Adds newly formatted files and adjusts list ordering. |
| scripts/quality_gates.sh | Lowers default bench iterations/runs/warmups used by the aggregate quality gate. |
| scripts/fuzz_smoke.sh | Adds outer timeout wrapper for fuzzers with configurable timeout. |
| scripts/bench.sh | Adds bounded defaults for generation/diarization env wiring and adds absolute+relative regression thresholding. |
| README.md | Updates docs to describe staged-read strategy as implemented and links architecture doc. |
| docs/templates/README.md.j2 | Mirrors README staged-read documentation updates in template form. |
| docs/roadmap.md | Updates roadmap text to reflect staged-read strategy is implemented. |
| CMakeLists.txt | Adds staged-read IO tests and adjusts fuzz sanitizer configuration for macOS vs non-Apple. |
| .planning/research/SUMMARY.md | Updates research summary to v1.26 staged-read scope and findings. |
| .planning/research/STACK.md | Updates stack research to staged-read milestone scope and constraints. |
| .planning/research/PITFALLS.md | Updates pitfalls research to staged-read SML constraints and risks. |
| .planning/research/FEATURES.md | Updates feature research to staged-read MVP/table-stakes framing. |
| .planning/research/ARCHITECTURE.md | Updates architecture research to staged-read actor integration approach. |
| .planning/PROJECT.md | Sets current milestone to v1.26 staged-read and updates milestone narrative/constraints. |
| .planning/phases/238-audit-artifact-and-probe-reporting-cleanup/238-VERIFICATION.md | Adds Phase 238 verification artifact. |
| .planning/phases/238-audit-artifact-and-probe-reporting-cleanup/238-VALIDATION.md | Adds Phase 238 validation strategy artifact. |
| .planning/phases/238-audit-artifact-and-probe-reporting-cleanup/238-CONTEXT.md | Adds Phase 238 context artifact. |
| .planning/phases/238-audit-artifact-and-probe-reporting-cleanup/238-01-SUMMARY.md | Adds Phase 238 summary artifact. |
| .planning/phases/238-audit-artifact-and-probe-reporting-cleanup/238-01-PLAN.md | Adds Phase 238 plan artifact. |
| .planning/phases/237-direct-tensor-staged-offset-contract-repair/237-VERIFICATION.md | Adds Phase 237 verification artifact. |
| .planning/phases/237-direct-tensor-staged-offset-contract-repair/237-VALIDATION.md | Adds Phase 237 validation strategy artifact. |
| .planning/phases/237-direct-tensor-staged-offset-contract-repair/237-CONTEXT.md | Adds Phase 237 context artifact. |
| .planning/phases/237-direct-tensor-staged-offset-contract-repair/237-01-SUMMARY.md | Adds Phase 237 summary artifact. |
| .planning/phases/237-direct-tensor-staged-offset-contract-repair/237-01-PLAN.md | Adds Phase 237 plan artifact. |
| .planning/phases/236-publication-and-evidence-truthfulness/236-VALIDATION.md | Adds Phase 236 validation strategy artifact. |
| .planning/phases/236-publication-and-evidence-truthfulness/236-CONTEXT.md | Adds Phase 236 context artifact. |
| .planning/phases/236-publication-and-evidence-truthfulness/236-01-SUMMARY.md | Adds Phase 236 summary artifact. |
| .planning/phases/236-publication-and-evidence-truthfulness/236-01-PLAN.md | Adds Phase 236 plan artifact. |
| .planning/phases/235-scope-and-non-regression-guardrails/235-VERIFICATION.md | Adds Phase 235 verification artifact. |
| .planning/phases/235-scope-and-non-regression-guardrails/235-VALIDATION.md | Adds Phase 235 validation strategy artifact. |
| .planning/phases/235-scope-and-non-regression-guardrails/235-CONTEXT.md | Adds Phase 235 context artifact. |
| .planning/phases/235-scope-and-non-regression-guardrails/235-01-SUMMARY.md | Adds Phase 235 summary artifact. |
| .planning/phases/235-scope-and-non-regression-guardrails/235-01-PLAN.md | Adds Phase 235 plan artifact. |
| .planning/phases/234-public-dispatch-tests/234-VERIFICATION.md | Adds Phase 234 verification artifact. |
| .planning/phases/234-public-dispatch-tests/234-VALIDATION.md | Adds Phase 234 validation strategy artifact. |
| .planning/phases/234-public-dispatch-tests/234-CONTEXT.md | Adds Phase 234 context artifact. |
| .planning/phases/234-public-dispatch-tests/234-01-SUMMARY.md | Adds Phase 234 summary artifact. |
| .planning/phases/234-public-dispatch-tests/234-01-PLAN.md | Adds Phase 234 plan artifact. |
| .planning/phases/233-public-loader-and-maintained-entrypoints/233-VERIFICATION.md | Adds/updates Phase 233 verification artifact. |
| .planning/phases/233-public-loader-and-maintained-entrypoints/233-VALIDATION.md | Adds Phase 233 validation strategy artifact. |
| .planning/phases/233-public-loader-and-maintained-entrypoints/233-CONTEXT.md | Adds Phase 233 context artifact. |
| .planning/phases/233-public-loader-and-maintained-entrypoints/233-01-SUMMARY.md | Adds Phase 233 summary artifact. |
| .planning/phases/233-public-loader-and-maintained-entrypoints/233-01-PLAN.md | Adds Phase 233 plan artifact. |
| .planning/phases/232-tensor-owned-integration-graph/232-VERIFICATION.md | Adds Phase 232 verification artifact. |
| .planning/phases/232-tensor-owned-integration-graph/232-VALIDATION.md | Adds Phase 232 validation strategy artifact. |
| .planning/phases/232-tensor-owned-integration-graph/232-CONTEXT.md | Adds Phase 232 context artifact. |
| .planning/phases/232-tensor-owned-integration-graph/232-01-SUMMARY.md | Adds Phase 232 summary artifact. |
| .planning/phases/232-tensor-owned-integration-graph/232-01-PLAN.md | Adds Phase 232 plan artifact. |
| .planning/phases/231-deterministic-error-taxonomy/231-VERIFICATION.md | Adds/updates Phase 231 verification artifact. |
| .planning/phases/231-deterministic-error-taxonomy/231-VALIDATION.md | Adds Phase 231 validation strategy artifact. |
| .planning/phases/231-deterministic-error-taxonomy/231-CONTEXT.md | Adds Phase 231 context artifact. |
| .planning/phases/231-deterministic-error-taxonomy/231-01-SUMMARY.md | Adds Phase 231 summary artifact. |
| .planning/phases/231-deterministic-error-taxonomy/231-01-PLAN.md | Adds Phase 231 execute-plan artifact. |
| .planning/phases/230-context-cleanness-and-per-attempt-lifetime/230-VERIFICATION.md | Adds/updates Phase 230 verification artifact. |
| .planning/phases/230-context-cleanness-and-per-attempt-lifetime/230-VALIDATION.md | Adds Phase 230 validation strategy artifact. |
| .planning/phases/230-context-cleanness-and-per-attempt-lifetime/230-CONTEXT.md | Adds Phase 230 context artifact. |
| .planning/phases/230-context-cleanness-and-per-attempt-lifetime/230-01-SUMMARY.md | Adds Phase 230 summary artifact. |
| .planning/phases/230-context-cleanness-and-per-attempt-lifetime/230-01-PLAN.md | Adds Phase 230 execute-plan artifact. |
| .planning/phases/229-staged-copy-progress-and-completion-semantics/229-VERIFICATION.md | Adds/updates Phase 229 verification artifact. |
| .planning/phases/229-staged-copy-progress-and-completion-semantics/229-VALIDATION.md | Adds Phase 229 validation strategy artifact. |
| .planning/phases/229-staged-copy-progress-and-completion-semantics/229-CONTEXT.md | Adds Phase 229 context artifact. |
| .planning/phases/229-staged-copy-progress-and-completion-semantics/229-01-SUMMARY.md | Adds Phase 229 summary artifact. |
| .planning/phases/229-staged-copy-progress-and-completion-semantics/229-01-PLAN.md | Adds Phase 229 execute-plan artifact. |
| .planning/phases/228-span-target-window-platform-gating/228-VERIFICATION.md | Adds/updates Phase 228 verification artifact. |
| .planning/phases/228-span-target-window-platform-gating/228-VALIDATION.md | Adds Phase 228 validation strategy artifact. |
| .planning/phases/228-span-target-window-platform-gating/228-CONTEXT.md | Adds Phase 228 context artifact. |
| .planning/phases/228-span-target-window-platform-gating/228-01-SUMMARY.md | Adds Phase 228 summary artifact. |
| .planning/phases/227-staged-read-strategy-component-boundary/227-VERIFICATION.md | Adds/updates Phase 227 verification artifact. |
| .planning/phases/227-staged-read-strategy-component-boundary/227-VALIDATION.md | Adds Phase 227 validation strategy artifact. |
| .planning/phases/227-staged-read-strategy-component-boundary/227-CONTEXT.md | Adds Phase 227 context artifact. |
| .planning/architecture/mermaid/model_tensor.mmd | Updates model/tensor architecture diagram to include staged-load graph. |
| .planning/architecture/mermaid/model_loader.mmd | Updates model/loader diagram to reflect generalized storage-backed strategy guards. |
| .planning/architecture/mermaid/io_loader.mmd | Updates io/loader diagram to include staged-read routing/guards. |
| .planning/phases/211-phase-verification-artifact-backfill/211-VALIDATION.md | Removes legacy Phase 211 validation artifact (cleanup). |
| .planning/phases/211-phase-verification-artifact-backfill/211-CONTEXT.md | Removes legacy Phase 211 context artifact (cleanup). |
| .planning/phases/211-phase-verification-artifact-backfill/211-01-SUMMARY.md | Removes legacy Phase 211 summary artifact (cleanup). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Preserve accurate staged batch failure indexes, route staged chunk sizing through policy instead of hard-coded tensor sizes, and restore representative quality-gate timing evidence. Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
io/staged_readStateforward.SML actor for bounded source-span staged loading with explicit guard, copy, error, and completion semantics.io::loader,model/tensor, model-loader, benchmark, paritychecker, and embedded probe surfaces without actor-detail reach-through.34/34active requirements satisfied.Test plan
ctest --test-dir build --output-on-failure -R emel_tests_ioctest --test-dir build --output-on-failure -R 'emel_tests_(io|model)'./build/emel_tests_bin --test-case="model_tensor_request_staged_load_*"EMEL_QUALITY_GATES_SCOPE=full EMEL_QUALITY_GATES_PARALLEL=0 scripts/quality_gates.shEMEL_QUALITY_GATES_CHANGED_FILES=".planning/v1.26-MILESTONE-AUDIT.md" scripts/quality_gates.shNotes
origin/mainafter PR Milestone v1.25: I/O Read Loading Strategy #84 landed, so the PR diff is scoped to v1.26 staged-read work.ESG-02Bremains explicitly deferred/future until an approved file-backed staged-read source path owns open/seek/read failures.Made with Cursor
Note
Medium Risk
Adds a new tensor-loading strategy and wires it through
io::loaderandmodel/tensor, which can affect model load correctness and error handling despite being additive to existing mmap/read paths.Overview
Implements a new bounded staged/chunked tensor load path via the
io/staged_readStateforward.SML actor, including explicit guard-validated request contracts, deterministic window copy semantics, and categorized error outcomes.Wires staged reads through the public runtime boundary so
io::loadercan dispatch staged loads (including batch) andmodel/tensorcan request/observe staged-load_done/_errorterminals while keeping tensor residency ownership inmodel/tensor.Updates planning/state artifacts for milestone v1.26 (requirements, roadmap, state, and architecture mermaid/docs) to reflect the new strategy and its validation/guardrails, including the Phase 237 nonzero-offset staged-load contract repair and audit cleanup.
Reviewed by Cursor Bugbot for commit 70599b8. Bugbot is set up for automated code reviews on this repo. Configure here.
Closes #63