Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
0ec2fee
docs: start milestone v1.23 io boundary
gabewillen May 4, 2026
f0013c3
feat: add io loading boundary
gabewillen May 4, 2026
e56e370
fix: close io boundary review gaps
gabewillen May 4, 2026
e9739e4
fix: reset tensor loader after io failure
gabewillen May 4, 2026
1e747c2
docs(roadmap): add gap closure phase 211
gabewillen May 4, 2026
a44499f
docs(211): execute phase 211 verification artifact backfill
gabewillen May 4, 2026
d1fd1a2
docs(audit): re-audit v1.24 after phase 211 backfill (passed)
gabewillen May 4, 2026
217944b
docs(211): add wave field to phase plan frontmatter (linter)
gabewillen May 4, 2026
6358b6a
chore: archive v1.24 milestone files
gabewillen May 4, 2026
438d200
chore: remove REQUIREMENTS.md for v1.24 milestone
gabewillen May 4, 2026
5e64d62
feat: add mmap loading strategy
gabewillen May 4, 2026
c0774df
fix: address io mmap PR review comments
gabewillen May 4, 2026
92712ce
merge origin/main into issue 60 branch
gabewillen May 5, 2026
12e8f1f
fix: reject mmap spans beyond eof
gabewillen May 5, 2026
8aad365
fix: close mmap review follow-ups
gabewillen May 5, 2026
70c622b
fix: simplify mapped load done transition
gabewillen May 5, 2026
278ccf5
fix: require mmap map completion callback
gabewillen May 5, 2026
7ba0155
fix(io/mmap,model/tensor): retain slot on unmap failure; reject legac…
gabewillen May 5, 2026
22a240b
fix: preserve mmap metadata and partial cleanup state
gabewillen May 5, 2026
7de907a
fix: preserve mmap bindings on storage rebind errors
gabewillen May 5, 2026
172df14
fix: reject bind tensor on mmap resident tensors
gabewillen May 5, 2026
111cd6c
fix: align v1.24 planning state counters
gabewillen May 5, 2026
1f6635b
docs: start milestone v1.25 io read loading strategy
gabewillen May 5, 2026
b84ae31
docs(v1.25): scaffold roadmap phase details for phases 212-218
gabewillen May 5, 2026
00e9e8d
feat(io): add read strategy boundary actor
gabewillen May 5, 2026
9fec8d3
feat(io): add read validation gates
gabewillen May 5, 2026
1466400
feat(io): execute read strategy attempts
gabewillen May 5, 2026
3ba2096
test(225-01): add failing io read batch doctests
gabewillen May 6, 2026
f2f38cd
feat(225-01): implement io read batch copy path
gabewillen May 6, 2026
3d7985a
docs(225-01): complete io read batch copy plan
gabewillen May 6, 2026
8c4b8fc
test(225-02): add failing io loader batch doctests
gabewillen May 6, 2026
56295e6
feat(225-02): implement io loader batch read copy route
gabewillen May 6, 2026
da3d84e
docs(225-02): complete io loader batch route plan
gabewillen May 6, 2026
69e6039
test(225-03): add failing model loader batch doctests
gabewillen May 6, 2026
5c87f76
feat(225-03): route model loader io through batch dispatch
gabewillen May 6, 2026
c27b347
docs(225-03): complete model loader batch dispatch plan
gabewillen May 6, 2026
724a12f
feat(225-04): wire maintained callers to io load spans
gabewillen May 6, 2026
c04b890
test(225-04): require io load spans in maintained caller guardrails
gabewillen May 6, 2026
dc7c7b9
docs(225-04): complete maintained caller batch scratch plan
gabewillen May 6, 2026
f08a145
docs(225-05): update active phase traceability
gabewillen May 6, 2026
2997d5c
docs(225-05): correct archived closeout paths
gabewillen May 6, 2026
2e08af9
docs(225-05): complete closeout traceability plan
gabewillen May 6, 2026
40a901c
docs(225-06): record current validation evidence
gabewillen May 6, 2026
357409d
docs(225-06): publish closeout audit evidence
gabewillen May 6, 2026
28dbfa8
docs(225-06): complete validation publication plan
gabewillen May 6, 2026
d127362
fix(225): CR-01 guard missing parse callback
gabewillen May 6, 2026
594b5a5
fix(225): WR-01 classify unknown read errors
gabewillen May 6, 2026
d40c6bc
style(225): format read error regression test
gabewillen May 6, 2026
ca25e79
style(225): prefix model loader parse guard
gabewillen May 6, 2026
14da7a5
chore: archive v1.25 milestone
gabewillen May 6, 2026
5ac0055
chore(v1.25): close read loading audit cleanup
gabewillen May 6, 2026
ca95f4a
docs(state): record v1.25 PR
gabewillen May 6, 2026
2cbbea3
Merge remote-tracking branch 'origin/main' into gsd/issue-62-io-read-…
gabewillen May 6, 2026
bcae489
chore: refresh quality gate timing after merge
gabewillen May 6, 2026
93c61c7
fix: address PR review comments
gabewillen May 6, 2026
a70908a
fix: address read-copy review feedback
gabewillen May 7, 2026
495a0e0
fix: address benchmark and source review feedback
gabewillen May 7, 2026
bdfb729
fix: require explicit tensor read targets
gabewillen May 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .planning/MILESTONES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,30 @@
# Project Milestones: EMEL

## v1.25 I/O Read Loading Strategy (Shipped: 2026-05-06)

**Phases completed:** 16 phases, 21 plans, 12 tasks

**Key accomplishments:**

- Established the canonical src/emel/io/read Stateforward.SML boundary actor with fail-closed read/copy semantics and scoped lifecycle tests.
- Added explicit read request validation and platform gates before the read-attempt placeholder.
- Added concrete read/copy execution, deterministic read errors, and transient resource close-before-done behavior.
- Reconciled the read actor with its RTC-safe source-span implementation.
- Added tensor-owned public read/copy load orchestration through io/read.
- Reopened closeout with Phase 225 to replace the model-loader per-tensor I/O action loop with one public batch dispatch, wire maintained callers through request-owned `io_load_spans`, and publish source-backed validation evidence.
- Closed Phase 225 code review findings for missing parse callback validation and unclassified read source errors; post-fix review is clean.
- Closed Phase 226 audit tech debt by adding an independent public `io/read`
batch cap and exact-cap/over-cap public-dispatch doctests.

**Audit:** Final source-backed audit passed with 13/13 active requirements
satisfied and no current tech-debt rows after Phase 226 cleanup.

**Known deferred items at close:** 5 carried-forward items remain outside v1.25
(one non-v1.23 quick task and four optimization todos; see STATE.md Deferred
Items).

---

## v1.24 I/O Mmap Loading Strategy (Shipped: 2026-05-04)

**Phases completed:** 8 phases, 8 plans, 0 tasks
Expand Down
105 changes: 79 additions & 26 deletions .planning/PROJECT.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,51 @@ before widening API surface or model scope.

## Current State

Current milestone: `v1.24 I/O Mmap Loading Strategy`
Current milestone: not yet defined

Latest shipped milestone: `v1.23 I/O Loading Strategy Boundary`
Latest shipped milestone: `v1.25 I/O Read Loading Strategy`

Status: `v1.23` shipped on 2026-05-04 after final source-backed audit passed. The repo now treats
`src/emel/io` as the loading strategy boundary owner, `src/emel/model/tensor` as the canonical
owner of tensor load, bind, evict, and residency semantics, and `model/loader` as the orchestrator
across those public actor surfaces.
Status: `v1.25` shipped on 2026-05-06 after Phase 225 review-fix cleanup and a
refreshed source-backed milestone audit passed. The repo now ships the issue #62 read/copy
strategy path beneath the existing `src/emel/io` boundary while preserving
tensor-owned residency. `src/emel/io/read` is the canonical read/copy actor,
`model/tensor` owns the target buffer and residency commit, and maintained
loader/tool lanes select/report read/copy through public runtime surfaces.
Maintained source-byte loading for benchmark, paritychecker, and embedded probe
evidence now uses the public `emel::io::source::load_file_bytes` setup-time
contract instead of actor-internal `io/read/detail.hpp`. Phase 224 also
confirmed Phase 214 is historical, clarified the direct `request_read_load`
coverage shape, and captured fresh passing `emel_tests_io` evidence before
archive.

Current planning focus: implement the issue #61 mmap strategy path beneath the v1.23 I/O
boundary.
## Latest Shipped Milestone: v1.25 I/O Read Loading Strategy

## Current Milestone: v1.24 I/O Mmap Loading Strategy
**Goal:** Add a dedicated `io/read` Stateforward.SML strategy actor under `src/emel/io`
so tensor-owned model loading can request explicit read/copy residency into a
caller-provided owned target buffer through the existing I/O boundary, without moving
tensor lifecycle ownership out of `model/tensor` and without folding mmap, staged/chunked
constrained-memory, async, or device strategy behavior into this issue.

**Source:** GitHub issue #62, "Add io/read state machine for copy-based tensor loading"

**Target features:**
- Dedicated `src/emel/io/read` machine, events, guards, actions, context, errors, and
public aliases for read/copy tensor loading.
- Tensor-to-I/O integration that lets `model/tensor` request read-based residency through
the public `emel/io` boundary while retaining tensor-owned load, bind, evict, and
residency semantics. The read strategy operates on a caller-provided owned target
buffer; the strategy never takes residency ownership.
- Explicit read/copy success, unsupported, validation, source-open, source-read, and
short-read failure outcomes surfaced deterministically through events and states.
- RTC-safe source/result handling: filesystem work is externalized from SML dispatch,
`io/read` stores no dispatch-local request data in context, and no kernel handle pool
persists beyond dispatch.
- Maintained tests, docs, lint snapshots, benchmark snapshots, benchmark outputs, and
model artifacts updated from maintained commands when required.
- Public setup-time source-byte loading under `emel::io::source`, with benchmark and
parity lanes kept off actor `detail.hpp` helpers.

## Previous Shipped Milestone: v1.24 I/O Mmap Loading Strategy

**Goal:** Add a dedicated `io/mmap` Stateforward.SML strategy actor under `src/emel/io` so
tensor-owned model loading can request memory-mapped residency through the existing I/O boundary
Expand All @@ -37,15 +69,25 @@ behavior.

**Source:** GitHub issue #61, "Add io/mmap state machine for tensor-backed model loading"

**Target features:**
- Dedicated `src/emel/io/mmap` machine, events, guards, actions, context, errors, and public
aliases for mmap-backed tensor loading.
- Tensor-to-I/O integration that lets `model/tensor` request mmap-backed residency while retaining
tensor-owned load, bind, evict, and residency semantics.
- Explicit mmap success, unsupported, validation, and platform/resource failure outcomes surfaced
deterministically through events and states.
- Maintained tests, docs, lint snapshots, benchmark snapshots, benchmark outputs, and model
artifacts updated from maintained commands when required.
**Shipped:** 2026-05-04

**Delivered:**
- Established the canonical `src/emel/io/mmap` Stateforward.SML actor with component-local
context, events, guards, actions, errors, and `emel::io::mmap::sm` ownership.
- Modeled mmap request/platform/file/offset/length/layout validation and unsupported-platform
rejection through explicit guards and transitions before any mapping attempt.
- Added real `open`+`mmap`+`munmap` paths under `#if defined(_WIN32)` selection, a
fixed-capacity slot pool (`EMEL_IO_MMAP_MAX_MAPPINGS = 256`), `event::release_mapping`
as the actor-owned unmap surface, and a deterministic mmap error taxonomy.
- Added `event::request_mapped_load` / `event::release_mapped_load`,
`lifecycle::mmap_resident`, and `sm(emel::io::mmap::sm*)` injection on `model/tensor`,
preserving tensor-owned load/bind/evict/residency orchestration with zero handle state in
tensor.
- Kept `model/loader`, maintained benchmark, paritychecker, and embedded-probe lanes off
actor internals; mmap reporting flows through public tensor surfaces only.

**Audit:** Final source-backed audit passed with 13/13 active requirements satisfied
(MMAP-01..03, TIO-01..03, PLAT-01, LIFE-01, ERR-01, VAL-01..04).

## Previous Shipped Milestone: v1.23 I/O Loading Strategy Boundary

Expand Down Expand Up @@ -367,14 +409,20 @@ truth anchor and without broadening into generic Liquid-family support.

### Active

- v1.24 adds a dedicated `src/emel/io/mmap` Stateforward.SML strategy actor for mmap-backed
- Next milestone requirements are not defined yet. Run `$gsd-new-milestone` to choose
the next scope.

### Recently Validated

- v1.25 added a dedicated `src/emel/io/read` Stateforward.SML strategy actor for read/copy
tensor loading.
- v1.24 integrates mmap-backed residency requests through the existing tensor-to-I/O boundary while
`model/tensor` remains the tensor lifecycle and residency owner.
- v1.24 models mmap support, validation, success, and failure as explicit guard/state/event
behavior without hiding runtime strategy choice in actions or detail helpers.
- v1.24 keeps staged read/copy, device-specific loading, cooperative async loading, new model
families, and broad public API expansion out of scope.
- v1.25 integrated read-backed residency requests through the public tensor-to-I/O boundary
while `model/tensor` remains the tensor lifecycle and residency owner.
- v1.25 modeled read/copy support, validation, success, and failure as explicit
guard/state/event behavior without hidden runtime behavior selection in actions or detail
helpers.
- v1.25 kept mmap changes, staged/chunked read policy, device-specific loading, cooperative
async loading, new model families, and broad public API expansion out of scope.

### Validated

Expand Down Expand Up @@ -582,12 +630,17 @@ only, under `src/emel/io`, with tensor residency still owned by `model/tensor`.
- **Mmap strategy scope**: `v1.24` implements only the mmap strategy behind `emel/io`. It must not
add staged read/copy, chunked, device-specific, or cooperative async loading behavior, and must
not move tensor residency lifecycle ownership out of `model/tensor`.
- **Read strategy scope**: `v1.25` implements only the read/copy strategy behind `emel/io`. It must
not change the shipped mmap runtime, add staged/chunked constrained-memory policy, add
device-specific or cooperative async loading behavior, or move tensor residency lifecycle
ownership out of `model/tensor`.

## Key Decisions

| Decision | Rationale | Outcome |
|----------|-----------|---------|
| Start v1.24 from GitHub issue #61 as the `io/mmap` loading strategy milestone | v1.23 established the `emel/io` strategy boundary and explicitly deferred concrete mmap behavior; issue #61 is the next narrow strategy path to land beneath tensor-owned residency | - Pending |
| Start v1.25 from GitHub issue #62 as the `io/read` loading strategy milestone | v1.24 shipped the mmap strategy and left read/copy as the next narrow concrete strategy path beneath tensor-owned residency | ✓ Shipped |
| Start v1.24 from GitHub issue #61 as the `io/mmap` loading strategy milestone | v1.23 established the `emel/io` strategy boundary and explicitly deferred concrete mmap behavior; issue #61 is the next narrow strategy path to land beneath tensor-owned residency | ✓ Shipped |
| Start v1.23 from GitHub issue #60 as the `emel/io` boundary milestone | v1.22 moved tensor residency ownership into `model/tensor`; the next architecture step is the explicit I/O strategy seam beneath tensor-owned residency before concrete mmap or staged strategy work lands | Phase 203 closeout cleanup |
| Start v1.22 from GitHub issue #59 as the weight-loading ownership cutover | `model/tensor` owns individual tensor lifecycle state while `model/weight_loader` still owns bulk residency transition planning; the next runtime architecture milestone should remove that split before adding future I/O strategy work | ✓ Shipped |
| Start v1.21 from GitHub issue #58 as quality-gate selective runner optimization | v1.18 and v1.19 added parity and benchmark dependency manifests; the next milestone should cash in that structure at the mandatory gate-orchestration level without weakening conservative fallback behavior | ✓ Shipped |
Expand Down
49 changes: 49 additions & 0 deletions .planning/RETROSPECTIVE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,55 @@

*A living document updated after each milestone. Lessons feed forward into future planning.*

## Milestone: v1.25 - I/O Read Loading Strategy

**Shipped:** 2026-05-06
**Phases:** 15 | **Plans:** 20 | **Sessions:** autonomous audit, cleanup, and closeout

### What Was Built

- Canonical `src/emel/io/read` Stateforward.SML actor for read/copy tensor loading.
- RTC-safe source-span read/copy execution into caller-owned target buffers.
- Tensor-owned public `request_read_load` route through injected `io/read`.
- Maintained model-loader read/copy evidence through public source loading, tensor plan/apply,
and `io/loader -> io/read`.
- Final closeout artifacts distinguishing the direct tensor result-carrier path from maintained
model-loader lanes.

### What Worked

- The final integration check caught an overbroad audit claim before archive.
- Phase 224 was useful as a cleanup-only phase because it closed ambiguity without changing source.
- Re-running `emel_tests_io` after the transient dyld launch failure produced fresh passing evidence.

### What Was Inefficient

- Milestone archive automation counted only the reopened active phase after earlier phase archival,
leaving duplicate/stale planning prose that needed manual cleanup.
- The audit truth around direct `request_read_load` versus maintained model-loader lanes required a
second integration pass to phrase precisely.

### Patterns Established

- Closeout audits must distinguish direct public actor routes from maintained benchmark/parity lanes
when they use different same-RTC handoff mechanisms.
- Archived phase directories should be folded under `.planning/milestones/vX.Y-phases/` before
treating the active roadmap as clean.

### Key Lessons

1. Source-backed wording needs to name the exact lane it applies to.
2. Transient local launch failures should be rerun before becoming archive-time tech debt.
3. Milestone completion should be followed by a consistency check after phase-directory cleanup.

### Cost Observations

- Model mix: not measured.
- Sessions: one autonomous closeout session with verifier and integration-checker agents.
- Notable: `commit_docs=false` skipped all planning commits, so archive changes remain local.

---

## Milestone: v1.22 - Weight Loading Ownership Cutover

**Shipped:** 2026-05-03
Expand Down
Loading