chore(release): 0.55.0 — PMAT-918..921 wave (convert/export runnable + GPU parity reconciled + autograd training proven) by noahgift · Pull Request #2218 · paiml/aprender

noahgift · 2026-06-24T11:47:47Z

v0.55.0 — version-bump PR (crates.io publish deferred to a separate human-gated step)

Bumps the aprender ecosystem 0.54.0 → 0.55.0 across all workspace Cargo.toml
(101 ecosystem version pins, 33 files) + Cargo.lock regen + CHANGELOG.md.

User-facing correctness + a reconciled GPU-parity gate + the autograd training story proven. The headline pair: (1) apr convert/apr export now produce runnable models for tied-embedding architectures (a converted .apr was missing its lm_head; an exported GGUF mis-stamped num_heads); (2) an end-to-end training proof caught that the transformer FFN was still severing the autograd graph (functional::gelu) after the v0.53/v0.54 "complete" sweep — per-layer gradchecks never saw it; a real train-to-loss test did. Each ships a named proof-obligation + a mutation-verified RED-on-bug/GREEN-on-fix falsifier + a pv-validated contract.

Wave = 6 merged beats

PR	Ticket	Summary
#2208	—	Gate the duckdb competitive bench behind a feature (cold `merge_group` builds intermittently failed the queue while PR heads were green)
#2209	PMAT-918	`apr convert --quantize q4k` now synthesizes the tied `lm_head` — fixes non-runnable `.apr` that failed load with "tensor not found: lm_head.weight"
#2210	PMAT-919	GPU/CPU parity gate reconciled vs ground truth (llama.cpp, per-position): fp32-`Mwv` is the correct Blackwell default, `HwDp4a` is degraded; F2 gate now per-position argmax + min-cosine
#2211	—	Gate `coop_gemm_bench` example behind opt-in `cooperative-matrix` (wgpu 27 dropped the Vulkan coop-matrix path → broke `--all-targets`)
#2212	PMAT-920	`apr export --format gguf` uses explicit `head_dim` for exact `num_heads`, hard-fails (no GGUF written) instead of silently stamping a wrong `num_heads`
#2213	PMAT-921	Fix transformer FFN `gelu` severing the autograd graph (`functional::gelu` → no `grad_fn`); + new end-to-end train-to-loss proof (loss 3.565 → 1.4e-5, every param group updates)

Verification

Bump scope: 101 version = "0.54.0" → 0.55.0 occurrences (all ecosystem package-versions + sibling path-dep pins; no third-party deps touched), 33 Cargo.toml files.
cargo update --workspace --offline regenerated Cargo.lock.
cargo metadata resolves; cargo check -p aprender-core finished clean.

Not in this PR: git tag, GH release, and make publish / cargo publish are deferred to a separate human-gated step.

…reconciled + autograd training proven (PMAT-918..921) Version bump 0.54.0 → 0.55.0 across all workspace Cargo.toml (101 ecosystem version pins) + Cargo.lock regen + CHANGELOG. The v0.55.0 wave = 6 merged beats: - #2208 — degate the duckdb competitive bench behind a feature; cold merge_group builds were intermittently failing the merge queue while PR heads were green. - #2209 (PMAT-918) — apr convert --quantize q4k now synthesizes the tied lm_head for tied-embedding models; the Q4K save path produced a non-runnable .apr that failed at load with "tensor not found: lm_head.weight". - #2210 (PMAT-919) — GPU/CPU parity gate reconciled against ground truth (llama.cpp, per-position): fp32-Mwv is the correct Blackwell default; HwDp4a is genuinely degraded. F2 gate now checks per-position argmax-match + min-cosine over positions >=1, replacing the last-token-only check. - #2211 — gate the coop_gemm_bench example behind opt-in cooperative-matrix; wgpu 27 dropped the Vulkan cooperative-matrix path, breaking --all-targets. - #2212 (PMAT-920) — apr export --format gguf now uses explicit head_dim for exact num_heads, and hard-fails with an actionable error instead of silently stamping a wrong num_heads into a valid-looking GGUF. - #2213 (PMAT-921) — fix the transformer FFN gelu severing the autograd graph (functional::gelu builds output via Tensor::from_vec with no grad_fn), plus a new end-to-end train-to-loss proof that catches the severed-graph class. Version-bump PR only. crates.io publish + git tag + GH release are deferred to a separate human-gated step (do NOT run make publish / cargo publish from this PR). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

noahgift enabled auto-merge June 24, 2026 11:47

noahgift added this pull request to the merge queue Jun 24, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 24, 2026

noahgift added this pull request to the merge queue Jun 24, 2026

Merged via the queue into main with commit 6f7c864 Jun 24, 2026
11 checks passed

noahgift deleted the chore/release-0.55.0 branch June 24, 2026 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(release): 0.55.0 — PMAT-918..921 wave (convert/export runnable + GPU parity reconciled + autograd training proven)#2218

chore(release): 0.55.0 — PMAT-918..921 wave (convert/export runnable + GPU parity reconciled + autograd training proven)#2218
noahgift merged 1 commit into
mainfrom
chore/release-0.55.0

noahgift commented Jun 24, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

noahgift commented Jun 24, 2026

v0.55.0 — version-bump PR (crates.io publish deferred to a separate human-gated step)

Wave = 6 merged beats

Verification

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant