chore(release): 0.55.0 — PMAT-918..921 wave (convert/export runnable + GPU parity reconciled + autograd training proven)#2218
Merged
Conversation
…reconciled + autograd training proven (PMAT-918..921) Version bump 0.54.0 → 0.55.0 across all workspace Cargo.toml (101 ecosystem version pins) + Cargo.lock regen + CHANGELOG. The v0.55.0 wave = 6 merged beats: - #2208 — degate the duckdb competitive bench behind a feature; cold merge_group builds were intermittently failing the merge queue while PR heads were green. - #2209 (PMAT-918) — apr convert --quantize q4k now synthesizes the tied lm_head for tied-embedding models; the Q4K save path produced a non-runnable .apr that failed at load with "tensor not found: lm_head.weight". - #2210 (PMAT-919) — GPU/CPU parity gate reconciled against ground truth (llama.cpp, per-position): fp32-Mwv is the correct Blackwell default; HwDp4a is genuinely degraded. F2 gate now checks per-position argmax-match + min-cosine over positions >=1, replacing the last-token-only check. - #2211 — gate the coop_gemm_bench example behind opt-in cooperative-matrix; wgpu 27 dropped the Vulkan cooperative-matrix path, breaking --all-targets. - #2212 (PMAT-920) — apr export --format gguf now uses explicit head_dim for exact num_heads, and hard-fails with an actionable error instead of silently stamping a wrong num_heads into a valid-looking GGUF. - #2213 (PMAT-921) — fix the transformer FFN gelu severing the autograd graph (functional::gelu builds output via Tensor::from_vec with no grad_fn), plus a new end-to-end train-to-loss proof that catches the severed-graph class. Version-bump PR only. crates.io publish + git tag + GH release are deferred to a separate human-gated step (do NOT run make publish / cargo publish from this PR). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v0.55.0 — version-bump PR (crates.io publish deferred to a separate human-gated step)
Bumps the aprender ecosystem 0.54.0 → 0.55.0 across all workspace
Cargo.toml(101 ecosystem version pins, 33 files) +
Cargo.lockregen +CHANGELOG.md.User-facing correctness + a reconciled GPU-parity gate + the autograd training story proven. The headline pair: (1)
apr convert/apr exportnow produce runnable models for tied-embedding architectures (a converted.aprwas missing itslm_head; an exported GGUF mis-stampednum_heads); (2) an end-to-end training proof caught that the transformer FFN was still severing the autograd graph (functional::gelu) after the v0.53/v0.54 "complete" sweep — per-layer gradchecks never saw it; a real train-to-loss test did. Each ships a named proof-obligation + a mutation-verified RED-on-bug/GREEN-on-fix falsifier + apv-validated contract.Wave = 6 merged beats
merge_groupbuilds intermittently failed the queue while PR heads were green)apr convert --quantize q4know synthesizes the tiedlm_head— fixes non-runnable.aprthat failed load with "tensor not found: lm_head.weight"Mwvis the correct Blackwell default,HwDp4ais degraded; F2 gate now per-position argmax + min-cosinecoop_gemm_benchexample behind opt-incooperative-matrix(wgpu 27 dropped the Vulkan coop-matrix path → broke--all-targets)apr export --format ggufuses explicithead_dimfor exactnum_heads, hard-fails (no GGUF written) instead of silently stamping a wrongnum_headsgelusevering the autograd graph (functional::gelu→ nograd_fn); + new end-to-end train-to-loss proof (loss 3.565 → 1.4e-5, every param group updates)Verification
version = "0.54.0"→0.55.0occurrences (all ecosystem package-versions + sibling path-dep pins; no third-party deps touched), 33Cargo.tomlfiles.cargo update --workspace --offlineregeneratedCargo.lock.cargo metadataresolves;cargo check -p aprender-corefinished clean.