Skip to content

chore(release): 0.55.0 — PMAT-918..921 wave (convert/export runnable + GPU parity reconciled + autograd training proven)#2218

Merged
noahgift merged 1 commit into
mainfrom
chore/release-0.55.0
Jun 24, 2026
Merged

chore(release): 0.55.0 — PMAT-918..921 wave (convert/export runnable + GPU parity reconciled + autograd training proven)#2218
noahgift merged 1 commit into
mainfrom
chore/release-0.55.0

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

v0.55.0 — version-bump PR (crates.io publish deferred to a separate human-gated step)

Bumps the aprender ecosystem 0.54.0 → 0.55.0 across all workspace Cargo.toml
(101 ecosystem version pins, 33 files) + Cargo.lock regen + CHANGELOG.md.

User-facing correctness + a reconciled GPU-parity gate + the autograd training story proven. The headline pair: (1) apr convert/apr export now produce runnable models for tied-embedding architectures (a converted .apr was missing its lm_head; an exported GGUF mis-stamped num_heads); (2) an end-to-end training proof caught that the transformer FFN was still severing the autograd graph (functional::gelu) after the v0.53/v0.54 "complete" sweep — per-layer gradchecks never saw it; a real train-to-loss test did. Each ships a named proof-obligation + a mutation-verified RED-on-bug/GREEN-on-fix falsifier + a pv-validated contract.

Wave = 6 merged beats

PR Ticket Summary
#2208 Gate the duckdb competitive bench behind a feature (cold merge_group builds intermittently failed the queue while PR heads were green)
#2209 PMAT-918 apr convert --quantize q4k now synthesizes the tied lm_head — fixes non-runnable .apr that failed load with "tensor not found: lm_head.weight"
#2210 PMAT-919 GPU/CPU parity gate reconciled vs ground truth (llama.cpp, per-position): fp32-Mwv is the correct Blackwell default, HwDp4a is degraded; F2 gate now per-position argmax + min-cosine
#2211 Gate coop_gemm_bench example behind opt-in cooperative-matrix (wgpu 27 dropped the Vulkan coop-matrix path → broke --all-targets)
#2212 PMAT-920 apr export --format gguf uses explicit head_dim for exact num_heads, hard-fails (no GGUF written) instead of silently stamping a wrong num_heads
#2213 PMAT-921 Fix transformer FFN gelu severing the autograd graph (functional::gelu → no grad_fn); + new end-to-end train-to-loss proof (loss 3.565 → 1.4e-5, every param group updates)

Verification

  • Bump scope: 101 version = "0.54.0"0.55.0 occurrences (all ecosystem package-versions + sibling path-dep pins; no third-party deps touched), 33 Cargo.toml files.
  • cargo update --workspace --offline regenerated Cargo.lock.
  • cargo metadata resolves; cargo check -p aprender-core finished clean.

Not in this PR: git tag, GH release, and make publish / cargo publish are deferred to a separate human-gated step.

…reconciled + autograd training proven (PMAT-918..921)

Version bump 0.54.0 → 0.55.0 across all workspace Cargo.toml (101 ecosystem
version pins) + Cargo.lock regen + CHANGELOG. The v0.55.0 wave = 6 merged beats:

- #2208 — degate the duckdb competitive bench behind a feature; cold merge_group
  builds were intermittently failing the merge queue while PR heads were green.
- #2209 (PMAT-918) — apr convert --quantize q4k now synthesizes the tied lm_head
  for tied-embedding models; the Q4K save path produced a non-runnable .apr that
  failed at load with "tensor not found: lm_head.weight".
- #2210 (PMAT-919) — GPU/CPU parity gate reconciled against ground truth
  (llama.cpp, per-position): fp32-Mwv is the correct Blackwell default; HwDp4a is
  genuinely degraded. F2 gate now checks per-position argmax-match + min-cosine
  over positions >=1, replacing the last-token-only check.
- #2211 — gate the coop_gemm_bench example behind opt-in cooperative-matrix;
  wgpu 27 dropped the Vulkan cooperative-matrix path, breaking --all-targets.
- #2212 (PMAT-920) — apr export --format gguf now uses explicit head_dim for exact
  num_heads, and hard-fails with an actionable error instead of silently stamping
  a wrong num_heads into a valid-looking GGUF.
- #2213 (PMAT-921) — fix the transformer FFN gelu severing the autograd graph
  (functional::gelu builds output via Tensor::from_vec with no grad_fn), plus a
  new end-to-end train-to-loss proof that catches the severed-graph class.

Version-bump PR only. crates.io publish + git tag + GH release are deferred to a
separate human-gated step (do NOT run make publish / cargo publish from this PR).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge June 24, 2026 11:47
@noahgift noahgift added this pull request to the merge queue Jun 24, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 24, 2026
@noahgift noahgift added this pull request to the merge queue Jun 24, 2026
Merged via the queue into main with commit 6f7c864 Jun 24, 2026
11 checks passed
@noahgift noahgift deleted the chore/release-0.55.0 branch June 24, 2026 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant