PUMA v2.7.0 Release Notes

Release date: 2026-05-16
Previous release: v2.6.0 (2026-05-16)
Branch: develop → main (post-tag)

Summary

This release consolidates Sprint 10 (catalog expansion) onto the
v2.6.0 base. It adds two Alibaba Qwen3 family entries to the
catalog — both verified against the Ollama registry before
inclusion — and formally excludes Kimi K2.6 after a 13-tag
registry probe confirmed it is not distributed via Ollama. The
catalog schema is preserved at the 8 fields used since v2.0.0;
all v2.7.0 metadata lives within the existing notes field per
the project's minimum-complexity discipline.

Highlights

Two Qwen3 entries — registry-verified before cataloguing

Tag	Type	GGUF (verified)	Profile	Notes
`qwen3:30b`	Dense	17.3 GB	`gpu-high`	Hybrid Gated DeltaNet + self-attention, 262144 context
`qwen3:30b-a3b`	MoE	17.3 GB	`gpu-high`	30B total / ~3B active per token; F8/D18 MoE caveat preserved in `notes`

Both entries:

Verified via Ollama registry manifest probe
(registry.ollama.ai/v2/library/qwen3/manifests/* returned HTTP
200 with the GGUF size derived from the sum of layer sizes).
Declare logprobs_supported: false conservatively until
empirical verification on appropriate hardware.
Excluded from gpu-entry AND every apple-silicon-* profile by
the P11 pending-validation invariant. Five new regression-guard
tests in tests/unit/test_catalog_metadata.py pin this contract
via exact-equality on profiles_compatible == ['gpu-high'].
params_b: 30.0 follows the gemma4:26b-a4b precedent (TOTAL
when the tag encodes both numbers). The MoE/F8 caveat in the
notes field documents that active-params count does NOT
predict GGUF size or runtime VRAM consumption.

Kimi K2.6 — formally excluded after 13-tag registry probe

A registry probe on 2026-05-16 returned HTTP 404 on every
plausible Ollama tag naming for Kimi K2.6:

kimi-k2:6              kimi-k2:latest        kimi-k2:1t
kimi-k2:1t-instruct    kimi-k2:0905          kimi-k2:base
kimi-k2:instruct       kimi:latest           kimi-k2.6:latest
moonshot:latest        moonshot:kimi-k2      kimi-k2-base:latest
kimi-k2-instruct:latest

The model is not distributed via the Ollama registry as of the
v2.7.0 cut. Cataloguing a non-existent ollama_tag would violate
the project's empirical-first principle (P10) and produce a
broken puma models pull command for users following the catalog
metadata. The exclusion decision is recorded in
docs/CATALOG_HISTORY.md v2.7.0 § "Considered but not catalogued"
with the full probe table for academic traceability. It may be
reconsidered in a future release if Moonshot AI or a third-party
distributor publishes K2.6 to the Ollama registry.

Deferred — known on Ollama but out of v2.7.0 scope

The registry probe confirmed these tags exist (HTTP 200) but they
are deferred from v2.7.0 for scope discipline:

Tag	Real GGUF	Reason
`qwen3:32b` (dense)	18.8 GB	Marginal upgrade over `qwen3:30b`; defer until empirical validation on gpu-high can distinguish them
`qwen3:235b-a22b` (MoE)	132.4 GB	Requires multi-GPU rigs well beyond gpu-high (24+ GB VRAM); pending hardware tier extension
`qwen3-coder:30b`, `qwen3-coder:480b`	—	Coder family is task-specific; out of scope for PMO benchmarks

Schema unchanged — minimum-complexity preserved

The original Sprint 10 plan proposed ~12 new YAML fields (family,
parameters_total_b, parameters_active_b, profile_recommended,
size_gb_disk_estimate, size_gb_vram_estimate, quantization,
license, release_date, capabilities, empirical_validation,
validation_blockers). The user's minimum-complexity decision
kept the catalog at the v2.0.0–v2.6.0 schema (8 fields:
ollama_tag, params_b, gguf_size_gb, context_window,
logprobs_supported, profiles_compatible, timeout_s,
notes). All v2.7.0 metadata (license, release date, MoE caveat,
validation blockers, architecture details) lives within
multi-line notes: text. src/puma/preflight/catalog.py and the
ModelEntry dataclass are byte-identical to v2.6.0.

Invariants generalised, not relaxed

The pending-validation exclusion from gpu-entry (established in
Sprint 9 for Apple Silicon entries) is reaffirmed for the new
Qwen3 entries and extended to every apple-silicon-* profile via
explicit tests:

Test	Status	Invariant
`test_gemma4_family_excluded_from_gpu_entry`	PASSED (preserved)	D18/F8 (Sprint 2)
`test_gemma4_family_not_compatible_with_any_apple_silicon`	PASSED (preserved)	P6 extension to Apple Silicon (Sprint 9)
`test_qwen3_entries_excluded_from_gpu_entry`	PASSED (new)	P10/P11 (Sprint 10)
`test_qwen3_entries_excluded_from_all_apple_silicon`	PASSED (new)	P11 generalisation across profile families
`test_qwen3_entries_target_gpu_high_only`	PASSED (new)	Exact-equality anchor against accidental loosening

The pattern is now: new entries default to the safest profile
only; loosening requires empirical evidence and an explicit
debt-tracker entry referencing the prior exclusion.

Tests

402 → 407 passing (-m "not ollama"), 7 deselected.
5 new regression-guard tests in
tests/unit/test_catalog_metadata.py (see invariant table
above).
tests/unit/test_preflight_catalog.py::test_load_catalog_returns_all_entries:
entry-count expectation updated 15 → 17 to reflect the two new
Qwen3 additions.
pre-commit run --all-files: all hooks green.
puma validate-baseline (triage, F1 path, fresh Ollama):
PASS f1=0.5831, delta=-0.0036, ±0.01.
puma validate-baseline --expected-mae 5.7150 (estimation,
fresh Ollama): PASS mae=5.7150, delta=+0.0000, ±0.05 —
bit-exact.

Quality

Coverage: 61 % (no significant change from v2.6.0; new entries
are YAML-only, no Python statements added).
CI: green on both main and develop. The
integration-tests-ollama job introduced in v2.5.0 continues to
run on push to those branches.
Baseline reproducibility: F1 = 0.5867 ± 0.01 on triage_jira
preserved; MAE = 5.7150 ± 0.05 on estimation_tawos preserved.
Linux + NVIDIA dispatch byte-identical to v2.6.0 (no new
profiles, no new dispatch logic, no new code paths). The Qwen3
entries appear in models_for_profile('gpu-high') only.

Design decisions

Schema unchanged at 8 fields. Documented Sprint-10-original
proposal of 12 new fields; chose to keep schema minimal. All
metadata that would have required new fields now lives in the
notes multi-line text. Governed by P5 (additive over
modification) and the project's minimum-complexity discipline.
Real Ollama tags only. Every catalogued ollama_tag is
verified against registry.ollama.ai/v2/library/<repo>/manifests/<tag>
before inclusion. The originally-planned qwen3:27b and
qwen3:35b-a3b were remapped to the real qwen3:30b and
qwen3:30b-a3b after probe; Kimi K2.6 was removed entirely
after every plausible tag returned 404.
Conservative params_b for MoE. The qwen3:30b-a3b entry
declares params_b: 30.0 following the gemma4:26b-a4b
precedent — TOTAL params when the tag encodes both numbers. The
F8/D18 caveat in notes documents that active-params count
does NOT predict VRAM consumption.
logprobs_supported: false conservatively. The Qwen3 family
announces logprob support upstream, but PUMA has not yet
empirically verified token-level confidence on these specific
tags. Flipping to true is part of the empirical validation
protocol when hardware becomes available.
gpu-high as the only target. 17.3 GB GGUF exceeds gpu-mid's
12–24 GB upper bound once OS + context overhead are accounted
for; gpu-high (24+ GB VRAM) is the only safe default. The
exact-equality anchor test
test_qwen3_entries_target_gpu_high_only pins this so future
loosening requires deliberate intent.

Debt tracking

No new open debt introduced by this release.
No closure of pre-existing debt — Sprint 10 is
forward-looking catalog expansion, not a debt-paydown Sprint.
Empirical validation of qwen3:30b and qwen3:30b-a3b is the
explicit follow-up; tracked via the notes text on each entry
and via the validation roadmap in
docs/CATALOG_HISTORY.md § "Empirical validation roadmap".

Known limitations

Unchanged from v2.6.0:

Single hardware tier empirically evaluated (gpu-entry);
models requiring gpu-mid/gpu-high are catalogued but not
yet validated.
AMD ROCm not yet detected.
All apple-silicon-* profiles declare
empirical_validation: pending (Sprint 9 forward-work).
TAWOS SHA-256 end-to-end fetch test pending (Gate D criterion
3).

New in v2.7.0:

Both qwen3:30b and qwen3:30b-a3b are catalogued with
validation pending. Users on gpu-high hardware can use them
via puma run and report empirical results to close the
validation gap.
Kimi K2.6 is not distributed via Ollama; the catalog
intentionally omits it. If a third-party distributor publishes
K2.6 to Ollama, a future Sprint can revisit cataloguing.

Empirical validation roadmap (when gpu-high hardware available)

The protocol for closing the validation gap is documented in
docs/CATALOG_HISTORY.md v2.7.0 § "Empirical validation roadmap":

Pull the model via ollama pull qwen3:30b and verify the
digest matches the registry manifest probed at cataloguing.
Run the canonical baselines: triage_jira (F1) and
estimation_tawos (MAE).
Measure parse_failure_rate (should be 0 for usable models)
and reproducibility (bit-exact under T=0.0 + seed=42 on the
gpu-high hardware).
If validation succeeds: bump logprobs_supported to true
(after a logprobs-enabled probe), extend profiles_compatible
to vetted Apple Silicon Max/Ultra variants (≥36 GB unified
memory) pending separate Apple-side validation, and document
the validating run_id in CATALOG_HISTORY.
If validation fails: document the failure mode (analogous to
gemma4 D18) in docs/known_debt.md and keep the entry at its
current [gpu-high] restriction.

Upgrade notes

No breaking changes. Existing CI invocations of
puma validate-baseline continue to work unchanged. Linux +
NVIDIA dispatch is byte-identical to v2.6.0.
Catalog grows by 2 entries, total 17. puma models list
shows the new entries; models_for_profile('gpu-high') now
returns 6 entries (4 from earlier releases + 2 from v2.7.0).
models_for_profile('gpu-entry') is unchanged.
Schema is byte-identical to v2.6.0 — no migration needed
for tooling that reads the catalog YAML directly.
New docs to know about: docs/CATALOG_HISTORY.md v2.7.0
section (full Kimi probe table and deferred-variants list).

Future work pointer

Sprint 10 closes the originally-planned multi-Sprint sequence
(Sprint 8 hardening → Sprint 9 Apple Silicon → Sprint 10 catalog
expansion). The project's technical infrastructure is complete;
the open follow-ups are empirical validations gated on hardware
availability:

MacBook Pro M4/M5: validates the Apple Silicon detection,
native runtime mode, and cross-arch reproducibility hypotheses
(H0/H1/H2/H3 per docs/CROSS_ARCH_REPRODUCIBILITY.md).
gpu-high NVIDIA hardware (24+ GB VRAM): validates
qwen3:30b and qwen3:30b-a3b empirically; closes the
validation roadmap in this release notes file.
Future catalog Sprints may add the deferred Qwen3 variants
(qwen3:32b, qwen3:235b-a22b, qwen3-coder:*) and
reconsider Kimi K2.6 if it becomes available on Ollama.

Acknowledgments

Development assistance provided by generative AI tooling. All
commits are attributed to the project's git identity per
repository convention.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PUMA v2.7.0 — Catalog expansion: Qwen3 (pending validation)

Choose a tag to compare

Sorry, something went wrong.