PUMA v2.7.0 — Catalog expansion: Qwen3 (pending validation)
PUMA v2.7.0 Release Notes
Release date: 2026-05-16
Previous release: v2.6.0 (2026-05-16)
Branch: develop → main (post-tag)
Summary
This release consolidates Sprint 10 (catalog expansion) onto the
v2.6.0 base. It adds two Alibaba Qwen3 family entries to the
catalog — both verified against the Ollama registry before
inclusion — and formally excludes Kimi K2.6 after a 13-tag
registry probe confirmed it is not distributed via Ollama. The
catalog schema is preserved at the 8 fields used since v2.0.0;
all v2.7.0 metadata lives within the existing notes field per
the project's minimum-complexity discipline.
Highlights
Two Qwen3 entries — registry-verified before cataloguing
| Tag | Type | GGUF (verified) | Profile | Notes |
|---|---|---|---|---|
qwen3:30b |
Dense | 17.3 GB | gpu-high |
Hybrid Gated DeltaNet + self-attention, 262144 context |
qwen3:30b-a3b |
MoE | 17.3 GB | gpu-high |
30B total / ~3B active per token; F8/D18 MoE caveat preserved in notes |
Both entries:
- Verified via Ollama registry manifest probe
(registry.ollama.ai/v2/library/qwen3/manifests/*returned HTTP
200 with the GGUF size derived from the sum of layer sizes). - Declare
logprobs_supported: falseconservatively until
empirical verification on appropriate hardware. - Excluded from
gpu-entryAND everyapple-silicon-*profile by
the P11 pending-validation invariant. Five new regression-guard
tests intests/unit/test_catalog_metadata.pypin this contract
via exact-equality onprofiles_compatible == ['gpu-high']. params_b: 30.0follows thegemma4:26b-a4bprecedent (TOTAL
when the tag encodes both numbers). The MoE/F8 caveat in the
notesfield documents that active-params count does NOT
predict GGUF size or runtime VRAM consumption.
Kimi K2.6 — formally excluded after 13-tag registry probe
A registry probe on 2026-05-16 returned HTTP 404 on every
plausible Ollama tag naming for Kimi K2.6:
kimi-k2:6 kimi-k2:latest kimi-k2:1t
kimi-k2:1t-instruct kimi-k2:0905 kimi-k2:base
kimi-k2:instruct kimi:latest kimi-k2.6:latest
moonshot:latest moonshot:kimi-k2 kimi-k2-base:latest
kimi-k2-instruct:latest
The model is not distributed via the Ollama registry as of the
v2.7.0 cut. Cataloguing a non-existent ollama_tag would violate
the project's empirical-first principle (P10) and produce a
broken puma models pull command for users following the catalog
metadata. The exclusion decision is recorded in
docs/CATALOG_HISTORY.md v2.7.0 § "Considered but not catalogued"
with the full probe table for academic traceability. It may be
reconsidered in a future release if Moonshot AI or a third-party
distributor publishes K2.6 to the Ollama registry.
Deferred — known on Ollama but out of v2.7.0 scope
The registry probe confirmed these tags exist (HTTP 200) but they
are deferred from v2.7.0 for scope discipline:
| Tag | Real GGUF | Reason |
|---|---|---|
qwen3:32b (dense) |
18.8 GB | Marginal upgrade over qwen3:30b; defer until empirical validation on gpu-high can distinguish them |
qwen3:235b-a22b (MoE) |
132.4 GB | Requires multi-GPU rigs well beyond gpu-high (24+ GB VRAM); pending hardware tier extension |
qwen3-coder:30b, qwen3-coder:480b |
— | Coder family is task-specific; out of scope for PMO benchmarks |
Schema unchanged — minimum-complexity preserved
The original Sprint 10 plan proposed ~12 new YAML fields (family,
parameters_total_b, parameters_active_b, profile_recommended,
size_gb_disk_estimate, size_gb_vram_estimate, quantization,
license, release_date, capabilities, empirical_validation,
validation_blockers). The user's minimum-complexity decision
kept the catalog at the v2.0.0–v2.6.0 schema (8 fields:
ollama_tag, params_b, gguf_size_gb, context_window,
logprobs_supported, profiles_compatible, timeout_s,
notes). All v2.7.0 metadata (license, release date, MoE caveat,
validation blockers, architecture details) lives within
multi-line notes: text. src/puma/preflight/catalog.py and the
ModelEntry dataclass are byte-identical to v2.6.0.
Invariants generalised, not relaxed
The pending-validation exclusion from gpu-entry (established in
Sprint 9 for Apple Silicon entries) is reaffirmed for the new
Qwen3 entries and extended to every apple-silicon-* profile via
explicit tests:
| Test | Status | Invariant |
|---|---|---|
test_gemma4_family_excluded_from_gpu_entry |
PASSED (preserved) | D18/F8 (Sprint 2) |
test_gemma4_family_not_compatible_with_any_apple_silicon |
PASSED (preserved) | P6 extension to Apple Silicon (Sprint 9) |
test_qwen3_entries_excluded_from_gpu_entry |
PASSED (new) | P10/P11 (Sprint 10) |
test_qwen3_entries_excluded_from_all_apple_silicon |
PASSED (new) | P11 generalisation across profile families |
test_qwen3_entries_target_gpu_high_only |
PASSED (new) | Exact-equality anchor against accidental loosening |
The pattern is now: new entries default to the safest profile
only; loosening requires empirical evidence and an explicit
debt-tracker entry referencing the prior exclusion.
Tests
- 402 → 407 passing (
-m "not ollama"), 7 deselected. - 5 new regression-guard tests in
tests/unit/test_catalog_metadata.py(see invariant table
above). tests/unit/test_preflight_catalog.py::test_load_catalog_returns_all_entries:
entry-count expectation updated 15 → 17 to reflect the two new
Qwen3 additions.pre-commit run --all-files: all hooks green.puma validate-baseline(triage, F1 path, fresh Ollama):
PASS f1=0.5831, delta=-0.0036, ±0.01.puma validate-baseline --expected-mae 5.7150(estimation,
fresh Ollama):PASS mae=5.7150, delta=+0.0000, ±0.05—
bit-exact.
Quality
- Coverage: 61 % (no significant change from v2.6.0; new entries
are YAML-only, no Python statements added). - CI: green on both
mainanddevelop. The
integration-tests-ollamajob introduced in v2.5.0 continues to
run on push to those branches. - Baseline reproducibility: F1 = 0.5867 ± 0.01 on
triage_jira
preserved; MAE = 5.7150 ± 0.05 onestimation_tawospreserved. - Linux + NVIDIA dispatch byte-identical to v2.6.0 (no new
profiles, no new dispatch logic, no new code paths). The Qwen3
entries appear inmodels_for_profile('gpu-high')only.
Design decisions
- Schema unchanged at 8 fields. Documented Sprint-10-original
proposal of 12 new fields; chose to keep schema minimal. All
metadata that would have required new fields now lives in the
notesmulti-line text. Governed by P5 (additive over
modification) and the project's minimum-complexity discipline. - Real Ollama tags only. Every catalogued
ollama_tagis
verified againstregistry.ollama.ai/v2/library/<repo>/manifests/<tag>
before inclusion. The originally-plannedqwen3:27band
qwen3:35b-a3bwere remapped to the realqwen3:30band
qwen3:30b-a3bafter probe; Kimi K2.6 was removed entirely
after every plausible tag returned 404. - Conservative
params_bfor MoE. Theqwen3:30b-a3bentry
declaresparams_b: 30.0following thegemma4:26b-a4b
precedent — TOTAL params when the tag encodes both numbers. The
F8/D18 caveat innotesdocuments that active-params count
does NOT predict VRAM consumption. logprobs_supported: falseconservatively. The Qwen3 family
announces logprob support upstream, but PUMA has not yet
empirically verified token-level confidence on these specific
tags. Flipping to true is part of the empirical validation
protocol when hardware becomes available.- gpu-high as the only target. 17.3 GB GGUF exceeds gpu-mid's
12–24 GB upper bound once OS + context overhead are accounted
for; gpu-high (24+ GB VRAM) is the only safe default. The
exact-equality anchor test
test_qwen3_entries_target_gpu_high_onlypins this so future
loosening requires deliberate intent.
Debt tracking
- No new open debt introduced by this release.
- No closure of pre-existing debt — Sprint 10 is
forward-looking catalog expansion, not a debt-paydown Sprint. - Empirical validation of
qwen3:30bandqwen3:30b-a3bis the
explicit follow-up; tracked via thenotestext on each entry
and via the validation roadmap in
docs/CATALOG_HISTORY.md§ "Empirical validation roadmap".
Known limitations
Unchanged from v2.6.0:
- Single hardware tier empirically evaluated (
gpu-entry);
models requiringgpu-mid/gpu-highare catalogued but not
yet validated. - AMD ROCm not yet detected.
- All
apple-silicon-*profiles declare
empirical_validation: pending(Sprint 9 forward-work). - TAWOS SHA-256 end-to-end fetch test pending (Gate D criterion
3).
New in v2.7.0:
- Both
qwen3:30bandqwen3:30b-a3bare catalogued with
validation pending. Users ongpu-highhardware can use them
viapuma runand report empirical results to close the
validation gap. - Kimi K2.6 is not distributed via Ollama; the catalog
intentionally omits it. If a third-party distributor publishes
K2.6 to Ollama, a future Sprint can revisit cataloguing.
Empirical validation roadmap (when gpu-high hardware available)
The protocol for closing the validation gap is documented in
docs/CATALOG_HISTORY.md v2.7.0 § "Empirical validation roadmap":
- Pull the model via
ollama pull qwen3:30band verify the
digest matches the registry manifest probed at cataloguing. - Run the canonical baselines: triage_jira (F1) and
estimation_tawos (MAE). - Measure
parse_failure_rate(should be 0 for usable models)
and reproducibility (bit-exact under T=0.0 + seed=42 on the
gpu-high hardware). - If validation succeeds: bump
logprobs_supportedto true
(after a logprobs-enabled probe), extendprofiles_compatible
to vetted Apple Silicon Max/Ultra variants (≥36 GB unified
memory) pending separate Apple-side validation, and document
the validating run_id in CATALOG_HISTORY. - If validation fails: document the failure mode (analogous to
gemma4 D18) indocs/known_debt.mdand keep the entry at its
current[gpu-high]restriction.
Upgrade notes
- No breaking changes. Existing CI invocations of
puma validate-baselinecontinue to work unchanged. Linux +
NVIDIA dispatch is byte-identical to v2.6.0. - Catalog grows by 2 entries, total 17.
puma models list
shows the new entries;models_for_profile('gpu-high')now
returns 6 entries (4 from earlier releases + 2 from v2.7.0).
models_for_profile('gpu-entry')is unchanged. - Schema is byte-identical to v2.6.0 — no migration needed
for tooling that reads the catalog YAML directly. - New docs to know about:
docs/CATALOG_HISTORY.mdv2.7.0
section (full Kimi probe table and deferred-variants list).
Future work pointer
Sprint 10 closes the originally-planned multi-Sprint sequence
(Sprint 8 hardening → Sprint 9 Apple Silicon → Sprint 10 catalog
expansion). The project's technical infrastructure is complete;
the open follow-ups are empirical validations gated on hardware
availability:
- MacBook Pro M4/M5: validates the Apple Silicon detection,
native runtime mode, and cross-arch reproducibility hypotheses
(H0/H1/H2/H3 perdocs/CROSS_ARCH_REPRODUCIBILITY.md). - gpu-high NVIDIA hardware (24+ GB VRAM): validates
qwen3:30bandqwen3:30b-a3bempirically; closes the
validation roadmap in this release notes file. - Future catalog Sprints may add the deferred Qwen3 variants
(qwen3:32b,qwen3:235b-a22b,qwen3-coder:*) and
reconsider Kimi K2.6 if it becomes available on Ollama.
Acknowledgments
Development assistance provided by generative AI tooling. All
commits are attributed to the project's git identity per
repository convention.