PUMA v2.3.0 Release Notes

Release date: 2026-05-13
Previous release: v2.2.0 (2026-05-13)
Branch: develop → main (post-tag)

Summary

This release consolidates Sprint 6 (dashboard polish + structural
refactor) and retrospective documentation work (INDEX.md +
docs/overview.md + README branding) onto the v2.2.0 base. With this
release, Phase C of the master plan is fully complete.

Highlights

Dashboard production-quality (Sprint 6)

Major refactor: app.py reduced from 803 LOC monolithic to 168 LOC
router (-79 %). View logic delegated to seven modules in
src/puma/dashboard/views/. Each view is independently importable
and testable; the router publishes filters to st.session_state and
dispatches via a VIEWS dict.

Ten polish improvements applied:

#	Improvement	Impact
1	`@st.cache_data(ttl=60)` on 7 loaders	Performance
2	`st.spinner` on slow operations	UX
3	CSV export on 4 tables	Productivity
4	Tooltips on ≈ 12 metric cards	UX
5	Unified empty-filtered-state component	UX
6	Friendly expander titles in Overview	UX
7	Module-level imports (no more inline)	Code quality
8	Emoji prefixes consistent across 7 view titles	UX
9	Dark-mode dataframe text legibility	UX (bug fix)
10	Empty-selectbox guard in Instance Drill-down	Robustness

Plus: first-visit guided tour with view overview and tips
(download CSV, dark mode, tooltips). Persistent dismiss via
st.session_state["tour_dismissed"]; "📖 Show tour" button in the
sidebar to re-open.

Documentation structure (Phase E.bis retrospective + Phase E.ter)

INDEX.md (root, uppercase): project status, phases, releases,
debt tracking, architecture entry points. Created in Phase E.bis;
this release updates it for v2.3.0 status.
docs/overview.md (new location): preserves the 256 LOC of
architectural content from the legacy lowercase index.md.
README.md: branded header with PUMA logo, descriptive
blockquote, and Related-Resources section linking to puma-vault,
the published knowledge garden, releases, INDEX.md, and
docs/overview.md.

Quality

Tests: 318 passing (up from 313 in v2.2.0; +5 dashboard smoke
tests covering view module integrity, polish helpers, cache
decorator presence, and the end-to-end AppTest render with the
live database).
Coverage: 58 % (up from 55 % in v2.2.0).
Pre-commit: 10/10 hooks green.
CI: green on both main and develop.
Baseline reproducibility: F1 = 0.5867 ± 0.01 holds; verified
via puma validate-baseline (PASS at 0.5831, delta −0.0036).

Methodological findings (academic traceability)

Sprint 6 surfaced one additional finding consistent with the
meta-pattern documented in docs/known_debt.md ("symptom in layer
N, root cause in layer M ≠ N"):

Dark-mode dataframe invisibility. The CSS rule applied
light-mode colours globally; under dark mode, table text inherited
light-mode colours against the dark background, rendering tables
nearly unreadable. Symptom (invisible tables) appeared in the
dashboard layer; root cause (CSS scope without theme awareness)
was in the styling layer. Resolved in the same commit as the
refactor by adding a theme-aware CSS override
(color: #E5E7EB + background-color: #16213E when
dark_mode == True).

This brings the meta-pattern catalogue to five instances (D15, D18,
D21, D22, and this CSS scope issue); the fifth is retired in the
same commit that surfaced it.

CI workflow hygiene

The .github/workflows/release.yml fix introduced in Phase E.bis
(commit 863c166) is now exercised end-to-end by the v2.3.0 tag
push. After the tag was pushed and gh release create ran, exactly
one release was created (no duplicate draft). The fix is verified
effective for v2.X.0 releases going forward.

Debt tracking

No new open debt introduced by this release.
Total resolved across v2.0.0 → v2.3.0: 15 of 24 items (62 %).
Phase C: ✓ COMPLETE (was the last open phase; all five
Gate-C criteria met).

Full inventory and diagnostic write-ups in
docs/known_debt.md.

Known limitations

Single hardware tier evaluated (gpu-entry); models requiring
gpu-mid and above (qwen2.5:14b, gemma3:27b, deepseek-r1:14b,
the gemma4 family, llama3.1:70b) catalogued but not yet
empirically evaluated.
AMD ROCm and Apple Metal backends not yet detected (development
hardware is NVIDIA-only).
TAWOS SHA-256 end-to-end fetch test pending (Gate D criterion 3).
input_text not persisted in triage_jira instances (D22, Low —
future data-pipeline enhancement). The Dashboard Instance
Drill-down handles this gracefully with an informative message.

Master plan status (post-v2.3.0)

Phase	Status
A — Foundations	✓ COMPLETE
B — Multi-model sweep	✓ COMPLETE
C — Professional dashboard	✓ COMPLETE (this release)
D — Technical depth	✓ ~95 % (ROCm/Metal n/a in current hardware)
E — Documentation and releases	✓ COMPLETE (v2.0.0, v2.1.0, v2.2.0, v2.3.0)

All five phases of the original master plan are now complete or
effectively complete (Phase D's remaining items are
hardware-dependent or scope-deferred).

Upgrade notes

No breaking changes to the public CLI or YAML run-spec schema.
Dashboard refactor is internal; user-facing behaviour is preserved.
Existing run-specs and CLI invocations work unchanged.
The dashboard module structure has changed (app.py is now a
router; each view lives in src/puma/dashboard/views/<name>.py).
Any external tooling that imported view code from app.py should
migrate to the new module paths.

Acknowledgments

Development assistance provided by generative AI tooling. All commits
are attributed to the project's git identity per repository
convention.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PUMA v2.3.0 — dashboard production-quality + docs structure

Choose a tag to compare

Sorry, something went wrong.