Skip to content

PUMA v2.3.0 — dashboard production-quality + docs structure

Choose a tag to compare

@pumacp pumacp released this 13 May 01:58
· 257 commits to main since this release

PUMA v2.3.0 Release Notes

Release date: 2026-05-13
Previous release: v2.2.0 (2026-05-13)
Branch: develop → main (post-tag)

Summary

This release consolidates Sprint 6 (dashboard polish + structural
refactor) and retrospective documentation work (INDEX.md +
docs/overview.md + README branding) onto the v2.2.0 base. With this
release, Phase C of the master plan is fully complete.

Highlights

Dashboard production-quality (Sprint 6)

Major refactor: app.py reduced from 803 LOC monolithic to 168 LOC
router
(-79 %). View logic delegated to seven modules in
src/puma/dashboard/views/. Each view is independently importable
and testable; the router publishes filters to st.session_state and
dispatches via a VIEWS dict.

Ten polish improvements applied:

# Improvement Impact
1 @st.cache_data(ttl=60) on 7 loaders Performance
2 st.spinner on slow operations UX
3 CSV export on 4 tables Productivity
4 Tooltips on ≈ 12 metric cards UX
5 Unified empty-filtered-state component UX
6 Friendly expander titles in Overview UX
7 Module-level imports (no more inline) Code quality
8 Emoji prefixes consistent across 7 view titles UX
9 Dark-mode dataframe text legibility UX (bug fix)
10 Empty-selectbox guard in Instance Drill-down Robustness

Plus: first-visit guided tour with view overview and tips
(download CSV, dark mode, tooltips). Persistent dismiss via
st.session_state["tour_dismissed"]; "📖 Show tour" button in the
sidebar to re-open.

Documentation structure (Phase E.bis retrospective + Phase E.ter)

  • INDEX.md (root, uppercase): project status, phases, releases,
    debt tracking, architecture entry points. Created in Phase E.bis;
    this release updates it for v2.3.0 status.
  • docs/overview.md (new location): preserves the 256 LOC of
    architectural content from the legacy lowercase index.md.
  • README.md: branded header with PUMA logo, descriptive
    blockquote, and Related-Resources section linking to puma-vault,
    the published knowledge garden, releases, INDEX.md, and
    docs/overview.md.

Quality

  • Tests: 318 passing (up from 313 in v2.2.0; +5 dashboard smoke
    tests covering view module integrity, polish helpers, cache
    decorator presence, and the end-to-end AppTest render with the
    live database).
  • Coverage: 58 % (up from 55 % in v2.2.0).
  • Pre-commit: 10/10 hooks green.
  • CI: green on both main and develop.
  • Baseline reproducibility: F1 = 0.5867 ± 0.01 holds; verified
    via puma validate-baseline (PASS at 0.5831, delta −0.0036).

Methodological findings (academic traceability)

Sprint 6 surfaced one additional finding consistent with the
meta-pattern documented in docs/known_debt.md ("symptom in layer
N, root cause in layer M ≠ N"):

  • Dark-mode dataframe invisibility. The CSS rule applied
    light-mode colours globally; under dark mode, table text inherited
    light-mode colours against the dark background, rendering tables
    nearly unreadable. Symptom (invisible tables) appeared in the
    dashboard layer; root cause (CSS scope without theme awareness)
    was in the styling layer. Resolved in the same commit as the
    refactor by adding a theme-aware CSS override
    (color: #E5E7EB + background-color: #16213E when
    dark_mode == True).

This brings the meta-pattern catalogue to five instances (D15, D18,
D21, D22, and this CSS scope issue); the fifth is retired in the
same commit that surfaced it.

CI workflow hygiene

The .github/workflows/release.yml fix introduced in Phase E.bis
(commit 863c166) is now exercised end-to-end by the v2.3.0 tag
push. After the tag was pushed and gh release create ran, exactly
one release was created (no duplicate draft). The fix is verified
effective for v2.X.0 releases going forward.

Debt tracking

  • No new open debt introduced by this release.
  • Total resolved across v2.0.0 → v2.3.0: 15 of 24 items (62 %).
  • Phase C: ✓ COMPLETE (was the last open phase; all five
    Gate-C criteria met).

Full inventory and diagnostic write-ups in
docs/known_debt.md.

Known limitations

  • Single hardware tier evaluated (gpu-entry); models requiring
    gpu-mid and above (qwen2.5:14b, gemma3:27b, deepseek-r1:14b,
    the gemma4 family, llama3.1:70b) catalogued but not yet
    empirically evaluated.
  • AMD ROCm and Apple Metal backends not yet detected (development
    hardware is NVIDIA-only).
  • TAWOS SHA-256 end-to-end fetch test pending (Gate D criterion 3).
  • input_text not persisted in triage_jira instances (D22, Low —
    future data-pipeline enhancement). The Dashboard Instance
    Drill-down handles this gracefully with an informative message.

Master plan status (post-v2.3.0)

Phase Status
A — Foundations ✓ COMPLETE
B — Multi-model sweep ✓ COMPLETE
C — Professional dashboard ✓ COMPLETE (this release)
D — Technical depth ✓ ~95 % (ROCm/Metal n/a in current hardware)
E — Documentation and releases ✓ COMPLETE (v2.0.0, v2.1.0, v2.2.0, v2.3.0)

All five phases of the original master plan are now complete or
effectively complete (Phase D's remaining items are
hardware-dependent or scope-deferred).

Upgrade notes

  • No breaking changes to the public CLI or YAML run-spec schema.
  • Dashboard refactor is internal; user-facing behaviour is preserved.
  • Existing run-specs and CLI invocations work unchanged.
  • The dashboard module structure has changed (app.py is now a
    router; each view lives in src/puma/dashboard/views/<name>.py).
    Any external tooling that imported view code from app.py should
    migrate to the new module paths.

Acknowledgments

Development assistance provided by generative AI tooling. All commits
are attributed to the project's git identity per repository
convention.