Skip to content

tr+totlib: physical EXTERNAL_DRIVEN_I scalar (L-7b-i)#187

Merged
k-yoshimi merged 4 commits into
chore/pre-push-hook-worktree-compatfrom
claude/2026-05-02-l7b-i-external-driven-i
May 2, 2026
Merged

tr+totlib: physical EXTERNAL_DRIVEN_I scalar (L-7b-i)#187
k-yoshimi merged 4 commits into
chore/pre-push-hook-worktree-compatfrom
claude/2026-05-02-l7b-i-external-driven-i

Conversation

@k-yoshimi
Copy link
Copy Markdown
Owner

Summary

Replaces the L-7a skeleton coupling (PLHCD as MA-magnitude carrier) with a
physically meaningful EXTERNAL_DRIVEN_I scalar [MA] injected via Gaussian
profile into AJRF, plus AJRFT C ABI exposure for integration verification,
plus tr_api_validate extension for the RW=0 + I!=0 misconfiguration
case.

4 commits:

  1. Fortran + C ABI (82479e82): adds EXTERNAL_DRIVEN_I/R0/RW scalars
    to tr/trcomm_param.f90, registers in tr_param_registry.f90, defaults
    in trinit.f90, injects Gaussian into AJRF in trprf.f90's TRPWRF,
    exposes AJRFT via tr_state_c + tr_api.h (end-of-struct), updates
    tr_api_get_state + tr_api_validate, and surfaces AJRFT in
    python/trlib/state.py.
  2. Python pipeline (4d533608): switches
    python/totlib/pipeline.py's COUPLING_RULES[("fp","tr")] from PLHCD
    skeleton to EXTERNAL_DRIVEN_I, removes the 5-line "skeleton coupling
    caveat" comment block. Existing pipeline tests updated.
  3. Tests + docs (aaf1c1c5): adds 5 new test cases under
    python/trlib/tests/test_external_driven_i.py (default/zero
    equivalence, downstream effect, AJRFT integration, registry
    round-trip, validate diagnostic) + README updates in
    python/totlib/ and python/trlib/.
  4. Pre-push review fixes (b09a0d06): gates the existing RF
    normalization block on (PECTOT+PLHTOT+PICTOT > 0) so a
    EXTERNAL_DRIVEN_I != 0 config with zero RF cannot reach
    0/0 divisions; introduces TR_STATE_ABI_VERSION constant in
    tr_api.h (=2 after this PR appends AJRFT); fixes a stale
    docstring count and documents the test threshold rationale.

Spec: docs/superpowers/specs/2026-05-02-l7b-i-external-driven-i-design.md
Plan: docs/superpowers/plans/2026-05-02-l7b-i-external-driven-i.md

Test plan

  • Layer 1 equivalence (demo2014 + ht6m) PASS at 1e-10 — backward
    compat gate (EXTERNAL_DRIVEN_I=0 default leaves AJRF unchanged)
  • All existing pipeline tests PASS with new dst_param
  • 5 new test cases PASS (default no-op / Q0 shift / AJRFT
    integration to 1e-12 / 3-scalar round-trip / OUT_OF_RANGE diag)
  • Pathological edge case verified: EXTERNAL_DRIVEN_I=1, PECRW=0
    no longer NaN-poisons (was a regression in Phase 1, fixed in
    commit 4)
  • In-house superpowers:code-reviewer: HIGH/MED resolved
  • Codex codex-rescue independent reviewer: HIGH (ABI break)
    addressed via TR_STATE_ABI_VERSION documentation; MED
    (zero-width divide) addressed via RF block gate
  • CLAUDE.md pre-push gate: REVIEW_OK marker for b09a0d06
  • Canonical sweep (totlib + tot_mcp): 206 passed, 1 unrelated fail
    (documented Python 3.10 ExceptionGroup test, pre-existing)

L-7b-i scope confirmation

In scope (this PR):

  • 3 new scalars (EXTERNAL_DRIVEN_I, EXTERNAL_DRIVEN_R0,
    EXTERNAL_DRIVEN_RW) registered + defaulted + Gaussian-injected
  • AJRFT exposed via tr_state_c + tr_api.h (struct ABI v2)
  • tr_api_validate extension for RW=0 misconfiguration
  • pipeline.py COUPLING_RULES rewrite (PLHCD → EXTERNAL_DRIVEN_I)
  • 5 new test cases + README updates
  • RF block gating + AJRF zero-init for the no-RF + external case

Out of scope (L-7b follow-ups):

  • L-7b-ii: BPSD broker profile coupling (wr → fp/tr, eq → tr)
  • L-7b-iii: Declarative tot.couple(src, dst) API
  • L-7b-iv: Per-module state aggregation
  • AJOHT/AJBST/AJNBT additional exposure
  • Test combining RF (PEC/PLH/PIC) with EXTERNAL_DRIVEN_I (additive
    composition path) — flagged as a future hardening test by the
    in-house reviewer

🤖 Generated with Claude Code

k-yoshimi and others added 4 commits May 2, 2026 16:49
Adds 3 new scalars to tr/trcomm_param.f90: EXTERNAL_DRIVEN_I [MA],
EXTERNAL_DRIVEN_R0 (Gaussian center, normalized rho), EXTERNAL_DRIVEN_RW
(Gaussian width, normalized rho). Defaults: 0/0/0.3 (no-op when I=0).

Registered in tr_param_registry.f90 (3 CASEs + USE import). Defaults
set in trinit.f90 EXTERNAL DRIVEN CURRENT block. Injected additively
into AJRF(NR) in trprf.f90's TRPWRF after the existing PEC/PLH/PIC
loop, normalized so AJRFT [MA] = SUM(AJRF * DSRHO * DR) / 1e6 includes
exactly EXTERNAL_DRIVEN_I [MA] of contribution. Inline math uses
DVRHO/(2*PI*RR) since DSRHO is local to other routines, not in TRCOMM.

The early-return guard at trprf.f90:19 was extended so the routine
also runs when only EXTERNAL_DRIVEN_I is set (PEC/PLH/PIC inputs all
zero produces zero contributions, harmless but not skipped).

Two-stage guard in trprf injection: outer (I!=0 .AND. RW>0) for default
no-op + silent-skip-on-zero-width, inner (SUM_EXT>0) for extreme RW
edge case. tr_api_validate surfaces RW<=0 + I!=0 as OUT_OF_RANGE.

AJRFT exposed via tr_state_c (Fortran, end-of-struct), tr_api.h C
header (matching offset), python/trlib/_ffi.py ctypes mirror, and
python/trlib/state.py SCALAR_FIELDS. test_ffi layout assertion
updated for the +1 scalar (sizeof(TrStateC) +8 bytes).

Backward compat: EXTERNAL_DRIVEN_I=0 default → trprf injection block
skipped → AJRF unchanged. Verified locally: Layer 1 equivalence
(demo2014 + ht6m) PASS at 1e-10. PLHCD logic untouched.

Smoke tests confirm: AJRFT in trlib state.scalars (default 0.0);
EXTERNAL_DRIVEN_I=1.0 produces AJRFT=1.0000000000000007 (1e-15 of
target); validate() with RW=0+I!=0 emits OUT_OF_RANGE diag (code=1).

Spec: docs/superpowers/specs/2026-05-02-l7b-i-external-driven-i-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches python/totlib/pipeline.py COUPLING_RULES[("fp","tr")] from
the L-7a skeleton (PLHCD as MA-magnitude carrier) to the physically
meaningful EXTERNAL_DRIVEN_I scalar [MA] introduced in the previous
commit. transform stays * 1e-6 (Amperes -> MA); dst_param string
changes; doc string is updated; the 5-line "skeleton coupling caveat"
comment block is removed.

Existing pipeline tests (test_pipeline.py / test_pipeline_equiv.py)
updated to assert against the new dst_param. Pattern X (direct) and
pattern Y (run_pipeline) equivalence at 1e-10 is preserved; the test
logic is unchanged, only the scalar name. test_pipeline_registry.py
had no PLHCD references and is unchanged.

Layer 1 equivalence (test_equivalence.py: demo2014 + ht6m) continues
to PASS at 1e-10 -- EXTERNAL_DRIVEN_I=0 default leaves tr behavior
identical to before this series.

Spec: docs/superpowers/specs/2026-05-02-l7b-i-external-driven-i-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oval

Adds 5 new test cases under python/trlib/tests/test_external_driven_i.py:

  - default (un-set) ≡ explicit 0.0 (no-op invariant; backward-compat)
  - I=1.0 MA shifts Q0 measurably (downstream effect proof)
  - AJRFT == 1.0 to 1e-12 when I=1.0 (Gaussian normalization correctness)
  - 3 scalars round-trip through set_param (registry CASE coverage)
  - validate() emits OUT_OF_RANGE for RW=0 + I!=0

Updates python/totlib/README.md (replaces PLHCD skeleton block with
physical EXTERNAL_DRIVEN_I description) and adds a new "External driven
current" section to python/trlib/README.md with the 3-row parameter
table, usage example, normalization note, and validation note. The
TrState scalar list is updated from 13 to 14 to include AJRFT.

The "downstream observable" is Q0 rather than AJT: AJT is a boundary
condition (RIPS/RIPE) and is invariant for ntmax=1, while Q0 reflects
the redistributed current density and so cleanly tracks the injected
external current.

Spec: docs/superpowers/specs/2026-05-02-l7b-i-external-driven-i-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four fixes from in-house code-reviewer + Codex independent reviewer:

1. tr/trprf.f90: gate the original RF heating block on (PECTOT+PLHTOT
   +PICTOT > 0) so a config with EXTERNAL_DRIVEN_I != 0 + zero RF
   power + zero RF width (PECRW/PLHRW/PICRW) cannot reach the
   PECTOT/SUMEC normalization that would divide 0/0. Adds AJRF
   zero-init when the RF block was skipped, so the post-condition
   AJRFT == EXTERNAL_DRIVEN_I holds exactly with no stale carry-over
   from a prior call. (Codex MED)

2. tr/tr_api.h: introduce TR_STATE_ABI_VERSION (=2) constant. tr_state_t
   ABI bumped from layout v1 (Phase L-2) to v2 by appending AJRFT in
   this PR. In-tree consumers (tot_api_check_*, python/trlib/_ffi.py)
   rebuild from this header automatically; out-of-tree binary consumers
   (none currently) can compare the constant against any cached layout
   assumption. (Codex HIGH addressed via documentation; in-tree
   consumers do not need binary stability.)

3. python/trlib/state.py: docstring updated from "13 scalar plasma
   quantities" to "14 ... AJRFT" to match the SCALAR_FIELDS tuple.
   (in-house MED-1)

4. python/trlib/tests/test_external_driven_i.py: documented the
   1e-3 Q0-shift threshold rationale (~10x below observed ~1.5e-2).
   Prevents future silent down-tuning that would mask a regression.
   (in-house MED-2)

Verification:
- Layer 1 equivalence (demo2014 + ht6m): 2 PASS at 1e-10
- 5 new test_external_driven_i cases: 5 PASS
- Pipeline equivalence (X vs Y): 3 PASS
- Pathological edge case (I=1 + PECRW=0): no NaN, AJRFT=1.0 exact
- Canonical sweep (totlib + tot_mcp): 206 passed, 1 unrelated fail
  (documented Python 3.10 ExceptionGroup, pre-existing)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@k-yoshimi k-yoshimi merged commit e049a1e into chore/pre-push-hook-worktree-compat May 2, 2026
2 checks passed
@k-yoshimi k-yoshimi deleted the claude/2026-05-02-l7b-i-external-driven-i branch May 2, 2026 13:15
k-yoshimi added a commit that referenced this pull request May 4, 2026
…188)

* docs(spec): add L-7b-ii BPSD broker coupling verification design

Brainstorm-stage spec for L-7b-ii: orchestrator-level verification of
the eq -> tr BPSD coupling. MVP scope ("A-medium"):

  - new non-mutating Fortran helper tr_check_bpsd_pull (4-slot check)
  - C ABI export + Python Trlib.check_bpsd_pull() wrapper
  - CouplingRule extended with kind/verify fields
  - new ("eq","tr") verify rule in COUPLING_RULES
  - 3-layer test strategy (mock dispatch / unit / integration)

Codex independent reviewed in 4 rounds (overall design + sec 6 + sec 7 + sec 8); all
HIGH/MED findings addressed:

  - non-mutating BPSD pull (per-slot ierr accumulation, no TRCOMM mutation)
  - existing TotPipelineCouplingError reused (no new exception class)
  - 3-level exception chain via existing broad except
  - MODELG=0 + eq->tr -> CouplingError as acceptance criterion
  - wr-side BPSD speaker pending evaluation (L-7a divergence-risk
    judgement preserved; re-evaluation trigger documented)

Implementation risks (4) carry explicit fallbacks. Out-of-scope items
explicitly cross-reference L-7a / L-7b-i specs by section name (not
file:line) for churn resilience.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(spec): L-7b-ii self-review fixes (BS7)

  - replace MAX_NRHO placeholder with reference to existing BPSD
    callers' allocation pattern
  - reconcile test case count (12 -> 11; C-2 documented but not
    implemented per drop note)
  - clarify error message format uses callable_repr (with
    repr() fallback for non-qualname callables like partial)

No semantic change; corrects three minor inconsistencies found in
the brainstorming spec self-review pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(spec): L-7b-ii final Codex review fixes

Six fixes from final Codex full-spec review (HIGH 1, MED 3, LOW 2):

  HIGH: verify checks 3 eq-pushed slots only (device/equ1D/metric1D),
        not 4. plasmaf is tr's own BPSD output (tr_bpsd_put), absent
        on a fresh eq->tr pipeline -- including it would cause
        spurious False on first-time pipelines. Updated §1 data flow,
        §2 Fortran helper, §3 Python wrapper docstring, C prototype.

  MED-1: __post_init__ now validates transform is non-None for
        kind=transfer (was missing -- transfer dispatch unconditionally
        calls rule.transform(raw), would crash on None default).

  MED-2: AC6 clarified -- Layer C drop decision is plan-time, not
        post-MVP. Either Layer C is required (Eq wrapper sufficient)
        or removed from MVP scope (insufficient + follow-up PR).
        Removes the 'both required and droppable' ambiguity.

  MED-3: §8.4 risk #5 added: stale BPSD slot data across tests/runs.
        Layer A/B unaffected; Layer C uses --forked or per-test BPSD
        reset for isolation.

  LOW-1: line 76 estimate table 12 -> 11 cases (matches §7 count
        after C-2 drop).

  LOW-2: @DataClass(frozen=True) preserved in §4 dataclass; explicit
        note that __post_init__ raises ValueError only (no field
        mutation) so frozen remains compatible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(spec): L-7b-ii confirmation review fixes

3 missed fixes from confirmation review:

  HIGH (still-broken): 'all four' verbiage at line 126 + plasmaf in
        rule.doc string at lines 247-255 -- both fixed. Remaining
        plasmaf mentions are all explicit 'intentionally excluded'
        rationale (intentional, kept).

  MED-1 (partial): added type-narrowing asserts in §5 dispatch
        snippet -- assert rule.transform is not None / assert
        rule.dst_param is not None -- to satisfy mypy/pyright on the
        Optional[Callable] / Optional[str] fields. __post_init__
        guarantees they hold for kind=transfer.

  MED-2 (partial): §1 in-scope clarified -- Layer A+B unconditional,
        Layer C conditional on plan-time Eq wrapper sufficiency
        check (cross-references AC6 + §8.4 risk #2). Resolves the
        'integration coverage promised but droppable' contradiction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(spec): L-7b-ii final assert narrowing for rule.verify

One-line fix from final confirmation review: §5 dispatch verify
branch was missing 'assert rule.verify is not None' for static
checker narrowing on the Optional[Callable] field. Symmetric to
the transfer-branch asserts already present.

Codex verdict after this fix: READY-FOR-IMPLEMENTATION.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plan): add L-7b-ii BPSD broker coupling implementation plan

Implementation plan for the L-7b-ii spec
(docs/superpowers/specs/2026-05-03-l7b-ii-bpsd-broker-coupling-design.md).

4-phase commit pattern mirroring L-7b-i (PR #187):

  Phase 1: tr_check_bpsd_pull Fortran helper + C ABI + Python wrapper
  Phase 2: CouplingRule kind/verify extension + dispatch + ('eq','tr') rule
  Phase 3: 11 test cases across 3 layers + README updates
  Phase 4: pre-push gate + PR open + CI/Bugbot wait + squash merge

Plan-time risks resolved at write-time:

  - Eq wrapper sufficient (set_param/set_param_str/run/get_state all
    present in python/eqlib/eqlib.py); Layer C in scope
  - bpsd_get_data is INTENT(INOUT), self-allocates when nrmax=0;
    helper sets local%nrmax=0 explicitly
  - g_initialized/g_prepared lifecycle flags exist in tr/tr_api.f90:64-65

Plan-time risk deferred to implementation:

  - Layer C runtime (< 5s estimate unmeasured); fallback is
    @pytest.mark.slow gating per spec §8.4

Step granularity bite-sized (read context -> edit -> verify -> commit)
per writing-plans skill convention. Suitable for executing-plans or
subagent-driven-development.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plan): L-7b-ii Codex review fixes

Six fixes from Codex plan review (HIGH 2, MED 3, LOW 1):

  HIGH-1: pre-push pytest commands now include --forked --timeout=120
        --timeout-method=signal per CLAUDE.md non-negotiable. Note
        added on pytest-forked plugin install + documented exception
        when plugin unavailable.

  HIGH-2: reviewer agent type Agent(subagent_type='feature-dev:
        code-reviewer', ...) per CLAUDE.md. Fallback note for
        environments where feature-dev: is unavailable points to
        superpowers:code-reviewer.

  MED-1: Layer C --forked is now required (not advisory). Without
        BPSD broker isolation, C-3 (MODELG=0 expected failure) may
        spuriously pass after C-1 populated slots.

  MED-2: base SHA references updated f3dafae -> bdc0913 (the plan
        commit itself is the new chore tip, so executor will branch
        from it). Phase 0/1.5/2.4/3.6 squash bases corrected.

  MED-3: spec-coverage matrix entry for §7 MODELG fixture corrected
        to acknowledge deviation: plan uses local _eq_params_modelg()
        helper instead of mutating shared tot_demo2014_params.py
        (rationale: avoid Layer 1 baseline side effects). eqdata path
        IS still reused.

  LOW-1: Layer A case count clarified -- 11 logical cases (A-5
        implemented as 3 sub-tests), 13 individual pytest tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plan): align Phase 4.1 Step 2 --timeout to CLAUDE.md (=120)

Final fix from confirmation review: Phase 4.1 Step 2 used
--timeout=60 (inconsistent with Step 1/3 and CLAUDE.md CI flags).
Aligned to --timeout=120.

Plan now READY-FOR-EXECUTION per Codex.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plan): align all pytest commands to CLAUDE.md flags

5 additional pytest commands (Tasks 1.4, 2.2, 2.3, 3.1, 3.2 sanity
checks) updated to --forked --timeout=120 --timeout-method=signal,
matching CI workflow and Phase 4 pre-push gate. Per CLAUDE.md
non-negotiable: 'Local test: run the relevant pytest locally with
the SAME flags the CI workflow uses.'

All 9 pytest invocations in the plan now consistent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* tr+trlib: add tr_check_bpsd_pull BPSD broker verification helper (L-7b-ii)

New non-mutating Fortran helper tr_check_bpsd_pull pulls the 3
eq-pushed BPSD slots (device, equ1D, metric1D) into local
discardable types and reports ok = 1 iff all 3 per-slot ierr == 0.
plasmaf is intentionally NOT checked: it is tr's own BPSD output
(tr_bpsd_put), absent on a fresh eq->tr pipeline.

Implementation notes:
- Local types initialized with %nrmax = 0 to trigger BPSD's
  self-allocation path (per ../../bpsd/bpsd_equ1D.f90:143-150,
  bpsd_get_* is INTENT(INOUT) and allocates internally when
  the caller passes nrmax=0).
- Per-slot ierr accumulated to AND-condition (avoids masking
  earlier failures by later successes).
- TRCOMM untouched -- safe to call before or after tr.run().
- TR_STATE_ABI_VERSION not bumped (no struct change).

Exposed via C ABI (tr/tr_api.h) and Python wrapper
(Trlib.check_bpsd_pull). Smoke-tested: fresh Trlib() init
returns False (BPSD slots empty); existing trlib tests unchanged.

Spec: docs/superpowers/specs/2026-05-03-l7b-ii-bpsd-broker-coupling-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* pipeline: extend CouplingRule for verify rules (kind/verify dispatch) (L-7b-ii)

CouplingRule dataclass gains two new fields:

  kind: str = "transfer"   # "transfer" | "verify"
  verify: Optional[Callable[[Any], bool]] = None

Existing transfer-rule fields (src_state_key, dst_param, transform)
are defaulted to None and validated by __post_init__:

  * kind="transfer" requires all three transfer fields
  * kind="verify"   requires verify callable
  * unknown kind raises ValueError

@DataClass(frozen=True) preserved; __post_init__ only raises and
never mutates self, so frozen remains compatible.

run_pipeline rule iteration gains a kind branch:

  * transfer: existing extract -> transform -> set_param -> record
  * verify:   ok = rule.verify(curr_inst); raise CouplingError if
              False or if verify itself raised.
              applied.append only on success (matches transfer-rule
              "succeeded snapshot" semantics of PipelineStep.coupling_applied)

Both error paths flow through the existing broad except at
pipeline.py and become TotPipelineRunError with __cause__ chain
(3-level: RunError -> CouplingError -> OriginalError if any).

Scope note (L-7b-ii deferred at registry level):

  The original spec proposed registering an ('eq','tr') verify rule
  using BPSD as a cross-module broker (Trlib.check_bpsd_pull). During
  implementation we found that libeqapi.so and libtrapi.so each carry
  a private copy of the BPSD module-level state -- ___bpsd_equ1d_MOD_
  equ1dx is static (s) in nm output of both shared objects. Therefore
  bpsd_put from libeqapi.so does NOT propagate to libtrapi.so's
  bpsd_get, and the verify rule cannot succeed under the per-module
  TotPipeline architecture. The legacy Tot / libtotapi.so path keeps
  eq+tr+bpsd co-linked and is unaffected.

  Until a cross-.so sharing scheme (unified .so, IPC, or RTLD_GLOBAL
  with weak symbols) is decided, no ('eq','tr') rule is registered.
  The verify dispatch infrastructure is in place and mock-tested
  (Layer A) so that the rule can be added later without further
  changes to pipeline.py.

One existing test (test_run_pipeline_missing_source_state_raises_
coupling_error) was updated to pass an explicit transform=lambda v: v
to satisfy the new __post_init__ contract; it had previously
relied on the old identity default.

Existing ("fp","tr") transfer rule and Layer 1 baselines (demo2014,
ht6m at 1e-10) are unaffected.

Spec: docs/superpowers/specs/2026-05-03-l7b-ii-bpsd-broker-coupling-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test+docs: Layer A/B BPSD coupling verify tests + READMEs (L-7b-ii)

Adds 11 new test cases across 2 layers (Layer C deferred — see below):

  Layer A (test_pipeline_verify.py, no .so): 8 mock-based dispatch
    tests covering verify-True success, verify-False raise+cause-chain,
    verify-raise 3-level chain, mixed transfer+verify ordering,
    __post_init__ validation (3 cases), unregistered-pair silent skip.

  Layer B (test_bpsd_check.py, libtrapi.so): 3 unit tests for
    Trlib.check_bpsd_pull() -- fresh init returns False, closed
    Trlib raises TrlibError, no exception leak from Fortran.

Also updates test_pipeline_dataclasses::test_coupling_rule_defaults
to pass an explicit transform=lambda v: v: the previous version
relied on the old default `transform = lambda v: v`, which is no
longer a default per the new __post_init__ contract introduced in
the previous commit.

Layer C (eq + tr integration) deferred:

  実装中の調査で `libeqapi.so` と `libtrapi.so` が独立した .so で、
  それぞれが BPSD module-level state (`___bpsd_equ1d_MOD_equ1dx` ほか)
  を private に持つことが判明 (nm で確認、両 .so で static linkage)。
  spec が想定した「BPSD = eq/tr 共有ブローカー」は
  `Tot` / `libtotapi.so` (eq+tr+bpsd 同梱) では成立するが、
  `TotPipeline` (per-module .so) では成立しない。
  したがって ('eq','tr') verify ルールは本 PR では登録せず、
  Layer C 統合テストも保留 (実行しても push が反映されないので、
  C-1 は構造的にパスしえず、C-3 は誤った理由で pass する)。

  verify ディスパッチの機構自体は本 PR で完成済み (Layer A 網羅)
  なので、共有方針が決まり次第 ('eq','tr') ルールは
  COUPLING_RULES に 1 行追加するだけで有効化できる。
  詳細は totlib/README.md の Coupling rules 節と
  trlib/README.md の "BPSD ブローカー pull 検証" 節を参照。

README updates:

  python/totlib/README.md: documents the kind="transfer" / "verify"
    distinction in the Coupling rules section, plus a Japanese-language
    deferral note explaining the per-.so BPSD isolation finding.

  python/trlib/README.md: new "BPSD ブローカー pull 検証" section
    (Japanese) documenting Trlib.check_bpsd_pull() with usage example
    and the same scope caveat about per-.so isolation.

Existing Layer 1 baselines (demo2014, ht6m at 1e-10) and existing
pipeline tests are unaffected (the verify dispatch doesn't fire when
no rule is registered for the pair).

Spec: docs/superpowers/specs/2026-05-03-l7b-ii-bpsd-broker-coupling-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(L-7b-ii): address reviewer findings (MED + LOW polish)

Codex MED1 (older .so compatibility): wrap tr_check_bpsd_pull
ctypes prototype attachment in try/except, mirroring the existing
tr_validate pattern in _ffi.py. Older builds of libtrapi.so that
predate L-7b-ii will now defer the AttributeError to first call of
Trlib.check_bpsd_pull instead of failing at module import.

Codex MED3 (chain-depth accuracy): totlib/README's "3-level cause
chain" was inaccurate for the False-return path. Clarify that
False produces a 2-level chain (TotPipelineRunError ->
TotPipelineCouplingError, no original cause), while a verify()
exception produces the 3-level chain (... -> original exception).

In-house L-1 (docstring count): test_pipeline_verify.py header
said "Six cases" but defines 8 test functions because A-5 expands
into three separate validation tests. Updated the docstring to
clearly distinguish "six logical cases" from "eight test
functions" with a per-case description.

Codex MED2 (spec staleness): added §0 (in Japanese, matching
totlib/trlib README convention) at the top of the spec doc
recording the implementation-time deferral, summarising what
shipped vs deferred and listing candidate resolution paths
(shared .so / IPC / RTLD_GLOBAL+weak / serialisation). Status
field changed from "Draft" to "Partial".

Verification:
- Layer A (8) + Layer B (3) + dataclasses (8) + pipeline (30) +
  equivalence (2) = 51 passed; 1 pre-existing Py3.10
  ExceptionGroup test still fails (documented in plan).
- Trlib() smoke load + check_bpsd_pull still returns False
  on fresh init.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
k-yoshimi added a commit that referenced this pull request May 4, 2026
Brainstorm-stage spec for the 'Input files + extending tr'
deepening menu item from project_tr_proper_manual.md (F, sized
1 session). Two bilingual pages with separate audiences:

input-files.md (User guide, after parameter-setting) -- user-
facing reference for where data files live:
- eqdata / EQDSK: MODELG ∈ {3,5,7,8} triggers eq's external
  load via KNAMEQ; tr only pulls via BPSD broker
  (tr/trbpsd.f90:213-245).
- ufiles + MDLUF: reader chain trufile.f90 ->
  tr_ufile_task.f90 -> tr_ufile_topics.f90; MDLUF default 0.
- trmodels/-style external data: implementation step verifies
  whether any runtime directory is consumed; if not, the page
  states it explicitly rather than fabricating a layout.
- Path constraints: 80-byte limit on KNAMEQ (we hit this in
  the L-7b-ii session); recommend chdir + bare filename.

extending-tr.md (Internals, after design) -- maintainer-facing
how-to with 3 walkthroughs:
- Add a scalar parameter: 1 CASE in tr_param_registry.f90:76+,
  default in trinit.f90, rebuild. C ABI unchanged.
- Add a transport model under MDLKAI: SELECT CASE in
  trcoef_turbulence.f90:400+ following the numbering ranges
  documented at :392-398 (constant / drift-wave / Rebu-Lalla
  / CDBM / DW-ballooning / ITG).
- Add a TrState field: 7-step recipe spanning Fortran
  TRCOMM compute side, tr/tr_state.f90 + tr/tr_api.h
  (BIND-C struct + ABI version bump from 2 -> 3),
  python/trlib/_ffi.py + state.py mirrors. L-7b-i AJRFT
  field is the worked precedent (PR #187, commit e049a1e).

15 acceptance criteria; spec verifies factual claims via
file:line citations the implementation step re-checks.

Codex design-stage review pending after this commit; same
multi-round pattern G/E/A used earlier today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
k-yoshimi added a commit that referenced this pull request May 4, 2026
Internals reference, maintainer-facing. Three concrete
walkthroughs:

- Walkthrough A (5 steps) -- adding a scalar parameter:
  one CASE in tr/tr_param_registry.f90:76+, default in
  trinit.f90, rebuild. C ABI unchanged because
  tr_set_param is string-keyed
  (tr/tr_api.f90:124-128,136-148).

- Walkthrough B (4 steps) -- adding a transport model
  under MDLKAI: SELECT CASE in
  tr/trcoef_turbulence.f90:400 (NOT line 64 which only
  sets graph labels), following the numbering convention
  at :392-398.

- Walkthrough C (8 steps) -- adding a TrState field with
  ABI impact. Touches tr_state.f90 / tr_api.h /
  tr_api.f90 (zero-init :263-283, scalar copy :299-315,
  profile loops :321-330) / TR_STATE_ABI_VERSION bump /
  python/trlib/_ffi.py / python/trlib/state.py
  (SCALAR_FIELDS at :22-28, scalar comprehension at :87,
  dataclass construction at :92-100). The trap section
  warns about Fortran column-major / C row-major
  transposition for 2-D fields. AJRFT (L-7b-i, PR #187,
  e049a1e) is the worked example.

Each load-bearing claim has a tr/* / python/trlib/*
file:line citation verified across 3 Codex design-stage
review rounds.

Pairs with input-files.md (previous commit) to close out
item F in the deepening menu.

Spec: docs/superpowers/specs/2026-05-04-tr-input-files-and-extending-tr-design.md (63a1332)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
k-yoshimi added a commit that referenced this pull request May 10, 2026
PR #187 (e049a1e) added AJRFT to the libtrapi.so C state struct
and Python wrapper SCALAR_FIELDS, but missed two paired Phase-0
artefacts:
- tr/trregress.f90 USE TRCOMM list and WRITE statements
- test_run/scripts/extract_tr_metrics.py SCALAR_KEYS allowlist

The asymmetry made test_equivalence.py::TestEquivalence::test_iter01
fail at 1e-10 with `compare_metrics FAIL: scalars.AJRFT: missing`
because libtrapi.so output now has 14 scalars (incl AJRFT) but the
phase-0-generated baseline still had 13. CI did not surface this
because eqdata.ITER01 is not staged into test_output by the workflow,
so the equiv test SKIPped (per CLAUDE.md feedback_equivalence_must_pass
this is the invisibility pattern).

This commit:
- Adds AJRFT to trregress.f90 USE list and one WRITE line
- Adds "AJRFT" to extract_tr_metrics.py SCALAR_KEYS
- Regenerates test_run/baselines/tr_iter01/metrics.json from a
  fresh tr2 build on a Linux box (clavius), so the baseline includes
  AJRFT and is consistent with chore branch's libtrapi.so

Layer-1 equivalence (test_iter01) now PASSES at 1e-10 locally.

Out of scope for this commit:
- tr_tst2 baseline regen (eq_tst2 has a separate pre-existing eq drift
  ~3e-9 rel_err under develop's eq physics; needs its own investigation)
- python/trlib/state.py (already updated by PR #187)
- CI workflow changes to actually run run_tests.sh + equiv (orthogonal)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
k-yoshimi added a commit that referenced this pull request May 10, 2026
Per pre-push code review on 24b1b12: the extractor unit-test
fixture (test_run/scripts/tests/fixtures/sample_tr_regress.dat)
is the third paired artefact PR #187 missed. Without it, future
copy-paste from the fixture as a "what does a real dump look like"
reference would silently omit AJRFT, recreating the same asymmetry
class this PR is closing.

- Adds AJRFT=0.0E+00 line after AJT in sample_tr_regress.dat
- Adds assertIn("AJRFT", ...) and value-equals-0.0 assertion to
  test_extracts_scalars so the fixture stays locked

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
k-yoshimi added a commit that referenced this pull request May 11, 2026
…follow-up) (#193)

* docs(spec): TOT AJRFT triangle backfill design (#191)

Expands #191 scope beyond the issue's literal text (regress mirror only)
to cover the full TOT C ABI + Python wrapper triangle. Codex design-stage
review (2026-05-12) surfaced the gap: TR-side PR #187 added AJRFT to the
tr_state_c + tr_api + trlib triangle, but the parallel TOT triangle was
untouched — meaning the L-7b-i `EXTERNAL_DRIVEN_I → tr → AJRFT` pipeline
has zero coverage at the TOT exit even after the regress-mirror work is
done. Verify-only on acceptance #6 is only defensible after the TOT
triangle is closed.

12 modification points across 3 commits:
- C1: tot_state.f90 + tot_api.h + tot_api.f90 + _ffi.py + state.py +
      test_ffi.py + TOT_STATE_ABI_VERSION 2 (mirrors PR #187 ABI block)
- C2: totregress.f90 + extract_tot_metrics.py + 2 baseline regens on
      clavius (mirrors TR commit 24b1b12)
- C3: sample_tot_regress.dat + test_extract_tot_metrics.py
      (mirrors TR commit d17f71e)

Self-reviewed: corrected §4.1.3 (tot_api.f90 reaches AJRFT via
trstate%, not direct TRCOMM USE) and §5 commit-ordering rationale
(commits are functionally independent; ordered for PR #187 parity).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(spec): apply Codex spec-file review fixes (#191)

Second Codex review pass on the design spec (post-commit 1df5493)
surfaced HIGH + MED + LOW findings. All verified against live tree
and incorporated.

HIGH:
- §4.1.2 (tot_api.h ABI version placement): the comment "after line 9"
  was inside the header /* ... */ block (lines 8-32). Corrected to
  "between #define TOT_MAX_NSMAX (line 35) and enum tot_error (line
  37+)" with explicit sample content mirroring tr_api.h:27-38.

MED:
- §4.3.1 sample_tot_regress.dat line off-by-one (9 → 8); AJT= is on
  line 8 between WPT= (7) and Q0= (9).
- §4.1.6 expanded: TR PR #187 added a test_size_matches_header_math
  test (test_ffi.py:70-85) that asserts ctypes.sizeof(TrStateC)
  matches header math. Mirror this for TOT — name-only assertIn
  alone misses misplaced-field bugs that the sizeof check catches.
- §4.1.7 new subsection (components 6a-6d): 4 user-facing surfaces
  hard-code the 13-scalar count and become stale once SCALAR_FIELDS
  grows to 14 — python/totlib/README.md, tot_mcp/server.py, and en/ja
  sphinx state.md. Folded into C1 to keep the scalar-count narrative
  self-consistent.
- §5 commit-dependency claim corrected: C3 actually depends on C2.
  Without C2's SCALAR_KEYS addition, the extractor's allowlist filters
  AJRFT out before C3's assertIn assertion can see it. Ordering
  C1 → C2 → C3 is now stated as functional, not just review parity.

LOW:
- §4.1.6 field-order in test_has_expected_fields locked to end of
  tuple (after "QP") so a future sizeof/offset check can't disagree
  with the name list.

Spec scope grows from 12 → 16 modification points; commit shape
unchanged (3 commits) but C1 now carries 10 components (6 ABI/wrapper
+ 4 docs-parity).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(spec): apply Codex round-3 spec review fixes (#191)

Round-3 Codex review on `87734400` surfaced an incomplete doc sweep
in §4.1.7 (4 → 12 hardcoded "13 scalars" references), a missing test
verification path for docs, and two LOW polish items.

MED:
- §4.1.7 reframed as a "doc-parity sweep" with 12 known hits (was 4):
  added totlib.py:319 (get_state docstring), server.py:831 (second hit
  beyond the schema description), en/state.md body line 37, en/
  applications.md:154, ja/state.md table row :102, ja/index.md:32,
  ja/applications.md:151, tot-library/architecture.md:116. Now driven
  by a single explicit grep + verification table.
- §7.4 added: post-impl grep returns 0 TOT-side hits + sphinx
  `make html` smoke-build catches markup errors in the new table rows.
- §5 commit-shape row C1 updated: components now "1, 2, 3, 4, 5, 6, 6a
  (doc sweep, 12 lines across 9 files)".

LOW:
- §4.1.2 sample ABI comment block adds the in-tree-consumers/build-
  system sentence from `tr_api.h:30-32` (faithful TR mirror).
- §8 risk table: added doc-drift + sphinx-markup risk rows.

Scope claim revised: 13 modification points + 1 doc-sweep (12 lines
across 9 files) + ABI bump + 3 commits. Group 1 = 6 ABI/wrapper +
1 doc sweep. Commit shape (3 commits) unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(spec): apply Codex round-4 spec review fixes (#191)

Round-4 Codex review on 5128997 returned PASS on checks 1-5 and
surfaced ONE MED finding.

MED:
- python/totlib/tests/test_totlib.py:135-185 (TestTotStateFromC)
  hardcodes a populated TotStateC instance + asserts a subset of
  scalars; the existing test silently passes when AJRFT is added to
  _fields_ because the field defaults to 0.0 and is never asserted.
  This silently loses coverage on the TotStateC -> TotState.from_c
  -> scalars["AJRFT"] round-trip.

Fix: new §4.1.8 (component 6b) prescribes two paired edits:
  - _populated_state sets s.AJRFT = 1.5
  - test_from_c_slices_correctly asserts st.scalars["AJRFT"] == 1.5

This is analogous to TR's d17f71e for the extractor, applied to the
totlib from_c round-trip path. Folded into C1 (same triangle as
_ffi.py + state.py).

Also updated §4 header (14 modification points), §4.1 header
(7 components + 1 doc sweep), §5 commit-shape C1 row, §7.1 to
include the new TestTotStateFromC pytest invocation, and §10.4
reviewer trail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(spec): apply Codex round-5 spec review fixes (#191)

Round-5 Codex confirms diminishing returns: checks on §1-3, §6, §9,
§10, and cross-section consistency mostly PASS. Found:

MED:
- §6 acceptance-mapping line 423 still listed components "1-6 + 6a-6d"
  — stale from R2 before §4.1.7 was consolidated into a single 6a
  (doc sweep) and §4.1.8 added 6b (test_totlib round-trip). Updated
  to "1-6 + 6a + 6b".

LOW:
- §4.1 line 88 prose "All six components are interdependent" was
  stale after 6a + 6b were added — rewrote to clarify that 1-6 form
  the ABI triangle (interdependent) while 6a and 6b are co-shipped
  to keep narrative/coverage in lockstep.
- §6 reviewer-history sentence said "four user-facing scalar-count
  surfaces" — stale from R2; rewrote to reflect the R3 expansion
  to 12 hits across 9 files plus R4's §4.1.8 addition.

§10.4 reviewer trail updated with R5 entry. R5 also surfaces a META
finding (issue #191 body should be updated before C1 lands to
reflect the expanded scope) — tracked as a manual user task, not
spec content.

Per Codex: no other material findings. §3 non-goal still defensible,
§9 out-of-scope still consistent, §10 memory refs still relevant,
no stale component counts elsewhere (grep clean).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plan): TOT AJRFT triangle backfill implementation plan (#191)

1014-line bite-sized task plan for the spec
`docs/superpowers/specs/2026-05-12-tot-ajrft-triangle-design.md`
(HEAD 3c2da8e). Breaks the 3-commit workflow into ~25 numbered
tasks across 4 phases:

- Pre-flight (worktree + toolchain sanity)
- Phase C1 (12 tasks): TDD-style C ABI + Python wrapper + 12-line
  doc sweep + test_ffi sizeof + test_totlib round-trip
- Phase C2 (6 tasks): regress code + 2 baseline regens on clavius
  (SSH workflow detailed)
- Phase C3 (3 tasks): extractor fixture + unit-test assertion
- Phase F (5 tasks): pre-push gate (parallel reviewers + REVIEW_OK
  marker) + push + PR create

Each task lists exact files + line ranges + before/after code
snippets + verification command + expected output. Acceptance
checklist at end cross-references issue #191 items 1-6 to plan
tasks.

Self-review: spec coverage complete (each §4 component maps to
1+ tasks), zero placeholders, names consistent across tasks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tot): add AJRFT to tot_state ABI + Python wrapper (#191 / PR #187 follow-up)

Mirror of PR #187's TR-side AJRFT triangle on the TOT orchestrator
struct. Closes the totlib pipeline path of issue #191's L-7b-i
invisibility class (the regress-dump path is closed by C2/C3).

C ABI:
- tot/tot_state.f90: append REAL(C_DOUBLE) :: AJRFT at end of struct
- tot/tot_api.h: add #define TOT_STATE_ABI_VERSION 2 + matching
  double AJRFT field; ABI version comment block mirrors tr_api.h:27-38
- tot/tot_api.f90: add zero-init + copy-from-trstate%AJRFT lines

Python wrapper:
- python/totlib/_ffi.py: ("AJRFT", c_double) at end of TotStateC._fields_
- python/totlib/state.py: "AJRFT" in SCALAR_FIELDS, docstring 13 -> 14

Tests:
- python/totlib/tests/test_ffi.py: AJRFT in test_has_expected_fields
  + new test_size_matches_header_math (mirrors TR test_ffi.py:70-85,
  adjusted for 7 ints with 28/32 byte padding tolerance)
- python/totlib/tests/test_totlib.py: TestTotStateFromC._populated_state
  sets s.AJRFT = 1.5; test_from_c_slices_correctly asserts it
  round-trips through TotState.from_c (mirrors d17f71e's
  test_extract_tr_metrics.py treatment, applied to the from_c path)

Doc parity sweep (12 lines / 9 files):
- python/totlib/totlib.py, README.md
- python/mcp-servers/tot_mcp/server.py (x2 lines)
- docs/sphinx/modules/tot/en/{state,applications}.md
- docs/sphinx/modules/tot/ja/{state,applications,index}.md
- docs/tot-library/architecture.md
All bump "13 scalars" -> "14 scalars (incl. AJRFT)" and add AJRFT
row to scalar tables where present.

Post-impl grep returns 0 TOT-side hardcoded "13 scalars" hits.
Sphinx HTML smoke-build succeeds.

Spec: docs/superpowers/specs/2026-05-12-tot-ajrft-triangle-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test+tot: backfill AJRFT in regression dump (#191 / PR #187 follow-up)

Mirrors TR-side commit 24b1b12, applied to the TOT regress path:
- tot/totregress.f90: AJRFT in USE TRCOMM list + WRITE line between
  AJT and Q0 (inside the TR_OK guard, so AJRFT is dumped only when
  TR's allocatable arrays are present, matching the existing scalars)
- test_run/scripts/extract_tot_metrics.py: "AJRFT" in SCALAR_KEYS

Baselines (tot_demo2014_short, tot_ht6m_short) will be regenerated
on clavius and added in a follow-up commit/amend (see spec §4.2.3/4).
Without the regen, ./test_run/run_tests.sh would fail at the schema
comparison because the new tot_regress.dat has AJRFT and the cached
baseline does not.

Spec: docs/superpowers/specs/2026-05-12-tot-ajrft-triangle-design.md §4.2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: backfill AJRFT in extractor unit-test fixture (#191)

Mirrors the TR-side commit d17f71e (`test: backfill AJRFT in
extractor unit-test fixture (PR #187 follow-up)`): the TOT
extractor's unit-test fixture is the analogous third paired
artefact, and without it future copy-paste of sample_tot_regress.dat
as a "what does a real dump look like" reference would silently
omit AJRFT, recreating the asymmetry class this PR is closing.

Depends on C2 (commit 6dec5cd), which already wired AJRFT into
totregress.f90's WRITE block and extract_tot_metrics.py's
SCALAR_KEYS — so the extractor already knows the key; this commit
only locks it into the unit-test fixture.

- Adds AJRFT=0.0E+00 line after AJT in sample_tot_regress.dat
- Adds `assert "AJRFT" in data["scalars"]` and value-equals-0.0
  assertion to test_extracts_tr_scalars so the fixture stays locked

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(tr): backfill stale 13->14 scalar references (PR #187 follow-on)

PR #187 (e049a1e, 2026-05-02) added AJRFT to the tr_state_c C ABI
+ Python wrapper but missed the docs/MCP-schema strings that hard-code
the 13-scalar count. The asymmetry surfaced when this PR's TOT-side
doc sweep asserted "TrState has the same 14 scalars as TotState",
contradicting TR's own docs that still said 13.

8 stale references fixed across 5 files:
- docs/sphinx/modules/tr/en/state.md (heading + body line + add table row)
- docs/sphinx/modules/tr/ja/state.md (heading + body line + add table row)
- docs/sphinx/modules/tr/en/design.md (single line bump)
- docs/sphinx/modules/tr/ja/design.md (single line bump)
- python/mcp-servers/tr_mcp/server.py (schema string + docstring)

Post-impl grep returns 0 TR-side stale hits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(presentations): backfill 13->14 in trlib slide builder + cached md (#191)

Final cumulative grep after C4 surfaced 3 residual "13 個" hits in
docs/presentations/ that escape the sphinx modules sweep:
- _build_trlib_usage.py:590 (TrState.scalars enumeration string)
- _build_trlib_usage.py:653 (speaker notes string)
- 2026-04-20-trlib-python-usage.md:191 (cached speaker notes)

While the markdown file is date-stamped, both files are git-tracked
active reference content (the .py is a re-runnable slide builder;
the .md is browsed as current trlib usage doc). Codex post-push
review (2026-05-12) flagged these as merge blockers for the same
reason C4 fixed the sphinx/MCP sweep — readers comparing TOT vs TR
get inconsistent counts otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
k-yoshimi added a commit that referenced this pull request May 12, 2026
CI surfaced what #192's eq-mirror fallback was designed to expose:
test_tst2 now actively runs (no longer silent SKIP) and fails with
`scalars.AJRFT: missing` because `test_run/baselines/tr_tst2/
metrics.json` predates PR #187 (AJRFT addition).

#190 tracks the baseline regen, which is in turn blocked by
upstream eq_tst2 drift (~3e-9 > 1e-10). Until that chain resolves,
mark test_tst2 as xfail(strict=True) so:
- CI is green again (this PR's #192 goal preserved for tr_iter01)
- When #190 closes and the baseline is regenerated to include
  AJRFT, the xfail flips to XPASS and CI fails red, forcing
  removal of this decorator (CLAUDE.md narrow exception protocol).

This is the documented contingency path from spec §6.4, except the
cause is not gfortran drift — it's a pre-existing baseline staleness
that was hidden by the silent SKIP this PR fixed. In a real sense
#192 is now doing its job: invisibility → visibility.

test_iter01 still PASSes (its baseline was regenerated in 24b1b12
to include AJRFT).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
k-yoshimi added a commit that referenced this pull request May 12, 2026
…195)

* docs(spec): trlib test_equivalence eq-mirror fallback design (#192)

Approach A2 (eq-mirror fallback) replaces the original A1 (CI step
fixture-copy). Pivot driven by Codex spec review #1 finding: eq's
test_equivalence.py:197-205 ALREADY has the FIXTURES_DIR fallback;
trlib is missing it. Adding the same fallback to trlib is the root-
cause fix vs adding a CI shim.

Scope (3 files, 1 commit):
- python/trlib/tests/test_equivalence.py: add FIXTURES_DIR constant
  + replace SKIP-only logic with two-tier fallback mirroring eq:53,
  197-211 verbatim.
- python/trlib/tests/fixtures/eqdata.ITER01 (NEW, 45956 B): copy of
  python/eqlib/tests/fixtures/eqdata.ITER01 (eqdata.TST-2 mirror is
  already present in trlib/tests/fixtures/).
- .gitignore: add !python/trlib/tests/fixtures/eqdata.ITER01 negation
  next to the existing TST-2 negation.

Out of scope (follow-ups):
- fp/wr/wrx/ti silent-SKIP audit (different mechanism, no KNAMEQ)
- eq.x build for CI regen / eq-physics drift detection

Drift risk (Codex MED): CI gfortran-13.2 vs baseline's 13.3 may
produce > 1e-10 drift; §6.4 documents the contingency regen path.

Reviewer trail: brainstorming → Codex spec review #1 → pivot A1→A2
→ this spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plan): trlib test_equivalence eq-mirror fallback impl plan (#192)

Bite-sized task plan for the spec at
docs/superpowers/specs/2026-05-12-ci-tr-equiv-staging-design.md
(committed be5245e). 8 numbered tasks across 2 phases:

- Pre-flight (worktree + branch + current-SKIP-state baseline)
- Phase 1 (5 tasks): copy eqdata.ITER01 fixture, .gitignore
  negation, FIXTURES_DIR + 2-tier fallback in test_equivalence.py,
  local sanity, commit C1
- Phase 2 (3 tasks): bounded pytest + parallel reviewers (in-house
  + Codex) on diff + REVIEW_OK marker + push + PR create + Bugbot
  trigger

Each task has exact files/lines, before/after snippets, verification
commands, expected outputs. Acceptance maps to issue #192 #1-#3.

Self-review: spec coverage complete, no placeholders, names
consistent across tasks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(trlib): fallback to committed eqdata fixture so test_equivalence runs (#192)

Mirror python/eqlib/tests/test_equivalence.py's two-tier eqdata
fallback into python/trlib/tests/test_equivalence.py so
TestEquivalence::test_{iter01,tst2} run on CI (and on any fresh
checkout) instead of silently SKIPping. Closes the invisibility
gap that let PR #187 (L-7b-i AJRFT) pass CI for ~10 days with a
real 1e-10 failure under chore branch's local test.

Three changes:
- python/trlib/tests/test_equivalence.py: add FIXTURES_DIR constant
  (line 46 area, between TEST_OUTPUT_DIR and COMPARE_SCRIPT) +
  replace SKIP-only logic in _check_case with the two-tier fallback
  (prefer Phase-0 runner output at TEST_OUTPUT_DIR/<case>/<KNAMEQ>,
  fall back to FIXTURES_DIR/<KNAMEQ>, only SKIP if both absent).
  Verbatim structural mirror of eqlib test_equivalence:53, 197-211.
- python/trlib/tests/fixtures/eqdata.ITER01 (new, 45956 B): exact
  copy of python/eqlib/tests/fixtures/eqdata.ITER01. The TST-2
  mirror already lives at python/trlib/tests/fixtures/eqdata.TST-2.
- .gitignore: add !python/trlib/tests/fixtures/eqdata.ITER01
  negation, adjacent to the existing TST-2 negation.

Out of scope (follow-up issues to be filed):
- fp/wr/wrx/ti silent-SKIP audit (different mechanism, no KNAMEQ).
- CI-side eq.x build for fresh eqdata regen (eq-physics drift
  detection); the current fixture-trust model mirrors the project's
  existing convention.

Spec: docs/superpowers/specs/2026-05-12-ci-tr-equiv-staging-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(trlib): xfail test_tst2 pending #190 baseline regen

CI surfaced what #192's eq-mirror fallback was designed to expose:
test_tst2 now actively runs (no longer silent SKIP) and fails with
`scalars.AJRFT: missing` because `test_run/baselines/tr_tst2/
metrics.json` predates PR #187 (AJRFT addition).

#190 tracks the baseline regen, which is in turn blocked by
upstream eq_tst2 drift (~3e-9 > 1e-10). Until that chain resolves,
mark test_tst2 as xfail(strict=True) so:
- CI is green again (this PR's #192 goal preserved for tr_iter01)
- When #190 closes and the baseline is regenerated to include
  AJRFT, the xfail flips to XPASS and CI fails red, forcing
  removal of this decorator (CLAUDE.md narrow exception protocol).

This is the documented contingency path from spec §6.4, except the
cause is not gfortran drift — it's a pre-existing baseline staleness
that was hidden by the silent SKIP this PR fixed. In a real sense
#192 is now doing its job: invisibility → visibility.

test_iter01 still PASSes (its baseline was regenerated in 24b1b12
to include AJRFT).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant