test(tr): Phase 0 - regression test infrastructure#2
Merged
Conversation
- specs: TR module refactoring design (Option B: structure + danger-spot modernization) - plans: Phase 0 regression test infrastructure with env-guarded high-precision dump - Parked on feature/tr-regression-phase0-docs pending dealloc cleanup merge Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add trregress module providing tr_regress_dump_if_enabled. When the environment variable TR_REGRESS_DUMP=1 is set, tr_loop writes key global quantities (WPT, AJT, Q0, BETAn, TAUEn, ZEFF0, ALI, RQ1) and radial profiles (RN, RT, AJ, QP) to tr_regress.dat in 1PE24.16 format at the end of the transport calculation loop. When the variable is unset, the routine returns immediately so normal runs are unaffected. This is Phase 0 Task 3 of the TR regression test infrastructure. Run-to-run bit-exact reproducibility verified. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pytest is not installed on the target environment. Switch to unittest so the regression suite runs on a vanilla Python 3 install.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Baselines captured with: - tr/tr2 built at commit a361df2 with TR_REGRESS_DUMP=1 env-guarded dump - gfortran OFLAGS = -g -O3 -m64 -std=legacy - Verified run-to-run reproducible within 1e-10 tolerance across 3 consecutive runs.
Owner
Author
|
@cursor review |
3 tasks
… passing Cursor Bugbot flagged that `_rel_err(41.13, inf)` evaluates to NaN (`inf / inf`) and `NaN > tol` is False, so a regression that produces `inf` would be silently accepted. Guard `_check_scalar` with `math.isinf` the same way `math.isnan` is guarded: identical infinities still match, but any finite-vs-inf or +inf/-inf mismatch is reported. Add three unittest cases covering actual-is-inf, baseline-is-inf, and matching-infs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
|
Cursor Bugbot の指摘に対応しました。 Fix: Tests: @cursor review |
Owner
Author
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit fe94c1e. Configure here.
This was referenced Apr 18, 2026
k-yoshimi
added a commit
that referenced
this pull request
Apr 28, 2026
Bugbot's second pass on PR #178 surfaced 4 issues that survived the first round of LOW fixes (e1e5a37). All four addressed in this commit, with targeted regression tests: #1 (LOW) PipelineResult.to_dict() now returns defensive copies of inner scalars dicts and coupling_applied lists. Previously, MCP- side mutation of the JSON payload would propagate into the originating PipelineStep. New test_to_dict_returns_defensive_copies asserts the contract. #2 (LOW) _validate_steps now accepts list-of-2-element steps in addition to tuple-of-2-element. JSON-decoded MCP payloads deserialise tuples as lists, so the API now round-trips through to_dict() without manual coercion. New test_run_pipeline_accepts_list_shaped_steps regression guard. #3 (LOW) set_param now explicitly rejects bool inputs with TotPipelineCouplingError instead of silently coercing to 1.0/0.0 via the numeric branch (Python's bool is a subclass of int). The previous test_set_param_records_bool_as_float (which documented the silent-coercion behavior) is replaced with test_set_param_rejects_bool that asserts the explicit rejection. #4 (MED) ExceptionGroup compatibility shim for Python 3.10. The close() multi-failure path uses builtin ExceptionGroup (3.11+); tot_mcp's pyproject.toml declares requires-python=">=3.10". The shim probes for the builtin first, falls back to the exceptiongroup PyPI backport if available, and degrades gracefully to raising errors[0] only when neither is reachable (same as pre-fix behavior). CI runs 3.11+ so the existing ExceptionGroup test still passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
k-yoshimi
added a commit
that referenced
this pull request
Apr 28, 2026
* docs(spec): add L-7a cross-module coupling design Design for incremental Python-side scalar coupling between fp and tr in the tot orchestrator. Establishes TotPipeline class skeleton in a new file (python/totlib/pipeline.py) without touching the legacy Tot wrapper. Covers fp -> tr driven current (RJT volume integral -> PNBCD scalar) only; profile/coordinate transforms deferred to L-7b via BPSD. Two Codex review passes: 9 findings (HIGH 2 / MED 5 / LOW 1) all addressed inline. Pre-flight Research R1 (singleton coexistence + finalize/init state-leak) / R2 (rjt_volint extraction) / R3 (tr param name confirmation) gated before implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plan): add L-7a cross-module coupling implementation plan Bite-sized TDD task list executing the L-7a design spec (0a12bd0) across 4 phases / 16 tasks / 3 sub-PRs: - Phase 0: Pre-flight research R1/R2/R3 (singleton coexistence, rjt_volint extraction, tr driven-current param confirmation) - Phase 1 (PR 1): TotPipeline class + errors + mock unit tests - Phase 2 (PR 2): compute_rjt_volint helper + COUPLING_RULES population + fp -> tr 1e-10 equivalence test - Phase 3 (PR 3): run_pipeline MCP tool + docs Each PR phase ends with the CLAUDE.md pre-push gate (code-reviewer + codex:codex-rescue + REVIEW_OK marker + --forked --timeout=120 --timeout-method=signal pytest). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(spec): record L-7a R1 (singleton coexistence) outcome R1-a (concurrent live) and R1-b (re-init state leak) verified per the L-7a design spec. Outcome and fallback decision recorded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(spec): record L-7a R2 (rjt_volint extraction) outcome R2 PASS. compute_rjt_volint(state, *, R0, a) verified against fp internal aggregate rtotalIP at 1e-10 (sanity case NRMAX=10, RR=3, RA=1, default fixture: helper -2.769528e-13 MA == fp stdout -2.7695E-13 MA). FpState shape characterized: top-level dataclass attrs [nrmax, nsamax, npmax, nthmax, ntg2, timefp, RNT/RWT/RTT/RJT/RPCT/RPWT]; **no `.scalars` dict** (asymmetric to TrState — R1 concern #3 confirmed). FpState exposes no grid info (RG/DV/VOLR/RA/RR absent), so helper takes R0, a as kwargs and assumes uniform rho-grid in [0,1] (matches fp/fpprep.f90:97-99). RJT units: [MA/m^2] (matches RJS in fp/fpcale.f90:290 where RJ_P = RJS*1.D6 converts to A/m^2). Helper applies *1e6 internally and returns total I in [A]. fplib unchanged — helper lives in totlib/pipeline.py (Phase 2 Task 2.1). Probe stdout (truncated): FpState attributes: RJT/RNT/RPCT/RPWT/RTT/RWT (list[list[float]]), npmax=50, nrmax=1, nsamax=1, ntg2=2, nthmax=50, timefp=0.01. Grid candidates RG/DV/VOLP/RM/RMAX/DR/RA/RR/RB: ALL ABSENT. Scalar candidates AJT/AJT_driven/.../scalars: ALL ABSENT. Sanity check (NRMAX=10): fp stdout: total plasma current [MA] -2.7695E-13 helper: compute_rjt_volint = -2.769528e-07 A = -2.769528e-13 MA (1e-10 match) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(spec): record L-7a R3 (tr driven current param) outcome PARTIAL PASS: PNBCD is NOT in tr_param_registry. Registered scalar candidates are PICCD, PECCD, PLHCD (dimensionless current-drive factors) and PLHTOT (LH input power [MW]). Selected PLHCD as L-7a skeleton target with documented unit mismatch (Amperes -> dimensionless factor); rigorous physical conversion deferred to L-7b. Picked a = tr.RA for compute_rjt_volint(state, R0=tr.RR, a=tr.RA) because fp's radial mesh is RA-normalized (fpcale.f90:34, 56). Active-drive fixture for Phase 2.3 equivalence test: fp.set_param('E0', 0.001) # induction E-field [V/m] yields ~0.946 MA (vs ~1e-13 MA for default), well above the 1e-10 equivalence-test floor. MODEL_WAVE+PABS_LH path failed with rc=3 and needs additional setup; E0 is the minimum-cost active-drive trigger. Open issue surfaced: tr lacks a registered scalar that semantically accepts an external driven current in MA. L-7b should add such a param (or expose PNBCD via the registry). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(l7a): R4 decision gate — accept skeleton coupling, update spec/plan R3 confirmed PNBCD is not registered in tr_param_registry.f90; only PLHCD/PICCD/PECCD (dimensionless multipliers) accept set_param. Per user decision (option A), L-7a proceeds as API-plumbing skeleton with PLHCD as the dst_param; physical fidelity (proper EXTERNAL_DRIVEN_I scalar in tr) is moved to L-7b follow-ups. Spec updates: - §2 table: PNBCD -> PLHCD with skeleton-coupling caveat - §3 non-goals: explicit "physical fidelity is L-7b scope" - §5.2 CouplingRule: src_state_key now (state, params) 2-arg callable - §5.3 PipelineResult + new _state_to_scalars adapter (handles FpState which lacks the .scalars dict per R1/R2) - §5.4 TotPipeline: self._params tracking for callable rules - §6.2 run_pipeline: uses _state_to_scalars + records dst_param injections back into _params - §12 follow-ups: tr EXTERNAL_DRIVEN_I_MA addition + fplib .scalars property consideration Plan updates: - Task 1.4: __init__ adds self._params; set_param records into it - Task 1.5: _extract_source 2-arg callable + _state_to_scalars helper + 5 new tests (callable+params, set_param tracking, adapter) - Task 2.1: compute_rjt_volint(state, *, R0, a) signature with the R2-derived implementation pasted in - Task 2.2: COUPLING_RULES uses lambda wrapper pulling tr:RR/tr:RA from params; dst_param=PLHCD; documents skeleton nature - Task 2.3: equivalence test uses E0=0.001 active-drive fixture (R3), PLHCD instead of PNBCD, _state_to_scalars adapter, plus a sanity guard that the integral magnitude is non-trivial (>= 1e3 A) - Test count expectations updated throughout Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(totlib): add TotPipeline error hierarchy Five new exception classes for the L-7a pipeline: - TotPipelineError (base, extends TotlibError) - TotPipelineUnknownModuleError, TotPipelineCouplingError, TotPipelineLifecycleError - TotPipelineRunError carrying partial_result + failed_step_index + failed_module attributes for debugging mid-pipeline failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(totlib): add CouplingRule/PipelineStep/PipelineResult dataclasses Pure data carriers for the run_pipeline orchestrator. CouplingRule is frozen (immutable) so the registry can be hashable in the future. PipelineResult.last(module) returns the most recent matching step, to_dict() is the MCP serialization format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(totlib): match typing convention from surrounding modules Code-quality reviewer flagged divergence: pipeline.py used PEP 585 (dict[...], list[...]) while totlib/state.py and sibling state.py files use Dict[...], List[...] from typing. Match the surrounding codebase. Also addresses two reviewer nits: explicit "frozen=True rationale" docstring on CouplingRule, and explicit shallow-ref caveat docstring on PipelineResult.to_dict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(totlib): add module registry + lazy import helpers _MODULE_REGISTRY maps short names (fp/tr/eq/wr/wrx/ti) to (package, wrapper_class, base_error_class) triples. _import_wrapper / _import_module_error keep the lazy-load contract from spec section 5.5: only modules used in the current pipeline get their lib*.so / errors.py loaded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(totlib): correct ti wrapper class name + add parametrized registry tests Code-quality reviewer flagged a coverage gap: only fp was exercised by the registry-resolves-class tests. Adding parametrized tests across all 6 modules immediately surfaced a real bug: the ti wrapper class is `TiLib` (capital L), not `Tilib` as the registry claimed. The sibling `TilibError` IS lowercase, so ti is the only entry where the wrapper-class and error-class casing diverge — easy to miss. Fixed registry entry; added a comment flagging the asymmetry. Spec and plan updated to match. Now 31/31 pipeline tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(totlib): add TotPipeline lifecycle (init/set_param/close) Implements the lifecycle subset of TotPipeline: - __init__ creates empty _modules / _params / _closed=False; opens nothing. - _ensure_module lazy-instantiates a wrapper on first reference, gated by the closed flag. - set_param routes strings to set_param_str, numerics to set_param, and records the value into _params after the wrapper accepts it (so failed wrapper validation keeps _params consistent). - close is idempotent and finalizes all modules even when one raises; the first exception is re-raised so the root cause is visible. - Context manager support via __enter__/__exit__. run_pipeline / _extract_source / COUPLING_RULES are deferred to Task 1.5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(totlib): guard set_param_str when module wrapper lacks the method Code-quality reviewer flagged that wrlib.Wrlib, wrxlib.Wrxlib, and tilib.TiLib do not expose set_param_str (only fp/tr/eq do). Routing a string value through TotPipeline.set_param previously raised a bare AttributeError leaking from getattr. Now raises a typed TotPipelineCouplingError with a clear message identifying the module. Added a regression test using a mock wrapper without set_param_str. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(totlib): implement TotPipeline.run_pipeline orchestrator - _state_to_scalars adapter (handles fp's missing .scalars dict + the standard .scalars-bearing modules uniformly). - COUPLING_RULES registry (empty at this commit; ('fp','tr') is filled in Phase 2 after R2/R3 outcomes are baked into pipeline.py). - _validate_steps performs side-effect-free pre-flight checks. - _extract_source resolves callable src_state_key with (state, params) signature; string keys go through _state_to_scalars. - run_pipeline iterates steps, applies coupling rules between adjacent pairs, and wraps per-module errors as TotPipelineRunError with the partial PipelineResult attached for debugging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(totlib): export TotPipeline + dataclasses at package level Backwards-compatible: legacy 'from totlib import Tot' / 'TotState' imports continue to work. New 'from totlib import TotPipeline, CouplingRule, PipelineStep, PipelineResult' is added for L-7a users. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(totlib): broaden run_pipeline catch + cover Codex review gaps Codex independent review (2026-04-28) flagged 1 MED + 2 LOW gaps the prior reviewer missed: MED1: run_pipeline's except clause was (base_err, TotPipelineCouplingError), which let TypeError/ValueError from wrong kwargs escape without TotPipelineRunError wrapping — so partial_result was lost. Broadened to `except Exception`. KeyboardInterrupt and SystemExit still propagate (they are BaseException, not Exception). The lazy `_import_module_error(name)` call is kept for forward-compat debugging clarity (annotated noqa). LOW2: test_set_param_records_into_params_dict only exercised the numeric path of set_param. Added a string-value entry so the string path's _params bookkeeping is also verified. LOW3: test_run_pipeline_partial_failure_carries_partial_result did not assert on coupling_applied. Added an empty-list assertion (regression guard against silently dropping coupling metadata). Plus a new test test_run_pipeline_typeerror_from_bad_kwargs_is_wrapped that exercises the broadened catch path directly. 61 tests pass under --forked --timeout=120 --timeout-method=signal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(totlib): address PR #178 Bugbot LOW findings Three independent fixes surfaced by the Cursor Bugbot pass on PR #178: 1. close() no longer silently drops secondary finalize errors (pipeline.py:259-269). When more than one module's close() raises, the caller now sees an ExceptionGroup carrying every collected error instead of just errors[0]. Single-failure path is unchanged (the original exception type is preserved). Tested via the new test_close_raises_exception_group_when_multiple_modules_fail. 2. set_param's _params bookkeeping no longer diverges from the value the wrapper actually received (pipeline.py:235-249). Numeric inputs are coerced via float() at the boundary, but _params used to store the original (e.g. True or int 2). Coupling rules reading self._params now see the same float the underlying module saw. Existing test_set_param_records_into_params_dict updated for the int → float coercion; new test_set_param_records_bool_as_float pins the bool path explicitly. 3. run_pipeline no longer calls _import_module_error per step (pipeline.py:329-336). The call had become dead after the broad "except Exception" catch in commit 941f353 — base_err was F841'd noqa'd, the comment misleadingly claimed "lazy-import semantics," and the call still triggered an .errors module import per step. _validate_steps already gates on _MODULE_REGISTRY membership pre-execution, and _ensure_module raises on bad names, so removing the dead call doesn't reduce safety. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(totlib): address PR #178 remaining Bugbot findings (4 items) Bugbot's second pass on PR #178 surfaced 4 issues that survived the first round of LOW fixes (e1e5a37). All four addressed in this commit, with targeted regression tests: #1 (LOW) PipelineResult.to_dict() now returns defensive copies of inner scalars dicts and coupling_applied lists. Previously, MCP- side mutation of the JSON payload would propagate into the originating PipelineStep. New test_to_dict_returns_defensive_copies asserts the contract. #2 (LOW) _validate_steps now accepts list-of-2-element steps in addition to tuple-of-2-element. JSON-decoded MCP payloads deserialise tuples as lists, so the API now round-trips through to_dict() without manual coercion. New test_run_pipeline_accepts_list_shaped_steps regression guard. #3 (LOW) set_param now explicitly rejects bool inputs with TotPipelineCouplingError instead of silently coercing to 1.0/0.0 via the numeric branch (Python's bool is a subclass of int). The previous test_set_param_records_bool_as_float (which documented the silent-coercion behavior) is replaced with test_set_param_rejects_bool that asserts the explicit rejection. #4 (MED) ExceptionGroup compatibility shim for Python 3.10. The close() multi-failure path uses builtin ExceptionGroup (3.11+); tot_mcp's pyproject.toml declares requires-python=">=3.10". The shim probes for the builtin first, falls back to the exceptiongroup PyPI backport if available, and degrades gracefully to raising errors[0] only when neither is reachable (same as pre-fix behavior). CI runs 3.11+ so the existing ExceptionGroup test still passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
k-yoshimi
added a commit
that referenced
this pull request
Apr 29, 2026
Code-review feedback on 8deed08: the handoff stated `768 commits behind chore` but the actual `git log master..HEAD` count is **755**. Updated all four occurrences (lines 10, 17, 163, 251) to match. The "768" figure was a copy from the prior handoff's "765" base plus +3 PRs guess; the right way is `git log --oneline master..HEAD | wc -l`, which is 755 today. Future handoffs should re-derive rather than extrapolate. (Codex MED #2 about hard-coded `/Users/k-yoshimi/...` paths is intentionally not addressed: every prior handoff in `.claude/` follows the same convention, the file is a per-checkout self- reference, and the user's git config already publishes the same identity in commit metadata.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
k-yoshimi
added a commit
that referenced
this pull request
May 4, 2026
…188) * docs(spec): add L-7b-ii BPSD broker coupling verification design Brainstorm-stage spec for L-7b-ii: orchestrator-level verification of the eq -> tr BPSD coupling. MVP scope ("A-medium"): - new non-mutating Fortran helper tr_check_bpsd_pull (4-slot check) - C ABI export + Python Trlib.check_bpsd_pull() wrapper - CouplingRule extended with kind/verify fields - new ("eq","tr") verify rule in COUPLING_RULES - 3-layer test strategy (mock dispatch / unit / integration) Codex independent reviewed in 4 rounds (overall design + sec 6 + sec 7 + sec 8); all HIGH/MED findings addressed: - non-mutating BPSD pull (per-slot ierr accumulation, no TRCOMM mutation) - existing TotPipelineCouplingError reused (no new exception class) - 3-level exception chain via existing broad except - MODELG=0 + eq->tr -> CouplingError as acceptance criterion - wr-side BPSD speaker pending evaluation (L-7a divergence-risk judgement preserved; re-evaluation trigger documented) Implementation risks (4) carry explicit fallbacks. Out-of-scope items explicitly cross-reference L-7a / L-7b-i specs by section name (not file:line) for churn resilience. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(spec): L-7b-ii self-review fixes (BS7) - replace MAX_NRHO placeholder with reference to existing BPSD callers' allocation pattern - reconcile test case count (12 -> 11; C-2 documented but not implemented per drop note) - clarify error message format uses callable_repr (with repr() fallback for non-qualname callables like partial) No semantic change; corrects three minor inconsistencies found in the brainstorming spec self-review pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(spec): L-7b-ii final Codex review fixes Six fixes from final Codex full-spec review (HIGH 1, MED 3, LOW 2): HIGH: verify checks 3 eq-pushed slots only (device/equ1D/metric1D), not 4. plasmaf is tr's own BPSD output (tr_bpsd_put), absent on a fresh eq->tr pipeline -- including it would cause spurious False on first-time pipelines. Updated §1 data flow, §2 Fortran helper, §3 Python wrapper docstring, C prototype. MED-1: __post_init__ now validates transform is non-None for kind=transfer (was missing -- transfer dispatch unconditionally calls rule.transform(raw), would crash on None default). MED-2: AC6 clarified -- Layer C drop decision is plan-time, not post-MVP. Either Layer C is required (Eq wrapper sufficient) or removed from MVP scope (insufficient + follow-up PR). Removes the 'both required and droppable' ambiguity. MED-3: §8.4 risk #5 added: stale BPSD slot data across tests/runs. Layer A/B unaffected; Layer C uses --forked or per-test BPSD reset for isolation. LOW-1: line 76 estimate table 12 -> 11 cases (matches §7 count after C-2 drop). LOW-2: @DataClass(frozen=True) preserved in §4 dataclass; explicit note that __post_init__ raises ValueError only (no field mutation) so frozen remains compatible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(spec): L-7b-ii confirmation review fixes 3 missed fixes from confirmation review: HIGH (still-broken): 'all four' verbiage at line 126 + plasmaf in rule.doc string at lines 247-255 -- both fixed. Remaining plasmaf mentions are all explicit 'intentionally excluded' rationale (intentional, kept). MED-1 (partial): added type-narrowing asserts in §5 dispatch snippet -- assert rule.transform is not None / assert rule.dst_param is not None -- to satisfy mypy/pyright on the Optional[Callable] / Optional[str] fields. __post_init__ guarantees they hold for kind=transfer. MED-2 (partial): §1 in-scope clarified -- Layer A+B unconditional, Layer C conditional on plan-time Eq wrapper sufficiency check (cross-references AC6 + §8.4 risk #2). Resolves the 'integration coverage promised but droppable' contradiction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(spec): L-7b-ii final assert narrowing for rule.verify One-line fix from final confirmation review: §5 dispatch verify branch was missing 'assert rule.verify is not None' for static checker narrowing on the Optional[Callable] field. Symmetric to the transfer-branch asserts already present. Codex verdict after this fix: READY-FOR-IMPLEMENTATION. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plan): add L-7b-ii BPSD broker coupling implementation plan Implementation plan for the L-7b-ii spec (docs/superpowers/specs/2026-05-03-l7b-ii-bpsd-broker-coupling-design.md). 4-phase commit pattern mirroring L-7b-i (PR #187): Phase 1: tr_check_bpsd_pull Fortran helper + C ABI + Python wrapper Phase 2: CouplingRule kind/verify extension + dispatch + ('eq','tr') rule Phase 3: 11 test cases across 3 layers + README updates Phase 4: pre-push gate + PR open + CI/Bugbot wait + squash merge Plan-time risks resolved at write-time: - Eq wrapper sufficient (set_param/set_param_str/run/get_state all present in python/eqlib/eqlib.py); Layer C in scope - bpsd_get_data is INTENT(INOUT), self-allocates when nrmax=0; helper sets local%nrmax=0 explicitly - g_initialized/g_prepared lifecycle flags exist in tr/tr_api.f90:64-65 Plan-time risk deferred to implementation: - Layer C runtime (< 5s estimate unmeasured); fallback is @pytest.mark.slow gating per spec §8.4 Step granularity bite-sized (read context -> edit -> verify -> commit) per writing-plans skill convention. Suitable for executing-plans or subagent-driven-development. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plan): L-7b-ii Codex review fixes Six fixes from Codex plan review (HIGH 2, MED 3, LOW 1): HIGH-1: pre-push pytest commands now include --forked --timeout=120 --timeout-method=signal per CLAUDE.md non-negotiable. Note added on pytest-forked plugin install + documented exception when plugin unavailable. HIGH-2: reviewer agent type Agent(subagent_type='feature-dev: code-reviewer', ...) per CLAUDE.md. Fallback note for environments where feature-dev: is unavailable points to superpowers:code-reviewer. MED-1: Layer C --forked is now required (not advisory). Without BPSD broker isolation, C-3 (MODELG=0 expected failure) may spuriously pass after C-1 populated slots. MED-2: base SHA references updated f3dafae -> bdc0913 (the plan commit itself is the new chore tip, so executor will branch from it). Phase 0/1.5/2.4/3.6 squash bases corrected. MED-3: spec-coverage matrix entry for §7 MODELG fixture corrected to acknowledge deviation: plan uses local _eq_params_modelg() helper instead of mutating shared tot_demo2014_params.py (rationale: avoid Layer 1 baseline side effects). eqdata path IS still reused. LOW-1: Layer A case count clarified -- 11 logical cases (A-5 implemented as 3 sub-tests), 13 individual pytest tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plan): align Phase 4.1 Step 2 --timeout to CLAUDE.md (=120) Final fix from confirmation review: Phase 4.1 Step 2 used --timeout=60 (inconsistent with Step 1/3 and CLAUDE.md CI flags). Aligned to --timeout=120. Plan now READY-FOR-EXECUTION per Codex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plan): align all pytest commands to CLAUDE.md flags 5 additional pytest commands (Tasks 1.4, 2.2, 2.3, 3.1, 3.2 sanity checks) updated to --forked --timeout=120 --timeout-method=signal, matching CI workflow and Phase 4 pre-push gate. Per CLAUDE.md non-negotiable: 'Local test: run the relevant pytest locally with the SAME flags the CI workflow uses.' All 9 pytest invocations in the plan now consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * tr+trlib: add tr_check_bpsd_pull BPSD broker verification helper (L-7b-ii) New non-mutating Fortran helper tr_check_bpsd_pull pulls the 3 eq-pushed BPSD slots (device, equ1D, metric1D) into local discardable types and reports ok = 1 iff all 3 per-slot ierr == 0. plasmaf is intentionally NOT checked: it is tr's own BPSD output (tr_bpsd_put), absent on a fresh eq->tr pipeline. Implementation notes: - Local types initialized with %nrmax = 0 to trigger BPSD's self-allocation path (per ../../bpsd/bpsd_equ1D.f90:143-150, bpsd_get_* is INTENT(INOUT) and allocates internally when the caller passes nrmax=0). - Per-slot ierr accumulated to AND-condition (avoids masking earlier failures by later successes). - TRCOMM untouched -- safe to call before or after tr.run(). - TR_STATE_ABI_VERSION not bumped (no struct change). Exposed via C ABI (tr/tr_api.h) and Python wrapper (Trlib.check_bpsd_pull). Smoke-tested: fresh Trlib() init returns False (BPSD slots empty); existing trlib tests unchanged. Spec: docs/superpowers/specs/2026-05-03-l7b-ii-bpsd-broker-coupling-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * pipeline: extend CouplingRule for verify rules (kind/verify dispatch) (L-7b-ii) CouplingRule dataclass gains two new fields: kind: str = "transfer" # "transfer" | "verify" verify: Optional[Callable[[Any], bool]] = None Existing transfer-rule fields (src_state_key, dst_param, transform) are defaulted to None and validated by __post_init__: * kind="transfer" requires all three transfer fields * kind="verify" requires verify callable * unknown kind raises ValueError @DataClass(frozen=True) preserved; __post_init__ only raises and never mutates self, so frozen remains compatible. run_pipeline rule iteration gains a kind branch: * transfer: existing extract -> transform -> set_param -> record * verify: ok = rule.verify(curr_inst); raise CouplingError if False or if verify itself raised. applied.append only on success (matches transfer-rule "succeeded snapshot" semantics of PipelineStep.coupling_applied) Both error paths flow through the existing broad except at pipeline.py and become TotPipelineRunError with __cause__ chain (3-level: RunError -> CouplingError -> OriginalError if any). Scope note (L-7b-ii deferred at registry level): The original spec proposed registering an ('eq','tr') verify rule using BPSD as a cross-module broker (Trlib.check_bpsd_pull). During implementation we found that libeqapi.so and libtrapi.so each carry a private copy of the BPSD module-level state -- ___bpsd_equ1d_MOD_ equ1dx is static (s) in nm output of both shared objects. Therefore bpsd_put from libeqapi.so does NOT propagate to libtrapi.so's bpsd_get, and the verify rule cannot succeed under the per-module TotPipeline architecture. The legacy Tot / libtotapi.so path keeps eq+tr+bpsd co-linked and is unaffected. Until a cross-.so sharing scheme (unified .so, IPC, or RTLD_GLOBAL with weak symbols) is decided, no ('eq','tr') rule is registered. The verify dispatch infrastructure is in place and mock-tested (Layer A) so that the rule can be added later without further changes to pipeline.py. One existing test (test_run_pipeline_missing_source_state_raises_ coupling_error) was updated to pass an explicit transform=lambda v: v to satisfy the new __post_init__ contract; it had previously relied on the old identity default. Existing ("fp","tr") transfer rule and Layer 1 baselines (demo2014, ht6m at 1e-10) are unaffected. Spec: docs/superpowers/specs/2026-05-03-l7b-ii-bpsd-broker-coupling-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test+docs: Layer A/B BPSD coupling verify tests + READMEs (L-7b-ii) Adds 11 new test cases across 2 layers (Layer C deferred — see below): Layer A (test_pipeline_verify.py, no .so): 8 mock-based dispatch tests covering verify-True success, verify-False raise+cause-chain, verify-raise 3-level chain, mixed transfer+verify ordering, __post_init__ validation (3 cases), unregistered-pair silent skip. Layer B (test_bpsd_check.py, libtrapi.so): 3 unit tests for Trlib.check_bpsd_pull() -- fresh init returns False, closed Trlib raises TrlibError, no exception leak from Fortran. Also updates test_pipeline_dataclasses::test_coupling_rule_defaults to pass an explicit transform=lambda v: v: the previous version relied on the old default `transform = lambda v: v`, which is no longer a default per the new __post_init__ contract introduced in the previous commit. Layer C (eq + tr integration) deferred: 実装中の調査で `libeqapi.so` と `libtrapi.so` が独立した .so で、 それぞれが BPSD module-level state (`___bpsd_equ1d_MOD_equ1dx` ほか) を private に持つことが判明 (nm で確認、両 .so で static linkage)。 spec が想定した「BPSD = eq/tr 共有ブローカー」は `Tot` / `libtotapi.so` (eq+tr+bpsd 同梱) では成立するが、 `TotPipeline` (per-module .so) では成立しない。 したがって ('eq','tr') verify ルールは本 PR では登録せず、 Layer C 統合テストも保留 (実行しても push が反映されないので、 C-1 は構造的にパスしえず、C-3 は誤った理由で pass する)。 verify ディスパッチの機構自体は本 PR で完成済み (Layer A 網羅) なので、共有方針が決まり次第 ('eq','tr') ルールは COUPLING_RULES に 1 行追加するだけで有効化できる。 詳細は totlib/README.md の Coupling rules 節と trlib/README.md の "BPSD ブローカー pull 検証" 節を参照。 README updates: python/totlib/README.md: documents the kind="transfer" / "verify" distinction in the Coupling rules section, plus a Japanese-language deferral note explaining the per-.so BPSD isolation finding. python/trlib/README.md: new "BPSD ブローカー pull 検証" section (Japanese) documenting Trlib.check_bpsd_pull() with usage example and the same scope caveat about per-.so isolation. Existing Layer 1 baselines (demo2014, ht6m at 1e-10) and existing pipeline tests are unaffected (the verify dispatch doesn't fire when no rule is registered for the pair). Spec: docs/superpowers/specs/2026-05-03-l7b-ii-bpsd-broker-coupling-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(L-7b-ii): address reviewer findings (MED + LOW polish) Codex MED1 (older .so compatibility): wrap tr_check_bpsd_pull ctypes prototype attachment in try/except, mirroring the existing tr_validate pattern in _ffi.py. Older builds of libtrapi.so that predate L-7b-ii will now defer the AttributeError to first call of Trlib.check_bpsd_pull instead of failing at module import. Codex MED3 (chain-depth accuracy): totlib/README's "3-level cause chain" was inaccurate for the False-return path. Clarify that False produces a 2-level chain (TotPipelineRunError -> TotPipelineCouplingError, no original cause), while a verify() exception produces the 3-level chain (... -> original exception). In-house L-1 (docstring count): test_pipeline_verify.py header said "Six cases" but defines 8 test functions because A-5 expands into three separate validation tests. Updated the docstring to clearly distinguish "six logical cases" from "eight test functions" with a per-case description. Codex MED2 (spec staleness): added §0 (in Japanese, matching totlib/trlib README convention) at the top of the spec doc recording the implementation-time deferral, summarising what shipped vs deferred and listing candidate resolution paths (shared .so / IPC / RTLD_GLOBAL+weak / serialisation). Status field changed from "Draft" to "Partial". Verification: - Layer A (8) + Layer B (3) + dataclasses (8) + pipeline (30) + equivalence (2) = 51 passed; 1 pre-existing Py3.10 ExceptionGroup test still fails (documented in plan). - Trlib() smoke load + check_bpsd_pull still returns False on fresh init. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #1. After #1 merges, the base of this PR should be changed to `develop`:
Summary
TASK/TR モジュールのリファクタリングに先立ち、数値挙動を固定する回帰テスト基盤を `test_run/` に整備する Phase 0 の実装。
設計書と実装計画
設計方針は Option B(構造+危険箇所のみの限定的近代化)、既存開発者の参画しやすさを優先。
回帰判定の仕組み
通常実行への影響
なし。 `TR_REGRESS_DUMP` 未設定時は `tr_regress_dump_if_enabled` が即 `RETURN` するため、従来の挙動と完全に同一。
変更の要約(15 コミット)
Test plan
pytest → unittest について
設計書と実装計画では pytest を想定していたが、対象環境に pytest が未インストールのため stdlib `unittest` に移植(コミット a5a6096)。機能的な差異はなく、`python3 -m unittest discover` で同等のテストを実行できる。
次のフェーズ
本 PR 後、Phase 1(`USE TRCOMM` への ONLY 句付与)から順次別 PR で進める予定。
🤖 Generated with Claude Code
Note
Medium Risk
Adds a new TR codepath (file dump) and wires numerical comparisons into the test runner; while gated by
TR_REGRESS_DUMP, failures can now come from baseline drift/tolerance and the dump hook runs on all TR test executions.Overview
Adds Phase 0 TR regression testing infrastructure by introducing
tr/trregress.f90and a one-line hook intrloop.f90(plustr/Makefileupdate) to writetr_regress.datin a fixed high-precision format whenTR_REGRESS_DUMP=1.Extends
test_runto enforce numeric stability:run_tests.shnow setsTR_REGRESS_DUMP=1for TR runs and, afterCLOSED, calls a newscripts/check_regression.shwrapper that extracts metrics (extract_tr_metrics.py) and compares them to committed baselines (compare_metrics.py, default rel tol1e-10), reportingREGRESSIONon mismatch.Adds two new TR input cases (
tr_m0904,tr_tst2) totest_definitions.conf, commits baselinemetrics.jsonfortr_iter01,tr_m0904, andtr_tst2, and documents the workflow intest_run/README.md(withtest_output/and Python caches ignored viatest_run/.gitignore).Reviewed by Cursor Bugbot for commit fe94c1e. Bugbot is set up for automated code reviews on this repo. Configure here.