fix(heck-relay): stop overwriting Rosales OPT params with Seminario projections#280
Merged
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes the heck-relay system loader so it no longer corrupts the published Rosales OPT force field by re-running Seminario projections over active OPT parameters, and updates diagnostics/docs to reflect the corrected baseline behavior.
Changes:
- Stop calling
estimate_force_constants()inload_heck_relay()while still usingfreeze_standard_params()to preserve the intended active-parameter mask. - Add a one-off diagnostic script to compare multiple heck-relay force-field baselines and write a JSON report with provenance.
- Update heck-relay and optimizer-comparison documentation to reflect the corrected baseline metrics and ratio behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| scripts/diagnose_heck_relay.py | Adds a three-baseline diagnostic + JSON/provenance output for heck-relay. |
| q2mm/diagnostics/systems.py | Fixes heck-relay loader to keep Rosales OPT parameters as-published (no Seminario overwrite). |
| docs/systems/heck-relay.md | Updates heck-relay system page to describe the loader fix and new baseline metrics. |
| docs/benchmarks/optimizer-comparison.md | Updates cross-system comparison tables/text for the new heck-relay ratio and R² values. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4 tasks
…rojections Resolves #277. The Heck-relay loader silently corrupted the published Rosales OPT force field. The sequence ff_template = ForceField.from_mm3_fld(ff_path, include_standard=True) ff_opt = ForceField.from_mm3_fld(ff_path, include_standard=False) ff_template.freeze_standard_params(ff_opt) ff = estimate_force_constants(molecules, forcefield=ff_template, ...) freezes all *non*-OPT params (standard MM3 backbone) and leaves the OPT-substructure params active. `estimate_force_constants` then skips the frozen params (q2mm/models/seminario.py:341) and re-estimates the *active* (= OPT) ones — overwriting the carefully-fitted Rosales values with raw Seminario projections. The new three-baseline diagnostic (`scripts/diagnose_heck_relay.py`, committed alongside this fix and mirrored to `q2mm-data/benchmarks/heck-relay/diagnostic/`) confirms the divergence: Baseline obj ratio R²(bond_len) A_untouched_rosales 3.42e+06 1.31 0.980 B_current_loader 1.17e+08 n/a (NaN) -4787.5 C_seminario_only 1.24e+08 1.77e+73 -849.0 Fix: keep `freeze_standard_params` (still needed for the optimizer's active mask) but skip `estimate_force_constants`. Use the Rosales OPT values as-published. Post-fix regenerated baseline (in companion q2mm-data PR): obj: 1.48e+08 → 3.46e+06 (43x smaller) ratio: ~10^21 (sanitized 1e30) → 1.29 (finite, real) ratio_status: out_of_band → out_of_band (still 12% over the band) bond_length R²: -48.06 → 0.980 bond_angle R²: -5.47 → 0.786 eig_diagonal R²: -4.66 → -12.62 (worse; reflects the real MM3* ↔ JAX-engine cross-engine gap, not a loader bug) Heck relay now sits with pd-conjugate as a candidate for the ratio_tol=None experiment (issue #276): the gate barely misses instead of catastrophically failing. Out of scope (per #277): the same `freeze_standard_params + estimate_force_constants` pattern exists in load_rh_enamide and load_pd_*. Those systems are not affected catastrophically (Rh-enamide runs at R²=0.97 and Pd-allyl at -1.41 today) but the same audit should probably extend to them in a follow-up. Also includes: - scripts/diagnose_heck_relay.py — the diagnostic script (committed per #277 acceptance criteria, output mirrored to q2mm-data). - docs/systems/heck-relay.md — updated to reflect the fix. - docs/benchmarks/optimizer-comparison.md — heck-relay row and notes updated; per-category R² table updated to current numbers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
986ec19 to
4e86326
Compare
ericchansen
added a commit
that referenced
this pull request
May 26, 2026
…oad_system
Replaces six per-system loaders + a private _load_wahlers_system helper
with a single declarative dispatch system, addressing the
inconsistencies (load_X vs _load_X_system, divergent shapes) and the
copy-paste between similar loaders.
New module: q2mm/models/loaders.py
- Strategy primitives, named after what they do (no implicit ordering
that produced the q2mm#277 bug):
load_published_opt(ff_path)
load_qfuerza_only(molecules, ff_path)
load_qfuerza_fresh(molecule)
compose_opt_with_mm3_base(opt_path, base_path, *, metal)
load_published_opt_composed(opt_path, base_path, *, metal)
- Each docstring cites Farrugia 2025 (the QFUERZA paper, AGENTS.md
"Key Papers") and documents the returned FF's frozen/active
invariants.
Rewritten q2mm/diagnostics/systems.py:
- New `SystemSpec` dataclass replaces `BenchmarkSystem`. Each spec
declares: molecule_loader callable, ff_strategy literal, ff_paths
mapping, metadata dict, optional metal symbol.
- New `load_system(key, *, engine=None, functional_form=None,
molecule_loader_kwargs=None)` is the single loader entry point.
Reference construction is uniform across multi-molecule systems
(`ReferenceData.from_molecules` with eigenmatrix-diagonal=True);
single-molecule systems with frequency-only references (ch3f) take
the `engine` argument and use `_build_frequency_reference`.
Deleted (no shims, alpha discipline):
- load_ch3f, load_rh_enamide, load_heck_relay, load_pd_allyl,
load_pd_conjugate, load_rh_conjugate (six public functions).
- _load_wahlers_system (private helper).
- BenchmarkSystem dataclass (replaced by SystemSpec).
Survives because still per-system:
- load_X_molecules() functions (genuinely different file
formats/layouts per system; referenced by SystemSpec.molecule_loader).
- _load_gaussian_molecules, _load_ch3f_molecules, _load_wahlers_molecules
(factored out of the deleted loaders).
Per-system code in SYSTEMS shrinks to ~10 lines of declarative spec
per system (vs ~40-60 lines of repetitive procedure in the old design).
Caller migration in the same commit (no shims):
- q2mm/diagnostics/cli.py: SYSTEMS[k].loader(engine) → load_system(k, engine=engine, ...)
BenchmarkSystem → SystemSpec
special-cased load_ch3f(data_dir=...) → load_system("ch3f", molecule_loader_kwargs={"data_dir": ...})
- q2mm/diagnostics/__init__.py: re-export SystemSpec + load_system instead of BenchmarkSystem.
- scripts/regenerate_convergence_results.py: SYSTEMS[k].loader → load_system.
- scripts/bench_composed.py: load_ch3f → load_system("ch3f").
- scripts/run_rh_benchmarks.sh: embedded Python — load_rh_enamide → load_system("rh-enamide").
- test/integration/test_heck_validation.py regression: load_heck_relay → load_system.
- test/integration/test_backend_optimizer_matrix.py: load_ch3f → load_system.
- test/benchmarks/test_optimization.py: same.
Smoke tests:
- `scripts/regenerate_convergence_results.py --system ch3f --skip-optimization`
runs end-to-end and produces the same baseline numbers
(ratio=1.000, ok, single 'frequency' category).
- Full unit suite: 656 passed, 272 skipped.
- Ruff check + format: clean.
Behavior changes (intentional, per plan, will surface in commit 5):
- load_rh_enamide previously called qfuerza_into on a copy of the
rh-enamide mm3.fld — i.e. it OVERWROTE the Donoghue OPT values
with raw FUERZA projections. After this commit it uses
load_published_opt, which preserves the published OPT values.
Numbers for rh-enamide will move in commit 5 (regeneration).
- load_pd_allyl, load_pd_conjugate, load_rh_conjugate had the same
overwrite pattern via _load_wahlers_system + qfuerza_into. After
this commit they use load_published_opt_composed and preserve the
Wahlers OPT values. Numbers will move in commit 5.
- load_heck_relay is unchanged in behavior — its loader was already
fixed in PR #280 to skip qfuerza_into. This commit just routes it
through the load_published_opt strategy.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ericchansen
added a commit
that referenced
this pull request
May 26, 2026
…h + qfuerza_into
The old ``estimate_force_constants(molecules, forcefield=None, ...)``
overloaded two completely different operations on one signature:
forcefield=None → build a brand-new FF (single molecule only).
forcefield=ff → mutate ff (well, ff.copy()) — silently overwriting
every "unfrozen" param.
The second mode is exactly the footgun that produced the q2mm#277
Heck-relay bug. Splitting the function into two names with two
narrower contracts makes the intent explicit at every call site:
qfuerza_fresh(molecule, ...) -> ForceField
Build a fresh FF from one molecule's QM Hessian. Errors if
multiple molecules are passed.
qfuerza_into(ff, molecules, ...) -> None
Overwrite the values of every *unfrozen* parameter in ``ff``
using the QFUERZA projection. Frozen params are skipped (the
load_heck_relay invariant). Mutates ff in place. No return.
Both names cite Farrugia et al. 2025 (J. Chem. Theory Comput.
22, 469, DOI 10.1021/acs.jctc.5c01751), the methodology paper of
record per AGENTS.md. The estimate_force_constants name is deleted
outright — no shim, per alpha discipline.
Other internal renames in seminario.py to match the new vocabulary
(TypeError message, two docstring/comment refs). __init__.py public
API now exports ``qfuerza_fresh`` and ``qfuerza_into`` instead of
``estimate_force_constants``.
Migration of all 85 call sites is mechanical:
fresh path: estimate_force_constants(mol, ...) → qfuerza_fresh(mol, ...)
into path: ff = estimate_force_constants(mols, forcefield=ff_template, ...)
→ ff = ff_template.copy()
qfuerza_into(ff, mols, ...)
19 files touched. The Heck-relay loader behavior is unchanged
(it doesn't call into qfuerza_into at all post-#280); the rh-enamide
and Wahlers loaders still have the same shape, with
``forcefield=ff_template, ...`` → ``copy + qfuerza_into`` and same
semantics.
Tests:
- Existing unit test renamed to
``test_qfuerza_into_does_not_overwrite_frozen_opt_block``;
same regression coverage for the q2mm#277 pattern.
- All 656 unit tests pass; ruff check + format pass clean.
No behavior change visible from outside the call sites — same loops
inside ``_estimate_into_ff`` (the renamed body), same skip-frozen
short-circuits. The Heck-relay regression test from PR #280
continues to pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ericchansen
added a commit
that referenced
this pull request
May 26, 2026
…oad_system
Replaces six per-system loaders + a private _load_wahlers_system helper
with a single declarative dispatch system, addressing the
inconsistencies (load_X vs _load_X_system, divergent shapes) and the
copy-paste between similar loaders.
New module: q2mm/models/loaders.py
- Strategy primitives, named after what they do (no implicit ordering
that produced the q2mm#277 bug):
load_published_opt(ff_path)
load_qfuerza_only(molecules, ff_path)
load_qfuerza_fresh(molecule)
compose_opt_with_mm3_base(opt_path, base_path, *, metal)
load_published_opt_composed(opt_path, base_path, *, metal)
- Each docstring cites Farrugia 2025 (the QFUERZA paper, AGENTS.md
"Key Papers") and documents the returned FF's frozen/active
invariants.
Rewritten q2mm/diagnostics/systems.py:
- New `SystemSpec` dataclass replaces `BenchmarkSystem`. Each spec
declares: molecule_loader callable, ff_strategy literal, ff_paths
mapping, metadata dict, optional metal symbol.
- New `load_system(key, *, engine=None, functional_form=None,
molecule_loader_kwargs=None)` is the single loader entry point.
Reference construction is uniform across multi-molecule systems
(`ReferenceData.from_molecules` with eigenmatrix-diagonal=True);
single-molecule systems with frequency-only references (ch3f) take
the `engine` argument and use `_build_frequency_reference`.
Deleted (no shims, alpha discipline):
- load_ch3f, load_rh_enamide, load_heck_relay, load_pd_allyl,
load_pd_conjugate, load_rh_conjugate (six public functions).
- _load_wahlers_system (private helper).
- BenchmarkSystem dataclass (replaced by SystemSpec).
Survives because still per-system:
- load_X_molecules() functions (genuinely different file
formats/layouts per system; referenced by SystemSpec.molecule_loader).
- _load_gaussian_molecules, _load_ch3f_molecules, _load_wahlers_molecules
(factored out of the deleted loaders).
Per-system code in SYSTEMS shrinks to ~10 lines of declarative spec
per system (vs ~40-60 lines of repetitive procedure in the old design).
Caller migration in the same commit (no shims):
- q2mm/diagnostics/cli.py: SYSTEMS[k].loader(engine) → load_system(k, engine=engine, ...)
BenchmarkSystem → SystemSpec
special-cased load_ch3f(data_dir=...) → load_system("ch3f", molecule_loader_kwargs={"data_dir": ...})
- q2mm/diagnostics/__init__.py: re-export SystemSpec + load_system instead of BenchmarkSystem.
- scripts/regenerate_convergence_results.py: SYSTEMS[k].loader → load_system.
- scripts/bench_composed.py: load_ch3f → load_system("ch3f").
- scripts/run_rh_benchmarks.sh: embedded Python — load_rh_enamide → load_system("rh-enamide").
- test/integration/test_heck_validation.py regression: load_heck_relay → load_system.
- test/integration/test_backend_optimizer_matrix.py: load_ch3f → load_system.
- test/benchmarks/test_optimization.py: same.
Smoke tests:
- `scripts/regenerate_convergence_results.py --system ch3f --skip-optimization`
runs end-to-end and produces the same baseline numbers
(ratio=1.000, ok, single 'frequency' category).
- Full unit suite: 656 passed, 272 skipped.
- Ruff check + format: clean.
Behavior changes (intentional, per plan, will surface in commit 5):
- load_rh_enamide previously called qfuerza_into on a copy of the
rh-enamide mm3.fld — i.e. it OVERWROTE the Donoghue OPT values
with raw FUERZA projections. After this commit it uses
load_published_opt, which preserves the published OPT values.
Numbers for rh-enamide will move in commit 5 (regeneration).
- load_pd_allyl, load_pd_conjugate, load_rh_conjugate had the same
overwrite pattern via _load_wahlers_system + qfuerza_into. After
this commit they use load_published_opt_composed and preserve the
Wahlers OPT values. Numbers will move in commit 5.
- load_heck_relay is unchanged in behavior — its loader was already
fixed in PR #280 to skip qfuerza_into. This commit just routes it
through the load_published_opt strategy.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ericchansen
added a commit
that referenced
this pull request
May 26, 2026
…oad_system
Replaces six per-system loaders + a private _load_wahlers_system helper
with a single declarative dispatch system, addressing the
inconsistencies (load_X vs _load_X_system, divergent shapes) and the
copy-paste between similar loaders.
New module: q2mm/models/loaders.py
- Strategy primitives, named after what they do (no implicit ordering
that produced the q2mm#277 bug):
load_published_opt(ff_path)
load_qfuerza_fresh(molecule)
compose_opt_with_mm3_base(opt_path, base_path, *, metal)
-> (composed, opt_only)
load_published_opt_composed(opt_path, base_path, *, metal)
- Each docstring cites Farrugia 2025 (the QFUERZA paper, AGENTS.md
"Key Papers") and documents the returned FF's frozen/active
invariants.
- Module-level `_METAL_VDW` dict holds published metal vdW parameters
for systems whose MM3 base file lacks them (today only PD from
Rosales 2020). Lives here rather than in `q2mm/diagnostics/`
so the dependency direction is models <- diagnostics, not the
reverse.
Rewritten q2mm/diagnostics/systems.py:
- New `SystemSpec` dataclass replaces `BenchmarkSystem`. Each spec
declares: molecule_loader callable, ff_strategy literal, ff_paths
mapping (uniformly `Callable[[], Path]`, no Path|Callable sum
type), optional `normal_modes_path` resolver, metadata dict,
optional metal symbol.
- New `load_system(key, *, engine=None, functional_form=None,
molecule_loader_kwargs=None)` is the single loader entry point.
Reference construction is uniform across multi-molecule systems
(`ReferenceData.from_molecules` with eigenmatrix-diagonal=True);
single-molecule systems with frequency-only references (ch3f) take
the `engine` argument and use `_build_frequency_reference`.
- Normal-modes side-loading is generic: any spec setting
`normal_modes_path` gets its modes loaded; the dispatcher has no
`key == "ch3f"` special-case.
Deleted (no shims, alpha discipline):
- load_ch3f, load_rh_enamide, load_heck_relay, load_pd_allyl,
load_pd_conjugate, load_rh_conjugate (six public functions).
- _load_wahlers_system (private helper).
- BenchmarkSystem dataclass (replaced by SystemSpec).
- _metal_vdw_registry() function (replaced by `_METAL_VDW` dict in
loaders.py, with no lazy import).
Survives because still per-system:
- load_X_molecules() functions (genuinely different file
formats/layouts per system; referenced by SystemSpec.molecule_loader).
- _load_gaussian_molecules, _load_ch3f_molecules, _load_wahlers_molecules
(factored out of the deleted loaders).
Per-system code in SYSTEMS shrinks to ~10 lines of declarative spec
per system (vs ~40-60 lines of repetitive procedure in the old design).
Caller migration in the same commit (no shims):
- q2mm/diagnostics/cli.py: SYSTEMS[k].loader(engine) -> load_system(k, engine=engine, ...)
BenchmarkSystem -> SystemSpec
special-cased load_ch3f(data_dir=...) -> load_system("ch3f", molecule_loader_kwargs={"data_dir": ...})
- q2mm/diagnostics/__init__.py: re-export SystemSpec + load_system instead of BenchmarkSystem.
- scripts/regenerate_convergence_results.py: SYSTEMS[k].loader -> load_system.
- scripts/bench_composed.py: load_ch3f -> load_system("ch3f").
- scripts/run_rh_benchmarks.sh: embedded Python -- load_rh_enamide -> load_system("rh-enamide").
- test/integration/test_heck_validation.py regression: load_heck_relay -> load_system.
- test/integration/test_backend_optimizer_matrix.py: load_ch3f -> load_system.
- test/benchmarks/test_optimization.py: same.
Smoke tests:
- `scripts/regenerate_convergence_results.py --system ch3f --skip-optimization`
runs end-to-end and produces the same baseline numbers
(ratio=1.000, ok, single 'frequency' category).
- Full unit suite: 656 passed, 272 skipped.
- Ruff check + format: clean.
Behavior changes (intentional, per plan, will surface in commit 5):
- load_rh_enamide previously called qfuerza_into on a copy of the
rh-enamide mm3.fld -- i.e. it OVERWROTE the Donoghue OPT values
with raw FUERZA projections. After this commit it uses
load_published_opt, which preserves the published OPT values.
Numbers for rh-enamide will move in commit 5 (regeneration).
- load_pd_allyl, load_pd_conjugate, load_rh_conjugate had the same
overwrite pattern via _load_wahlers_system + qfuerza_into. After
this commit they use load_published_opt_composed and preserve the
Wahlers OPT values. Numbers will move in commit 5.
- load_heck_relay is unchanged in behavior -- its loader was already
fixed in PR #280 to skip qfuerza_into. This commit just routes it
through the load_published_opt strategy.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ericchansen
added a commit
that referenced
this pull request
May 26, 2026
…h + qfuerza_into
The old ``estimate_force_constants(molecules, forcefield=None, ...)``
overloaded two completely different operations on one signature:
forcefield=None → build a brand-new FF (single molecule only).
forcefield=ff → mutate ff (well, ff.copy()) — silently overwriting
every "unfrozen" param.
The second mode is exactly the footgun that produced the q2mm#277
Heck-relay bug. Splitting the function into two names with two
narrower contracts makes the intent explicit at every call site:
qfuerza_fresh(molecule, ...) -> ForceField
Build a fresh FF from one molecule's QM Hessian. Errors if
multiple molecules are passed.
qfuerza_into(ff, molecules, ...) -> None
Overwrite the values of every *unfrozen* parameter in ``ff``
using the QFUERZA projection. Frozen params are skipped (the
load_heck_relay invariant). Mutates ff in place. No return.
Both names cite Farrugia et al. 2025 (J. Chem. Theory Comput.
22, 469, DOI 10.1021/acs.jctc.5c01751), the methodology paper of
record per AGENTS.md. The estimate_force_constants name is deleted
outright — no shim, per alpha discipline.
Other internal renames in seminario.py to match the new vocabulary
(TypeError message, two docstring/comment refs). __init__.py public
API now exports ``qfuerza_fresh`` and ``qfuerza_into`` instead of
``estimate_force_constants``.
Migration of all 85 call sites is mechanical:
fresh path: estimate_force_constants(mol, ...) → qfuerza_fresh(mol, ...)
into path: ff = estimate_force_constants(mols, forcefield=ff_template, ...)
→ ff = ff_template.copy()
qfuerza_into(ff, mols, ...)
19 files touched. The Heck-relay loader behavior is unchanged
(it doesn't call into qfuerza_into at all post-#280); the rh-enamide
and Wahlers loaders still have the same shape, with
``forcefield=ff_template, ...`` → ``copy + qfuerza_into`` and same
semantics.
Tests:
- Existing unit test renamed to
``test_qfuerza_into_does_not_overwrite_frozen_opt_block``;
same regression coverage for the q2mm#277 pattern.
- All 656 unit tests pass; ruff check + format pass clean.
No behavior change visible from outside the call sites — same loops
inside ``_estimate_into_ff`` (the renamed body), same skip-frozen
short-circuits. The Heck-relay regression test from PR #280
continues to pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ericchansen
added a commit
that referenced
this pull request
May 26, 2026
…oad_system
Replaces six per-system loaders + a private _load_wahlers_system helper
with a single declarative dispatch system, addressing the
inconsistencies (load_X vs _load_X_system, divergent shapes) and the
copy-paste between similar loaders.
New module: q2mm/models/loaders.py
- Strategy primitives, named after what they do (no implicit ordering
that produced the q2mm#277 bug):
load_published_opt(ff_path)
load_qfuerza_fresh(molecule)
compose_opt_with_mm3_base(opt_path, base_path, *, metal)
-> (composed, opt_only)
load_published_opt_composed(opt_path, base_path, *, metal)
- Each docstring cites Farrugia 2025 (the QFUERZA paper, AGENTS.md
"Key Papers") and documents the returned FF's frozen/active
invariants.
- Module-level `_METAL_VDW` dict holds published metal vdW parameters
for systems whose MM3 base file lacks them (today only PD from
Rosales 2020). Lives here rather than in `q2mm/diagnostics/`
so the dependency direction is models <- diagnostics, not the
reverse.
Rewritten q2mm/diagnostics/systems.py:
- New `SystemSpec` dataclass replaces `BenchmarkSystem`. Each spec
declares: molecule_loader callable, ff_strategy literal, ff_paths
mapping (uniformly `Callable[[], Path]`, no Path|Callable sum
type), optional `normal_modes_path` resolver, metadata dict,
optional metal symbol.
- New `load_system(key, *, engine=None, functional_form=None,
molecule_loader_kwargs=None)` is the single loader entry point.
Reference construction is uniform across multi-molecule systems
(`ReferenceData.from_molecules` with eigenmatrix-diagonal=True);
single-molecule systems with frequency-only references (ch3f) take
the `engine` argument and use `_build_frequency_reference`.
- Normal-modes side-loading is generic: any spec setting
`normal_modes_path` gets its modes loaded; the dispatcher has no
`key == "ch3f"` special-case.
Deleted (no shims, alpha discipline):
- load_ch3f, load_rh_enamide, load_heck_relay, load_pd_allyl,
load_pd_conjugate, load_rh_conjugate (six public functions).
- _load_wahlers_system (private helper).
- BenchmarkSystem dataclass (replaced by SystemSpec).
- _metal_vdw_registry() function (replaced by `_METAL_VDW` dict in
loaders.py, with no lazy import).
Survives because still per-system:
- load_X_molecules() functions (genuinely different file
formats/layouts per system; referenced by SystemSpec.molecule_loader).
- _load_gaussian_molecules, _load_ch3f_molecules, _load_wahlers_molecules
(factored out of the deleted loaders).
Per-system code in SYSTEMS shrinks to ~10 lines of declarative spec
per system (vs ~40-60 lines of repetitive procedure in the old design).
Caller migration in the same commit (no shims):
- q2mm/diagnostics/cli.py: SYSTEMS[k].loader(engine) -> load_system(k, engine=engine, ...)
BenchmarkSystem -> SystemSpec
special-cased load_ch3f(data_dir=...) -> load_system("ch3f", molecule_loader_kwargs={"data_dir": ...})
- q2mm/diagnostics/__init__.py: re-export SystemSpec + load_system instead of BenchmarkSystem.
- scripts/regenerate_convergence_results.py: SYSTEMS[k].loader -> load_system.
- scripts/bench_composed.py: load_ch3f -> load_system("ch3f").
- scripts/run_rh_benchmarks.sh: embedded Python -- load_rh_enamide -> load_system("rh-enamide").
- test/integration/test_heck_validation.py regression: load_heck_relay -> load_system.
- test/integration/test_backend_optimizer_matrix.py: load_ch3f -> load_system.
- test/benchmarks/test_optimization.py: same.
Smoke tests:
- `scripts/regenerate_convergence_results.py --system ch3f --skip-optimization`
runs end-to-end and produces the same baseline numbers
(ratio=1.000, ok, single 'frequency' category).
- Full unit suite: 656 passed, 272 skipped.
- Ruff check + format: clean.
Behavior changes (intentional, per plan, will surface in commit 5):
- load_rh_enamide previously called qfuerza_into on a copy of the
rh-enamide mm3.fld -- i.e. it OVERWROTE the Donoghue OPT values
with raw FUERZA projections. After this commit it uses
load_published_opt, which preserves the published OPT values.
Numbers for rh-enamide will move in commit 5 (regeneration).
- load_pd_allyl, load_pd_conjugate, load_rh_conjugate had the same
overwrite pattern via _load_wahlers_system + qfuerza_into. After
this commit they use load_published_opt_composed and preserve the
Wahlers OPT values. Numbers will move in commit 5.
- load_heck_relay is unchanged in behavior -- its loader was already
fixed in PR #280 to skip qfuerza_into. This commit just routes it
through the load_published_opt strategy.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #277.
TL;DR
The Heck-relay loader was silently corrupting the published Rosales OPT force field by overwriting its OPT parameters with raw Seminario projections. Fixing this single line drops the ObjectiveFunction by 43×, takes JaxLoss from "diverges by 21 orders of magnitude" to ratio ≈ 1.29, and brings bond_length R² from −48 to 0.98.
The bug
freeze_standard_paramsmarks the non-OPT params (standard MM3 backbone) as frozen and leaves the OPT params active.estimate_force_constantsskips frozen params (seeq2mm/models/seminario.py:341) and re-estimates the active ones — overwriting the carefully-fitted Rosales OPT values with raw Seminario projections.Three-baseline diagnostic
The new
scripts/diagnose_heck_relay.pybuilds three FFs and evaluates each (no optimization, just evaluation). Output committed toq2mm-data/benchmarks/heck-relay/diagnostic/:A's bond_length R² (0.980) is dramatically better than B's (−4787) — matching the issue's decision rule "A fine, B explodes → narrow on Seminario behavior with freeze_standard_params".
The fix
Keep
freeze_standard_params(still needed for the optimizer's active mask) but skipestimate_force_constants. Use the published Rosales OPT parameters as-is.Post-fix baseline
(Companion q2mm-data PR regenerates the heck-relay convergence JSONs.)
The eigenmatrix actually gets worse (−4.66 → −12.6), but that just reflects the real MM3* ↔ JAX-engine cross-engine gap that's now visible without being masked by the corrupted FF. Geometry is now well-reproduced and JaxLoss is no longer divergent.
Out of scope
freeze_standard_params + estimate_force_constantspattern exists inload_rh_enamideand theload_pd_*loaders. They aren't catastrophically broken today (Rh-enamide runs at R² = 0.97, Pd-allyl at −1.41), but the same audit should probably extend to them in a follow-up.ratio_tol=Noneexperiment (#276).Validation
python -m ruff check q2mm/ test/ scripts/— clean.python -m ruff format --check q2mm test scripts examples— clean.python -m pytest test/ -x -q -m "not (openmm or tinker or jax or jax_md or psi4)" -k "heck or systems"— 11 passed.scripts/regenerate_convergence_results.py --system heck-relay --skip-optimizationran clean on a single GPU; output (with full provenance) is in the companion q2mm-data PR.Companion PR
ericchansen/q2mm-data#?— adds the diagnostic JSON and the regenerated convergence baseline.