refactor: loader API + frozen-as-invariant + QFUERZA split#281
Merged
Conversation
…ferences Adds a prominent table of the papers that govern q2mm's parametrization and optimization methodology, with DOIs and one-line scope for each: - Farrugia, Helquist, Norrby & Wiest 2025 (QFUERZA) — the methodology of record for q2mm/models/seminario.py. - Seminario 1996 — original FUERZA, background only. - Limé & Norrby 2015 — TS Hessian inversion. - Donoghue 2008 — rh-enamide TSFF. - Rosales 2020 — heck-relay TSFF. - Wahlers 2022 — pd-allyl, pd-conjugate, rh-conjugate TSFFs. Placed before the existing "TS Hessian Inversion" warning so it serves as general methodology orientation that the warning then specialises. Motivation: during the #277 investigation an agent (me) repeatedly cited Seminario 1996 when the real authority is the 2025 QFUERZA paper, and this caused at least one wasted analysis round. The table makes the right reference unambiguous for future agents and explicitly flags that proposed changes to the parametrization/loader code require reading Farrugia 2025 first. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Until now, ``frozen`` was a passive flag — consumers like Seminario and ``set_param_vector`` checked it defensively, and any unchecked write silently corrupted values. q2mm#277 (the Heck-relay loader bug) was exactly that pattern: ``estimate_force_constants(forcefield=ff)`` overwrote OPT params that the caller had carefully frozen via ``freeze_standard_params``. This commit turns ``frozen`` into a true invariant on the Param objects themselves. Each Param dataclass now mixes in ``_FrozenAwareParam``, which overrides ``__setattr__`` to raise ``FrozenParamError`` when code tries to assign to a *physical value field* (force_constant, equilibrium, ub_force_constant, ub_equilibrium, radius, epsilon, reduction, dipole_moment, phase) while ``self.frozen == True``. Design notes: - ``_VALUE_FIELDS`` is declared per class so identity fields (atom_type, element, env_id, ff_row, ...) are NOT guarded — only physical values are. ``frozen`` itself is also not guarded, so the existing ``param.frozen = False`` idiom still works for code that prefers it over ``.unfreeze()``. - Construction is exempt because ``frozen`` is the last field of every Param dataclass — dataclass ``__init__`` assigns value fields first, ``frozen`` last, so the guard is dormant during init. This is what lets ``BondParam(..., frozen=True)`` still construct. - New ``.freeze()`` / ``.unfreeze()`` methods are the preferred state transitions; existing callers (``freeze_all``, ``freeze_standard_params``) switch to them. - ``with_params`` and ``set_param_vector`` now skip frozen params (rather than relying on the guard to catch silent writes). The full-length vec layout from ``get_param_vector`` is unchanged; the slot indices belonging to frozen params are simply not written through. - The defensive ``getattr(param, "frozen", False)`` shortcuts in ``seminario.py`` and ``forcefield.active_mask`` lose their getattr fallback (every Param now has the attribute). The seminario shortcuts stay as ``if param.frozen: continue`` for now and document that commit 3 will replace ``estimate_force_constants`` entirely with ``qfuerza_fresh`` + ``qfuerza_into`` that take an explicit target list and never iterate frozen params. Tests: - New ``TestFrozenParamInvariant`` (10 tests) covers every Param class, the freeze/unfreeze toggle, construction with ``frozen=True``, the unguarded ``frozen=`` direct write, and a regression for the q2mm#277 ``estimate_force_constants(forcefield=ff)`` overwrite pattern at the unit level. - One existing test (``test_mm3_export_updates_template``) was relying on the silent-overwrite behavior to edit a now-frozen standard MM3 bond; updated to ``.unfreeze()`` + edit + ``.freeze()`` (its intent was roundtrip fidelity, not frozenness). - All 656 unit tests pass; ruff check + format pass clean. No behavior change for valid callers. Buggy callers (q2mm#277 style) that try to overwrite frozen values now crash loudly with a clear ``FrozenParamError`` instead of silently corrupting the FF. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors Q2MM’s system-loading and initial-parameter generation pipeline to make “frozen params are invariant” a first-class contract, remove the overloaded estimate_force_constants() entry point, and centralize per-system configuration into a declarative SystemSpec registry with a single load_system() dispatcher.
Changes:
- Enforce a frozen-parameter mutation invariant via
FrozenParamError+ guarded__setattr__, with explicit.freeze()/.unfreeze()APIs. - Split Seminario/QFUERZA entry points into
qfuerza_fresh()(new FF) andqfuerza_into()(mutate-in-place while skipping frozen params), and migrate callers. - Replace per-system loader functions with
SystemSpec+q2mm.models.loadersstrategies, updating scripts/tests/docs accordingly.
Reviewed changes
Copilot reviewed 38 out of 38 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test_models.py | Updates Seminario/QFUERZA API usage; adds unit tests for the frozen-param mutation invariant. |
| test/test_jaxopt_optimizer.py | Migrates CH3F test pipeline from estimate_force_constants to qfuerza_fresh. |
| test/integration/test_seminario_parity.py | Migrates integration parity tests to qfuerza_fresh/qfuerza_into and in-place update flow. |
| test/integration/test_published_ff_validation.py | Switches baseline FF construction to qfuerza_into in-place updates. |
| test/integration/test_optimization_validation.py | Updates optimization validation to use qfuerza_fresh. |
| test/integration/test_openmm_export.py | Updates OpenMM export integration tests to use qfuerza_fresh. |
| test/integration/test_openmm_backend.py | Updates OpenMM backend integration tests to use qfuerza_fresh. |
| test/integration/test_jaxopt.py | Updates JAXOPT integration tests to use qfuerza_fresh. |
| test/integration/test_jax_backend.py | Updates documentation/comment reference from estimate_force_constants to qfuerza_fresh. |
| test/integration/test_heck_validation.py | Migrates loader usage to load_system and Seminario initialization to qfuerza_into. |
| test/integration/test_full_loop_parity.py | Updates full-loop pipeline tests to new QFUERZA API and in-place mutation flow. |
| test/integration/test_ethane_seminario.py | Updates ethane Seminario pipeline integration tests to qfuerza_fresh/qfuerza_into. |
| test/integration/test_e2e_sn2_validation.py | Updates end-to-end SN2 validation to use qfuerza_fresh. |
| test/integration/test_backend_optimizer_matrix.py | Migrates benchmark pipeline matrix test to load_system. |
| test/benchmarks/test_optimization.py | Migrates benchmark test to load_system("ch3f"). |
| scripts/validate_against_upstream.py | Switches upstream-validation pipeline to qfuerza_into in-place updates. |
| scripts/run_rh_benchmarks.sh | Migrates benchmark helper script from per-system loader to load_system. |
| scripts/regenerate_full_loop_fixtures.py | Updates fixture regeneration to use qfuerza_fresh. |
| scripts/regenerate_convergence_results.py | Updates convergence regeneration to use load_system dispatcher. |
| scripts/diagnose_heck_relay.py | Updates diagnostic to use qfuerza_into pattern (but docstrings need cleanup). |
| scripts/bench_composed.py | Migrates composed benchmark script to load_system("ch3f"). |
| q2mm/models/seminario.py | Replaces estimate_force_constants with qfuerza_fresh + qfuerza_into and shared inner loop that skips frozen params. |
| q2mm/models/loaders.py | Adds named FF assembly strategies (published OPT, composed OPT+base, QFUERZA-only, etc.). |
| q2mm/models/forcefield.py | Implements frozen mutation guard + freeze/unfreeze APIs and makes vector setters skip frozen params. |
| q2mm/diagnostics/systems.py | Introduces SystemSpec, consolidates system config into SYSTEMS, and adds load_system() dispatcher. |
| q2mm/diagnostics/cli.py | Updates CLI system resolution and per-form loading to call load_system(). |
| q2mm/diagnostics/init.py | Updates diagnostics public exports to include SystemSpec and load_system. |
| q2mm/init.py | Updates top-level public API exports to qfuerza_fresh/qfuerza_into (removes estimate_force_constants). |
| examples/sn2-test/run_tsff_pipeline.py | Updates example pipeline to use qfuerza_into in-place updates. |
| examples/sn2-test/demo_pipeline.py | Updates demo example to use qfuerza_fresh. |
| examples/rh-enamide/README.md | Updates example documentation to use qfuerza_into. |
| docs/systems/rh-enamide.md | Updates reported ratio and clarifies optimization numbers are TBD post-loader refactor. |
| docs/systems/rh-conjugate.md | Updates narrative/results to reflect ratio gate now passing after loader refactor. |
| docs/systems/pd-conjugate.md | Updates narrative/results to reflect ratio gate now passing after loader refactor. |
| docs/systems/pd-allyl.md | Updates narrative/results; marks post-optimization rows TBD post-refactor. |
| docs/systems/heck-relay.md | Updates ratio value (1.29 → 1.30). |
| docs/benchmarks/optimizer-comparison.md | Updates ratio table and commentary reflecting new loader behavior and TBD optimization rows. |
| AGENTS.md | Adds “Key Papers” methodology section with DOI references. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
208234e to
d75a132
Compare
…h + qfuerza_into
The old ``estimate_force_constants(molecules, forcefield=None, ...)``
overloaded two completely different operations on one signature:
forcefield=None → build a brand-new FF (single molecule only).
forcefield=ff → mutate ff (well, ff.copy()) — silently overwriting
every "unfrozen" param.
The second mode is exactly the footgun that produced the q2mm#277
Heck-relay bug. Splitting the function into two names with two
narrower contracts makes the intent explicit at every call site:
qfuerza_fresh(molecule, ...) -> ForceField
Build a fresh FF from one molecule's QM Hessian. Errors if
multiple molecules are passed.
qfuerza_into(ff, molecules, ...) -> None
Overwrite the values of every *unfrozen* parameter in ``ff``
using the QFUERZA projection. Frozen params are skipped (the
load_heck_relay invariant). Mutates ff in place. No return.
Both names cite Farrugia et al. 2025 (J. Chem. Theory Comput.
22, 469, DOI 10.1021/acs.jctc.5c01751), the methodology paper of
record per AGENTS.md. The estimate_force_constants name is deleted
outright — no shim, per alpha discipline.
Other internal renames in seminario.py to match the new vocabulary
(TypeError message, two docstring/comment refs). __init__.py public
API now exports ``qfuerza_fresh`` and ``qfuerza_into`` instead of
``estimate_force_constants``.
Migration of all 85 call sites is mechanical:
fresh path: estimate_force_constants(mol, ...) → qfuerza_fresh(mol, ...)
into path: ff = estimate_force_constants(mols, forcefield=ff_template, ...)
→ ff = ff_template.copy()
qfuerza_into(ff, mols, ...)
19 files touched. The Heck-relay loader behavior is unchanged
(it doesn't call into qfuerza_into at all post-#280); the rh-enamide
and Wahlers loaders still have the same shape, with
``forcefield=ff_template, ...`` → ``copy + qfuerza_into`` and same
semantics.
Tests:
- Existing unit test renamed to
``test_qfuerza_into_does_not_overwrite_frozen_opt_block``;
same regression coverage for the q2mm#277 pattern.
- All 656 unit tests pass; ruff check + format pass clean.
No behavior change visible from outside the call sites — same loops
inside ``_estimate_into_ff`` (the renamed body), same skip-frozen
short-circuits. The Heck-relay regression test from PR #280
continues to pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
d75a132 to
1772a32
Compare
Comment on lines
+289
to
+290
| Use this when you have no published FF to start from — e.g. | ||
| ``load_ch3f``. All params come from the QFUERZA projection |
|
|
||
| Iterates every parameter slot in *ff* and writes new values from | ||
| the QFUERZA projection (Farrugia et al. *J. Chem. Theory Comput.* | ||
| **2026**, *22*, 469). **Frozen parameters are skipped** — they |
Comment on lines
+255
to
+259
| bonds=ff.bonds, | ||
| angles=ff.angles, | ||
| torsions=ff.torsions, | ||
| vdws=list(ff.vdws) + [vdw], | ||
| stretch_bends=ff.stretch_bends, |
…oad_system
Replaces six per-system loaders + a private _load_wahlers_system helper
with a single declarative dispatch system, addressing the
inconsistencies (load_X vs _load_X_system, divergent shapes) and the
copy-paste between similar loaders.
New module: q2mm/models/loaders.py
- Strategy primitives, named after what they do (no implicit ordering
that produced the q2mm#277 bug):
load_published_opt(ff_path)
load_qfuerza_fresh(molecule)
compose_opt_with_mm3_base(opt_path, base_path, *, metal)
-> (composed, opt_only)
load_published_opt_composed(opt_path, base_path, *, metal)
- Each docstring cites Farrugia 2025 (the QFUERZA paper, AGENTS.md
"Key Papers") and documents the returned FF's frozen/active
invariants.
- Module-level `_METAL_VDW` dict holds published metal vdW parameters
for systems whose MM3 base file lacks them (today only PD from
Rosales 2020). Lives here rather than in `q2mm/diagnostics/`
so the dependency direction is models <- diagnostics, not the
reverse.
Rewritten q2mm/diagnostics/systems.py:
- New `SystemSpec` dataclass replaces `BenchmarkSystem`. Each spec
declares: molecule_loader callable, ff_strategy literal, ff_paths
mapping (uniformly `Callable[[], Path]`, no Path|Callable sum
type), optional `normal_modes_path` resolver, metadata dict,
optional metal symbol.
- New `load_system(key, *, engine=None, functional_form=None,
molecule_loader_kwargs=None)` is the single loader entry point.
Reference construction is uniform across multi-molecule systems
(`ReferenceData.from_molecules` with eigenmatrix-diagonal=True);
single-molecule systems with frequency-only references (ch3f) take
the `engine` argument and use `_build_frequency_reference`.
- Normal-modes side-loading is generic: any spec setting
`normal_modes_path` gets its modes loaded; the dispatcher has no
`key == "ch3f"` special-case.
Deleted (no shims, alpha discipline):
- load_ch3f, load_rh_enamide, load_heck_relay, load_pd_allyl,
load_pd_conjugate, load_rh_conjugate (six public functions).
- _load_wahlers_system (private helper).
- BenchmarkSystem dataclass (replaced by SystemSpec).
- _metal_vdw_registry() function (replaced by `_METAL_VDW` dict in
loaders.py, with no lazy import).
Survives because still per-system:
- load_X_molecules() functions (genuinely different file
formats/layouts per system; referenced by SystemSpec.molecule_loader).
- _load_gaussian_molecules, _load_ch3f_molecules, _load_wahlers_molecules
(factored out of the deleted loaders).
Per-system code in SYSTEMS shrinks to ~10 lines of declarative spec
per system (vs ~40-60 lines of repetitive procedure in the old design).
Caller migration in the same commit (no shims):
- q2mm/diagnostics/cli.py: SYSTEMS[k].loader(engine) -> load_system(k, engine=engine, ...)
BenchmarkSystem -> SystemSpec
special-cased load_ch3f(data_dir=...) -> load_system("ch3f", molecule_loader_kwargs={"data_dir": ...})
- q2mm/diagnostics/__init__.py: re-export SystemSpec + load_system instead of BenchmarkSystem.
- scripts/regenerate_convergence_results.py: SYSTEMS[k].loader -> load_system.
- scripts/bench_composed.py: load_ch3f -> load_system("ch3f").
- scripts/run_rh_benchmarks.sh: embedded Python -- load_rh_enamide -> load_system("rh-enamide").
- test/integration/test_heck_validation.py regression: load_heck_relay -> load_system.
- test/integration/test_backend_optimizer_matrix.py: load_ch3f -> load_system.
- test/benchmarks/test_optimization.py: same.
Smoke tests:
- `scripts/regenerate_convergence_results.py --system ch3f --skip-optimization`
runs end-to-end and produces the same baseline numbers
(ratio=1.000, ok, single 'frequency' category).
- Full unit suite: 656 passed, 272 skipped.
- Ruff check + format: clean.
Behavior changes (intentional, per plan, will surface in commit 5):
- load_rh_enamide previously called qfuerza_into on a copy of the
rh-enamide mm3.fld -- i.e. it OVERWROTE the Donoghue OPT values
with raw FUERZA projections. After this commit it uses
load_published_opt, which preserves the published OPT values.
Numbers for rh-enamide will move in commit 5 (regeneration).
- load_pd_allyl, load_pd_conjugate, load_rh_conjugate had the same
overwrite pattern via _load_wahlers_system + qfuerza_into. After
this commit they use load_published_opt_composed and preserve the
Wahlers OPT values. Numbers will move in commit 5.
- load_heck_relay is unchanged in behavior -- its loader was already
fixed in PR #280 to skip qfuerza_into. This commit just routes it
through the load_published_opt strategy.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…refactor
The loader API refactor (previous commit) stopped the silent QFUERZA
overwrite of published OPT parameter values for all five published-FF
systems. This changes every system's baseline numbers, so every doc
that quotes one needs updating.
Headline change: the ratio gate now passes for **four of five
systems** (rh-enamide 1.07, pd-allyl 1.10, pd-conjugate 0.96, rh-conjugate
1.04), up from two of five before. pd-conjugate and rh-conjugate
were previously misclassified as gate-failures (1.20 and ~4 × 10³
respectively) — those ratios came from the QFUERZA overwrite
corrupting the published Wahlers OPT values and sending JaxLoss's
inner geometry minimization into pathological regions.
This is the result the loader API refactor was designed to produce.
The pre-refactor rh-conjugate "non-determinism" (q2mm#278: ratios of
0.46 / 0.96 / ~4 × 10³ across sessions) was a symptom of the same
bug — the overwrite was chaotic. After the refactor, all systems
report stable ratios across runs.
Per-category R² updated to current numbers (from the q2mm-data
companion PR's `paper_metrics.json`):
| System | R²(bond_len) | R²(bond_ang) | R²(eig_diag) |
|----------------|:------------:|:------------:|:------------:|
| Rh-enamide | 0.987 | 0.918 | 0.963 |
| Heck relay | 0.980 | 0.781 | −12.6 |
| Pd-allyl | 0.042 | 0.330 | −2.82 |
| Pd 1,4-conj | 0.939 | −0.177 | −10.06 |
| Rh 1,4-conj | 0.891 | 0.454 | −7.86 |
Geometry reproduction is now strong for the published-OPT systems
(bond_length R² ≥ 0.89 except pd-allyl); eigenmatrix R² is
consistently negative — the real MM3* ↔ JAX-engine cross-engine gap
that the refactor cannot fix.
Updated:
- docs/benchmarks/optimizer-comparison.md — ratio-gate table,
optimization-results table, per-category R² table, notes.
Post-opt rows marked TBD pending an end-to-end run against the
refactored loader.
- docs/systems/rh-enamide.md
- docs/systems/pd-allyl.md
- docs/systems/pd-conjugate.md
- docs/systems/rh-conjugate.md
- docs/systems/heck-relay.md (small ratio touch-up)
All four touched system pages now consistently note that the
published OPT values are used as-published (no QFUERZA overwrite),
and that the post-optimization numbers will be TBD until a fresh
end-to-end run lands in q2mm-data.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1772a32 to
7f75323
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Companion data PR: ericchansen/q2mm-data#? (regenerated baselines for all 5 systems).
TL;DR
After our long discussion, this PR makes
frozena load-bearing invariant, splitsestimate_force_constantsinto two named functions, and rewrites the per-system loaders into a single declarativeSystemSpecregistry — the design we converged on by mid-conversation.The headline result is that the bug pattern from #277 (Heck-relay) was actually a slow-burn problem in four loaders, not one. Fixing the loaders properly means the ratio gate now passes for four of five published-FF systems, up from two.
Methodology citation
This work is grounded in Farrugia, Helquist, Norrby & Wiest 2025 (the QFUERZA paper, J. Chem. Theory Comput. 22, 469; DOI 10.1021/acs.jctc.5c01751). AGENTS.md now has a "Key Papers" section calling this out (commit 1) so the next agent doesn't cite Seminario 1996 when they mean Farrugia 2025.
Five logical commits
docs(agents): add "Key Papers" sectionrefactor(forcefield): make frozen a load-bearing invariantFrozenParamError+__setattr__guard on physical value fields..freeze()/.unfreeze()methods. Construction still works becausefrozenis the last dataclass field (init assigns it last).refactor(seminario): split estimate_force_constants → qfuerza_fresh + qfuerza_intoforcefield=Noneargument. Each name does one thing. The dangerous "overwrite via forcefield=" path is gone.refactor(loaders): consolidate per-system loaders into SystemSpec + load_systemq2mm/models/loaders.pywith named FF-assembly strategies.q2mm/diagnostics/systems.pybecomes a declarativeSystemSpecregistry + oneload_system(key)dispatcher. Six per-system loaders +_load_wahlers_system+BenchmarkSystemdeleted (no shims, alpha discipline). All callers migrated in the same commit.docs: refresh convergence numbersThe ratio-gate change
Pd 1,4-conj and Rh 1,4-conj were previously misclassified as gate-failures. The QFUERZA overwrite was corrupting their published Wahlers OPT values and sending JaxLoss's inner geometry minimization into pathological regions. This also fully explains the rh-conjugate non-determinism tracked in #278 (ratios of 0.46 / 0.96 / ~4 × 10³ across sessions): the overwrite was chaotic.
What got deleted (no shims)
estimate_force_constants— replaced byqfuerza_fresh/qfuerza_into.load_ch3f,load_rh_enamide,load_heck_relay,load_pd_allyl,load_pd_conjugate,load_rh_conjugate— six per-system loaders._load_wahlers_system— private helper.BenchmarkSystem— replaced bySystemSpec.All 85+ callers (across CLI, regen script, bench_composed, run_rh_benchmarks, ~10 test files, 2 example scripts) migrated in the same commits.
Adding new systems is now easy
Append a ~10-line
SystemSpectoSYSTEMSinq2mm/diagnostics/systems.py. No new code; no module-level functions. The spec declares: molecule loader callable, ff_strategy literal, ff_paths mapping, metadata dict.Validation
ruff check q2mm/ test/ scripts/clean.ruff format --check q2mm test scripts examplesclean.uv run --extra docs properdocs build -f properdocs.ymlclean.TestFrozenParamInvariantexercise the new setter contract on every Param class.Closes / Refs