refactor: Consolidate element data, DRY up codebase, fix packaging#107
Merged
ericchansen merged 3 commits intomasterfrom Mar 21, 2026
Merged
refactor: Consolidate element data, DRY up codebase, fix packaging#107ericchansen merged 3 commits intomasterfrom
ericchansen merged 3 commits intomasterfrom
Conversation
Extract repeated path constants (REPO_ROOT, SN2_XYZ, CH3F_HESS, etc.) and molecule factory functions (_diatomic, _water, _noble_gas_pair) into test/_shared.py. Update 13 test files to import from the shared module instead of redefining them locally. - Create test/_shared.py with path constants, data-availability flags, and parameterised make_diatomic/make_water/make_noble_gas_pair factories - Re-export shared items from test/conftest.py for fixture use - Preserve per-file default overrides via thin wrapper functions (e.g. bond_tolerance=1.5 in JAX tests, name='water-like' in OpenMM tests) No behaviour change: 356 passed, 44 skipped (identical to baseline). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Create q2mm/elements.py as single source of truth for periodic table data (masses, radii, symbols) with annotated provenance (NUBASE2020, Cordero 2008) - Wire constants.py, identifiers.py, objective.py, molecule.py to import from elements.py instead of maintaining separate copies - Delete dead _patterns.py module and MacroModel constant duplicates - Remove duplicate Rhodium entry from MASSES - Extract generic _build_param_maps() and _match_for_export() replacing 5 near-identical functions in forcefield.py - Extract generic _match_mode() and _collect_matching() replacing 4 near-identical functions in seminario.py - Rewrite from_molecules() to delegate to from_molecule(), fixing silent feature drift (missing eigenmatrix params) - Extract shared build_leaderboard_rows() used by cli.py and report.py - Make scipy import lazy in scipy_opt.py - Make Tinker bond_tolerance configurable (was hardcoded 1.3) - Add --data-dir to benchmark CLI for wheel installs - Fix packaging extras: jax/jaxlib in [all], openmm in [dev] - Extract DFT scaling constant DEFAULT_DFT_SCALING in seminario.py - Consolidate test fixtures into test/_shared.py (16 path constants, 3 molecule factories) replacing duplicates across 13 test files Net: -243 lines, 0 test regressions (356 pass, 44 skip) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors Q2MM to reduce duplication by centralizing element data and shared test fixtures, while also tightening packaging/extras and making some imports more defensive to improve install-time behavior.
Changes:
- Add
q2mm/elements.pyas a single source of truth for element symbols/masses/radii and update multiple consumers to import from it. - DRY up repeated logic in core modules (e.g., forcefield/seminario) and consolidate duplicated test paths/molecule factories into
test/_shared.py. - Improve packaging/runtime robustness (extras updates, lazy SciPy imports, benchmark CLI
--data-dir, configurable Tinker bond tolerance) and remove dead code.
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test_reference_data.py | Uses shared fixture paths from test._shared instead of local constants. |
| test/test_models.py | Switches common fixture paths to test._shared; keeps test-specific paths local. |
| test/test_method_e_pipeline.py | Reuses shared SN2/Ethane paths/flags from test._shared. |
| test/test_method_d_e.py | Reuses shared SN2 Hessian path from test._shared. |
| test/test_eigenmatrix.py | Reuses shared SN2/Ethane paths/flags from test._shared. |
| test/integration/test_seminario_parity.py | Imports repo root + SN2 paths from test._shared to avoid duplication. |
| test/integration/test_scipy_optimizer.py | Uses shared molecule factories (make_water, make_diatomic). |
| test/integration/test_optimization_validation.py | Uses shared fixture paths and make_water factory. |
| test/integration/test_openmm_export.py | Uses shared fixture paths and shared molecule factories. |
| test/integration/test_openmm_backend.py | Uses shared fixture paths and shared molecule factories. |
| test/integration/test_jax_backend.py | Uses shared repo/SN2 paths and shared molecule factories. |
| test/integration/test_e2e_sn2_validation.py | Imports all SN2/CH3F fixture paths from test._shared. |
| test/conftest.py | Imports/re-exports shared constants/factories for fixture access. |
| test/_shared.py | New shared module defining common example paths, availability flags, and molecule factories. |
| q2mm/parsers/_patterns.py | Deleted unused shared regex/util module. |
| q2mm/optimizers/scipy_opt.py | Makes SciPy imports lazy (import inside methods) to allow module import without SciPy. |
| q2mm/optimizers/objective.py | Fixes from_molecules() feature drift by delegating to from_molecule() and centralizes atomic symbol lookup. |
| q2mm/models/seminario.py | Consolidates matching helpers and introduces DEFAULT_DFT_SCALING. |
| q2mm/models/molecule.py | Imports covalent radii from centralized q2mm.elements. |
| q2mm/models/identifiers.py | Imports two-letter element set from centralized q2mm.elements. |
| q2mm/models/forcefield.py | Adds generic param-map builder and export-matching helper to reduce duplication. |
| q2mm/elements.py | New centralized periodic table data and derived lookup tables. |
| q2mm/diagnostics/report.py | Extracts shared leaderboard row-building logic into build_leaderboard_rows(). |
| q2mm/diagnostics/cli.py | Adds --data-dir support and reuses shared leaderboard row builder. |
| q2mm/constants.py | Re-exports MASSES from q2mm.elements and removes duplicated element data. |
| q2mm/backends/mm/tinker.py | Makes bond detection tolerance configurable via bond_tolerance. |
| pyproject.toml | Updates extras (all includes jax/jaxlib; dev includes openmm). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix 0.0 score treated as NaN due to falsy 'or' check in build_leaderboard_rows() (use explicit None check instead) - Update constants.py docstring to reflect _patterns.py deletion - Add bond_tolerance to TinkerEngine docstring Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DRY up the codebase, consolidate element data into a single annotated source of truth, fix packaging gaps, and reduce test fixture duplication. Net result: -243 lines with zero test regressions.
Highlights
q2mm/elements.py— Single source of truth for periodic table dataElementNamedTuple with(z, symbol, mass, covalent_radius)and cited provenance (NUBASE2020, Cordero 2008)constants.py,identifiers.py,objective.py,molecule.py) now import from this one moduleDRY refactors
_build_param_maps()+_match_for_export()replace 5 near-identical functions_match_mode()+_collect_matching()replace 4 near-identical functionsbuild_leaderboard_rows()helperBug fixes
from_molecules()feature drift: Silently lackedinclude_eigenmatrixandeigenmatrix_diagonal_onlyparams thatfrom_molecule()had — now delegates properly"RH"entry from MASSES (104 entries for 103 elements)Packaging & defensive imports
[all]extras now includejax/jaxlib;[dev]includesopenmmscipyimport moved to lazy (inside methods) so module loads without scipy installed--data-dirfor wheel installs where example data isn't bundledbond_tolerancenow configurable (was hardcoded 1.3)Dead code removed
q2mm/parsers/_patterns.py(nothing imported from it)constants.pyQ2MMMoleculeimport inforcefield.pyTesting
ruff checkandruff formatclean