Skip to content

Implement Dataclasses as Input #330

Merged
jan-janssen merged 17 commits into
mainfrom
dataclass
May 5, 2026
Merged

Implement Dataclasses as Input #330
jan-janssen merged 17 commits into
mainfrom
dataclass

Conversation

@jan-janssen
Copy link
Copy Markdown
Member

@jan-janssen jan-janssen commented May 5, 2026

The dataclasses for the output are already available at:
https://github.com/pyiron/pyiron_dataclasses/blob/main/pyiron_dataclasses/v1/jobs/lammps.py

Summary by CodeRabbit

  • New Features
    • Added structured, typed inputs for molecular-dynamics runs (temperature/pressure, timestep, damping and RNG options, thermostat controls).
    • Added structured, typed inputs for structure minimization (convergence tolerances, iteration limits, pressure and minimizer style).
    • Calculation interface now accepts these typed configurations directly, with validation for unsupported input types.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Warning

Rate limit exceeded

@jan-janssen has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 38 minutes and 39 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90dfab9b-6792-4630-a271-224f60189d1b

📥 Commits

Reviewing files that changed from the base of the PR and between 94940cd and 58d0405.

⛔ Files ignored due to path filters (7)
  • src/lammpsparser/compatibility/__pycache__/data.cpython-312.pyc is excluded by !**/*.pyc
  • src/lammpsparser/compatibility/__pycache__/file.cpython-312.pyc is excluded by !**/*.pyc
  • tests/__pycache__/test_compatibility_calculate.cpython-312.pyc is excluded by !**/*.pyc
  • tests/__pycache__/test_compatibility_file.cpython-312.pyc is excluded by !**/*.pyc
  • tests/__pycache__/test_compatibility_structure.cpython-312.pyc is excluded by !**/*.pyc
  • tests/__pycache__/test_output.cpython-312.pyc is excluded by !**/*.pyc
  • tests/__pycache__/test_structure.cpython-312.pyc is excluded by !**/*.pyc
📒 Files selected for processing (3)
  • src/lammpsparser/compatibility/data.py
  • src/lammpsparser/compatibility/file.py
  • tests/test_compatibility_file.py
📝 Walkthrough

Walkthrough

Adds two public dataclass input schemas, CalcMDInput and CalcMinimizeInput, and extends lammps_file_interface_function with an optional calc_dataclass parameter that determines calc_mode and builds calc_kwargs via dataclasses.asdict(), raising TypeError for unsupported dataclass types.

Changes

Typed Input Schemas & Integration

Layer / File(s) Summary
Data Shape
src/lammpsparser/compatibility/data.py
Introduces CalcMDInput (MD parameters: optional temperature/pressure, time_step, n_print, damping timescales, seed, tloop, initial_temperature, langevin/deltas, rotation_matrix, units) and CalcMinimizeInput (structure: Atoms, ionic_energy_tolerance, ionic_force_tolerance, max_iter, optional pressure, n_print, style, rotation_matrix, units).
Core Wiring / API
src/lammpsparser/compatibility/file.py
Extends lammps_file_interface_function signature with calc_dataclass: Optional[Union[CalcMDInput, CalcMinimizeInput]] = None. When provided, derives calc_mode ("md" or "minimize") and sets calc_kwargs using asdict(); raises TypeError for unsupported types.
Imports & Docstring
src/lammpsparser/compatibility/file.py
Adds imports for asdict, Union, and the new dataclass types; updates function docstring to document calc_dataclass.
Manifest / Requirements
pyproject.toml, requirements.txt
No functional changes recorded beyond listing these files in the diff manifest.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I shaped the fields with careful hops,
MD beats and minimizer stops,
Typed inputs snug in neat arrays,
A rabbit's patchwork warms the LAMMPS ways. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Implement Dataclasses as Input' directly summarizes the main change: adding support for dataclasses (CalcMDInput and CalcMinimizeInput) as function inputs to replace direct parameter passing.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dataclass

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 97.50000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 89.06%. Comparing base (ef566d4) to head (58d0405).

Files with missing lines Patch % Lines
src/lammpsparser/compatibility/file.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #330      +/-   ##
==========================================
+ Coverage   88.78%   89.06%   +0.28%     
==========================================
  Files          11       12       +1     
  Lines        1159     1198      +39     
==========================================
+ Hits         1029     1067      +38     
- Misses        130      131       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/lammpsparser/compatibility/data.py (1)

23-23: ⚡ Quick win

units field in both dataclasses is silently overridden and effectively dead

In file.py, the MD path (line 175) and minimize path (line 185) both execute calc_kwargs["units"] = units, where units is the function-level parameter (defaulting to "metal"). This unconditionally overwrites whatever the dataclass field contains. A caller who sets CalcMDInput(units="lj") will have that value silently discarded.

The units field serves no purpose in its current form. The cleanest fix is to remove it from both dataclasses and rely solely on the function-level units parameter, or to read calc_kwargs["units"] into the outer units variable when a dataclass is provided.

♻️ Proposed fix — remove redundant `units` from both dataclasses
 `@dataclass`
 class CalcMDInput:
     ...
-    units: str = "metal"
 `@dataclass`
 class CalcMinimizeInput:
     ...
-    units: str = "metal"

Also applies to: 36-36

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lammpsparser/compatibility/data.py` at line 23, The dataclass field units
is redundant and gets overwritten by the function-level units parameter; remove
the units attribute from the CalcMDInput and CalcMinimizeInput dataclasses in
src/lammpsparser/compatibility/data.py and rely solely on the function-level
units parameter (and calc_kwargs["units"] assignment) in the MD/minimize paths,
or alternatively ensure the function reads any units provided by a dataclass by
setting units = calc_kwargs.get("units", units) before assigning
calc_kwargs["units"] = units; update either CalcMDInput/CalcMinimizeInput to
drop units or implement the get-from-calc_kwargs read in the MD/minimize code
paths (refer to CalcMDInput, CalcMinimizeInput, calc_kwargs and the
function-level parameter units).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lammpsparser/compatibility/data.py`:
- Around line 8-23: CalcMDInput is missing an n_ionic_steps field so asdict()
never emits it and MD jobs fall back to 1; add an integer field named
n_ionic_steps to the CalcMDInput dataclass (e.g., n_ionic_steps: int = 1) so the
key is present when asdict() is used and existing dispatch code that
int(calc_kwargs.pop("n_ionic_steps")) continues to work.

In `@src/lammpsparser/compatibility/file.py`:
- Around line 96-107: The CalcMinimizeInput branch converts the dataclass to
calc_kwargs with asdict(), which includes a "structure" key and causes a
duplicate keyword when later calling calc_minimize(structure=structure,
**calc_kwargs); fix by removing the structure entry from calc_kwargs before the
call (e.g., pop("structure", None)) in the block that handles CalcMinimizeInput
so calc_minimize receives the explicit structure param only; ensure this change
is applied only for the CalcMinimizeInput path (symbols: calc_dataclass,
CalcMinimizeInput, calc_kwargs, calc_minimize, CalcMDInput).

---

Nitpick comments:
In `@src/lammpsparser/compatibility/data.py`:
- Line 23: The dataclass field units is redundant and gets overwritten by the
function-level units parameter; remove the units attribute from the CalcMDInput
and CalcMinimizeInput dataclasses in src/lammpsparser/compatibility/data.py and
rely solely on the function-level units parameter (and calc_kwargs["units"]
assignment) in the MD/minimize paths, or alternatively ensure the function reads
any units provided by a dataclass by setting units = calc_kwargs.get("units",
units) before assigning calc_kwargs["units"] = units; update either
CalcMDInput/CalcMinimizeInput to drop units or implement the
get-from-calc_kwargs read in the MD/minimize code paths (refer to CalcMDInput,
CalcMinimizeInput, calc_kwargs and the function-level parameter units).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e60c6e9f-6cf6-4a7c-9e00-aabcf13adf4f

📥 Commits

Reviewing files that changed from the base of the PR and between 223bea4 and 7106d19.

📒 Files selected for processing (2)
  • src/lammpsparser/compatibility/data.py
  • src/lammpsparser/compatibility/file.py

Comment thread src/lammpsparser/compatibility/data.py Outdated
Comment on lines +96 to +107
if calc_dataclass is not None:
if isinstance(calc_dataclass, CalcMDInput):
calc_mode = "md"
calc_kwargs = asdict(calc_dataclass)

elif isinstance(calc_dataclass, CalcMinimizeInput):
calc_mode = "minimize"
calc_kwargs = asdict(calc_dataclass)
else:
raise TypeError(
"calc_dataclass must be an instance of either CalcMDInput or CalcMinimizeInput"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

CalcMinimizeInput.structure causes TypeError in calc_minimize call

CalcMinimizeInput includes a structure field. asdict() converts every dataclass field into a dict entry, recursing into nested dataclasses, dicts, lists, and tuples — and other objects (like Atoms) are copied with copy.deepcopy(). So calc_kwargs = asdict(calc_dataclass) will include a "structure" key.

At line 192 this dict is then spread into:

calc_minimize(structure=structure, **calc_kwargs)

which is equivalent to passing structure= twice → TypeError: got multiple values for keyword argument 'structure'. The CalcMinimizeInput path is entirely broken at runtime.

🐛 Proposed fix — pop `structure` from `calc_kwargs`
         elif isinstance(calc_dataclass, CalcMinimizeInput):
             calc_mode = "minimize"
             calc_kwargs = asdict(calc_dataclass)
+            calc_kwargs.pop("structure", None)  # structure is passed separately to the function
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lammpsparser/compatibility/file.py` around lines 96 - 107, The
CalcMinimizeInput branch converts the dataclass to calc_kwargs with asdict(),
which includes a "structure" key and causes a duplicate keyword when later
calling calc_minimize(structure=structure, **calc_kwargs); fix by removing the
structure entry from calc_kwargs before the call (e.g., pop("structure", None))
in the block that handles CalcMinimizeInput so calc_minimize receives the
explicit structure param only; ensure this change is applied only for the
CalcMinimizeInput path (symbols: calc_dataclass, CalcMinimizeInput, calc_kwargs,
calc_minimize, CalcMDInput).

@jan-janssen
Copy link
Copy Markdown
Member Author

======================================================================
FAIL: test_calc_minimize_pressure (test_compatibility_file.TestCompatibilityFile.test_calc_minimize_pressure)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/lammpsparser/lammpsparser/tests/test_compatibility_file.py", line 531, in test_calc_minimize_pressure
    self.assertIn(line, content)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^
AssertionError: 'variable dumptime equal 1 \n' not found in ['units metal\n', 'dimension 3\n', 'boundary p p p\n', 'atom_style atomic\n', 'read_data lammps.data\n', 'pair_style eam/alloy\n', 'pair_coeff * * /home/runner/work/lammpsparser/lammpsparser/tests/static/potential/potential_LAMMPS/1999--Mishin-Y--Al--LAMMPS--ipr1/Al99.eam.alloy Al\n', 'variable dumptime equal 100 \n', 'dump 1 all custom ${dumptime} dump.out id type xsu ysu zsu fx fy fz vx vy vz\n', 'dump_modify 1 sort id format line "%d %d %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g"\n', 'variable thermotime equal 100 \n', 'thermo_style custom step temp pe etotal pxx pxy pxz pyy pyz pzz vol\n', 'thermo_modify format float %20.15g\n', 'thermo ${thermotime}\n', 'fix ensemble all box/relax iso 0.0\n', 'min_style cg\n', 'minimize 0.0 0.0001 100000 10000000\n']

======================================================================
FAIL: test_calc_minimize_pressure_3d (test_compatibility_file.TestCompatibilityFile.test_calc_minimize_pressure_3d)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/lammpsparser/lammpsparser/tests/test_compatibility_file.py", line 572, in test_calc_minimize_pressure_3d
    self.assertIn(line, content)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^
AssertionError: 'variable dumptime equal 1 \n' not found in ['units metal\n', 'dimension 3\n', 'boundary p p p\n', 'atom_style atomic\n', 'read_data lammps.data\n', 'pair_style eam/alloy\n', 'pair_coeff * * /home/runner/work/lammpsparser/lammpsparser/tests/static/potential/potential_LAMMPS/1999--Mishin-Y--Al--LAMMPS--ipr1/Al99.eam.alloy Al\n', 'variable dumptime equal 100 \n', 'dump 1 all custom ${dumptime} dump.out id type xsu ysu zsu fx fy fz vx vy vz\n', 'dump_modify 1 sort id format line "%d %d %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g"\n', 'variable thermotime equal 100 \n', 'thermo_style custom step temp pe etotal pxx pxy pxz pyy pyz pzz vol\n', 'thermo_modify format float %20.15g\n', 'thermo ${thermotime}\n', 'fix ensemble all box/relax x 0.0 y 0.0 z 0.0 couple none\n', 'min_style cg\n', 'minimize 0.0 0.0001 100000 10000000\n']

----------------------------------------------------------------------

@jan-janssen jan-janssen merged commit 942b0d8 into main May 5, 2026
22 checks passed
@jan-janssen jan-janssen deleted the dataclass branch May 5, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants