Skip to content

Implement raw parsing functions which only need the output file as input#291

Merged
jan-janssen merged 7 commits intomainfrom
split_output_parsers
Jan 8, 2026
Merged

Implement raw parsing functions which only need the output file as input#291
jan-janssen merged 7 commits intomainfrom
split_output_parsers

Conversation

@jan-janssen
Copy link
Copy Markdown
Member

@jan-janssen jan-janssen commented Jan 8, 2026

Summary by CodeRabbit

  • Refactor

    • Separated raw parsing from post-processing, reorganizing LAMMPS output handling for clearer responsibilities.
    • Introduced unified raw parsers for H5MD, text dump, and log formats and streamlined downstream normalization (cell handling, indices, units).
  • Chores

    • Removed several legacy in-file helpers in favor of the new parser layer.
  • Tests

    • Added/updated tests validating raw parsers and normalization across multiple input scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 8, 2026

📝 Walkthrough

Walkthrough

Refactors LAMMPS parsing: moves raw parsers and data structures into a new output_raw.py module; output.py now delegates H5MD/text/log parsing to these raw parsers and post-processes their results. Tests updated to import to_amat from the new module.

Changes

Cohort / File(s) Summary
Tests
tests/test_output.py, tests/test_output_raw.py
Removed tests/imports referencing _collect_dump_from_h5md; updated to_amat import to pyiron_lammps.output_raw; added tests/test_output_raw.py validating to_amat, parse_raw_dump_from_text, and parse_raw_lammps_log.
Output parsing refactor
src/pyiron_lammps/output.py
Replaced in-file dump/text/log parsing with delegation to parse_raw_dump_from_h5md, parse_raw_dump_from_text, and parse_raw_lammps_log; removed DumpData, _collect_dump_from_h5md, and to_amat from this module; added post-processing (cell unfolding, index remapping, unit handling).
New raw parsing module
src/pyiron_lammps/output_raw.py
New module introducing DumpData dataclass and functions: to_amat(), parse_raw_dump_from_h5md(), parse_raw_dump_from_text(), parse_raw_lammps_log() to extract and normalize raw dump/log data.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Test/Client
  participant Output as pyiron_lammps.output
  participant Raw as pyiron_lammps.output_raw
  participant FS as FileSystem

  Test->>Output: request parse (dump/log/h5md)
  Output->>FS: open file
  Output->>Raw: call parse_raw_* (file path)
  Raw->>FS: read file contents (H5/text/log)
  Raw-->>Output: return raw parsed dict / DataFrame
  Output->>Output: post-process (unfold cell, remap indices, rotate fields)
  Output-->>Test: return normalized data structures
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through dumps and logs at night,

dug tunnels of code till parsers were tight.
Raw bytes to cells, indices all neat,
a carrot of tests to make it complete.
🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the PR: introducing new raw parsing functions that take only the output file as input, implemented in the new output_raw.py module.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pyiron_lammps/output.py (2)

208-225: Add a docstring for the new public function.

This is a new public parsing function but lacks documentation. Please add a docstring that explains:

  • The purpose of the function
  • The expected format of the H5MD file
  • The structure of the returned dictionary
📝 Suggested docstring
 def parse_raw_dump_from_h5md(file_name: str) -> Dict:
+    """
+    Parse raw dump data from an H5MD file.
+
+    This function extracts positions, forces, steps, and cell information from
+    a LAMMPS H5MD trajectory file without applying any coordinate transformations.
+
+    Args:
+        file_name (str): Path to the H5MD file.
+
+    Returns:
+        Dict: A dictionary containing:
+            - 'forces': List of force arrays for each timestep
+            - 'positions': List of position arrays for each timestep
+            - 'steps': List of timestep values
+            - 'cells': List of cell matrices for each timestep
+    """
     import h5py

208-225: Update test imports for renamed function.

The function _collect_dump_from_h5md has been renamed to parse_raw_dump_from_h5md in pyiron_lammps/output.py, but tests/test_output.py still imports and uses the old function name. Update the import statement at line 9 and function calls at lines 450 and 580 to use parse_raw_dump_from_h5md.

🤖 Fix all issues with AI agents
In @pyiron_lammps/output.py:
- Around line 228-237: The docstring for parse_raw_dump_from_text is a
placeholder; replace it with a clear description stating that
parse_raw_dump_from_text parses raw LAMMPS dump text and returns parsed dump
data without applying any coordinate transformations or atom index remapping,
include parameter and return descriptions (file_name: path to the LAMMPS dump
file; returns: Dict with raw parsed frames/atoms metadata), and mention that
coordinates and indices are left as in the original dump to clarify behavior for
callers.
- Around line 425-434: Replace the placeholder docstring in parse_raw_lammps_log
with a clear description of its behavior: explain that the function reads a
LAMMPS log file, extracts timestep/thermo blocks (e.g., step, temperature,
pressure, energy columns) by detecting header lines and numeric columns, parses
numeric values into a pandas.DataFrame, and returns a DataFrame indexed by
timestep (or a sequential index) with one column per thermo key; also document
expected input (file_name: str), error conditions (e.g., missing file or no
thermo blocks), and the exact DataFrame schema (column names and dtypes) so
callers know what fields to expect.
- Around line 186-193: The call to parse_raw_dump_from_h5md is passing a prism
keyword (prism=prism) but the function parse_raw_dump_from_h5md only accepts
file_name, causing a TypeError; update the call site in the block after
_check_ortho_prism to match the function signature by removing the prism keyword
(call parse_raw_dump_from_h5md with dump_h5_full_file_name only) or, if prism is
required, add an optional prism parameter to parse_raw_dump_from_h5md’s
definition and propagate it through that function and any downstream callers so
the symbol parse_raw_dump_from_h5md and its usage of dump_h5_full_file_name and
prism remain consistent.
🧹 Nitpick comments (1)
pyiron_lammps/output.py (1)

186-189: Consider extracting the error message to improve error handling.

The static analysis tool suggests that the error message could be extracted to a custom exception class for better maintainability. However, given the message is context-specific and not overly long, this is an optional improvement.

Based on static analysis hints (Ruff TRY003).

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 202ec5d and d1f46c8.

📒 Files selected for processing (1)
  • pyiron_lammps/output.py
🧰 Additional context used
🧬 Code graph analysis (1)
pyiron_lammps/output.py (1)
pyiron_lammps/structure.py (4)
  • UnfoldingPrism (38-191)
  • structure (228-234)
  • structure (237-251)
  • unfold_cell (138-168)
🪛 GitHub Actions: Pipeline
pyiron_lammps/output.py

[error] 1-1: ImportError: cannot import name '_collect_dump_from_h5md' from 'pyiron_lammps.output' during test execution (test_output).

🪛 Ruff (0.14.10)
pyiron_lammps/output.py

187-189: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (3)
pyiron_lammps/output.py (3)

276-348: LGTM! Clean separation of raw parsing from transformations.

The refactoring correctly moves all coordinate transformations, index remapping, and rotation operations out of the raw parsing function. This allows the function to return unprocessed LAMMPS data, which is the intended design.


352-392: LGTM! Wrapper correctly applies transformations.

The refactored _collect_dump_from_text appropriately delegates to parse_raw_dump_from_text and then applies the necessary coordinate transformations, index remapping, and cell unfolding. The logic correctly handles different data types with their specific transformation requirements.


468-485: LGTM! Cleaner logic for handling LogStep.

The refactored _collect_output_log correctly delegates to parse_raw_lammps_log and the updated logic for handling the optional LogStep column is more straightforward than before.

Comment thread src/pyiron_lammps/output.py
Comment thread pyiron_lammps/output.py Outdated
Comment thread pyiron_lammps/output.py Outdated
Comment on lines 425 to 434
def parse_raw_lammps_log(file_name: str) -> pd.DataFrame:
"""
general purpose routine to extract static from a lammps log file
Docstring for _parse_lammps_log

Args:
file_name (str): The path to the lammps log file.

Returns:
pd.DataFrame: Dataframe containing the parsed log data.
"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Update the placeholder docstring.

Line 427 contains placeholder text "Docstring for _parse_lammps_log" that should be replaced with a proper description. The docstring should explain the parsing logic and the structure of the returned DataFrame.

📝 Suggested docstring improvement
 def parse_raw_lammps_log(file_name: str) -> pd.DataFrame:
     """
-    Docstring for _parse_lammps_log
+    Parse thermo output data from a LAMMPS log file.
+
+    This function extracts thermodynamic data printed by LAMMPS during a simulation
+    run. It handles multiple thermo sections and concatenates them with a 'LogStep'
+    column to distinguish different runs.
 
     Args:
         file_name (str): The path to the lammps log file.
 
     Returns:
-        pd.DataFrame: Dataframe containing the parsed log data.
+        pd.DataFrame: DataFrame containing the parsed thermodynamic data with columns
+            for each thermo output (e.g., Step, Temp, PotEng, etc.). If multiple
+            runs are present, includes a 'LogStep' column to identify each run.
     """
🤖 Prompt for AI Agents
In @pyiron_lammps/output.py around lines 425 - 434, Replace the placeholder
docstring in parse_raw_lammps_log with a clear description of its behavior:
explain that the function reads a LAMMPS log file, extracts timestep/thermo
blocks (e.g., step, temperature, pressure, energy columns) by detecting header
lines and numeric columns, parses numeric values into a pandas.DataFrame, and
returns a DataFrame indexed by timestep (or a sequential index) with one column
per thermo key; also document expected input (file_name: str), error conditions
(e.g., missing file or no thermo blocks), and the exact DataFrame schema (column
names and dtypes) so callers know what fields to expect.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_output.py (1)

571-576: Missing import causes pipeline failure.

The test test_h5md_error references _collect_dump_from_h5md which is not imported. Based on the AI summary, this function was removed from output.py in favor of parse_raw_dump_from_h5md. Either import the new function from output_raw and update the test, or remove this test if it's no longer applicable.

Option 1: Update test to use _parse_dump which now contains the RuntimeError check
     def test_h5md_error(self):
         with self.assertRaises(RuntimeError):
-            _collect_dump_from_h5md(
-                file_name="test",
+            _parse_dump(
+                dump_h5_full_file_name="test",
+                dump_out_full_file_name="nonexistent",
                 prism=UnfoldingPrism(cell=np.array([[1, 1, 0], [0, 1, 0], [0, 0, 1]])),
+                structure=bulk("Ni"),
+                potential_elements=["Ni"],
             )
Option 2: Remove the test if it's now covered elsewhere
-    def test_h5md_error(self):
-        with self.assertRaises(RuntimeError):
-            _collect_dump_from_h5md(
-                file_name="test",
-                prism=UnfoldingPrism(cell=np.array([[1, 1, 0], [0, 1, 0], [0, 0, 1]])),
-            )
-
🤖 Fix all issues with AI agents
In @pyiron_lammps/output_raw.py:
- Around line 115-117: The code uses line.lstrip("ITEM: ATOMS") which
incorrectly removes any of those characters rather than the literal prefix;
change this to use line.removeprefix("ITEM: ATOMS") (or if supporting older
Python, check line.startswith("ITEM: ATOMS") and slice with line[len("ITEM:
ATOMS"):]) before .split() to populate columns correctly in the logic that
assigns to the variable columns.
- Around line 137-139: The dump.forces.append call assumes columns
"fx","fy","fz" exist and will KeyError if missing; add a guard like the existing
velocity check (e.g., where velocities are validated) to verify all three
columns are present on the DataFrame before stacking and appending to
dump.forces, and skip or append an empty/None entry when any are missing,
logging a warning and referencing the same logic used in the velocity handling
to keep behavior consistent.
- Around line 244-250: The loop that handles multiple DataFrames only sets the
LogStep on the temporary variable df and ends up applying it to the last one;
instead, in the else branch iterate over dfs and set the LogStep on each element
(e.g., assign to dfs[i]["LogStep"] or modify the DataFrame in-place) so every
DataFrame gets the correct step value, then call pd.concat(dfs,
ignore_index=True) to combine them; ensure you no longer overwrite dfs entries
by only updating the temp variable df.
🧹 Nitpick comments (2)
pyiron_lammps/output_raw.py (2)

224-225: Consider renaming loop variable l to line for clarity.

Variable name l is ambiguous (can be confused with 1 or I). A more descriptive name improves readability.

Proposed fix
-        for l in f:
-            l = l.lstrip()
+        for line in f:
+            line = line.lstrip()
 
-            if l.startswith("Step"):
+            if line.startswith("Step"):
                 thermo_lines = ""
                 read_thermo = True
 
             if read_thermo:
-                if l.startswith("Loop") or l.startswith("ERROR"):
+                if line.startswith("Loop") or line.startswith("ERROR"):
                     read_thermo = False
                     dfs.append(
                         pd.read_csv(StringIO(thermo_lines), sep="\\s+", engine="c")
                     )
 
-                elif l.startswith("WARNING:"):
-                    warnings.warn(f"A warning was found in the log:\n{l}")
+                elif line.startswith("WARNING:"):
+                    warnings.warn(f"A warning was found in the log:\n{line}", stacklevel=2)
 
                 else:
-                    thermo_lines += l
+                    thermo_lines += line

238-239: Add stacklevel parameter to warnings.warn.

Without stacklevel=2, the warning will report the location inside this function rather than the caller's location, making debugging harder.

Proposed fix
-                    warnings.warn(f"A warning was found in the log:\n{l}")
+                    warnings.warn(f"A warning was found in the log:\n{l}", stacklevel=2)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1f46c8 and 8bfa64f.

📒 Files selected for processing (3)
  • pyiron_lammps/output.py
  • pyiron_lammps/output_raw.py
  • tests/test_output.py
🧰 Additional context used
🧬 Code graph analysis (2)
pyiron_lammps/output.py (2)
pyiron_lammps/output_raw.py (3)
  • parse_raw_dump_from_h5md (67-84)
  • parse_raw_dump_from_text (87-208)
  • parse_raw_lammps_log (211-251)
pyiron_lammps/structure.py (3)
  • unfold_cell (138-168)
  • structure (228-234)
  • structure (237-251)
tests/test_output.py (1)
pyiron_lammps/output_raw.py (1)
  • to_amat (28-64)
🪛 GitHub Actions: Pipeline
pyiron_lammps/output.py

[error] 138-138: TypeError: parse_raw_dump_from_h5md() got an unexpected keyword argument 'prism'

tests/test_output.py

[error] 573-573: NameError: name '_collect_dump_from_h5md' is not defined

🪛 Ruff (0.14.10)
pyiron_lammps/output.py

135-137: Avoid specifying long messages outside the exception class

(TRY003)

pyiron_lammps/output_raw.py

117-117: Using .strip() with multi-character strings is misleading

(B005)


224-224: Ambiguous variable name: l

(E741)


225-225: Ambiguous variable name: l

(E741)


239-239: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

🔇 Additional comments (7)
pyiron_lammps/output_raw.py (3)

12-25: LGTM!

The DumpData dataclass is well-structured with appropriate default factories for mutable default values.


28-64: LGTM!

The to_amat function correctly handles both 6-element (orthorhombic) and 9-element (triclinic) box bound formats with proper cell matrix construction following LAMMPS documentation.


67-84: LGTM!

The H5MD parser correctly reads positions, steps, forces, and cells from the HDF5 file. The lazy import of h5py is appropriate for optional dependency handling.

tests/test_output.py (1)

11-11: LGTM!

The import of to_amat is correctly updated to reference the new output_raw module.

pyiron_lammps/output.py (3)

11-15: LGTM!

The imports from the new output_raw module are correctly structured.


167-196: LGTM!

The refactored _collect_dump_from_text correctly delegates raw parsing to parse_raw_dump_from_text and applies the necessary post-processing (cell unfolding, index remapping, coordinate rotation) for each data type.


235-246: LGTM!

The log parsing is correctly delegated to parse_raw_lammps_log, and the LogStep column handling is properly conditional on its presence in the DataFrame.

Comment on lines +137 to +139
dump.forces.append(
np.stack([df["fx"].array, df["fy"].array, df["fz"].array], axis=1)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing check for required force columns.

The code assumes fx, fy, fz columns always exist without checking. If a dump file lacks these columns, this will raise a KeyError. Consider adding a guard similar to the velocity check.

Proposed fix
-                dump.forces.append(
-                    np.stack([df["fx"].array, df["fy"].array, df["fz"].array], axis=1)
-                )
+                if "fx" in columns and "fy" in columns and "fz" in columns:
+                    dump.forces.append(
+                        np.stack([df["fx"].array, df["fy"].array, df["fz"].array], axis=1)
+                    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dump.forces.append(
np.stack([df["fx"].array, df["fy"].array, df["fz"].array], axis=1)
)
if "fx" in columns and "fy" in columns and "fz" in columns:
dump.forces.append(
np.stack([df["fx"].array, df["fy"].array, df["fz"].array], axis=1)
)
🤖 Prompt for AI Agents
In @pyiron_lammps/output_raw.py around lines 137 - 139, The dump.forces.append
call assumes columns "fx","fy","fz" exist and will KeyError if missing; add a
guard like the existing velocity check (e.g., where velocities are validated) to
verify all three columns are present on the DataFrame before stacking and
appending to dump.forces, and skip or append an empty/None entry when any are
missing, logging a warning and referencing the same logic used in the velocity
handling to keep behavior consistent.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 98.44961% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.12%. Comparing base (a329e12) to head (559f623).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/pyiron_lammps/output.py 95.00% 1 Missing ⚠️
src/pyiron_lammps/output_raw.py 99.08% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #291      +/-   ##
==========================================
+ Coverage   81.95%   82.12%   +0.16%     
==========================================
  Files          10       11       +1     
  Lines        1125     1141      +16     
==========================================
+ Hits          922      937      +15     
- Misses        203      204       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @pyiron_lammps/output.py:
- Around line 134-140: The H5MD branch currently returns
parse_raw_dump_from_h5md(...) without applying the same cell unfolding and
folding-parameter validation as the text path; update the H5MD path so that
after _check_ortho_prism(prism=prism) you call prism.unfold_cell() (or
equivalent) and validate prism.ns to ensure folding parameters are sane before
returning the parsed data from parse_raw_dump_from_h5md, and extract the current
RuntimeError message string into a module-level constant (e.g.,
H5MD_ORTHO_ERROR) and use that constant in the raise to satisfy TRY003.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8bfa64f and 643e9f6.

📒 Files selected for processing (2)
  • pyiron_lammps/output.py
  • tests/test_output.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_output.py
🧰 Additional context used
🧬 Code graph analysis (1)
pyiron_lammps/output.py (2)
pyiron_lammps/output_raw.py (3)
  • parse_raw_dump_from_h5md (67-84)
  • parse_raw_dump_from_text (87-208)
  • parse_raw_lammps_log (211-251)
pyiron_lammps/structure.py (3)
  • unfold_cell (138-168)
  • structure (228-234)
  • structure (237-251)
🪛 Ruff (0.14.10)
pyiron_lammps/output.py

135-137: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unittest_matrix (windows-latest, 3.14)
  • GitHub Check: unittest_matrix (macos-latest, 3.14)
🔇 Additional comments (3)
pyiron_lammps/output.py (3)

11-15: LGTM: Clean separation of raw parsing logic.

The new imports properly delegate low-level file parsing to a dedicated module, improving modularity.


166-195: LGTM: Well-structured transformation pipeline.

The refactoring properly separates raw parsing from coordinate transformations. The post-processing correctly applies:

  • Cell unfolding to reverse LAMMPS folding
  • Index remapping to preserve species
  • Rotation for vector quantities (forces, velocities)
  • Scaled-to-Cartesian conversion followed by rotation for positions

The logic is sound and maintains consistency with the original implementation.


234-245: LGTM: Robust handling of optional LogStep column.

The dynamic check for LogStep properly handles both single-run and multi-run log files. This prevents KeyErrors during column renaming while maintaining backward compatibility.

Comment on lines +134 to 140
if not _check_ortho_prism(prism=prism):
raise RuntimeError(
"The Lammps output will not be mapped back to pyiron correctly."
)
return parse_raw_dump_from_h5md(
file_name=dump_h5_full_file_name,
prism=prism,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's understand the file structure and locate key functions
echo "=== File structure check ==="
wc -l pyiron_lammps/output.py

echo -e "\n=== Examining H5MD and text parsing paths ==="
# Look at the parse_raw_dump_from_h5md function definition and its return
rg -n "def parse_raw_dump_from_h5md" pyiron_lammps/

echo -e "\n=== Check what parse_raw_dump_from_h5md returns ==="
fd -e py | xargs rg -l "parse_raw_dump_from_h5md" | head -5

echo -e "\n=== Checking remap_indices_funct usage in text path ==="
rg -n "remap_indices_funct" pyiron_lammps/output.py -A2 -B2

echo -e "\n=== Checking unfold_cell usage ==="
rg -n "unfold_cell" pyiron_lammps/output.py -A2 -B2

echo -e "\n=== Checking parse_raw_dump_from_text ==="
rg -n "def parse_raw_dump_from_text" pyiron_lammps/

Repository: pyiron/pyiron_lammps

Length of output: 1936


🏁 Script executed:

#!/bin/bash
# Let's examine the actual implementation of these parsing functions
# and see what they return

echo "=== Finding parse_raw_dump_from_h5md implementation ==="
rg -A 30 "def parse_raw_dump_from_h5md" --type py

echo -e "\n=== Finding parse_raw_dump_from_text implementation ==="
rg -A 30 "def parse_raw_dump_from_text" --type py | head -60

Repository: pyiron/pyiron_lammps

Length of output: 3960


🏁 Script executed:

#!/bin/bash
# Let's see the complete context of lines around 134-140 and 142-179
echo "=== Reading the complete _parse_dump method context ==="
sed -n '120,190p' pyiron_lammps/output.py

Repository: pyiron/pyiron_lammps

Length of output: 2543


🏁 Script executed:

#!/bin/bash
# Check if H5MD format includes type/atom_type information
echo "=== Searching for H5MD type/atom_type information ==="
rg -n "h5md.*type|h5md.*atom|particles/all" pyiron_lammps/output_raw.py -A5 -B5

echo -e "\n=== Check _check_ortho_prism implementation ==="
rg -n "def _check_ortho_prism" pyiron_lammps/output.py -A15

echo -e "\n=== Check unfold_cell implementation and prism attributes ==="
rg -n "def unfold_cell|prism.s|prism.pbc" pyiron_lammps/ --type py | head -20

echo -e "\n=== Look for tests that verify H5MD path ==="
fd -e py -path '*/test*' | xargs rg -l "h5md\|H5MD" 2>/dev/null | head -5

Repository: pyiron/pyiron_lammps

Length of output: 2129


🏁 Script executed:

#!/bin/bash
# Check what UnfoldingPrism properties are used
echo "=== Check UnfoldingPrism attributes in _collect_dump_from_text ==="
sed -n '158,180p' pyiron_lammps/output.py

echo -e "\n=== Check if unfold_cell depends on prism state beyond R ==="
rg -B5 -A15 "def unfold_cell" --type py

echo -e "\n=== Check test files for H5MD usage ==="
find . -path '*/test*' -name '*.py' -exec rg -l "dump.*h5|h5md" {} \; 2>/dev/null | head -3

Repository: pyiron/pyiron_lammps

Length of output: 2424


🏁 Script executed:

#!/bin/bash
# Check imports and understand what DumpData contains
echo "=== Check DumpData structure ==="
rg -n "class DumpData|def DumpData" pyiron_lammps/ --type py -A20 | head -40

echo -e "\n=== Check if indices are always expected in output ==="
rg -n "indices" pyiron_lammps/output.py | head -15

echo -e "\n=== Verify return values from parse functions are used correctly ==="
sed -n '130,160p' pyiron_lammps/output.py

Repository: pyiron/pyiron_lammps

Length of output: 3785


🏁 Script executed:

#!/bin/bash
# Get full unfold_cell implementation
echo "=== Full unfold_cell implementation ==="
sed -n '138,160p' pyiron_lammps/structure.py

echo -e "\n=== Check prism.s attribute and folding ==="
rg -n "self.s =" pyiron_lammps/structure.py -B5 -A2

echo -e "\n=== Check tests for H5MD output path ==="
sed -n '1,50p' tests/test_output.py

Repository: pyiron/pyiron_lammps

Length of output: 2149


🏁 Script executed:

#!/bin/bash
# Look at test_output.py to see if H5MD parsing is tested
echo "=== Search for H5MD-related tests ==="
rg -n "h5md\|H5MD\|dump_h5" tests/test_output.py -B3 -A5

echo -e "\n=== Check if there are integration tests that use H5MD output ==="
rg -n "def test.*h5|parse_raw_dump_from_h5md" tests/ -A10

Repository: pyiron/pyiron_lammps

Length of output: 940


🏁 Script executed:

#!/bin/bash
# Check what happens when cells are orthogonal with non-identity folding
echo "=== Understanding UnfoldingPrism initialization and folding ==="
sed -n '50,120p' pyiron_lammps/structure.py

echo -e "\n=== Full definition of prism attributes ==="
rg -n "@property" pyiron_lammps/structure.py | head -10

Repository: pyiron/pyiron_lammps

Length of output: 2836


🏁 Script executed:

#!/bin/bash
# Get the rest of unfold_cell implementation
echo "=== Complete unfold_cell implementation ==="
sed -n '138,165p' pyiron_lammps/structure.py

echo -e "\n=== Check what happens after rotation in unfold_cell ==="
rg -n "def unfold_cell" pyiron_lammps/structure.py -A30

echo -e "\n=== Look at test_full_job_output_h5 to see what's actually validated ==="
sed -n '335,380p' tests/test_output.py

Repository: pyiron/pyiron_lammps

Length of output: 3952


🏁 Script executed:

#!/bin/bash
# Check if there's a self.s attribute or folding matrix
echo "=== Search for self.s or folding related attributes ==="
rg -n "self\.s\s*=" pyiron_lammps/structure.py

echo -e "\n=== Check what the test does with H5MD output ==="
sed -n '335,400p' tests/test_output.py

echo -e "\n=== Check if output_dict from H5MD is compared to text output ==="
rg -n "test_full_job_output" tests/test_output.py -A60 | head -80

Repository: pyiron/pyiron_lammps

Length of output: 6300


🏁 Script executed:

#!/bin/bash
# Look at the actual test data to understand what's being tested
echo "=== Check test folder structure ==="
find tests/static -name "*h5*" -o -name "*dump*" 2>/dev/null | head -10

echo -e "\n=== Check if there's a dump.h5 test file ==="
ls -la tests/static/full_job_h5/ 2>/dev/null | grep -E "dump|h5" || echo "Dir not listed"

echo -e "\n=== Check what assertions are made in the H5MD test ==="
sed -n '335,360p' tests/test_output.py

Repository: pyiron/pyiron_lammps

Length of output: 1633


🏁 Script executed:

#!/bin/bash
# Check what parse_lammps_output_files does with the dump dict
echo "=== Find parse_lammps_output_files definition ==="
rg -n "def parse_lammps_output_files" pyiron_lammps/ -A30

echo -e "\n=== Check if dump dict with missing indices causes issues ==="
rg -n "indices.*in.*output\|output.*indices" pyiron_lammps/output.py

echo -e "\n=== Check if indices are used in post-processing ==="
rg -n "\[.indices.\]\|\['indices'\]" pyiron_lammps/output.py

Repository: pyiron/pyiron_lammps

Length of output: 230


🏁 Script executed:

#!/bin/bash
# Let's check what parse_lammps_output_files actually does with the dump
echo "=== Full parse_lammps_output_files function ==="
rg -n "def parse_lammps_output_files" pyiron_lammps/ -A50 | head -80

echo -e "\n=== Check if _parse_dump result is further processed ==="
rg -n "dump.*=" pyiron_lammps/output.py | grep -E "_parse_dump|parse_raw_dump"

Repository: pyiron/pyiron_lammps

Length of output: 274


🏁 Script executed:

#!/bin/bash
# Check what happens with the dump dict after _parse_dump
echo "=== Search for where _parse_dump is called ==="
rg -n "_parse_dump\(" pyiron_lammps/output.py -B2 -A5

echo -e "\n=== Check the full context around line 60-90 ==="
sed -n '50,100p' pyiron_lammps/output.py

Repository: pyiron/pyiron_lammps

Length of output: 2482


🏁 Script executed:

#!/bin/bash
# Continue reading from line 85 onwards to see how dump_dict is processed
echo "=== Continue parsing of dump_dict (lines 85-130) ==="
sed -n '85,130p' pyiron_lammps/output.py

echo -e "\n=== Check if indices key is ever accessed from dump_dict ==="
rg -n "dump_dict\[.*indices" pyiron_lammps/output.py

echo -e "\n=== Check what happens with remaining keys in dump_dict ==="
rg -n "for.*in.*dump_dict\|dump_dict\[" pyiron_lammps/output.py

Repository: pyiron/pyiron_lammps

Length of output: 1764


🏁 Script executed:

#!/bin/bash
# Look at the full processing logic
echo "=== Full output processing logic around lines 85-123 ==="
sed -n '85,123p' pyiron_lammps/output.py

Repository: pyiron/pyiron_lammps

Length of output: 1462


🏁 Script executed:

#!/bin/bash
# Verify if self.ns (folding parameters) are identity for orthogonal cells
echo "=== Check how self.ns is computed ==="
rg -n "self.ns" pyiron_lammps/structure.py

echo -e "\n=== Check what self.ns property returns ==="
sed -n '215,240p' pyiron_lammps/structure.py

echo -e "\n=== Verify that orthogonal cells have identity folding ==="
# For cubic cell like in test, check if ns would be (0, 0, 0)
python3 << 'EOF'
import numpy as np

# Cubic cell like in test
cell = np.eye(3) * 3.52  # Ni cubic FCC

# Simulate UnfoldingPrism initialization logic for orthogonal case
a, b, c = cell
an, bn, cn = [np.linalg.norm(v) for v in cell]

# For cubic/orthogonal, angles should be 90 degrees
alpha = np.arccos(np.dot(b, c) / (bn * cn))
beta = np.arccos(np.dot(a, c) / (an * cn))
gamma = np.arccos(np.dot(a, b) / (an * bn))

print(f"Cell angles (should be ~pi/2 for orthogonal):")
print(f"alpha={alpha:.4f} (pi/2={np.pi/2:.4f})")
print(f"beta={beta:.4f} (pi/2={np.pi/2:.4f})")
print(f"gamma={gamma:.4f} (pi/2={np.pi/2:.4f})")

# For orthogonal case, R should be identity
R = np.dot(np.linalg.inv(cell), cell)  # Should be identity for orthogonal
print(f"\nR (should be identity):\n{R}")

# For orthogonal, folding parameters n1, n2, n3 should be 0
print(f"\nFor orthogonal cell, folding parameters should be 0")
EOF

Repository: pyiron/pyiron_lammps

Length of output: 150


🏁 Script executed:

#!/bin/bash
# Now check what the actual ns property does
echo "=== Full ns property implementation ==="
sed -n '215,230p' pyiron_lammps/structure.py

Repository: pyiron/pyiron_lammps

Length of output: 456


🏁 Script executed:

#!/bin/bash
# Find the ns property definition
echo "=== Search for ns property ==="
rg -n "def ns\|@property" pyiron_lammps/structure.py | grep -A1 "ns"

echo -e "\n=== Get the ns property implementation ==="
rg -n "def ns" pyiron_lammps/structure.py -A10

Repository: pyiron/pyiron_lammps

Length of output: 138


🏁 Script executed:

#!/bin/bash
# Check if there's an ns attribute set during initialization
echo "=== Search for ns attribute assignment ==="
rg -n "\.ns\s*=" pyiron_lammps/structure.py

echo -e "\n=== Check if ns comes from parent UnfoldingPrism class ==="
rg -n "class UnfoldingPrism" pyiron_lammps/structure.py

echo -e "\n=== Check imports to understand parent class ==="
head -20 pyiron_lammps/structure.py | grep -i "import\|from"

Repository: pyiron/pyiron_lammps

Length of output: 641


🏁 Script executed:

#!/bin/bash
# Final verification: Check if H5MD test passes and what it validates
echo "=== Run the H5MD test to see if it passes ==="
cd tests && python3 -m pytest test_output.py::TestLammpsOutput::test_full_job_output_h5 -v 2>&1 | tail -20

Repository: pyiron/pyiron_lammps

Length of output: 1124


🏁 Script executed:

#!/bin/bash
# Check comment about H5MD specification in the code
echo "=== H5MD specification comment ==="
sed -n '67,85p' pyiron_lammps/output_raw.py

echo -e "\n=== Check if there's any validation that cells are orthogonal ==="
rg -n "is_skewed\|orthogonal\|diagonal" pyiron_lammps/ --type py | head -10

echo -e "\n=== Verify what DumpData returns to downstream ==="
rg -n "class DumpData" pyiron_lammps/output_raw.py -A15

Repository: pyiron/pyiron_lammps

Length of output: 1771


🏁 Script executed:

#!/bin/bash
# Check if indices field being empty vs missing causes any issues
echo "=== Check how parse_raw_dump_from_text handles indices ==="
sed -n '87,150p' pyiron_lammps/output_raw.py | grep -A20 "indices"

echo -e "\n=== Verify that H5MD doesn't even attempt to parse indices ==="
sed -n '67,85p' pyiron_lammps/output_raw.py

Repository: pyiron/pyiron_lammps

Length of output: 1599


Address asymmetry in H5MD vs text dump parsing paths.

The H5MD branch returns raw parsed data without applying the transformations that the text path applies, creating inconsistency:

  1. Missing cell unfolding: The text path (line 170) unfolds cells via prism.unfold_cell(), but the H5MD path skips this. While the ortho-prism check ensures no rotation is needed, the folding parameters (prism.ns) are not validated. For consistency and defensive programming, H5MD cells should also be unfolded even if they are guaranteed orthogonal by the H5MD format.

  2. Indices absent in H5MD format: Unlike the text format, H5MD does not include atom type information, so parse_raw_dump_from_h5md intentionally returns no indices key. This is a format difference by design, not a bug. Downstream code correctly handles missing keys without error.

  3. Error message: Consider extracting the error message (lines 136–137) to a module-level constant to address the Ruff hint TRY003.

🧰 Tools
🪛 Ruff (0.14.10)

135-137: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In @pyiron_lammps/output.py around lines 134 - 140, The H5MD branch currently
returns parse_raw_dump_from_h5md(...) without applying the same cell unfolding
and folding-parameter validation as the text path; update the H5MD path so that
after _check_ortho_prism(prism=prism) you call prism.unfold_cell() (or
equivalent) and validate prism.ns to ensure folding parameters are sane before
returning the parsed data from parse_raw_dump_from_h5md, and extract the current
RuntimeError message string into a module-level constant (e.g.,
H5MD_ORTHO_ERROR) and use that constant in the raise to satisfy TRY003.

# Conflicts:
#	src/pyiron_lammps/output_raw.py
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @src/pyiron_lammps/output_raw.py:
- Line 117: Replace the fragile use of line.lstrip("ITEM: ATOMS") when computing
columns with the literal-prefix removal API: change the expression columns =
line.lstrip("ITEM: ATOMS").split() to use line.removeprefix("ITEM:
ATOMS").split() so the literal substring "ITEM: ATOMS" is removed (not a set of
characters); update the same usage anywhere else in the file where lstrip is
being misused for this exact prefix.
- Around line 244-250: The code assumes dfs is non-empty and will call
pd.concat(dfs) which crashes when dfs == []; before the existing length check in
the block handling dfs, add an early branch that handles an empty list (e.g.,
set df = pd.DataFrame() or return an empty DataFrame with the expected columns)
so pd.concat is never called on an empty sequence; keep the existing logic for
len(dfs) == 1 and the loop that assigns df["LogStep"] for multiple frames, but
ensure you check "if not dfs:" first and handle it appropriately.
🧹 Nitpick comments (2)
src/pyiron_lammps/output_raw.py (2)

224-225: Consider renaming ambiguous variable l to line.

The variable name l can be easily confused with 1 (digit one). Using line improves readability.

Proposed fix
-        for l in f:
-            l = l.lstrip()
+        for line in f:
+            line = line.lstrip()

-            if l.startswith("Step"):
+            if line.startswith("Step"):
...

239-239: Add stacklevel=2 to warnings.warn.

Setting stacklevel=2 makes the warning point to the caller of this function rather than the warn statement itself, which is more helpful for debugging.

Proposed fix
-                    warnings.warn(f"A warning was found in the log:\n{l}")
+                    warnings.warn(f"A warning was found in the log:\n{l}", stacklevel=2)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 643e9f6 and ce3abac.

📒 Files selected for processing (2)
  • src/pyiron_lammps/output.py
  • src/pyiron_lammps/output_raw.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/pyiron_lammps/output.py (1)
src/pyiron_lammps/structure.py (3)
  • unfold_cell (138-168)
  • structure (228-234)
  • structure (237-251)
🪛 Ruff (0.14.10)
src/pyiron_lammps/output_raw.py

117-117: Using .strip() with multi-character strings is misleading

(B005)


224-224: Ambiguous variable name: l

(E741)


225-225: Ambiguous variable name: l

(E741)


239-239: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

src/pyiron_lammps/output.py

135-137: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unittest_matrix (windows-latest, 3.14)
  • GitHub Check: unittest_matrix (macos-latest, 3.14)
🔇 Additional comments (7)
src/pyiron_lammps/output_raw.py (3)

12-25: LGTM!

The dataclass properly uses field(default_factory=...) to avoid mutable default argument pitfalls.


28-64: LGTM!

The cell matrix construction correctly handles both orthogonal (6-element) and triclinic (9-element) box formats following LAMMPS conventions.


67-84: LGTM!

The H5MD parsing correctly uses lazy import for h5py and properly handles SWMR mode for concurrent access.

src/pyiron_lammps/output.py (4)

11-15: LGTM!

Clean import of the new raw parsing functions from the extracted module.


134-140: LGTM!

The orthogonality check before H5MD parsing ensures correct coordinate mapping. This is a sensible guard that prevents incorrect results for non-orthogonal cells.


166-195: LGTM!

The post-processing correctly applies coordinate transformations: cell unfolding, index remapping, and rotation from LAMMPS frame to original pyiron frame. The position transformation properly accounts for direct (scaled) coordinates.


234-245: LGTM!

Good defensive handling of the optional LogStep column that only appears when multiple thermo blocks are present.


elif "ITEM: ATOMS" in line:
# get column names from line
columns = line.lstrip("ITEM: ATOMS").split()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use removeprefix() instead of lstrip() with a multi-character string.

lstrip("ITEM: ATOMS") strips individual characters from the set {'I', 'T', 'E', 'M', ':', ' ', 'A', 'O', 'S'}, not the literal substring. This works by coincidence because LAMMPS column names start with lowercase letters not in that set, but it's misleading and fragile.

Proposed fix
-                columns = line.lstrip("ITEM: ATOMS").split()
+                columns = line.removeprefix("ITEM: ATOMS").split()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
columns = line.lstrip("ITEM: ATOMS").split()
columns = line.removeprefix("ITEM: ATOMS").split()
🧰 Tools
🪛 Ruff (0.14.10)

117-117: Using .strip() with multi-character strings is misleading

(B005)

🤖 Prompt for AI Agents
In @src/pyiron_lammps/output_raw.py at line 117, Replace the fragile use of
line.lstrip("ITEM: ATOMS") when computing columns with the literal-prefix
removal API: change the expression columns = line.lstrip("ITEM: ATOMS").split()
to use line.removeprefix("ITEM: ATOMS").split() so the literal substring "ITEM:
ATOMS" is removed (not a set of characters); update the same usage anywhere else
in the file where lstrip is being misused for this exact prefix.

Comment on lines +244 to +250
if len(dfs) == 1:
df = dfs[0]
else:
for i in range(len(dfs)):
df = dfs[i]
df["LogStep"] = np.ones(len(df)) * i
df = pd.concat(dfs, ignore_index=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle empty dfs list to prevent pd.concat crash.

If the log file contains no thermo output (no "Step" line found), dfs remains empty and pd.concat([]) raises ValueError: No objects to concatenate.

Proposed fix
+    if len(dfs) == 0:
+        return pd.DataFrame()
     if len(dfs) == 1:
         df = dfs[0]
     else:
         for i in range(len(dfs)):
             df = dfs[i]
             df["LogStep"] = np.ones(len(df)) * i
         df = pd.concat(dfs, ignore_index=True)
     return df
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if len(dfs) == 1:
df = dfs[0]
else:
for i in range(len(dfs)):
df = dfs[i]
df["LogStep"] = np.ones(len(df)) * i
df = pd.concat(dfs, ignore_index=True)
if len(dfs) == 0:
return pd.DataFrame()
if len(dfs) == 1:
df = dfs[0]
else:
for i in range(len(dfs)):
df = dfs[i]
df["LogStep"] = np.ones(len(df)) * i
df = pd.concat(dfs, ignore_index=True)
return df
🤖 Prompt for AI Agents
In @src/pyiron_lammps/output_raw.py around lines 244 - 250, The code assumes dfs
is non-empty and will call pd.concat(dfs) which crashes when dfs == []; before
the existing length check in the block handling dfs, add an early branch that
handles an empty list (e.g., set df = pd.DataFrame() or return an empty
DataFrame with the expected columns) so pd.concat is never called on an empty
sequence; keep the existing logic for len(dfs) == 1 and the loop that assigns
df["LogStep"] for multiple frames, but ensure you check "if not dfs:" first and
handle it appropriately.

* feat(tests): Add comprehensive unit tests for output_raw.py

This commit introduces a comprehensive suite of unit tests for the `output_raw.py` module, achieving 92% test coverage.

The following changes are included:
- Added tests for the `to_amat` function, covering various input formats and error handling.
- Added tests for the `parse_raw_dump_from_text` function, using various test files to cover different parsing paths, including jagged dumps and mean fields.
- Added tests for the `parse_raw_lammps_log` function, covering multiple thermo outputs, logs with warnings, and logs without a pressure column.
- Refactored the parsing logic in `parse_raw_dump_from_text` and `parse_raw_lammps_log` to be more robust and handle multi-snapshot files correctly.
- Added a `.gitignore` file to exclude build artifacts and local configuration files from the repository.

The tests for `parse_raw_dump_from_h5md` were not implemented due to the lack of suitable multi-snapshot H5MD test data.

* Format black

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Delete .gitignore

* Update output_raw.py

* remove generated tests

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: pyiron-runner <pyiron@mpie.de>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Janssen <jan-janssen@users.noreply.github.com>
Co-authored-by: Jan Janßen <janssen@mpie.de>
@jan-janssen jan-janssen merged commit d59ba90 into main Jan 8, 2026
20 of 21 checks passed
@jan-janssen jan-janssen deleted the split_output_parsers branch January 8, 2026 16:36
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @tests/test_output_raw.py:
- Around line 46-54: In the test method
test_parse_raw_dump_from_text_mean_fields remove the duplicate assertion that
repeats len(data["mean_unwrapped_positions"]) == 1 (the second occurrence);
leave the single correct assertion and keep the remaining assertions (steps,
mean_forces, mean_velocities, computes) intact so the test only asserts
mean_unwrapped_positions once.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce3abac and 559f623.

📒 Files selected for processing (1)
  • tests/test_output_raw.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unittest_matrix (windows-latest, 3.14)
  • GitHub Check: unittest_matrix (macos-latest, 3.14)
🔇 Additional comments (6)
tests/test_output_raw.py (6)

1-7: LGTM!

The imports are clean and appropriate for testing the new output_raw module.


19-21: LGTM!

Proper error handling test for invalid input.


35-39: Good test coverage for multiple thermo blocks. The assertions verify both the DataFrame structure and the presence of distinct steps, with the necessary test fixture available.


41-44: Good test for conditional column handling.

Correctly verifies that optional columns like "Press" are omitted when absent from the log file. The fixture file is properly set up.


23-33: The test already includes value assertions; review comment mischaracterizes the test implementation.

The test at line 26 contains self.assertEqual(data["natoms"], [1, 2]), which is a direct value assertion—not a length-only check. The review comment incorrectly states the test "only verifies lengths and key presence." While the suggestion to add more comprehensive value assertions (e.g., actual step numbers, cell dimensions, array element values) may still be reasonable for robustness, the problem diagnosis is inaccurate.

Likely an incorrect or invalid review comment.


11-17: No action needed. The test cases correctly validate the to_amat function's transformation of LAMMPS box bounds to cell matrices. The inputs [0, 1, 0, 0, 1, 0, 0, 1, 0] and [0, 1, 0, 1, 0, 1] represent unit cubic boxes with zero tilt, which correctly produce the identity matrix output.

Likely an incorrect or invalid review comment.

Comment thread tests/test_output_raw.py
Comment on lines +46 to +54
def test_parse_raw_dump_from_text_mean_fields(self):
data = parse_raw_dump_from_text("tests/static/mean_dump/dump.out")
self.assertEqual(len(data["steps"]), 1)
self.assertEqual(len(data["mean_forces"]), 1)
self.assertEqual(len(data["mean_velocities"]), 1)
self.assertEqual(len(data["mean_unwrapped_positions"]), 1)
self.assertIn("test", data["computes"])
self.assertEqual(len(data["mean_unwrapped_positions"]), 1)
self.assertEqual(len(data["computes"]["test"]), 1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the test file
fd --type f "test_output_raw.py"

Repository: pyiron/pyiron_lammps

Length of output: 184


🏁 Script executed:

# Check the structure of the test file around lines 46-54
fd --type f "test_output_raw.py" -x cat -n {} | sed -n '40,60p'

Repository: pyiron/pyiron_lammps

Length of output: 912


🏁 Script executed:

# Check if the test fixture exists
fd --type f "dump.out" tests/static/mean_dump/

Repository: pyiron/pyiron_lammps

Length of output: 95


🏁 Script executed:

# Find the parse_raw_dump_from_text function to understand the data structure
rg "def parse_raw_dump_from_text" -A 30

Repository: pyiron/pyiron_lammps

Length of output: 2095


🏁 Script executed:

# Find and examine the DumpData class to understand all available fields
rg "class DumpData" -A 40 src/pyiron_lammps/

Repository: pyiron/pyiron_lammps

Length of output: 2929


🏁 Script executed:

# Look at the return statement of parse_raw_dump_from_text to see what dictionary keys are returned
rg "def parse_raw_dump_from_text" -A 100 src/pyiron_lammps/output_raw.py | grep -A 100 "return"

Repository: pyiron/pyiron_lammps

Length of output: 46


🏁 Script executed:

# Check the actual test fixture content to understand the data structure
head -100 tests/static/mean_dump/dump.out

Repository: pyiron/pyiron_lammps

Length of output: 437


🏁 Script executed:

# Check if there's a mean_positions field or how positions are stored
rg "mean_positions|positions" src/pyiron_lammps/output_raw.py | head -20

Repository: pyiron/pyiron_lammps

Length of output: 978


🏁 Script executed:

# Get the full DumpData class definition
rg "class DumpData" -A 50 src/pyiron_lammps/output_raw.py

Repository: pyiron/pyiron_lammps

Length of output: 1947


Remove duplicate assertion on line 53.

Line 53 duplicates the assertion from line 51, checking len(data["mean_unwrapped_positions"]) == 1 twice. This is a copy-paste error and should be removed.

🔧 Proposed fix
     self.assertEqual(len(data["steps"]), 1)
     self.assertEqual(len(data["mean_forces"]), 1)
     self.assertEqual(len(data["mean_velocities"]), 1)
     self.assertEqual(len(data["mean_unwrapped_positions"]), 1)
     self.assertIn("test", data["computes"])
-    self.assertEqual(len(data["mean_unwrapped_positions"]), 1)
     self.assertEqual(len(data["computes"]["test"]), 1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_parse_raw_dump_from_text_mean_fields(self):
data = parse_raw_dump_from_text("tests/static/mean_dump/dump.out")
self.assertEqual(len(data["steps"]), 1)
self.assertEqual(len(data["mean_forces"]), 1)
self.assertEqual(len(data["mean_velocities"]), 1)
self.assertEqual(len(data["mean_unwrapped_positions"]), 1)
self.assertIn("test", data["computes"])
self.assertEqual(len(data["mean_unwrapped_positions"]), 1)
self.assertEqual(len(data["computes"]["test"]), 1)
def test_parse_raw_dump_from_text_mean_fields(self):
data = parse_raw_dump_from_text("tests/static/mean_dump/dump.out")
self.assertEqual(len(data["steps"]), 1)
self.assertEqual(len(data["mean_forces"]), 1)
self.assertEqual(len(data["mean_velocities"]), 1)
self.assertEqual(len(data["mean_unwrapped_positions"]), 1)
self.assertIn("test", data["computes"])
self.assertEqual(len(data["computes"]["test"]), 1)
🤖 Prompt for AI Agents
In @tests/test_output_raw.py around lines 46 - 54, In the test method
test_parse_raw_dump_from_text_mean_fields remove the duplicate assertion that
repeats len(data["mean_unwrapped_positions"]) == 1 (the second occurrence);
leave the single correct assertion and keep the remaining assertions (steps,
mean_forces, mean_velocities, computes) intact so the test only asserts
mean_unwrapped_positions once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant