Skip to content

Add test for output#196

Merged
jan-janssen merged 10 commits intomainfrom
output
Feb 24, 2025
Merged

Add test for output#196
jan-janssen merged 10 commits intomainfrom
output

Conversation

@jan-janssen
Copy link
Copy Markdown
Member

@jan-janssen jan-janssen commented Feb 23, 2025

Summary by CodeRabbit

  • New Features

    • Introduced a new function for remapping Lammps-dumped indices to structure indices, enhancing flexibility in data parsing.
    • Added a utility function for converting lists or arrays into a specific format based on their length.
  • Refactor

    • Improved data parsing by updating the whitespace regular expression handling for better separation of CSV data.
    • Enhanced precision in floating-point number representation in specific methods.
  • Tests

    • Added new unit tests to verify the functionality and correctness of the index remapping and data parsing under various conditions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 23, 2025

Warning

Rate limit exceeded

@jan-janssen has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 35 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 3173f77 and cba3629.

📒 Files selected for processing (1)
  • .github/workflows/pipeline.yml (1 hunks)

Walkthrough

The changes introduce a new function remap_indices_ase for remapping Lammps-dumped indices to structure indices in the pyiron_lammps/output.py file. The parse_lammps_output and _parse_dump functions have been updated to accept a customizable remapping function. A new utility function to_amat has been added for formatting arrays. Additionally, the separator in CSV reading has been modified. A new test file tests/test_output.py has been created to validate the functionality of the new and modified functions.

Changes

File(s) Change Summary
pyiron_lammps/output.py Added remap_indices_ase for index remapping; updated parse_lammps_output, _parse_dump, and _collect_dump_from_text to accept remap_indices_funct; added to_amat for array formatting; changed regex separator in CSV reading from "\s+" to "\\s+".
tests/test_output.py Introduced TestLammpsOutput class with test_remap_indices and test_dump_chemical methods to validate remap_indices_ase and _parse_dump functionality.
pyiron_lammps/structure.py Modified f2qdec and f2s methods in UnfoldingPrism class to use str(f) for converting input to Decimal.

Possibly related PRs

  • Migrate to file based interface #193: The changes in the main PR are related to the modifications made in the parse_lammps_output and remap_indices_ase functions, which are tested in the retrieved PR's TestLammpsOutput class.

🐇 In the code where indices roam,
A new function found its home.
With tests to check each little part,
We ensure that it plays its part.
So let us hop and celebrate,
For our code is now first-rate! 🎉


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 23, 2025

Codecov Report

Attention: Patch coverage is 86.95652% with 3 lines in your changes missing coverage. Please review.

Project coverage is 44.45%. Comparing base (0485da1) to head (cba3629).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
pyiron_lammps/output.py 90.47% 2 Missing ⚠️
pyiron_lammps/structure.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #196      +/-   ##
==========================================
- Coverage   47.08%   44.45%   -2.63%     
==========================================
  Files           4        5       +1     
  Lines         720      893     +173     
==========================================
+ Hits          339      397      +58     
- Misses        381      496     +115     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
tests/test_output.py (3)

8-44: Consider using a more concise way to set chemical symbols.

The current implementation manually lists each chemical symbol, which is verbose and error-prone.

Consider using list multiplication and concatenation for a more concise implementation:

-        structure.set_chemical_symbols(
-            [
-                "Ag",
-                "Ag",
-                "Ag",
-                "Ag",
-                "Au",
-                "Au",
-                "Au",
-                "Au",
-                "Cu",
-                "Cu",
-                "Cu",
-                "Cu",
-                "Ag",
-                "Ag",
-                "Ag",
-                "Ag",
-                "Ag",
-                "Ag",
-                "Ag",
-                "Ag",
-                "Au",
-                "Au",
-                "Au",
-                "Au",
-                "Cu",
-                "Cu",
-                "Cu",
-                "Cu",
-                "Ag",
-                "Ag",
-                "Ag",
-                "Ag",
-            ]
-        )
+        structure.set_chemical_symbols(
+            ["Ag"] * 4 + ["Au"] * 4 + ["Cu"] * 4 +
+            ["Ag"] * 8 + ["Au"] * 4 + ["Cu"] * 4 +
+            ["Ag"] * 4
+        )

45-83: Consider using a more concise way to define lammps_indices.

Similar to the chemical symbols, the indices can be defined more concisely.

Consider using list multiplication and concatenation:

-            lammps_indices=[
-                1,
-                1,
-                1,
-                1,
-                5,
-                5,
-                5,
-                5,
-                3,
-                3,
-                3,
-                3,
-                1,
-                1,
-                1,
-                1,
-                1,
-                1,
-                1,
-                1,
-                5,
-                5,
-                5,
-                5,
-                3,
-                3,
-                3,
-                3,
-                1,
-                1,
-                1,
-                1,
-            ],
+            lammps_indices=[1] * 4 + [5] * 4 + [3] * 4 +
+                          [1] * 8 + [5] * 4 + [3] * 4 +
+                          [1] * 4,

84-95: Add docstring to explain test cases.

The test method lacks documentation explaining the purpose of each test case and the expected outcomes.

Consider adding a docstring:

     def test_remap_indices(self):
+        """Test remap_indices function with different input conditions.
+        
+        Test cases:
+        1. Full structure with all elements - expect sum of indices to be 24
+        2. Partial structure with 4 elements - expect sum of indices to be 8
+        3. Different element order - expect sum of indices to be 0
+        """
         structure = bulk("Ag", cubic=True).repeat([2, 2, 2])
pyiron_lammps/output.py (1)

529-529: Consider implementing the TODO for vectorization.

The TODO comment suggests vectorizing the for-loop for better performance.

I can help implement a vectorized version of this loop. Would you like me to propose a solution or create an issue to track this improvement?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0485da1 and d80ebe2.

📒 Files selected for processing (2)
  • pyiron_lammps/output.py (3 hunks)
  • tests/test_output.py (1 hunks)
🔇 Additional comments (3)
tests/test_output.py (1)

1-6: LGTM!

The imports are appropriate and the test class is well-structured.

pyiron_lammps/output.py (2)

518-518: LGTM! Performance improvement in remap_indices.

The use of np.unique to get unique chemical symbols is more efficient than the previous implementation.


185-185:

Details

❓ Verification inconclusive

Verify the separator change impact.

The separator pattern has been changed from "\s+" to "\\s+". While both patterns are valid in Python, they might behave differently in pandas' C engine.

Let's verify if this change affects the parsing of the dump file:


🏁 Script executed:

#!/bin/bash
# Description: Check if the separator change affects file parsing
# Create a sample dump file and test both separators

# Create a test file with various whitespace patterns
echo "1    2\t\t3   4" > test_dump.txt
echo "5\t6  7    8" >> test_dump.txt

# Test both separators
python3 << EOF
import pandas as pd
from io import StringIO

with open('test_dump.txt', 'r') as f:
    content = f.read()

# Test old separator
df1 = pd.read_csv(StringIO(content), sep="\s+", header=None, engine="c")
print("Old separator result:", df1.values.tolist())

# Test new separator
df2 = pd.read_csv(StringIO(content), sep="\\s+", header=None, engine="c")
print("New separator result:", df2.values.tolist())

# Compare results
print("Results match:", (df1 == df2).all().all())
EOF

Length of output: 175


Action required: Re-run verification in an environment with pandas installed.

The test script encountered a ModuleNotFoundError for the pandas module, so we couldn't determine if changing the separator from "\s+" to "\\s+" impacts the parsing behavior. Please ensure that pandas is available (e.g., by installing it locally or in your CI environment) and re-run the test script manually to verify that both separators yield equivalent parsing results.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/test_output.py (1)

13-102: Improve test maintainability and clarity.

While the test is thorough, consider these improvements:

  1. Extract test data into constants or fixtures to improve maintainability.
  2. Add more descriptive assertion messages to explain the expected outcomes.

Example refactor:

+    # Test data
+    CHEMICAL_SYMBOLS = ["Ag"] * 4 + ["Au"] * 4 + ["Cu"] * 4 + ["Ag"] * 8 + ["Au"] * 4 + ["Cu"] * 4 + ["Ag"] * 4
+    LAMMPS_INDICES_1 = [1] * 4 + [5] * 4 + [3] * 4 + [1] * 8 + [5] * 4 + [3] * 4 + [1] * 4
+    POTENTIAL_ELEMENTS_1 = ["Ag", "Al", "Cu", "Co", "Au"]

     def test_remap_indices(self):
         structure = bulk("Ag", cubic=True).repeat([2, 2, 2])
-        structure.set_chemical_symbols([...])  # Long list of symbols
+        structure.set_chemical_symbols(self.CHEMICAL_SYMBOLS)
         
         ind = remap_indices(
-            lammps_indices=[...],  # Long list of indices
+            lammps_indices=self.LAMMPS_INDICES_1,
-            potential_elements=["Ag", "Al", "Cu", "Co", "Au"],
+            potential_elements=self.POTENTIAL_ELEMENTS_1,
             structure=structure,
         )
-        self.assertEqual(sum(ind), 24)
+        self.assertEqual(sum(ind), 24, "Expected sum of indices to be 24 for Ag atoms")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d80ebe2 and f8935ea.

⛔ Files ignored due to path filters (3)
  • tests/static/dump_chemical/dump_AlH.out is excluded by !**/*.out
  • tests/static/dump_chemical/dump_NiAlH.out is excluded by !**/*.out
  • tests/static/dump_chemical/dump_NiH.out is excluded by !**/*.out
📒 Files selected for processing (1)
  • tests/test_output.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_output.py

111-111: Ambiguous variable name: l

(E741)

🪛 GitHub Actions: Pipeline
tests/test_output.py

[error] 119-119: ERROR: test_dump_chemical failed due to decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>].

🔇 Additional comments (2)
tests/test_output.py (2)

1-12: LGTM!

The imports are well-organized and the class setup follows unittest conventions.


116-122:

Details

❓ Verification inconclusive

Investigate decimal conversion error in test.

The pipeline is failing with a decimal conversion error. This might be due to how the numeric data is being handled in the UnfoldingPrism class or the dump file parsing.

Let's verify the contents of the dump files:


🏁 Script executed:

#!/bin/bash
# Description: Check the format of numeric values in dump files

# Test: Check for any non-standard numeric formats in dump files
echo "Checking dump_NiH.out:"
cat tests/static/dump_chemical/dump_NiH.out

echo -e "\nChecking dump_AlH.out:"
cat tests/static/dump_chemical/dump_AlH.out

echo -e "\nChecking dump_NiAlH.out:"
cat tests/static/dump_chemical/dump_NiAlH.out

Length of output: 3413


Decimal Conversion Error: Revisit Numeric Parsing and Empty Input Handling

The dump files display standard scientific notation and appear valid, so the conversion issue isn’t due to the file formats. Please double-check the following:

  • Confirm if the empty string passed as dump_h5_full_file_name should trigger alternative logic rather than being processed like a valid numeric value.
  • Verify that the conversion routines in both _parse_dump and the UnfoldingPrism class correctly handle all components of the dump (e.g., ignoring labels like "pp" in the BOX BOUNDS line) and don’t inadvertently attempt to convert non-numeric strings.
  • Review unit tests around numerical conversions to ensure robust handling of empty/missing fields and proper parsing of scientific notation values.
🧰 Tools
🪛 GitHub Actions: Pipeline

[error] 119-119: ERROR: test_dump_chemical failed due to decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>].

Comment thread tests/test_output.py
def test_dump_chemical(self):
test_folder = os.path.join(self.static_folder, "dump_chemical")
structure_ni = bulk("Ni", cubic=True)
structure_al = bulk("Ni", cubic=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect structure initialization.

The structure for aluminum is incorrectly using "Ni" as the base structure.

Apply this diff:

-        structure_al = bulk("Ni", cubic=True)
+        structure_al = bulk("Al", cubic=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
structure_al = bulk("Ni", cubic=True)
structure_al = bulk("Al", cubic=True)

Comment thread tests/test_output.py
Comment on lines +111 to +115
for l, s, ind in zip(
["dump_NiH.out", "dump_AlH.out", "dump_NiAlH.out"],
[structure_ni, structure_al, structure_all],
[np.array([0, 1, 1, 1]), np.array([1, 0, 0, 0]), np.array([0, 0, 1, 2])],
):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Rename ambiguous variable for better readability.

The variable name l is ambiguous and should be more descriptive.

Apply this diff:

-        for l, s, ind in zip(
+        for dump_file, s, ind in zip(
             ["dump_NiH.out", "dump_AlH.out", "dump_NiAlH.out"],
             [structure_ni, structure_al, structure_all],
             [np.array([0, 1, 1, 1]), np.array([1, 0, 0, 0]), np.array([0, 0, 1, 2])],
         ):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for l, s, ind in zip(
["dump_NiH.out", "dump_AlH.out", "dump_NiAlH.out"],
[structure_ni, structure_al, structure_all],
[np.array([0, 1, 1, 1]), np.array([1, 0, 0, 0]), np.array([0, 0, 1, 2])],
):
for dump_file, s, ind in zip(
["dump_NiH.out", "dump_AlH.out", "dump_NiAlH.out"],
[structure_ni, structure_al, structure_all],
[np.array([0, 1, 1, 1]), np.array([1, 0, 0, 0]), np.array([0, 0, 1, 2])],
):
🧰 Tools
🪛 Ruff (0.8.2)

111-111: Ambiguous variable name: l

(E741)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pyiron_lammps/output.py (1)

34-71: Consider vectorizing the index remapping loop.

The for-loop at line 67 could be vectorized for better performance.

Here's a vectorized implementation:

-    for i_struct, i_lammps in enumerate(map_):
-        np.place(structure_indices, lammps_indices == i_lammps, i_struct)
+    # Create a mapping array initialized with zeros
+    mapping = np.zeros_like(lammps_indices)
+    # Use boolean indexing to update all matching indices at once
+    for i_struct, i_lammps in enumerate(map_):
+        mapping[lammps_indices == i_lammps] = i_struct
+    structure_indices = mapping
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87da7ba and 3173f77.

📒 Files selected for processing (2)
  • pyiron_lammps/output.py (8 hunks)
  • tests/test_output.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_output.py

111-111: Ambiguous variable name: l

(E741)

🪛 GitHub Check: codecov/patch
pyiron_lammps/output.py

[warning] 142-142: pyiron_lammps/output.py#L142
Added line #L142 was not covered by tests


[warning] 158-158: pyiron_lammps/output.py#L158
Added line #L158 was not covered by tests

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: unittest_matrix (windows-latest, 3.12)
  • GitHub Check: unittest_matrix (macos-latest, 3.12)
🔇 Additional comments (4)
tests/test_output.py (3)

13-101: LGTM! Comprehensive test cases.

The test cases thoroughly validate the remap_indices_ase function with different input combinations, ensuring correct remapping behavior.


106-106: Fix incorrect structure initialization.

The structure for aluminum is incorrectly using "Ni" as the base structure.


111-115: Rename ambiguous variable for better readability.

The variable name l is ambiguous and should be more descriptive.

🧰 Tools
🪛 Ruff (0.8.2)

111-111: Ambiguous variable name: l

(E741)

pyiron_lammps/output.py (1)

269-269: LGTM! Proper regex escaping.

The change correctly escapes the whitespace pattern in the separator.

Comment thread pyiron_lammps/output.py
Comment on lines +139 to +176
def to_amat(l_list: Union[np.ndarray, List]) -> List:
lst = np.reshape(l_list, -1)
if len(lst) == 9:
(
xlo_bound,
xhi_bound,
xy,
ylo_bound,
yhi_bound,
xz,
zlo_bound,
zhi_bound,
yz,
) = lst

elif len(lst) == 6:
xlo_bound, xhi_bound, ylo_bound, yhi_bound, zlo_bound, zhi_bound = lst
xy, xz, yz = 0.0, 0.0, 0.0
else:
raise ValueError("This format for amat not yet implemented: " + str(len(lst)))

# > xhi_bound - xlo_bound = xhi -xlo + MAX(0.0, xy, xz, xy + xz) - MIN(0.0, xy, xz, xy + xz)
# > xhili = xhi -xlo = xhi_bound - xlo_bound - MAX(0.0, xy, xz, xy + xz) + MIN(0.0, xy, xz, xy + xz)
xhilo = (
(xhi_bound - xlo_bound)
- max([0.0, xy, xz, xy + xz])
+ min([0.0, xy, xz, xy + xz])
)

# > yhilo = yhi -ylo = yhi_bound -ylo_bound - MAX(0.0, yz) + MIN(0.0, yz)
yhilo = (yhi_bound - ylo_bound) - max([0.0, yz]) + min([0.0, yz])

# > zhi - zlo = zhi_bound- zlo_bound
zhilo = zhi_bound - zlo_bound

cell = [[xhilo, 0, 0], [xy, yhilo, 0], [xz, yz, zhilo]]
return cell

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Add test coverage for edge cases.

The function lacks test coverage for the 9-element case and error handling path. Consider adding test cases to cover:

  1. 9-element input with tilt factors
  2. Invalid input length error case

The following lines need test coverage:


🏁 Script executed:

#!/bin/bash
# Description: Check test coverage for to_amat function

# Show lines that need coverage
rg -A 5 "def to_amat"

Length of output: 332


Action: Add comprehensive test cases for the to_amat function.

It looks like the function doesn’t have tests covering both the 9-element and error-handling paths. Please add tests to ensure:

  • The 9-element input case (with tilt factors) computes the correct cell.
  • The 6-element input case (with zeros for tilt factors) works as expected.
  • An invalid input length triggers the appropriate ValueError.

Consider updating the test suite to include these scenarios.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 142-142: pyiron_lammps/output.py#L142
Added line #L142 was not covered by tests


[warning] 158-158: pyiron_lammps/output.py#L158
Added line #L158 was not covered by tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant