Skip to content

Fix sporadic parsing errors with binary ScalarList#762

Merged
gerlero merged 3 commits intomainfrom
copilot/fix-binary-scalarlist-parsing
Mar 2, 2026
Merged

Fix sporadic parsing errors with binary ScalarList#762
gerlero merged 3 commits intomainfrom
copilot/fix-binary-scalarlist-parsing

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 24, 2026

Binary-format scalarList files would sporadically raise FoamFileDecodeError, UnicodeDecodeError, or ValueError depending on the specific float values written — because the parser tried to interpret raw binary bytes as ASCII text before falling back to the binary parser.

Root causes

  • _parse_token fatal errors escaping suppression: When binary data begins with an ASCII letter followed by ( (e.g. H(), _parse_token's depth-tracking logic scanned through the binary payload and raised FoamFileDecodeError (not a ParseError subclass), bypassing all contextlib.suppress(ParseError) guards. Same issue for UnicodeDecodeError when a binary byte matching " triggered the quoted-string path.

  • _ASCIINumericListParser decode/reshape errors: The comment-aware regex (_pattern, no re.ASCII flag) could match binary data containing non-ASCII bytes, causing UnicodeDecodeError from data.decode("ascii"). For vector lists, a mismatched element count caused ValueError from reshape().

  • int32 binary parser masking float64: Binary parsers were chained — int32 was tried first, float64 only on failure. The byte 0x29 ()) can appear at offset count × 4 within float64 data, making the int32 parser falsely succeed and float64 never be attempted.

Changes

  • _parse_token: Raise ParseError instead of FoamFileDecodeError for unmatched parens and unclosed quoted/#{} tokens; wrap all decode() calls to convert UnicodeDecodeErrorParseError.

  • _ASCIINumericListParser.__call__ and _parse_ascii_faces_like_list: Wrap data.decode("ascii") and ret.reshape() to convert UnicodeDecodeError/ValueErrorParseError.

  • _parse_binary_numeric_list: Wrap ret.reshape() to convert ValueErrorParseError.

  • _parse_standalone_data_entry: Try all three binary dtypes (int32, float64, float64×3) independently rather than in a chain, always keeping the result with the highest consumed position. Float64 will always win over int32 for the same count because it consumes twice the bytes. Also catch FoamFileDecodeError/UnicodeDecodeError from _parse_data.

Original prompt

This section details on the original issue you should resolve

<issue_title>Parsing error with binary ScalarList</issue_title>
<issue_description>I experienced a strange parsing error when trying to read a scalarList with the FoamFile class when in binary format, in the form:

[...]
  File "/home/framognino/anaconda3/envs/TabICE/lib/python3.11/site-packages/foamlib/_files/_parsing/_parser.py", line 1143, in parse_located
    raise e.make_fatal() from None
foamlib._files._parsing.exceptions.foamlib.FoamFileDecodeError: parsing failed on line 83, column 264:
���d�?��$I]��?�4��J�?��s���?����)�?|l�7��?�}r���?�������?Y�K�j,�?4������?�� M���?������?��1D�?to�m�?@b��g��?BP�t�?i!���e�?���>?���,�?����B�?a�o���?�8�����?�%����?��e��?�r�����?�Vsw���?�x����?5�Pa�   �?k�T_s0�?��N�%W�?�uA����?Vmn����?,�b���?)`K�U6�?J<'��4�?+�X-�?V���-�?�$�mZ��?���,���?��>�vz�?3��NA�?E��        �?�J�+\��|4��?
                                                                                                                                                      c�?���?�,�?��72���?$�&����?�l�?�R��9�?nSm�'??C�?�bB_���?d~�9���?��wdX~�?ak�&�H�?^bN�>?s�}:���?�Y@7f��?W�|7%s�?���>?��T�3       �?�2a����?`�s�`��?"<(
                                                                                                                                                                                                                                                                       ^
Expected: keyword or standalone data

This error is really sporadic. I can tell that it is not caused by floating point special cases (I had datasets with nan and inf and worked fine). I tried storing the data in the FoamFile as list and numpy arrays before saving but got still the same problem.

Thank you a lot in advance and let me know if you need further info.

How to reproduce

Here are the functions that I used to write the scalar list, and the one that I am using to reading it.

import os
from typing import Iterable
from foamlib import FoamFile
import numpy as np
import traceback
import struct

# Here I define some convenience functions to I/O scalar fields with the foamlib FoamFile class
def readScalarList(fileName:str) -> Iterable[float]:
    """
    Reads an OpenFOAM file storing a scalarList. Automatically detects if the file is binary or not.

    Args:
        fileName (str): Name of the OpenFOAM file.
    
    Raises:
        IOError: If the file does not exist or if it does not store a scalarList
    
    Returns:
        Iterable[float]: The data stored in the file.
    """
    #Check path:
    if not(os.path.isfile(fileName)):
        raise IOError("File '{}' not found.".format(fileName))
    
    with FoamFile(fileName) as f:
        if f.class_ != "scalarList":
            raise IOError("File '{}' does not store a scalarList.".format(fileName))
        
        data = f[None]
        
        # Check if the data were correctly read as float, otherwise cast them
        if isinstance(data, np.ndarray) and data.dtype == np.int64:
            data.dtype = np.float64
        
        return data

#Write OF file with scalar list
def writeScalarList(values:Iterable[float], path:str, *, overwrite:bool=False, binary:bool=False) -> None:
    """
    Write an OpenFOAM file storing a scalarList.

    Args:
        values (Iterable[float]): The data to store.
        path (str): The location where to file the scalarList.
        overwrite (bool, optional): Overwrite if found? Defaults to False.
        binary (bool, optional): Write in binary? Defaults to False.
    
    Raises:
        IOError: If the file exists and overwrite is False.
    """
    #Check path:
    if os.path.isfile(path) and not overwrite:
        raise IOError("File '{}' exists. Run with overwrite=True.".format(path))
    
    if os.path.exists(path):
        os.remove(path)

    #Create the file object
    with FoamFile(path) as File:
        File.path.touch()
        root, file = os.path.split(path)
        # Use add bacause this prevents trying to parse the file which would lead to an error since it is empty
        File.add("FoamFile", {
            "class":"scalarList",
            "version":2.0,
            "object":file,
            "location":os.path.split(os.path.abspath(root))[1],
            "format": "binary" if binary else "ascii"
        })
        File.format = "binary" if binary else "ascii"
        File.add(None,np.array(values).flatten().tolist())

#---------------------------------------------------
# Testing with foamlib
print("Testing with foamlib")

# Loading ascii file 
data = readScalarList("data_ascii")

# Writing in binary format
writeScalarList(data, "data_binary", overwrite=True, binary=True)

# Read again in binary
try:
    data_binary = readScalarList("data_binary")
    print("Done!")
except BaseException as e:
    print("Error!")
    traceback.print_exc()

Output:

Testing with foamlib
Error!
Traceback (most recent ...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes gerlero/foamlib#760

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

Multiple root causes:
1. _parse_token raised FoamFileDecodeError (not ParseError) when binary data
   started with ASCII letter + '(' (depth tracking scanned through binary data)
2. _parse_token raised UnicodeDecodeError for quoted strings with non-ASCII bytes
3. _ASCIINumericListParser raised UnicodeDecodeError from decode('ascii') when
   comment-pattern regex matched binary data
4. _ASCIINumericListParser/_parse_binary_numeric_list raised ValueError from
   reshape when element count wasn't divisible by 3
5. Binary parsers in _parse_standalone_data_entry were chained (int32 first,
   float64 only if int32 failed), so int32 could win accidentally when ')' byte
   appeared at position count*4 in the float64 data

Fixes:
- _parse_token: raise ParseError instead of FoamFileDecodeError/UnicodeDecodeError
- _ASCIINumericListParser: handle UnicodeDecodeError and ValueError as ParseError
- _parse_binary_numeric_list: handle ValueError from reshape as ParseError
- _parse_standalone_data_entry: try all binary types independently, pick the
  one that advances furthest; catch FoamFileDecodeError/UnicodeDecodeError from
  _parse_data

Add regression tests in test_issue_binary_scalar_list.py

Co-authored-by: gerlero <15150530+gerlero@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix parsing error with binary ScalarList in FoamFile class Fix sporadic parsing errors with binary ScalarList Feb 24, 2026
@gerlero gerlero marked this pull request as ready for review March 2, 2026 13:52
@gerlero gerlero merged commit 48b48da into main Mar 2, 2026
34 of 35 checks passed
@gerlero gerlero deleted the copilot/fix-binary-scalarlist-parsing branch March 2, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants