Fix: Read Inconsistent Zero Pads #1118

ax3l · 2021-10-07T01:54:03Z

Some codes mess up the zero-padding in fileBased encoding, e.g., when specifying padding to 5 digits but creating >100'000 output steps.

Files like those cannot yet be parsed using an openpmd_%T.h5 file open string and fell back to no padding, which fails to open the file:

openpmd_00000.h5
openpmd_02000.h5
openpmd_101000.h5
openpmd_01000.h5
openpmd_100000.h5
openpmd_104000.h5

Error:

RuntimeError: [HDF5] Failed to open HDF5 file diags/diag1/openpmd_0.h5

To Do

clean up
pass existing tests
add new test with above pattern

src/Series.cpp

franzpoeschel · 2021-10-21T14:04:59Z

I've reverted your changes and pushed a suggested implementation that should fix this
A quick test seems to confirm it:

In [1]: import openpmd_api as io                

In [2]: io.Series("data%T.json", io.Access.create)                                              
Out[2]: <openPMD.Attributable with '8' attributes>

In [3]: series = io.Series("data%T.json", io.Access.create)                                     
In [4]: series.iterations[3000]                 
Out[4]: <openPMD.Iteration of at t = '0.000000 s'>

In [5]: del series      

In [6]: series = io.Series("data%02T.json", io.Access.read_only)                                

In [7]: [key for key in series.iterations]      
Out[7]: [3000]

The trick was to modify the regex a bit that parses a directory's filenames
e.g. the regex for a padding of 5: ^<prefix>(([1-9][[:digit:]]*)?([[:digit:]]{5}))<postfix>$:

The part after the question mark: Exactly five digits, these need to be given at the minimum
The part before the question mark: Optionally, further digits, but only if the first digit is not 0

In consequence, this regex now accepts 00001, 50000, 550000, 999999999999999999, but not 001, 1 or 000050000

ax3l · 2021-10-25T19:02:54Z

Cool, I think that works well!

I think a general thing we could also consider is that prefixes that are weirdly named, e.g., diag1 that then read diag10004.bp can cause problems. I generally implement this in user-code with a separating underscore to avoid the ambiguity.

Should we maybe add a warning if the user-provided name does end in a number of the prefix before the %T?

ax3l · 2021-10-28T06:20:39Z

I pushed test bug it looks like that one passes already before the patch - as long as we use %T.

Maybe that's all that is needed in openPMD-viewer then.

ax3l · 2021-10-28T16:39:01Z

Ah, I think the problem was not in Series::Series but when we actually read the data and do a series.flush() on a specific iteration. Let's refine the test.

franzpoeschel · 2021-11-01T09:04:14Z

Should we maybe add a warning if the user-provided name does end in a number of the prefix before the %T?

Theoretically we could start soft-enforcing the underscore by giving a warning if anything but an underscore is used?

A prefix diag1 should be technically unproblematic since the 1 is a hardcoded part of the regex and will not be considered for the iteration number. But user-side confusion is very likely in such cases, yeah

I pushed test bug it looks like that one passes already before the patch - as long as we use %T.

Yep, if using %T, the regex will just match for [[digit]]+, so any number of digits will be accepted. So, %T is a good catch-all for inconsistent paddings. This could well fix your use case, but I still think that going forward with this PR is necessary. %06T should be able to read all files that it produces.

franzpoeschel · 2021-11-01T15:12:13Z

Theoretically we could start soft-enforcing the underscore by giving a warning if anything but an underscore is used?

I've pushed a commit that does this. In case we want to only warn if the prefix ends in a digit, that should be easy enough to fix.

Apparently, there is an instance of data%T in our test suite:

[Warning] In file-based iteration encoding, it is strongly recommended
to prepend the expansion pattern of the filename with an underscore '_'.
Example: 'data_%T.json' or 'simOutput_%06T.h5'
Given file pattern: 'data%T'

ax3l · 2021-11-01T15:19:14Z

Not sure if we need to warn on not ending in an underscore, ending in a digit is maybe worth a warning.

…

On November 1, 2021 8:12:24 AM PDT, "Franz Pöschel" ***@***.***> wrote: > Theoretically we could start soft-enforcing the underscore by giving a warning if anything but an underscore is used? I've pushed a commit that does this. In case we want to only warn if the prefix ends in a digit, that should be easy enough to fix. Apparently, there is an instance of `data%T` in our test suite: ``` [Warning] In file-based iteration encoding, it is strongly recommended to prepend the expansion pattern of the filename with an underscore '_'. Example: 'data_%T.json' or 'simOutput_%06T.h5' Given file pattern: 'data%T' ```

Some codes mess up the zero-padding in `fileBased` encoding, e.g., when specifying padding to 5 digits but creating >100'000 output steps. Files like those cannot yet be parsed and fell back to no padding, which fails to open the file: ``` openpmd_00000.h5 openpmd_02000.h5 openpmd_101000.h5 openpmd_01000.h5 openpmd_100000.h5 openpmd_104000.h5 ``` Error: ``` RuntimeError: [HDF5] Failed to open HDF5 file diags/diag1/openpmd_0.h5 ```

Parse iteration numbers that are longer than their padding Read inconsistent zero padding

test/SerialIOTest.cpp

ax3l · 2021-11-03T20:25:36Z

test/SerialIOTest.cpp

+         * So even when opening the series with a padding of 1,
+         * that iteration will be opened.
+         */
+        REQUIRE(o.iterations.count(123456) == 1);


I was initially thinking here: should we read 123456 if our padding was specifically asked to be 00?

But if we would skip it, then this would be a bit inconsistent with reading unpadded numbers.
Thus, I think the logic change here in the test that you added is the proper way to handle this 👍

should we read 123456 if our padding was specifically asked to be 00?

Yep, it should, because that number is not padded ;)

test/SerialIOTest.cpp

ax3l · 2021-11-03T20:27:51Z

@franzpoeschel, thanks a lot for your help and updates 💖

ax3l · 2021-11-03T21:28:26Z

CI: conda install seems to be hanging today, nothing to worry

* [Draft] Fix: Read Inconsistent Zero Pads Some codes mess up the zero-padding in `fileBased` encoding, e.g., when specifying padding to 5 digits but creating >100'000 output steps. Files like those cannot yet be parsed and fell back to no padding, which fails to open the file: ``` openpmd_00000.h5 openpmd_02000.h5 openpmd_101000.h5 openpmd_01000.h5 openpmd_100000.h5 openpmd_104000.h5 ``` Error: ``` RuntimeError: [HDF5] Failed to open HDF5 file diags/diag1/openpmd_0.h5 ``` * Revert previous changes except for test Parse iteration numbers that are longer than their padding Read inconsistent zero padding * Overflow Padding: Read Test * Warn if the prefix does end in a digit * Fix: Don't let oversize numbers accidentally bump the padding * Update test * Issue warnings on misleading patterns also when writing * Minor Style Update Co-authored-by: Franz Pöschel <franz.poeschel@gmail.com>

* Read: time/dt also in long double (#1096) * Python: time/dt round-trip Test writing and reading time and dt on an iteration via properties. * Fix: Iteration read of long double time Support reading of `dt` and `time` attributes if they are of type `long double`. (openPMD standard: all `floatX` supported) * Executables: CXX_STANDARD/EXTENSIONS (#1102) Set `CXX_EXTENSIONS OFF` and `CXX_STANDARD_REQUIRED ON` for created executables. This mitigates issues with NVCC 11.0 and C++17 builds seen as added `-std=gnu++17` flags that lead to ``` nvcc fatal : Value 'gnu++17' is not defined for option 'std' ``` when using `nvcc` as CXX compiler directly. * Doc: More Locations -DPython_EXECUTABLE (#1104) Mention the `-DPython_EXECUTABLE` twice more in build examples. * NVCC + C++17 (#1103) * NVCC + C++17 Work-around a build issue with NVCC in C++17 builds. ``` include/openPMD/backend/Attributable.hpp(437): error #289: no instance of constructor "openPMD::Attribute::Attribute" matches the argument list argument types are: (std::__cxx11::string) detected during instantiation of "__nv_bool openPMD::AttributableInterface::setAttribute(const std::__cxx11::string &, T) [with T=std::__cxx11::string]" ``` from ``` inline bool AttributableInterface::setAttribute( std::string const & key, char const value[] ) { return this->setAttribute(key, std::string(value)); } ``` Seen with: - NVCC 11.0.2 + GCC 8.3.0 - NVCC 11.0.2 + GCC 7.5.0 * NVCC 11.0.2 C++17 work-around: Add Comment * Lazy parsing: Make findable in docs and use in openpmd-ls (#1111) * Use deferred iteration parsing in openpmd-ls * Make lazy/deferred parsing searchable * Add a way to search for usesteps key * HDF5: Document HDF5_USE_FILE_LOCKING (#1106) Document a HDF5 read work-around that we currently need on OLCF Jupyter (https://jupyter.olcf.ornl.gov), due to a mounting issue of GPFS in the Jupyter serice (OLCFHELP-3685). From the HDF5 1.10.1 Release Notes: ``` Other New Features and Enhancements =================================== Library ------- - Added a mechanism for disabling the SWMR file locking scheme. The file locking calls used in HDF5 1.10.0 (including patch1) will fail when the underlying file system does not support file locking or where locks have been disabled. To disable all file locking operations, an environment variable named HDF5_USE_FILE_LOCKING can be set to the five-character string 'FALSE'. This does not fundamentally change HDF5 library operation (aside from initial file open/create, SWMR is lock-free), but users will have to be more careful about opening files to avoid problematic access patterns (i.e.: multiple writers) that the file locking was designed to prevent. Additionally, the error message that is emitted when file lock operations set errno to ENOSYS (typical when file locking has been disabled) has been updated to describe the problem and potential resolution better. (DER, 2016/10/26, HDFFV-9918) ``` This also exists as a compilation option for HDF5 in CMake, where it defaults to ``TRUE`` by default, which is also what distributions/ package managers ship. Disabling from Bash: ```bash export HDF5_USE_FILE_LOCKING=FALSE ``` Disabling from Python: ```py import os os.environ['HDF5_USE_FILE_LOCKING'] = "FALSE" ``` * Avoid object slicing when deriving from Series class (#1107) * Make Series class final * Use private constructor to avoid object slicing * Doc: OMPI_MCA_io Control (#1114) Document OpenMPI MPI-I/O backend control. We have documented this long in #446. * openPMD.hpp: Include auxiliary StringManip (#1124) Include this, handy functions. * CXX Std: Remember <variant> Impl. (#1128) We use `<variant>` or `<mpark/variant.hpp>` in our public API interface for datatypes, depending on the C++ standard. This pull request makes sure that the same implementation is used in downstream code, even if the C++ standard is switched. This avoids ABI issues when, e.g., using a C++14 built openPMD-api in a C++17 downstream code. * Spack: No More `load -r` (#1125) The `-r` argument was removed from `spack load` and is now implied. * Fix AppVeyor: Python Executable (#1127) * GH Action: Add MSVC & ClangCL on Win * Fix AppVeyor: Python Executable * Avoid mismatching system Python and Conda Python * Conda: Fix Numpy * CMake: Skip Pipe Test Written in a too special way, we cannot assume SH is always present * Test 8b (Bench Read Parallel): Support Variable encoding, Fix Bugs (#1131) * added support to read variable encoding, plus fixed some bugs * fixed style * Update examples/8b_benchmark_read_parallel.cpp remove commented out code Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * removed commented line * updated 8b env option Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * HDF5 I/O optimizations (#1129) * Include HDF5 optimization options * Fix code style check * Fix validations and include checks * Fix style check * Remove unecessary strict check * Update documentation with HDF5 tuning options * Update contributions * Fix Guards for H5Pset_all_coll_metadata* * MPI Guard: H5Pset_all_coll_metadata* * Remove duplicated variable Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Include known issues section for HDF5 (#1132) * Update known issues with HDF5 and collective metadata operations * Fix rst link and tiny typo * Add targeted bugfix releases. Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Include check for paged allocation (#1133) * Include check for paged allocation * Update ParallelHDF5IOHandler.cpp * libfabric 1.6+: Document SST Work-Arounds (#1134) * libfabric 1.6+: Document SST Work-Arounds Document work-arounds for libfabric 1.6+ on Cray systems when using data staging / streaming with ADIOS2 SST. Co-authored-by: Franz Pöschel <franz.poeschel@gmail.com> * Fix: Read Inconsistent Zero Pads (#1118) * [Draft] Fix: Read Inconsistent Zero Pads Some codes mess up the zero-padding in `fileBased` encoding, e.g., when specifying padding to 5 digits but creating >100'000 output steps. Files like those cannot yet be parsed and fell back to no padding, which fails to open the file: ``` openpmd_00000.h5 openpmd_02000.h5 openpmd_101000.h5 openpmd_01000.h5 openpmd_100000.h5 openpmd_104000.h5 ``` Error: ``` RuntimeError: [HDF5] Failed to open HDF5 file diags/diag1/openpmd_0.h5 ``` * Revert previous changes except for test Parse iteration numbers that are longer than their padding Read inconsistent zero padding * Overflow Padding: Read Test * Warn if the prefix does end in a digit * Fix: Don't let oversize numbers accidentally bump the padding * Update test * Issue warnings on misleading patterns also when writing * Minor Style Update Co-authored-by: Franz Pöschel <franz.poeschel@gmail.com> * Release: 0.14.3 Co-authored-by: Franz Pöschel <franz.poeschel@gmail.com> Co-authored-by: guj <guj@users.noreply.github.com> Co-authored-by: Jean Luca Bez <jeanlucabez@gmail.com> Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>

ax3l · 2022-01-15T02:03:54Z

@camille12225 reports that even with this port in 0.14.3, she still sees issues with inconsistent padding in files.

I added a report to #1173

ax3l added bug affects latest release internal labels Oct 7, 2021

ax3l commented Oct 7, 2021

View reviewed changes

src/Series.cpp Outdated Show resolved Hide resolved

ax3l force-pushed the fix-inconsistentZeroPadRead branch from 11a40b8 to 311f107 Compare October 7, 2021 02:02

ax3l requested a review from franzpoeschel October 7, 2021 02:04

franzpoeschel force-pushed the fix-inconsistentZeroPadRead branch from 756e9ae to 0670508 Compare October 21, 2021 14:01

franzpoeschel force-pushed the fix-inconsistentZeroPadRead branch from 0670508 to bbc3306 Compare October 21, 2021 14:07

ax3l force-pushed the fix-inconsistentZeroPadRead branch from bbc3306 to 83730a0 Compare October 25, 2021 19:00

ax3l changed the title ~~[Draft] Fix: Read Inconsistent Zero Pads~~ Fix: Read Inconsistent Zero Pads Oct 25, 2021

ax3l force-pushed the fix-inconsistentZeroPadRead branch 2 times, most recently from 63aa78a to c4b0cde Compare November 1, 2021 21:45

ax3l and others added 4 commits November 1, 2021 14:57

Revert previous changes except for test

0199e34

Parse iteration numbers that are longer than their padding Read inconsistent zero padding

Overflow Padding: Read Test

fdf80bf

Warn if the prefix does end in a digit

c6dca59

ax3l force-pushed the fix-inconsistentZeroPadRead branch from f614f39 to c6dca59 Compare November 1, 2021 21:58

Fix: Don't let oversize numbers accidentally bump the padding

54a9dbc

franzpoeschel reviewed Nov 3, 2021

View reviewed changes

test/SerialIOTest.cpp Show resolved Hide resolved

Update test

970c2f4

franzpoeschel force-pushed the fix-inconsistentZeroPadRead branch from 8c8f534 to 970c2f4 Compare November 3, 2021 12:23

Issue warnings on misleading patterns also when writing

a85c0d4

ax3l added this to the 0.14.3 milestone Nov 3, 2021

ax3l commented Nov 3, 2021

View reviewed changes

test/SerialIOTest.cpp Outdated Show resolved Hide resolved

Minor Style Update

5ea8f24

ax3l merged commit 15fc049 into openPMD:dev Nov 3, 2021

ax3l deleted the fix-inconsistentZeroPadRead branch November 3, 2021 21:29

ax3l mentioned this pull request Jan 15, 2022

Pattern Problem Returns in 0.14.3 #1173

Closed

ax3l mentioned this pull request Apr 21, 2022

Fix Zero Pattern Issue #1253

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Read Inconsistent Zero Pads #1118

Fix: Read Inconsistent Zero Pads #1118

ax3l commented Oct 7, 2021 •

edited

Loading

franzpoeschel commented Oct 21, 2021 •

edited

Loading

ax3l commented Oct 25, 2021 •

edited

Loading

ax3l commented Oct 28, 2021

ax3l commented Oct 28, 2021

franzpoeschel commented Nov 1, 2021

franzpoeschel commented Nov 1, 2021

ax3l commented Nov 1, 2021 via email •

edited

Loading

ax3l Nov 3, 2021

franzpoeschel Nov 4, 2021

ax3l commented Nov 3, 2021

ax3l commented Nov 3, 2021 •

edited

Loading

ax3l commented Jan 15, 2022 •

edited

Loading

Fix: Read Inconsistent Zero Pads #1118

Fix: Read Inconsistent Zero Pads #1118

Conversation

ax3l commented Oct 7, 2021 • edited Loading

To Do

franzpoeschel commented Oct 21, 2021 • edited Loading

ax3l commented Oct 25, 2021 • edited Loading

ax3l commented Oct 28, 2021

ax3l commented Oct 28, 2021

franzpoeschel commented Nov 1, 2021

franzpoeschel commented Nov 1, 2021

ax3l commented Nov 1, 2021 via email • edited Loading

ax3l Nov 3, 2021

Choose a reason for hiding this comment

franzpoeschel Nov 4, 2021

Choose a reason for hiding this comment

ax3l commented Nov 3, 2021

ax3l commented Nov 3, 2021 • edited Loading

ax3l commented Jan 15, 2022 • edited Loading

ax3l commented Oct 7, 2021 •

edited

Loading

franzpoeschel commented Oct 21, 2021 •

edited

Loading

ax3l commented Oct 25, 2021 •

edited

Loading

ax3l commented Nov 1, 2021 via email •

edited

Loading

ax3l commented Nov 3, 2021 •

edited

Loading

ax3l commented Jan 15, 2022 •

edited

Loading