Skip to content

Fix remaining RPM build test failures#477

Merged
markcmiller86 merged 27 commits intomainfrom
bug-mcm86-ci-testing-fixes2
Oct 26, 2025
Merged

Fix remaining RPM build test failures#477
markcmiller86 merged 27 commits intomainfrom
bug-mcm86-ci-testing-fixes2

Conversation

@markcmiller86
Copy link
Member

No description provided.

@markcmiller86
Copy link
Member Author

I have a hunch why checksum test is not working on RPM build CI but is working on Basic CI. We're using two different versions of HDF5 here. And, the checksum test walks the HDF5 error stack looking for specific items. I think they've changed something in the way their error stack works between the two versions. I am getting expected behavior on 1.14.4 (which represents perhaps all previous version's behavior) and unexpected behavior on 1.14.6.

@markcmiller86
Copy link
Member Author

markcmiller86 commented Oct 25, 2025

Am recording here a smoking gun reproducer of the issue in error stack error messages. Attached is an intentionally corrupted HDF5 file. 1.14.4 produces an error message indicating a checksum failure. 1.14.6 does not.

multi_ucd3d_corrupt.h5.gz

Here is what h5ls with 1.14.4 produces (the "correct" result with bottom entry the Checksum failure)

env PATH=$PATH:/mnt/nvme/mark/silo/hdf5-1.14.4/build/my_install/bin h5ls --enable-error-stack -vlrd multi_ucd3d_corrupt.h5/.silo/#000041
Opened "multi_ucd3d_corrupt.h5" with sec2 driver.
.silo/#000041            Dataset {1200/1200}
    Location:  1:453468
    Links:     1
    Chunks:    {1200} 4800 bytes
    Storage:   4800 logical bytes, 4804 allocated bytes, 99.92% utilization
    Filter-0:  fletcher32-3  {}
    Type:      native int
    Data:
HDF5-DIAG: Error detected in HDF5 (1.14.4-3) thread 0:
  #000: /mnt/nvme/mark/silo/hdf5-1.14.4/src/H5D.c line 1061 in H5Dread(): can't synchronously read data
    major: Dataset
    minor: Read failed
  #001: /mnt/nvme/mark/silo/hdf5-1.14.4/src/H5D.c line 1008 in H5D__read_api_common(): can't read data
    major: Dataset
    minor: Read failed
  #002: /mnt/nvme/mark/silo/hdf5-1.14.4/src/H5VLcallback.c line 2092 in H5VL_dataset_read_direct(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #003: /mnt/nvme/mark/silo/hdf5-1.14.4/src/H5VLcallback.c line 2048 in H5VL__dataset_read(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #004: /mnt/nvme/mark/silo/hdf5-1.14.4/src/H5VLnative_dataset.c line 374 in H5VL__native_dataset_read(): can't read data
    major: Dataset
    minor: Read failed
  #005: /mnt/nvme/mark/silo/hdf5-1.14.4/src/H5Dio.c line 402 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #006: /mnt/nvme/mark/silo/hdf5-1.14.4/src/H5Dchunk.c line 2913 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #007: /mnt/nvme/mark/silo/hdf5-1.14.4/src/H5Dchunk.c line 4542 in H5D__chunk_lock(): data pipeline read failed
    major: Dataset
    minor: Filter operation failed
  #008: /mnt/nvme/mark/silo/hdf5-1.14.4/src/H5Z.c line 1450 in H5Z_pipeline(): filter returned failure during read
    major: Data filters
    minor: Read failed
  #009: /mnt/nvme/mark/silo/hdf5-1.14.4/src/H5Zfletcher32.c line 102 in H5Z__filter_fletcher32(): data error detected by Fletcher32 checksum
    major: Data storage
    minor: Read failed
                 Unable to print data.
H5tools-DIAG: Error detected in HDF5:tools (1.14.4) thread 0:
  #000: /mnt/nvme/mark/silo/hdf5-1.14.4/tools/lib/h5tools_dump.c line 1770 in h5tools_dump_simple_dset(): H5Dread failed
    major: Failure in tools library
    minor: error in function

Here is what h5ls with 1.14.6 produces

Notice the last message about flecher32 failure is missing. That is what silo is looking for to flag a checksum error...as opposed to any one of a number of other errors.

env PATH=$PATH:/mnt/nvme/mark/silo/hdf5-1.14.6/build/my_install/bin h5ls --enable-error-stack -vlrd multi_ucd3d_corrupt.h5/.silo/#000041
Opened "multi_ucd3d_corrupt.h5" with sec2 driver.
.silo/#000041            Dataset {1200/1200}
    Location:  1:453468
    Links:     1
    Chunks:    {1200} 4800 bytes
    Storage:   4800 logical bytes, 4804 allocated bytes, 99.92% utilization
    Filter-0:  fletcher32-3  {}
    Type:      native int
    Data:
HDF5-DIAG: Error detected in HDF5 (1.14.6):
  #000: /mnt/nvme/mark/silo/hdf5-1.14.6/src/H5D.c line 1044 in H5Dread(): can't synchronously read data
    major: Dataset
    minor: Read failed
  #001: /mnt/nvme/mark/silo/hdf5-1.14.6/src/H5D.c line 992 in H5D__read_api_common(): can't read data
    major: Dataset
    minor: Read failed
  #002: /mnt/nvme/mark/silo/hdf5-1.14.6/src/H5VLcallback.c line 2078 in H5VL_dataset_read(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #003: /mnt/nvme/mark/silo/hdf5-1.14.6/src/H5VLcallback.c line 2034 in H5VL__dataset_read(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #004: /mnt/nvme/mark/silo/hdf5-1.14.6/src/H5VLnative_dataset.c line 374 in H5VL__native_dataset_read(): can't read data
    major: Dataset
    minor: Read failed
  #005: /mnt/nvme/mark/silo/hdf5-1.14.6/src/H5Dio.c line 402 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #006: /mnt/nvme/mark/silo/hdf5-1.14.6/src/H5Dchunk.c line 2913 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #007: /mnt/nvme/mark/silo/hdf5-1.14.6/src/H5Dchunk.c line 4542 in H5D__chunk_lock(): data pipeline read failed
    major: Dataset
    minor: Filter operation failed
  #008: /mnt/nvme/mark/silo/hdf5-1.14.6/src/H5Z.c line 1450 in H5Z_pipeline(): filter returned failure during read
    major: Data filters
    minor: Read failed
                 Unable to print data.
H5tools-DIAG: Error detected in HDF5:tools (1.14.6):
  #000: /mnt/nvme/mark/silo/hdf5-1.14.6/tools/lib/h5tools_dump.c line 1770 in h5tools_dump_simple_dset(): H5Dread failed
    major: Failure in tools library
    minor: error in function

@markcmiller86 markcmiller86 merged commit 82f6dce into main Oct 26, 2025
4 checks passed
@markcmiller86 markcmiller86 deleted the bug-mcm86-ci-testing-fixes2 branch October 26, 2025 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant