Skip to content

Fix OOB read / segfault when reading SHAPE-scalar HDF5 datasets#63

Closed
SimonPinches wants to merge 1 commit into
iterorganization:developfrom
SimonPinches:fix/hdf5-shape-read-oob
Closed

Fix OOB read / segfault when reading SHAPE-scalar HDF5 datasets#63
SimonPinches wants to merge 1 commit into
iterorganization:developfrom
SimonPinches:fix/hdf5-shape-read-oob

Conversation

@SimonPinches
Copy link
Copy Markdown
Contributor

Fix OOB read / segfault when reading SHAPE-scalar HDF5 datasets

Closes #51. Alternative to #52.

Problem

Reading GGD geometry data segfaults inside the HDF5 reader, e.g.:

ids.grid_ggd[0].space[0].objects_per_dimension[3].object[0].geometry[0]

crashes with SIGSEGV in __intel_avx_rep_memcpy (see the backtrace in #51).

Root cause

HDF5HsSelectionReader computes dim = dataset_rank - AOSRank. For a *_SHAPE
dataset whose rank equals the AOS depth, dim == 0 while rank > 0.

readIntNDFromBuffer and readDoubleNDFromBuffer then take the rank != dim
branch and unconditionally append a trailing value-dimension index before
computing the flat offset:

std::vector<int> indices = current_arrctx_indices;   // size == rank
indices.push_back(0);                                // -> size rank + 1
int index = hdf5_utils.indices_to_flat_index(indices, hsSelectionReader.getDataSpaceDims());
memcpy(*data, v + index, shape * sizeof(int));

indices_to_flat_index iterates indices.size() times reading dataSpaceDims[i],
so the extra element reads one past the dims array and folds it into the offset.
The resulting index is out of range and the memcpy reads unmapped memory.

Notably, the buffered write path (fillFullBuffers()) already handles this
correctly — it guards the same push_back with if (request_dim != 0) and, when
dim == 0, indexes directly from the AOS indices. The read-from-buffer functions
were simply missing that guard.

Fix

Mirror the existing write-path guard into both read functions: only append the
trailing index when dim != 0; otherwise address the scalar directly from
current_arrctx_indices. Minimal, symmetric with fillFullBuffers(), and it
lets the read return the correct value instead of crashing.

Why this as an alternative to #52

#52 addresses the same issue from the caller side by adding rank checks in
readPersistentShapes_Get / _GetSlice that throw an ALBackendException when
the SHAPE-dataset rank does not match arrctx_indices.size() + 1. That removes
the undefined behaviour, but for the reproduction case the rank is exactly
arrctx_indices.size(), so the guard converts the crash into a thrown
exception — the GGD geometry still cannot be read. (The PR description for #52
also describes a conditional-push_back fix and a write-path change that are not
present in its diff.)

This PR instead fixes the indexing at its source, so the affected reads succeed.
The two approaches are not mutually exclusive: #52's guard could be kept as a
defence-in-depth safety net (with its condition relaxed so it does not reject the
legitimate rank == AOSRank SHAPE case), but it is not required once the
indexing is correct.

Verification

Reproduced the crash on the released IMAS-Core/5.6.0 modules (both intel-2023b
and foss-2023b) with the script from #51SIGSEGV (exit 139).

Built libal from this branch and re-ran with the patched library preloaded over
the released Python extension (so the patched C library is the only thing that
changed):

Case Unpatched 5.6.0 Patched
GGD geometry read (#51 repro) SIGSEGV (exit 139) empty array, exit 0
Non-empty FLT_1D (point coordinates, objects_per_dimension[0]) reads correctly
Non-empty INT_1D (nodes, sampled across the full object range) reads correctly

Both value-dimension branches (dim != 0, int and double) and the new dim == 0
branch are exercised.

Follow-up suggestions (not in this PR)

  • Harden indices_to_flat_index to take a rank/length and assert/throw if the
    index count exceeds it, so any future caller bug fails loudly instead of
    reading out of bounds.
  • Add a regression test using a *_SHAPE dataset whose rank equals the AOS depth.

readIntNDFromBuffer and readDoubleNDFromBuffer unconditionally appended a
trailing value-dimension index (indices.push_back(0)) whenever rank != dim,
then computed a flat offset with indices_to_flat_index. For a *_SHAPE
dataset whose rank equals the AOS depth (dim == 0), the index vector
already matched the dataset rank, so the extra element pushed the loop one
past the dataspace dims array and produced an out-of-range flat offset. The
subsequent memcpy(*data, v + index, ...) then read unmapped memory and
crashed (SIGSEGV in __intel_avx_rep_memcpy).

Guard the push_back with `if (dim != 0)`, mirroring the existing, correct
logic in fillFullBuffers() (the buffered write path): when dim == 0 the
scalar is addressed directly by current_arrctx_indices with no extra index.

Fixes the segfault reported when reading GGD geometry (issue iterorganization#51), e.g.
grid_ggd[0]/space[0]/objects_per_dimension[3]/object[0]/geometry, where the
geometry_SHAPE dataset has one fewer dimension than the geometry values.

Verified against the reproduction D3D dataset: with the patched libal the
previously-crashing empty-geometry read returns an empty array, while
non-empty FLT_1D (point coordinates) and INT_1D (nodes) reads remain
correct.
@prasad-sawantdesai
Copy link
Copy Markdown
Collaborator

prasad-sawantdesai commented Jun 2, 2026

Compiled and check with above PR parallel to #52

It avoids segfault and provides following message but does not tell about underlying issue with the rank.. #52 (comment)

import imas

entry = imas.DBEntry("imas:hdf5?path=/home/ITER/pankina/public/d3d/4/163518/75", "r")
ids = entry.get("mhd", occurrence=1, lazy=True, autoconvert=False)
ids.grid_ggd[0].space[0].objects_per_dimension[3].object[0].geometry[0]
$ python test.py
16:04:08 INFO     Parsing data dictionary version 4.1.1 @dd_zip.py:89
Traceback (most recent call last):
  File "/home/ITER/sawantp1/github/IMAS-Core/test.py", line 11, in <module>
    ids.grid_ggd[0].space[0].objects_per_dimension[3].object[0].geometry[0]
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/home/ITER/sawantp1/github/IMAS-Core/myenv/lib/python3.13/site-packages/imas/ids_primitive.py", line 224, in __getitem__
    return self.value[index]
           ~~~~~~~~~~^^^^^^^
IndexError: index 0 is out of bounds for axis 0 with size 0

@SimonPinches
Copy link
Copy Markdown
Contributor Author

Thanks for testing it. The IndexError here is expected and correct — it's not coming from IMAS-Core, and there is actually no rank anomaly to report. Let me explain why.

The geometry really is empty for that element. objects_per_dimension[3] holds the highest-dimension objects (cells), which in this grid are defined by their nodes, not by an explicit geometry. So …/object[0]/geometry genuinely has length 0:

opd[3].object[0]: geometry len=0   nodes len=6
opd[0].object[0]: geometry len=3   val=[ 1.80929947  0.  -0.02699206 ]   # points carry coordinates

The original snippet does geometry[0], i.e. it indexes element 0 of a legitimately empty array — that's a plain Python IndexError, exactly as you'd get from np.array([])[0]. Reading the field itself, obj.geometry, returns [] cleanly. So #63 returns the correct value; the test line just happens to index into an empty field.

There is no "rank 4 vs rank 5" problem. For a 1D field stored in homogeneous-AOS form, the _SHAPE dataset stores a single scalar length per AOS cell, so its rank equals the AOS depth — one fewer than the value dataset, which carries the extra value dimension. That's exactly what the file shows and it is the correct, intended layout:

…&object[]&geometry        {2, 1, 4, 3702416, 3}   # value: AOS(4) + value-dim(1) = rank 5
…&object[]&geometry_SHAPE  {2, 1, 4, 3702416}      # shape: AOS(4)               = rank 4

So geometry_SHAPE having rank 4 is not a malformation — it's normal for every 1D field. The expected_rank = aos + 1 assumption in #52 only holds for ≥2D fields (e.g. geometry_2d, whose SHAPE needs an extra dimension to hold the 2 per-cell lengths). Applied to a 1D field it is off by one, which is why #52 reports "rank 4 but rank 5 was expected" and raises an exception — it's flagging the normal 1D layout as an error and refusing to read valid data.

What #63 does. It reads the SHAPE correctly in both cases by branching on dim = shape_rank − aos (mirroring the existing fillFullBuffers() write path): dim == 0 → 1D scalar length (this case), dim ≥ 1 → ND. Verified against the same dataset:

  • opd[3].object[0].geometry[] (length 0, correct — cell defined by nodes)
  • opd[0].object[0].geometry[1.809, 0, -0.027] (length 3, non-empty FLT_1D read correctly)
  • nodes (INT_1D) → 6 values per object across the full object range

So both branches (and both int/double readers) are exercised and return correct data when present, and correct empties when the stored length is 0.

On surfacing the rank. If we do want a diagnostic for genuinely inconsistent SHAPE datasets, the right invariant is shape_rank == value_dataset_rank − 1 (equivalently dim ∈ {0, …} consistent with the field's DD rank), not aos + 1 — that would catch a truly corrupt file without rejecting valid 1D data. I'm happy to add such a check as belt-and-suspenders on top of this fix if that's useful, but the empty geometry in your test is the correct result, not a masked error.

@SimonPinches
Copy link
Copy Markdown
Contributor Author

Correction to my earlier comment — @prasad-sawantdesai is right, and I was wrong about the layout.

I wrote a 1D GGD geometry with pure IMAS-Core and inspected the datasets with h5py:

geometry        shape=(1, 1, 1, 2, 3)  rank=5
geometry_SHAPE  shape=(1, 1, 1, 2, 1)  rank=5      <-- conformant: trailing length-1 dim present

So a conformant _SHAPE dataset has rank = AOS depth + 1 (the trailing dimension of length = field ndim, i.e. 1 for a 1D field). That matches the writer (createOrUpdateShapesDataSet: dim = 1, tensorized over the AOS → rank AOSRank+1). My earlier claim that "rank 4 is the normal layout for a 1D field" was incorrect.

The file in #51 has geometry_SHAPE of rank 4 ({2,1,4,3702416}) — it is missing that trailing dimension, i.e. it is non-conformant with the IMAS-Core HDF5 contract. As @prasad-sawantdesai notes, that file was written partly via h5py (nimrod2imas/dump2imas.py) rather than through IMAS-Core's put/put_slice, and the direct-h5py path omits the trailing shapes dimension. A conformant, IMAS-Core-written file reads correctly on the unpatched release (verified), so the crashing code path is only reachable with the non-conformant layout — this is not a regression on normal data.

That means #52's expected_rank = aos + 1 check is correct against the contract: it accurately diagnoses the malformed dataset.

Given that, the real fix belongs in the producer — nimrod2imas should write the SHAPE dataset with the trailing dimension (or simply use IMAS-Core put/put_slice). On the IMAS-Core side the question is purely how to behave on non-conformant input:

Either way, the one thing both PRs agree on and that we clearly want is: no segfault / no out-of-bounds read on malformed input — it should be a clear error (or a safe read), never UB. I'm happy to converge #63 onto that: keep the bounds-safe guard but raise the same kind of explicit "non-conformant SHAPE dataset" diagnostic as #52 instead of silently reading, so the two PRs aren't at odds. Apologies for the confusion in my previous comment.

@SimonPinches
Copy link
Copy Markdown
Contributor Author

Closing in favour of #52. As established above, the file in #51 is non-conformant — its geometry_SHAPE dataset is missing the trailing shapes dimension that IMAS-Core's own writer produces (verified: a pure-IMAS-Core 1D GGD geometry yields a rank-5 geometry_SHAPE, the nimrod file has rank 4). #52's aos + 1 rank check correctly diagnoses this against the contract, so it is the right place to handle it.

The root-cause fix belongs in the producer (nimrod2imas/dump2imas.py), which should write the SHAPE dataset with the trailing dimension or use IMAS-Core put/put_slice rather than writing via raw h5py.

Thanks @prasad-sawantdesai for the clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accessing GGD geometry data triggers a segmentation fault

2 participants