Skip to content

Conversation

@nvaytet
Copy link
Member

@nvaytet nvaytet commented Nov 3, 2025

Some tweaks were required in the GenericNeXusWorkflow for histogram mode detectors.

  • a detector_number coordinate may not be present
  • we do not need to group event data by detector id

@nvaytet nvaytet requested a review from jl-wynen November 3, 2025 14:09
'ess/tbl',
files={
"857127_00000112_small.hdf": "md5:6f3059e0e5e111a2a8f1b368f24c6f93",
"857127_00000112_small.hdf": "md5:be5741892537a264c6f5df009a7c2881",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did the hash change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add some x_pixel_offset and y_pixel_offset coordinates to the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why didn't you bump the version? Doesn't this break old versions of the package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did bump the version... see 2 lines below

# Histogram mode detectors may not have a detector_number, as it is not needed
# for grouping data by detector id.
if 'detector_number' in children:
children['data'] = children['detector_number']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised the downstream code can handle this without changes. Is it possible to completely remove this line even for event data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why it's so surprising? Until now, any data we have used has a detector_number, so the if is always True, and the behaviour is basically unchanged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised that we don't need this check in other places. That the remaining code can handle this even without a detector number or 'data'. Hence my question, do we even need 'data'?

Copy link
Member Author

@nvaytet nvaytet Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was so that we were able to make a DataArray later? And that you can't create a DataArray without data? Only Datasets can be coordinate only if I remember correctly?
But I could be wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we wouldn't have any data here if there is no detector number. But actualy, we do for histogram detectors because we don't drop the NXdata entry above in children = _drop(children, (snx.NXoff_geometry, snx.NXevent_data)). This means that this code does not load only the geometry. But it loads the complete detector including the data. And that data then gets loaded again in a separate provider.

So please change this to not load the data here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, but I don't know what to do then, apart from requiring a detector_number from ECDC...
I don't see how we can load the empty detector without loading the data to know the shape.
Loading the x/y_pixel_offsets wouldn't necessarily give you the shape because they can be one-dimensional (e.g. for timepix3, the x_pixel_offset is a 1D array of size 4096, and the same for the y_pixel_offset, only the detector_number has shape 4096x4096)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where and when do we need the shape? Can we postpone this until we have the data? Otherwise, you could construct a detector number from an arange and fold it according to the shape of the data (loading that without loading the actual values).
Let's talk to figure this out!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot that we can look at the shape of an hdf5 dataset without loading the data... it should be possible to create it on the fly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of the latest update? (I still need to add tests, but would like to know if you think this is the right direction)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overall approach is good. See my individual comments.

field_sizes = None
if ('detector_number' not in children) and ('data' in children):
if 'value' in children['data']:
field_sizes = children['data']['value'].sizes.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no 'value', can you raise an exception here? Otherwise, we get a more cryptic error later on.

children['data'] = children['detector_number']
elif field_sizes is not None:
field_sizes.pop('time', None)
children['data'] = _EmptyField(sizes=field_sizes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use an empty field in all cases to make it more uniform? Or would that have to allocate additional memory compared to assigning the detector number?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I followed what you meant by "all cases". Did you mean have else instead of elif field_sizes is not None:?
What should we do in the case where there is no "data" in the children? (Not sure that is a valid case btw)
Should we add a scalar DummyField like for the monitor?
Or should we assume that we always have a data field, and then we can always add an EmptyField?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See updated logic.

if 'detector_number' in children:
children['data'] = children['detector_number']
elif field_sizes is not None:
field_sizes.pop('time', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment about why 'time' is dropped here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other "time dimension names" that might be found in files (I remember frame_time floating around?) and need to be dropped?

self.shape = tuple(sizes.values())

def __getitem__(self, key: Any) -> sc.Variable:
return sc.empty(dims=self.dims, shape=self.shape, unit=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd feel more safe if this was sc.zeros. There is a risk of reading uninitialised memory here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied over from DummyField.
If I use zeros, should I also use dtype int?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with zeros and int32 to save memory.

# Histogram mode detectors may not have a detector_number, as it is not needed
# for grouping data by detector id.
if 'detector_number' in children:
children['data'] = children['detector_number']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overall approach is good. See my individual comments.

@nvaytet nvaytet merged commit 4639153 into main Nov 6, 2025
4 checks passed
@nvaytet nvaytet deleted the histogram-detectors branch November 6, 2025 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants