-
Notifications
You must be signed in to change notification settings - Fork 1
Small fixes for loading histogram-mode detectors #274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tests/conftest.py
Outdated
| 'ess/tbl', | ||
| files={ | ||
| "857127_00000112_small.hdf": "md5:6f3059e0e5e111a2a8f1b368f24c6f93", | ||
| "857127_00000112_small.hdf": "md5:be5741892537a264c6f5df009a7c2881", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did the hash change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to add some x_pixel_offset and y_pixel_offset coordinates to the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why didn't you bump the version? Doesn't this break old versions of the package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did bump the version... see 2 lines below
src/ess/reduce/nexus/workflow.py
Outdated
| # Histogram mode detectors may not have a detector_number, as it is not needed | ||
| # for grouping data by detector id. | ||
| if 'detector_number' in children: | ||
| children['data'] = children['detector_number'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised the downstream code can handle this without changes. Is it possible to completely remove this line even for event data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why it's so surprising? Until now, any data we have used has a detector_number, so the if is always True, and the behaviour is basically unchanged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised that we don't need this check in other places. That the remaining code can handle this even without a detector number or 'data'. Hence my question, do we even need 'data'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was so that we were able to make a DataArray later? And that you can't create a DataArray without data? Only Datasets can be coordinate only if I remember correctly?
But I could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we wouldn't have any data here if there is no detector number. But actualy, we do for histogram detectors because we don't drop the NXdata entry above in children = _drop(children, (snx.NXoff_geometry, snx.NXevent_data)). This means that this code does not load only the geometry. But it loads the complete detector including the data. And that data then gets loaded again in a separate provider.
So please change this to not load the data here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, but I don't know what to do then, apart from requiring a detector_number from ECDC...
I don't see how we can load the empty detector without loading the data to know the shape.
Loading the x/y_pixel_offsets wouldn't necessarily give you the shape because they can be one-dimensional (e.g. for timepix3, the x_pixel_offset is a 1D array of size 4096, and the same for the y_pixel_offset, only the detector_number has shape 4096x4096)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where and when do we need the shape? Can we postpone this until we have the data? Otherwise, you could construct a detector number from an arange and fold it according to the shape of the data (loading that without loading the actual values).
Let's talk to figure this out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I forgot that we can look at the shape of an hdf5 dataset without loading the data... it should be possible to create it on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of the latest update? (I still need to add tests, but would like to know if you think this is the right direction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overall approach is good. See my individual comments.
src/ess/reduce/nexus/workflow.py
Outdated
| field_sizes = None | ||
| if ('detector_number' not in children) and ('data' in children): | ||
| if 'value' in children['data']: | ||
| field_sizes = children['data']['value'].sizes.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no 'value', can you raise an exception here? Otherwise, we get a more cryptic error later on.
src/ess/reduce/nexus/workflow.py
Outdated
| children['data'] = children['detector_number'] | ||
| elif field_sizes is not None: | ||
| field_sizes.pop('time', None) | ||
| children['data'] = _EmptyField(sizes=field_sizes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use an empty field in all cases to make it more uniform? Or would that have to allocate additional memory compared to assigning the detector number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I followed what you meant by "all cases". Did you mean have else instead of elif field_sizes is not None:?
What should we do in the case where there is no "data" in the children? (Not sure that is a valid case btw)
Should we add a scalar DummyField like for the monitor?
Or should we assume that we always have a data field, and then we can always add an EmptyField?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See updated logic.
src/ess/reduce/nexus/workflow.py
Outdated
| if 'detector_number' in children: | ||
| children['data'] = children['detector_number'] | ||
| elif field_sizes is not None: | ||
| field_sizes.pop('time', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment about why 'time' is dropped here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other "time dimension names" that might be found in files (I remember frame_time floating around?) and need to be dropped?
src/ess/reduce/nexus/workflow.py
Outdated
| self.shape = tuple(sizes.values()) | ||
|
|
||
| def __getitem__(self, key: Any) -> sc.Variable: | ||
| return sc.empty(dims=self.dims, shape=self.shape, unit=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd feel more safe if this was sc.zeros. There is a risk of reading uninitialised memory here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied over from DummyField.
If I use zeros, should I also use dtype int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with zeros and int32 to save memory.
src/ess/reduce/nexus/workflow.py
Outdated
| # Histogram mode detectors may not have a detector_number, as it is not needed | ||
| # for grouping data by detector id. | ||
| if 'detector_number' in children: | ||
| children['data'] = children['detector_number'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overall approach is good. See my individual comments.
Some tweaks were required in the
GenericNeXusWorkflowfor histogram mode detectors.detector_numbercoordinate may not be present