-
Notifications
You must be signed in to change notification settings - Fork 1
Add load_field and load_group to allow loading custom entries using GenericNeXusWorkflow #278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tests/nexus/workflow_test.py
Outdated
| location: NeXusComponentLocationSpec[UserAffiliation, RunType], | ||
| ) -> UserAffiliation[RunType]: | ||
| return UserAffiliation[RunType]( | ||
| load_component(location, nx_class=snx.NXuser)['affiliation'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not figure out how to just have load_component(location, nx_class=None) and (below) wf[NeXusName[UserAffiliation]] = '/entry/user_0/affiliation'...
I'm running into the error ValueError: Expected a NeXus group as item '/entry/title' but got a field.
Help welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand the approach at all: If we anyway add a custom "load" provider, why was the generic NeXus workflow modified? What role does it still play? Is it the location-spec mechanism?
Regarding the error, could it be related that load_component looks only within NXinstrument (except for the sample)? Is there some search logic that needs to be generalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the location-spec mechanism
Yes, I thought we needed to do things this way to re-use the load_component properly.
If there is a less invasive way to just add a provider and not modify the generic workflow, then it would be great if you can show me?
thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that's fine... provided we can actually re-use load_component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the error, could it be related that load_component looks only within NXinstrument (except for the sample)?
Sorry, the error message was confusing. I copied the wrong message (from a previous iteration).
The error is actually
ValueError: Expected a NeXus group as item '/entry/user_0/affiliation' but got a field.
So it is because load_component expects to load a group and not a field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_component was made for any "physical component" (it handles positions, etc.). Say maybe it is reasonable and by design that it fails? Which brings us back to the question of what you gain by trying to uses the existing machinery instead of having a simple load provider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't mind either way, I could not figure out how to properly just add a simple load provider which would hook into the part of the workflow that opens the file for reading.
I thought that using the Filename[SampleRun] as input to the provider would mean we open the file multiple times, and I wanted to avoid that.
So I would welcome some pointers as to what the provider would take in as input so it can load a custom field. Thanks in advance 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the same inputs that load_component does, but in a reusable provider (load_field or load_group?)?
| def LoadDetectorWorkflow( | ||
| *, | ||
| run_types: Iterable[sciline.typing.Key], | ||
| monitor_types: Iterable[sciline.typing.Key], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we needed to have the monitor_types here, when we are loading detectors?
So I removed it.
| def load_field( | ||
| filename: NeXusFileSpec, | ||
| field_path: str, | ||
| selection: snx.typing.ScippIndex | slice = (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to add the selection as an arg here instead of going the NeXusComponentLocationSpec way as it would have required creating a new TypeVar which was not only for Component, and then creating new types like NeXusEntryLocationSpec and also some new
def entry_spec_by_name(
filename: NeXusFileSpec[RunType], name: NeXusName[EntryTypeVar]
) -> NeXusEntryLocationSpec[EntryTypeVar, RunType]:(the analog to component_spec_by_name: https://github.com/scipp/essreduce/blob/main/src/ess/reduce/nexus/workflow.py#L89).
The current approach is less invasive.
But I don't mind implementing the above if there would be a use for it?
Right now, since the load_field would be called inside a custom providers, and selection parameters could simple be added to that provider, e.g.
def load_custom_entry(
file: NeXusFileSpec[RunType],
path: NeXusName[MyEntry[RunType]],
start: MyEntryRangeStart[RunType],
end: MyEntryRangeEnd[RunType]
) -> MyEntry[RunType]:
return MyEntry[RunType](load_field(file, path, selection=slice(start, end)))Not sure what other cases could be needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just use NeXusLocationSpec? It underlies NeXusComponentLocationSpec among others. So in a workflow, we only need a single new domain type instead of 2 custom range types as in your example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just use
NeXusLocationSpec
I don't think so because NeXusLocationSpec is not a generic. I would need to depend on both RunType and also the EntryTypeVar so I have to create a new one no matter what?
Unless I missed something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant as an argument type to load_field and load_group because they are anyway not usable as providers. When you need a provider, you anyway have to wrap these functions and define custom domain types. So you may as well use the same mechanism as the existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if load_field takes in a NeXusLocationSpec as input, I either have to repeat the filename, e.g.
wf[Filename[SampleRun]] = loki_tutorial_sample_run_60250()
wf[NeXusLocationSpec[UserAffiliation[SampleRun]]] = NeXusLocationSpec(
filename=loki_tutorial_sample_run_60250(),
entry_name='/entry/user_0/affiliation',
selection=...
)or add a function like component_spec_by_name that makes the transition from NeXusFileSpec and NeXusName to a NeXusLocationSpec.
As mentioned above, I would then have to make a new generic NeXusEntryLocationSpec which uses a new TypeVar not limited to what is covered by Component, etc.
So you may as well use the same mechanism as the existing code.
That's what I tried to explain as to why I didn't use the same mechanism.
But I also said I don't mind implementing that if you think it's better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am proposing something different: As shown in you example, you have to wrap load_field in a separate provider function:
class UserAffiliation(sl.Scope[RunType, str], str):
"""User affiliation."""
def load_user_affiliation(
file: NeXusFileSpec[RunType],
path: NeXusName[UserAffiliation[RunType]],
) -> UserAffiliation[RunType]:
return UserAffiliation[RunType](load_field(file, path))I am proposing to change this to
class UserAffiliation(sl.Scope[RunType, str], str):
"""User affiliation."""
def load_user_affiliation(
file: NeXusFileSpec[RunType],
path: NeXusName[UserAffiliation[RunType]],
) -> UserAffiliation[RunType]:
return UserAffiliation[RunType](load_field(
NeXusLocationSpec(
filename=file,
entry_name=path,
selection=...
))and adjust load_field accordingly. This looks the same from the outside but allows us to reuse the existing mechanism for specifying locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
| wf[NeXusName[UserInfo]] = '/entry/user_0' | ||
| user_info = wf.compute(UserInfo[SampleRun]) | ||
| assert user_info['affiliation'] == 'ESS' | ||
| assert user_info['name'] == 'John Doe' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should ideally be handled similarly to
| def load_metadata( |
and load one or more https://github.com/scipp/scippneutron/blob/3c49525dd89af68d375119c6f9072008f337dc6c/src/scippneutron/metadata/_model.py#L168
Does your code provide the basis for doing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are you suggesting that instead of having
class UserAffiliation(sl.Scope[RunType, str], str):
"""User affiliation."""
def load_user_affiliation(
file: NeXusFileSpec[RunType],
path: NeXusName[UserAffiliation[RunType]],
) -> UserAffiliation[RunType]:
return UserAffiliation[RunType](load_field(file, path))I would need to create a pydantic model for the UserAffiliation with a from_nexus_entry classmethod, and then my load_user_affiliation would call load_metadata instead of load_field?
I guess I could but it feels a bit overkill?
I agree that for the proton charge, it might be worth it, because it will be used by multiple instruments.
But for something like an ExposureTime which is very specific to imaging, I feel it's too much effort?
In addition, I already felt it was annoying to have to make changes in essreduce so I could use them in essimaging, now I would have to first change scippneutron, then essreduce, then imaging...
I will say it again: should scippneutron and essreduce be merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to make a model for UserAffiliation. We already have a Person model. All you need is a generic domain type for it (Like your UserAffiliation) and a provider to load it. It doesn't have to use a class method in ScippNeutron. And the code that needs the affiliation can extract that from a Person.
I think that the generic workflow should be able to load all users and provide a way to select one. (by index, path, or name, probably) Especially because this will be relevant to all workflows in some way.
And please don't misunderstand me, I think it is good to have tools for loading anything from the file in a simple way. But people related info is used in a lot of places. So I think we should have a common, robust, and flexible solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, did I understand correctly that your objection was not adding load_field and load_group, but it was because I was using them to load the affiliation instead of using the load_metadata and Person model?
I just picked that field randomly, to illustrate that we can load something custom from the file. Is it better if I pick something else to load?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, did I understand correctly that your objection was not adding load_field and load_group, but it was because I was using them to load the affiliation instead of using the load_metadata and Person model?
Correct. I am fine with the example and test case. My comment was only about checking whether this implementation is useful for implementing a loader for Person.
I'm now thinking that it probably isn't because we need to load an unknown number of users in general. So your functions here are unrelated.
| """ | ||
| with open_nexus_file(filename.value) as f: | ||
| field = f[field_path] | ||
| return cast(sc.Variable | sc.DataArray, field[selection]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does scippnexus guarantee to return a variable even if the dataset is a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. I'm actually not sure why the cast doesn't fail in the test where I am loading the user affiliation, which is just a string.
Should I just remove the cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast does nothing at run time. It only narrows the type during type checking.
Should I just remove the cast?
And change the return type.
| field = f[field_path] | ||
| return field[selection] | ||
| with open_nexus_file(location.filename, definitions=definitions) as f: | ||
| field = f[location.entry_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be location.component_name. entry_name is only meant to identify the entry in the file. So really, this should be
| field = f[location.entry_name] | |
| entry = _unique_child_group(f, snx.NXentry, location.entry_name) | |
| field = entry[location.component_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure I understand what the benefit is? Does _unique_child_group provide some security that we are missing otherwise?
As I understand, this would mean that in the test, I would now need to split the path into the entry_name and the component_name (which may go wrong)?
def load_user_affiliation(
file: NeXusFileSpec[RunType],
path: NeXusName[UserAffiliation[RunType]],
) -> UserAffiliation[RunType]:
psplit = path.split('/')
return UserAffiliation[RunType](
load_field(NeXusLocationSpec(filename=file.value, entry_name="/".join(psplit[:-1], component_name=psplit[-1]))
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it's because if we pass a NeXusLocationSpec that has a component_name, it would currently be ignored by load_field?
But the component_name is ignored by load_group. Is that expected? If so, should we document it better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about staying closer to the existing code. In practice, you rarely have to specify entry_name. If it is None, _unique_child_group will find the entry. So you only need the path from the entry.
Also, 'entry' always refers to the NXentry. So using netry_name to point to a dataset or group would be confusing. So load_group needs to change accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, you rarely have to specify entry_name. If it is None, _unique_child_group will find the entry. So you only need the path from the entry.
Sorry I still don't understand. In my example above, how would you then load the user affiliation?
You can't load a field directly with _unique_child_group, it will raise Expected a NeXus group as item '{name}' but got a field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I still don't understand. In my example above, how would you then load the user affiliation? You can't load a field directly with
_unique_child_group, it will raiseExpected a NeXus group as item '{name}' but got a field.
def load_user_affiliation(
file: NeXusFileSpec[RunType],
path: NeXusName[UserAffiliation[RunType]],
) -> UserAffiliation[RunType]:
return UserAffiliation[RunType](load_field(
NeXusLocationSpec(
filename=file,
component_name=path,
selection=...
))
# ...
wf[NeXusName[UserAffiliation[SampleRun]]] = 'user_0/affiliation'where NeXusName[UserAffiliation[SampleRun]] is relative to the entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
…ngle load_from_path function
Example (from the tests):