Add load_field and load_group to allow loading custom entries using GenericNeXusWorkflow #278

nvaytet · 2025-11-11T10:35:44Z

Example (from the tests):

class UserAffiliation(sl.Scope[RunType, str], str):
    """User affiliation."""

def load_user_affiliation(
    file: NeXusFileSpec[RunType],
    path: NeXusName[UserAffiliation[RunType]],
) -> UserAffiliation[RunType]:
    return UserAffiliation[RunType](load_field(file, path))

wf = GenericNeXusWorkflow(run_types=[SampleRun], monitor_types=[])
wf.insert(load_user_affiliation)
wf[Filename[SampleRun]] = loki_tutorial_sample_run_60250
wf[NeXusName[UserAffiliation[SampleRun]]] = '/entry/user_0/affiliation'
affiliation = wf.compute(UserAffiliation[SampleRun])
assert affiliation == 'ESS'

…the nexus file

nvaytet · 2025-11-11T10:37:20Z

tests/nexus/workflow_test.py

+        location: NeXusComponentLocationSpec[UserAffiliation, RunType],
+    ) -> UserAffiliation[RunType]:
+        return UserAffiliation[RunType](
+            load_component(location, nx_class=snx.NXuser)['affiliation']


I could not figure out how to just have load_component(location, nx_class=None) and (below) wf[NeXusName[UserAffiliation]] = '/entry/user_0/affiliation'...

I'm running into the error ValueError: Expected a NeXus group as item '/entry/title' but got a field.
Help welcome!

I am not sure I understand the approach at all: If we anyway add a custom "load" provider, why was the generic NeXus workflow modified? What role does it still play? Is it the location-spec mechanism?

Regarding the error, could it be related that load_component looks only within NXinstrument (except for the sample)? Is there some search logic that needs to be generalized?

Is it the location-spec mechanism

Yes, I thought we needed to do things this way to re-use the load_component properly.
If there is a less invasive way to just add a provider and not modify the generic workflow, then it would be great if you can show me?
thanks

No that's fine... provided we can actually re-use load_component?

Regarding the error, could it be related that load_component looks only within NXinstrument (except for the sample)?

Sorry, the error message was confusing. I copied the wrong message (from a previous iteration).
The error is actually
ValueError: Expected a NeXus group as item '/entry/user_0/affiliation' but got a field.

So it is because load_component expects to load a group and not a field.

load_component was made for any "physical component" (it handles positions, etc.). Say maybe it is reasonable and by design that it fails? Which brings us back to the question of what you gain by trying to uses the existing machinery instead of having a simple load provider?

I really don't mind either way, I could not figure out how to properly just add a simple load provider which would hook into the part of the workflow that opens the file for reading.

I thought that using the Filename[SampleRun] as input to the provider would mean we open the file multiple times, and I wanted to avoid that.
So I would welcome some pointers as to what the provider would take in as input so it can load a custom field. Thanks in advance 🙂

Why not use the same inputs that load_component does, but in a reusable provider (load_field or load_group?)?

nvaytet · 2025-11-11T10:39:06Z

src/ess/reduce/nexus/workflow.py

 def LoadDetectorWorkflow(
-    *,
-    run_types: Iterable[sciline.typing.Key],
-    monitor_types: Iterable[sciline.typing.Key],


Not sure why we needed to have the monitor_types here, when we are loading detectors?
So I removed it.

…ponent-types

…load_group

nvaytet · 2025-11-14T14:52:48Z

src/ess/reduce/nexus/_nexus_loader.py

+def load_field(
+    filename: NeXusFileSpec,
+    field_path: str,
+    selection: snx.typing.ScippIndex | slice = (),


I decided to add the selection as an arg here instead of going the NeXusComponentLocationSpec way as it would have required creating a new TypeVar which was not only for Component, and then creating new types like NeXusEntryLocationSpec and also some new

def entry_spec_by_name( filename: NeXusFileSpec[RunType], name: NeXusName[EntryTypeVar] ) -> NeXusEntryLocationSpec[EntryTypeVar, RunType]:

(the analog to component_spec_by_name: https://github.com/scipp/essreduce/blob/main/src/ess/reduce/nexus/workflow.py#L89).

The current approach is less invasive.
But I don't mind implementing the above if there would be a use for it?

Right now, since the load_field would be called inside a custom providers, and selection parameters could simple be added to that provider, e.g.

def load_custom_entry( file: NeXusFileSpec[RunType], path: NeXusName[MyEntry[RunType]], start: MyEntryRangeStart[RunType], end: MyEntryRangeEnd[RunType] ) -> MyEntry[RunType]: return MyEntry[RunType](load_field(file, path, selection=slice(start, end)))

Not sure what other cases could be needed?

Can't you just use NeXusLocationSpec? It underlies NeXusComponentLocationSpec among others. So in a workflow, we only need a single new domain type instead of 2 custom range types as in your example.

Can't you just use NeXusLocationSpec

I don't think so because NeXusLocationSpec is not a generic. I would need to depend on both RunType and also the EntryTypeVar so I have to create a new one no matter what?
Unless I missed something?

I meant as an argument type to load_field and load_group because they are anyway not usable as providers. When you need a provider, you anyway have to wrap these functions and define custom domain types. So you may as well use the same mechanism as the existing code.

So if load_field takes in a NeXusLocationSpec as input, I either have to repeat the filename, e.g.

wf[Filename[SampleRun]] = loki_tutorial_sample_run_60250() wf[NeXusLocationSpec[UserAffiliation[SampleRun]]] = NeXusLocationSpec( filename=loki_tutorial_sample_run_60250(), entry_name='/entry/user_0/affiliation', selection=... )

or add a function like component_spec_by_name that makes the transition from NeXusFileSpec and NeXusName to a NeXusLocationSpec.

As mentioned above, I would then have to make a new generic NeXusEntryLocationSpec which uses a new TypeVar not limited to what is covered by Component, etc.

So you may as well use the same mechanism as the existing code.

That's what I tried to explain as to why I didn't use the same mechanism.
But I also said I don't mind implementing that if you think it's better.

I am proposing something different: As shown in you example, you have to wrap load_field in a separate provider function:

class UserAffiliation(sl.Scope[RunType, str], str): """User affiliation.""" def load_user_affiliation( file: NeXusFileSpec[RunType], path: NeXusName[UserAffiliation[RunType]], ) -> UserAffiliation[RunType]: return UserAffiliation[RunType](load_field(file, path))

I am proposing to change this to

class UserAffiliation(sl.Scope[RunType, str], str): """User affiliation.""" def load_user_affiliation( file: NeXusFileSpec[RunType], path: NeXusName[UserAffiliation[RunType]], ) -> UserAffiliation[RunType]: return UserAffiliation[RunType](load_field( NeXusLocationSpec( filename=file, entry_name=path, selection=... ))

and adjust load_field accordingly. This looks the same from the outside but allows us to reuse the existing mechanism for specifying locations.

jl-wynen · 2025-11-17T08:22:50Z

tests/nexus/workflow_test.py

+    wf[NeXusName[UserInfo]] = '/entry/user_0'
+    user_info = wf.compute(UserInfo[SampleRun])
+    assert user_info['affiliation'] == 'ESS'
+    assert user_info['name'] == 'John Doe'


This should ideally be handled similarly to

essreduce/src/ess/reduce/nexus/_nexus_loader.py

Line 104 in b46edac

def load_metadata(

and load one or more https://github.com/scipp/scippneutron/blob/3c49525dd89af68d375119c6f9072008f337dc6c/src/scippneutron/metadata/_model.py#L168

Does your code provide the basis for doing that?

So are you suggesting that instead of having

class UserAffiliation(sl.Scope[RunType, str], str): """User affiliation.""" def load_user_affiliation( file: NeXusFileSpec[RunType], path: NeXusName[UserAffiliation[RunType]], ) -> UserAffiliation[RunType]: return UserAffiliation[RunType](load_field(file, path))

I would need to create a pydantic model for the UserAffiliation with a from_nexus_entry classmethod, and then my load_user_affiliation would call load_metadata instead of load_field?

I guess I could but it feels a bit overkill?
I agree that for the proton charge, it might be worth it, because it will be used by multiple instruments.
But for something like an ExposureTime which is very specific to imaging, I feel it's too much effort?

In addition, I already felt it was annoying to have to make changes in essreduce so I could use them in essimaging, now I would have to first change scippneutron, then essreduce, then imaging...
I will say it again: should scippneutron and essreduce be merged?

You don't need to make a model for UserAffiliation. We already have a Person model. All you need is a generic domain type for it (Like your UserAffiliation) and a provider to load it. It doesn't have to use a class method in ScippNeutron. And the code that needs the affiliation can extract that from a Person.

I think that the generic workflow should be able to load all users and provide a way to select one. (by index, path, or name, probably) Especially because this will be relevant to all workflows in some way.

And please don't misunderstand me, I think it is good to have tools for loading anything from the file in a simple way. But people related info is used in a lot of places. So I think we should have a common, robust, and flexible solution.

So, did I understand correctly that your objection was not adding load_field and load_group, but it was because I was using them to load the affiliation instead of using the load_metadata and Person model?

I just picked that field randomly, to illustrate that we can load something custom from the file. Is it better if I pick something else to load?

So, did I understand correctly that your objection was not adding load_field and load_group, but it was because I was using them to load the affiliation instead of using the load_metadata and Person model?

Correct. I am fine with the example and test case. My comment was only about checking whether this implementation is useful for implementing a loader for Person.
I'm now thinking that it probably isn't because we need to load an unknown number of users in general. So your functions here are unrelated.

jl-wynen · 2025-11-17T08:23:56Z

src/ess/reduce/nexus/_nexus_loader.py

+    """
+    with open_nexus_file(filename.value) as f:
+        field = f[field_path]
+        return cast(sc.Variable | sc.DataArray, field[selection])


Does scippnexus guarantee to return a variable even if the dataset is a string?

I don't think so. I'm actually not sure why the cast doesn't fail in the test where I am loading the user affiliation, which is just a string.

Should I just remove the cast?

cast does nothing at run time. It only narrows the type during type checking.

Should I just remove the cast?

And change the return type.

jl-wynen · 2025-11-18T08:25:53Z

src/ess/reduce/nexus/_nexus_loader.py

-        field = f[field_path]
-        return field[selection]
+    with open_nexus_file(location.filename, definitions=definitions) as f:
+        field = f[location.entry_name]


This should be location.component_name. entry_name is only meant to identify the entry in the file. So really, this should be

Suggested change

field = f[location.entry_name]

entry = _unique_child_group(f, snx.NXentry, location.entry_name)

field = entry[location.component_name]

Hmm, I'm not sure I understand what the benefit is? Does _unique_child_group provide some security that we are missing otherwise?

As I understand, this would mean that in the test, I would now need to split the path into the entry_name and the component_name (which may go wrong)?

def load_user_affiliation( file: NeXusFileSpec[RunType], path: NeXusName[UserAffiliation[RunType]], ) -> UserAffiliation[RunType]: psplit = path.split('/') return UserAffiliation[RunType]( load_field(NeXusLocationSpec(filename=file.value, entry_name="/".join(psplit[:-1], component_name=psplit[-1])) )

Or maybe it's because if we pass a NeXusLocationSpec that has a component_name, it would currently be ignored by load_field?

But the component_name is ignored by load_group. Is that expected? If so, should we document it better?

This is about staying closer to the existing code. In practice, you rarely have to specify entry_name. If it is None, _unique_child_group will find the entry. So you only need the path from the entry.

Also, 'entry' always refers to the NXentry. So using netry_name to point to a dataset or group would be confusing. So load_group needs to change accordingly.

In practice, you rarely have to specify entry_name. If it is None, _unique_child_group will find the entry. So you only need the path from the entry.

Sorry I still don't understand. In my example above, how would you then load the user affiliation?
You can't load a field directly with _unique_child_group, it will raise Expected a NeXus group as item '{name}' but got a field.

Sorry I still don't understand. In my example above, how would you then load the user affiliation? You can't load a field directly with _unique_child_group, it will raise Expected a NeXus group as item '{name}' but got a field.

def load_user_affiliation( file: NeXusFileSpec[RunType], path: NeXusName[UserAffiliation[RunType]], ) -> UserAffiliation[RunType]: return UserAffiliation[RunType](load_field( NeXusLocationSpec( filename=file, component_name=path, selection=... )) # ... wf[NeXusName[UserAffiliation[SampleRun]]] = 'user_0/affiliation'

where NeXusName[UserAffiliation[SampleRun]] is relative to the entry.

…ngle load_from_path function

nvaytet added 3 commits November 11, 2025 11:07

add component_types to contraints to allow loading custom entries in …

681179c

…the nexus file

add tests

50ad6a0

formatting

43df02f

nvaytet commented Nov 11, 2025

View reviewed changes

fix LoadDetectorWorkflow tests

ca661a1

nvaytet mentioned this pull request Nov 11, 2025

Image normalization workflow for TBL/Orca scipp/essimaging#127

Merged

nvaytet requested a review from SimonHeybrock November 11, 2025 10:53

nvaytet and others added 3 commits November 12, 2025 13:10

Merge branch 'main' into component-types

b95ef2f

trying without component_types and adding a load_field

8fbb233

Merge branch 'component-types' of github.com:scipp/essreduce into com…

f5a448e

…ponent-types

nvaytet marked this pull request as draft November 14, 2025 12:02

pre-commit-ci-lite bot and others added 3 commits November 14, 2025 12:02

Apply automatic formatting

49b4b4c

remove component_types in favor of providers that use load_field and …

e18cd6b

…load_group

Apply automatic formatting

c43c1b0

nvaytet changed the title ~~Add component types contraints to allow loading custom component in a nexus file~~ Add load_field and load_group to make it simpler to load custom entries using GenericNeXusWorkflow Nov 14, 2025

nvaytet commented Nov 14, 2025

View reviewed changes

nvaytet marked this pull request as ready for review November 14, 2025 14:53

nvaytet changed the title ~~Add load_field and load_group to make it simpler to load custom entries using GenericNeXusWorkflow~~ Add load_field and load_group to allow loading custom entries using GenericNeXusWorkflow Nov 14, 2025

jl-wynen reviewed Nov 17, 2025

View reviewed changes

nvaytet added 2 commits November 17, 2025 11:58

remove cast for load_field

0f3fd11

use NeXusLocationSpec

c6e1bf8

jl-wynen requested changes Nov 18, 2025

View reviewed changes

nvaytet and others added 2 commits November 19, 2025 11:19

use _unique_child_group and merge load_field and load_group into a si…

f19dd47

…ngle load_from_path function

Apply automatic formatting

f4b1611

jl-wynen approved these changes Nov 19, 2025

View reviewed changes

nvaytet merged commit 194e23f into main Nov 19, 2025
4 checks passed

nvaytet deleted the component-types branch November 19, 2025 10:48

	field = f[location.entry_name]
	entry = _unique_child_group(f, snx.NXentry, location.entry_name)
	field = entry[location.component_name]

Add load_field and load_group to allow loading custom entries using GenericNeXusWorkflow #278

Add load_field and load_group to allow loading custom entries using GenericNeXusWorkflow #278

Uh oh!

Conversation

nvaytet commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nvaytet Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nvaytet Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jl-wynen Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nvaytet Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nvaytet Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nvaytet commented Nov 11, 2025 •

edited

Loading

nvaytet Nov 11, 2025 •

edited

Loading

nvaytet Nov 11, 2025 •

edited

Loading

jl-wynen Nov 17, 2025 •

edited

Loading

nvaytet Nov 18, 2025 •

edited

Loading

nvaytet Nov 18, 2025 •

edited

Loading