Skip to content

Conversation

@nvaytet
Copy link
Member

@nvaytet nvaytet commented Nov 11, 2025

Example (from the tests):

class UserAffiliation(sl.Scope[RunType, str], str):
    """User affiliation."""

def load_user_affiliation(
    file: NeXusFileSpec[RunType],
    path: NeXusName[UserAffiliation[RunType]],
) -> UserAffiliation[RunType]:
    return UserAffiliation[RunType](load_field(file, path))

wf = GenericNeXusWorkflow(run_types=[SampleRun], monitor_types=[])
wf.insert(load_user_affiliation)
wf[Filename[SampleRun]] = loki_tutorial_sample_run_60250
wf[NeXusName[UserAffiliation[SampleRun]]] = '/entry/user_0/affiliation'
affiliation = wf.compute(UserAffiliation[SampleRun])
assert affiliation == 'ESS'

location: NeXusComponentLocationSpec[UserAffiliation, RunType],
) -> UserAffiliation[RunType]:
return UserAffiliation[RunType](
load_component(location, nx_class=snx.NXuser)['affiliation']
Copy link
Member Author

@nvaytet nvaytet Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not figure out how to just have load_component(location, nx_class=None) and (below) wf[NeXusName[UserAffiliation]] = '/entry/user_0/affiliation'...

I'm running into the error ValueError: Expected a NeXus group as item '/entry/title' but got a field.
Help welcome!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the approach at all: If we anyway add a custom "load" provider, why was the generic NeXus workflow modified? What role does it still play? Is it the location-spec mechanism?

Regarding the error, could it be related that load_component looks only within NXinstrument (except for the sample)? Is there some search logic that needs to be generalized?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the location-spec mechanism

Yes, I thought we needed to do things this way to re-use the load_component properly.
If there is a less invasive way to just add a provider and not modify the generic workflow, then it would be great if you can show me?
thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that's fine... provided we can actually re-use load_component?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the error, could it be related that load_component looks only within NXinstrument (except for the sample)?

Sorry, the error message was confusing. I copied the wrong message (from a previous iteration).
The error is actually
ValueError: Expected a NeXus group as item '/entry/user_0/affiliation' but got a field.

So it is because load_component expects to load a group and not a field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_component was made for any "physical component" (it handles positions, etc.). Say maybe it is reasonable and by design that it fails? Which brings us back to the question of what you gain by trying to uses the existing machinery instead of having a simple load provider?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't mind either way, I could not figure out how to properly just add a simple load provider which would hook into the part of the workflow that opens the file for reading.

I thought that using the Filename[SampleRun] as input to the provider would mean we open the file multiple times, and I wanted to avoid that.
So I would welcome some pointers as to what the provider would take in as input so it can load a custom field. Thanks in advance 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the same inputs that load_component does, but in a reusable provider (load_field or load_group?)?

def LoadDetectorWorkflow(
*,
run_types: Iterable[sciline.typing.Key],
monitor_types: Iterable[sciline.typing.Key],
Copy link
Member Author

@nvaytet nvaytet Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we needed to have the monitor_types here, when we are loading detectors?
So I removed it.

@nvaytet nvaytet marked this pull request as draft November 14, 2025 12:02
@nvaytet nvaytet changed the title Add component types contraints to allow loading custom component in a nexus file Add load_field and load_group to make it simpler to load custom entries using GenericNeXusWorkflow Nov 14, 2025
def load_field(
filename: NeXusFileSpec,
field_path: str,
selection: snx.typing.ScippIndex | slice = (),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to add the selection as an arg here instead of going the NeXusComponentLocationSpec way as it would have required creating a new TypeVar which was not only for Component, and then creating new types like NeXusEntryLocationSpec and also some new

def entry_spec_by_name(
    filename: NeXusFileSpec[RunType], name: NeXusName[EntryTypeVar]
) -> NeXusEntryLocationSpec[EntryTypeVar, RunType]:

(the analog to component_spec_by_name: https://github.com/scipp/essreduce/blob/main/src/ess/reduce/nexus/workflow.py#L89).

The current approach is less invasive.
But I don't mind implementing the above if there would be a use for it?

Right now, since the load_field would be called inside a custom providers, and selection parameters could simple be added to that provider, e.g.

def load_custom_entry(
    file: NeXusFileSpec[RunType],
    path: NeXusName[MyEntry[RunType]],
    start: MyEntryRangeStart[RunType],
    end: MyEntryRangeEnd[RunType]
) -> MyEntry[RunType]:
    return MyEntry[RunType](load_field(file, path, selection=slice(start, end)))

Not sure what other cases could be needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just use NeXusLocationSpec? It underlies NeXusComponentLocationSpec among others. So in a workflow, we only need a single new domain type instead of 2 custom range types as in your example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just use NeXusLocationSpec

I don't think so because NeXusLocationSpec is not a generic. I would need to depend on both RunType and also the EntryTypeVar so I have to create a new one no matter what?
Unless I missed something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant as an argument type to load_field and load_group because they are anyway not usable as providers. When you need a provider, you anyway have to wrap these functions and define custom domain types. So you may as well use the same mechanism as the existing code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if load_field takes in a NeXusLocationSpec as input, I either have to repeat the filename, e.g.

wf[Filename[SampleRun]] = loki_tutorial_sample_run_60250()
wf[NeXusLocationSpec[UserAffiliation[SampleRun]]] = NeXusLocationSpec(
    filename=loki_tutorial_sample_run_60250(),
    entry_name='/entry/user_0/affiliation',
    selection=...
)

or add a function like component_spec_by_name that makes the transition from NeXusFileSpec and NeXusName to a NeXusLocationSpec.

As mentioned above, I would then have to make a new generic NeXusEntryLocationSpec which uses a new TypeVar not limited to what is covered by Component, etc.

So you may as well use the same mechanism as the existing code.

That's what I tried to explain as to why I didn't use the same mechanism.
But I also said I don't mind implementing that if you think it's better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am proposing something different: As shown in you example, you have to wrap load_field in a separate provider function:

class UserAffiliation(sl.Scope[RunType, str], str):
    """User affiliation."""

def load_user_affiliation(
    file: NeXusFileSpec[RunType],
    path: NeXusName[UserAffiliation[RunType]],
) -> UserAffiliation[RunType]:
    return UserAffiliation[RunType](load_field(file, path))

I am proposing to change this to

class UserAffiliation(sl.Scope[RunType, str], str):
    """User affiliation."""

def load_user_affiliation(
    file: NeXusFileSpec[RunType],
    path: NeXusName[UserAffiliation[RunType]],
) -> UserAffiliation[RunType]:
    return UserAffiliation[RunType](load_field(
        NeXusLocationSpec(
                filename=file,
                entry_name=path,
                selection=...
        ))

and adjust load_field accordingly. This looks the same from the outside but allows us to reuse the existing mechanism for specifying locations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

@nvaytet nvaytet marked this pull request as ready for review November 14, 2025 14:53
@nvaytet nvaytet changed the title Add load_field and load_group to make it simpler to load custom entries using GenericNeXusWorkflow Add load_field and load_group to allow loading custom entries using GenericNeXusWorkflow Nov 14, 2025
wf[NeXusName[UserInfo]] = '/entry/user_0'
user_info = wf.compute(UserInfo[SampleRun])
assert user_info['affiliation'] == 'ESS'
assert user_info['name'] == 'John Doe'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should ideally be handled similarly to


and load one or more https://github.com/scipp/scippneutron/blob/3c49525dd89af68d375119c6f9072008f337dc6c/src/scippneutron/metadata/_model.py#L168

Does your code provide the basis for doing that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are you suggesting that instead of having

    class UserAffiliation(sl.Scope[RunType, str], str):
        """User affiliation."""

    def load_user_affiliation(
        file: NeXusFileSpec[RunType],
        path: NeXusName[UserAffiliation[RunType]],
    ) -> UserAffiliation[RunType]:
        return UserAffiliation[RunType](load_field(file, path))

I would need to create a pydantic model for the UserAffiliation with a from_nexus_entry classmethod, and then my load_user_affiliation would call load_metadata instead of load_field?

I guess I could but it feels a bit overkill?
I agree that for the proton charge, it might be worth it, because it will be used by multiple instruments.
But for something like an ExposureTime which is very specific to imaging, I feel it's too much effort?

In addition, I already felt it was annoying to have to make changes in essreduce so I could use them in essimaging, now I would have to first change scippneutron, then essreduce, then imaging...
I will say it again: should scippneutron and essreduce be merged?

Copy link
Member

@jl-wynen jl-wynen Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to make a model for UserAffiliation. We already have a Person model. All you need is a generic domain type for it (Like your UserAffiliation) and a provider to load it. It doesn't have to use a class method in ScippNeutron. And the code that needs the affiliation can extract that from a Person.

I think that the generic workflow should be able to load all users and provide a way to select one. (by index, path, or name, probably) Especially because this will be relevant to all workflows in some way.

And please don't misunderstand me, I think it is good to have tools for loading anything from the file in a simple way. But people related info is used in a lot of places. So I think we should have a common, robust, and flexible solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, did I understand correctly that your objection was not adding load_field and load_group, but it was because I was using them to load the affiliation instead of using the load_metadata and Person model?

I just picked that field randomly, to illustrate that we can load something custom from the file. Is it better if I pick something else to load?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, did I understand correctly that your objection was not adding load_field and load_group, but it was because I was using them to load the affiliation instead of using the load_metadata and Person model?

Correct. I am fine with the example and test case. My comment was only about checking whether this implementation is useful for implementing a loader for Person.
I'm now thinking that it probably isn't because we need to load an unknown number of users in general. So your functions here are unrelated.

"""
with open_nexus_file(filename.value) as f:
field = f[field_path]
return cast(sc.Variable | sc.DataArray, field[selection])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does scippnexus guarantee to return a variable even if the dataset is a string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I'm actually not sure why the cast doesn't fail in the test where I am loading the user affiliation, which is just a string.

Should I just remove the cast?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast does nothing at run time. It only narrows the type during type checking.

Should I just remove the cast?

And change the return type.

field = f[field_path]
return field[selection]
with open_nexus_file(location.filename, definitions=definitions) as f:
field = f[location.entry_name]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be location.component_name. entry_name is only meant to identify the entry in the file. So really, this should be

Suggested change
field = f[location.entry_name]
entry = _unique_child_group(f, snx.NXentry, location.entry_name)
field = entry[location.component_name]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure I understand what the benefit is? Does _unique_child_group provide some security that we are missing otherwise?

As I understand, this would mean that in the test, I would now need to split the path into the entry_name and the component_name (which may go wrong)?

    def load_user_affiliation(
        file: NeXusFileSpec[RunType],
        path: NeXusName[UserAffiliation[RunType]],
    ) -> UserAffiliation[RunType]:
        psplit = path.split('/')
        return UserAffiliation[RunType](
            load_field(NeXusLocationSpec(filename=file.value, entry_name="/".join(psplit[:-1], component_name=psplit[-1]))
        )

Copy link
Member Author

@nvaytet nvaytet Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it's because if we pass a NeXusLocationSpec that has a component_name, it would currently be ignored by load_field?

But the component_name is ignored by load_group. Is that expected? If so, should we document it better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about staying closer to the existing code. In practice, you rarely have to specify entry_name. If it is None, _unique_child_group will find the entry. So you only need the path from the entry.

Also, 'entry' always refers to the NXentry. So using netry_name to point to a dataset or group would be confusing. So load_group needs to change accordingly.

Copy link
Member Author

@nvaytet nvaytet Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, you rarely have to specify entry_name. If it is None, _unique_child_group will find the entry. So you only need the path from the entry.

Sorry I still don't understand. In my example above, how would you then load the user affiliation?
You can't load a field directly with _unique_child_group, it will raise Expected a NeXus group as item '{name}' but got a field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I still don't understand. In my example above, how would you then load the user affiliation? You can't load a field directly with _unique_child_group, it will raise Expected a NeXus group as item '{name}' but got a field.

def load_user_affiliation(
    file: NeXusFileSpec[RunType],
    path: NeXusName[UserAffiliation[RunType]],
) -> UserAffiliation[RunType]:
    return UserAffiliation[RunType](load_field(
        NeXusLocationSpec(
                filename=file,
                component_name=path,
                selection=...
        ))

# ...

wf[NeXusName[UserAffiliation[SampleRun]]] = 'user_0/affiliation'

where NeXusName[UserAffiliation[SampleRun]] is relative to the entry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@nvaytet nvaytet merged commit 194e23f into main Nov 19, 2025
4 checks passed
@nvaytet nvaytet deleted the component-types branch November 19, 2025 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants