Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application definitions #63

Merged
merged 25 commits into from
Nov 3, 2022
Merged

Application definitions #63

merged 25 commits into from
Nov 3, 2022

Conversation

SimonHeybrock
Copy link
Member

@SimonHeybrock SimonHeybrock commented Oct 17, 2022

Experimental (and incomplete) support for reading NXcanSAS.

Also fixes #65.

self._group = group

def child_strategy(self, group: H5Group):
if (definition_class := group.attrs.get(self.class_attribute)) is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires child classes to define class_attribute and strategies. @nvaytet and I talked about this recently and concluded that we should avoid this kind of requirement.
Can we avoid it here? That is, can this be implemented by defining those fields in ApplicationDefinitionStrategy? Or can we use a decorator instead of inheritance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this file is just a quick draft, to see if the strategy mechanism can be used for this. There are many things to improve here.


def child_strategy(self, group: H5Group):
if (definition_class := group.attrs.get(self.class_attribute)) is not None:
return self.strategies.get(definition_class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function returns None when

  • there is no class attribute in the file
  • there is no child strategy for the given attribute

Shouldn't the 2nd case be a hard failure, i.e. something is not supported by scippnexus?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is for the application definition to decide. Without a custom strategy, child strategies are always set to None, meaning the default strategy will be used by the child. Conceptually, this means the parent "does not care about or control" the child.

pass


NXcanSAS.classes = [SASroot, SASentry, SASdata, SAStransmission_spectrum]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By inheriting from NXcanSAS in all classes, this application definition supports arbitrary nesting. E.g. a root inside data. Does this make sense? Should we limit it more? For instance using something like the nexus loader prototype in ess?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered this. My conclusion was that in many cases structure may not be something people want to enforce (it will prevent reading partially broken/non-standard files, for example). At the same time, you can write your strategy in a way that it enforces structure.

@@ -2,7 +2,7 @@
# Copyright (c) 2022 Scipp contributors (https://github.com/scipp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach seems quite invasive in that it requires the NX* classes to know about and handle strategies. Is this necessary or could it be implemented on a higher level like the loader prototype in ess?
So what kind of modification to the behaviour of, e.g. NXdata does the application definition need to make which cannot be done as post processing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach seems quite invasive in that it requires the NX* classes to know about and handle strategies.

Well, this is the strategy pattern. It does not have to know about the details of the strategy.

Is this necessary or could it be implemented on a higher level like the loader prototype in ess?

Not really, you would need to fall back to h5py, since scippnexus will generally not be able to load groups that follow an application definition, e.g., because they define axes (dims) in a custom manner.

So what kind of modification to the behaviour of, e.g. NXdata does the application definition need to make which cannot be done as post processing?

Probably some could be made. But the post processing would need to reopen the file and read custom attributes, etc. So basically you could just go back and implemented your loader with plain h5py instead of scippnexus, as there might be little benefit.

@jl-wynen
Copy link
Member

Overall comment to the approach in #63 (comment)

@SimonHeybrock
Copy link
Member Author

SimonHeybrock commented Oct 24, 2022

@jl-wynen I refactored to use decorators (I will move this to a separate file once we are happy with it). One thing that bothers me is the creation of instances of the strategies for each group. I have considering changing the self argument of, e.g., NXdataStrategy or SASdata to group, and remove the __init__s that store the group. Then, instead of instantiating in the Nexus class, we would just call self._strategy.dims(self) instead of self._strategy.dims. This also looks a bit odd, but the strategy implementations themselves would be cleaner. Note they always have self._group.bla, with _group defined invisibly in a base class via the decorator.

@SimonHeybrock
Copy link
Member Author

@jl-wynen I refactored to use decorators (I will move this to a separate file once we are happy with it). One thing that bothers me is the creation of instances of the strategies for each group. I have considering changing the self argument of, e.g., NXdataStrategy or SASdata to group, and remove the __init__s that store the group. Then, instead of instantiating in the Nexus class, we would just call self._strategy.dims(self) instead of self._strategy.dims. This also looks a bit odd, but the strategy implementations themselves would be cleaner. Note they always have self._group.bla, with _group defined invisibly in a base class via the decorator.

I made this change, since it will make support for writing more convenient. Not perfectly happy with the result, but not worse that before either.

@SimonHeybrock SimonHeybrock marked this pull request as ready for review October 25, 2022 12:24
@SimonHeybrock
Copy link
Member Author

@jl-wynen I managed to cleanup the mechanism significantly. Can you have a look?

def __init__(self, data):
self.data = data

def __write_to_nexus_group__(self, group: NXobject):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a need for a dunder method since we are not going to support writing wildly different objects from different libraries. A protocol like this should be sufficient:

class NXWritable(Protocol):
    def write_to_nexus_group(self, group:NXobject):
        ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for not making this a normal method was that users may want to make their existing classes "writable", without implementing a new class, and without changing the publicly visible interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dunder method is public. In my mind, a dunder method relates to a general convention across libraries / codebases. In this case here, we basically have duck typing, we just require a certain method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes public, but it does not interfere with tab completion, etc.

# Note that there is also an "uncertainties" attribute. It is not clear
# to me what the difference is.
coord.attrs['resolutions'] = 'Qdev'
group.create_field('Qdev', sc.stddevs(da.coords['Q']))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'resolution' and 'uncertainty' seem to be different concepts. I would interpret it as resolution referring to systematic effects (e.g. binning in Q, potentially implicitly induced by the detector resolution) and uncertainty to statistical errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if someone actually uses both they will need to provide this in a different manner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But shouldn't we write uncertainties instead of resolutions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually people talk about "Q-resolution", and I do not think it is statistical. I don't know if anyone has decided how it will be stored, and for now just assumed they might use Scipp's variances.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if you think this is safe. I would be worried that people treat a systematic uncertainty as though it was statistical and get bad fits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will definitely need to check with the scientists, for now a lot of this is incomplete anyway.

if isinstance(axes, str):
axes = axes.split(" ")
if axes is None:
axes = group.attrs.get('I_axes')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a string with multiple 'entries'? I.e. does this if need to be swapped with the previous one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation does not say. Will probably need to fix this in the future if it causes problems.

axes = axes.split(" ")
if axes is None:
axes = group.attrs.get('I_axes')
if not isinstance(axes, list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can axes be a numpy array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Checking for int now instead.

class SASdataStrategy:

@staticmethod
def dims(group: NXobject) -> Tuple[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def dims(group: NXobject) -> Tuple[str]:
def dims(group: NXobject) -> Tuple[str, ...]:

src/scippnexus/nxobject.py Outdated Show resolved Hide resolved
self.child_params = {}
self._definition = definition
self._strategy = self._default_strategy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._strategy = self._default_strategy()

src/scippnexus/nxdata.py Show resolved Hide resolved
src/scippnexus/definitions/nxcansas.py Outdated Show resolved Hide resolved
tests/application_definition_test.py Show resolved Hide resolved
self.data = data
assert Q_variances in (None, 'uncertainties', 'resolutions')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learned a surprising thing recently, assertions are removed when making optimised builds (e.g. in a wheel). So this should be a proper if-raise construct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know that either!

@SimonHeybrock SimonHeybrock merged commit 840c260 into main Nov 3, 2022
@SimonHeybrock SimonHeybrock deleted the application-definitions branch November 3, 2022 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Writing NeXus files based on application definitions
2 participants