New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-25447: Add support for read-only components #319
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good. I have some of the same concerns on the big picture and increasing complexity that you do, but I'm mostly happy to trust that you've looked into alternatives more than I have and have chosen least-bad options. On the off chance a thought I had for simplifying things might not be one you've considered, I've left a couple of comments on those big-picture things at various points in the PR.
One more quick big-picture thought: in terms of nomenclature, I wonder if we should not be calling these "readComponents" a type of "component" at all - maybe "properties", with what we're calling "allComponents" here (i.e. real components and properties) -> "attributes"? Just a thought - I don't like those names enough better that I'd want to push for changing things this late in the game, but the naming right now feels like we started out by thinking of these as another kind of component, and by the end it became clear that they were more different than similar.
config/storageClasses.yaml
Outdated
Point2I: | ||
pytype: lsst.geom.Point2I | ||
Filter: | ||
pytype: lsst.afw.image.Filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets me wondering about whether we should allow (maybe just in this context, maybe more broadly) regular Python classes to be used as StorageClasses
when their definitions are trivial, rather than requiring those to be added explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting thought albeit not for this ticket. So for StorageClass("lsst.afw.image.Filter")
then rather than defaulting to object
we assume that the pytype name is the storage class name... (we could even adopt the convention that names that start with upper case are never pytypes). Worth pondering.
if isReadOnlyComponent: | ||
inMemoryDataset = writeStorageClass.assembler().handleParameters(inMemoryDataset, parameters) | ||
# Then disable parameters for later | ||
parameters = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a concrete case where this matters? I'm just wondering if we should say, "you can't pass parameters if you want a read-only component" to try to make things a little simpler.
Edit: after writing the above I saw the commit message that referenced an origin
parameter to bbox
. I'd be okay with just dropping support for that (origin=PARENT
is all we need - the box returned with origin=LOCAL
is trivially calculable from the one returned with origin=PARENT
). I'd be fine with dropping the xy0
and dimensions
read components as being similarly redundant, though I gather that doesn't gain us as much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll ponder this some more when I write up the formatter/assembler documentation. Stating that parameters are never allowed for read-only components is the easiest approach.
I called them read components because from the point of view of the caller they act exactly like components. The caller doesn't even know that some are read/write and some are read-only because they have no idea which of the components ends up being important for disassembly. We still refer to them as "composite.component" and calling them a property or attribute might be easier to talk about internally (since talking about read-write component relevant for read-only component is annoying at times) but for the general interface they are components. |
dfb1217
to
b3e3bd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I'm glad the last review pass was useful. I don't have much to add in this pass - just minor comments - though some of the old ones are still relevant as well.
python/lsst/daf/butler/registry/datasets/byDimensions/_manager.py
Outdated
Show resolved
Hide resolved
This necessitated a allComponents method to return the read/write components and read-only components together.
No longer used.
This allows us to test specialist formatter capabilities without having to expand the generic YAML and pickle formatters. It also allows us to test read parameters in formatters in addition to assemblers.
For a disassembled component the formatter is only interested in reading the entire file, not extracting a component from the component.
This enables disassembly testing with the new formatters without involving butler registry.
We have to process the parameters before we extract the read-only component.
This creates dataset types for all the components.
We no longer store components at the registry level so there is not much to be gained by registering the dataset types.
This requires a new method on assembler to return which component should be used to calculate the read-only component.
Parameters for read-only components are problematic since it is not entirely clear whether the parameters should be applied to the component that is used to calculate the read-only component, or should be applied to the calculation of the read-only component itself. The complication is that assemblers must support the same parameters as formatters (otherwise in-memory datastore can not function) and also can apply parameters that the formatter did not understand. In the current implementation this means that the assembler can only see the final storage class and at that point parameters might not be relevant. As concrete examples. In the test suite I have added a "counter" read-only component that returns the number of elements in metrics.data. This means that the assembler sees an Integer storage class and integers aren't amenable to parameters. Instead the "slice" parameter is assumed to apply to the "data" component and then "counter" is calculated on the sliced "data". Another example is bbox in Exposure. This can be applied to the "image" component. Where should parameters go? To the calculation of the "image" or to the calculation of the bounding box (origin is the only one). Once the bounding box is created though the "origin" parameter has no meaning so "origin" is only relevant when passed to the "image" and can't be relevant to the assembler. Therefore read-only components for disassembled composites do not receive any parameters.
…mponent Check that a component does have a parent but a composite does not. The parent storage class is not checked to determine if the component is defined by it.
32cf9eb
to
235453e
Compare
This required changes to how pickling and deep copy worked. Also to simplify pickling the parentStorageClass is now also a positional argument.
This allows pipeline definitions to set a temporary parent when analyzing the pipeline but then during execution update it with the real parent.
This was quite a lot more code than I was expecting. There is still an issue over what to do with parameters with read-only components. For now they are sent to the component that is being used to calculate the read-only component and can't be sent to the assembler at all.