Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-25447: Add support for read-only components #319

Merged
merged 24 commits into from Jul 2, 2020
Merged

Conversation

timj
Copy link
Member

@timj timj commented Jun 25, 2020

This was quite a lot more code than I was expecting. There is still an issue over what to do with parameters with read-only components. For now they are sent to the component that is being used to calculate the read-only component and can't be sent to the assembler at all.

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. I have some of the same concerns on the big picture and increasing complexity that you do, but I'm mostly happy to trust that you've looked into alternatives more than I have and have chosen least-bad options. On the off chance a thought I had for simplifying things might not be one you've considered, I've left a couple of comments on those big-picture things at various points in the PR.

One more quick big-picture thought: in terms of nomenclature, I wonder if we should not be calling these "readComponents" a type of "component" at all - maybe "properties", with what we're calling "allComponents" here (i.e. real components and properties) -> "attributes"? Just a thought - I don't like those names enough better that I'd want to push for changing things this late in the game, but the naming right now feels like we started out by thinking of these as another kind of component, and by the end it became clear that they were more different than similar.

config/storageClasses.yaml Show resolved Hide resolved
python/lsst/daf/butler/core/storageClass.py Outdated Show resolved Hide resolved
Point2I:
pytype: lsst.geom.Point2I
Filter:
pytype: lsst.afw.image.Filter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets me wondering about whether we should allow (maybe just in this context, maybe more broadly) regular Python classes to be used as StorageClasses when their definitions are trivial, rather than requiring those to be added explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting thought albeit not for this ticket. So for StorageClass("lsst.afw.image.Filter") then rather than defaulting to object we assume that the pytype name is the storage class name... (we could even adopt the convention that names that start with upper case are never pytypes). Worth pondering.

tests/test_datastore.py Show resolved Hide resolved
if isReadOnlyComponent:
inMemoryDataset = writeStorageClass.assembler().handleParameters(inMemoryDataset, parameters)
# Then disable parameters for later
parameters = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concrete case where this matters? I'm just wondering if we should say, "you can't pass parameters if you want a read-only component" to try to make things a little simpler.

Edit: after writing the above I saw the commit message that referenced an origin parameter to bbox. I'd be okay with just dropping support for that (origin=PARENT is all we need - the box returned with origin=LOCAL is trivially calculable from the one returned with origin=PARENT). I'd be fine with dropping the xy0 and dimensions read components as being similarly redundant, though I gather that doesn't gain us as much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll ponder this some more when I write up the formatter/assembler documentation. Stating that parameters are never allowed for read-only components is the easiest approach.

python/lsst/daf/butler/core/formatter.py Outdated Show resolved Hide resolved
@timj
Copy link
Member Author

timj commented Jun 26, 2020

I called them read components because from the point of view of the caller they act exactly like components. The caller doesn't even know that some are read/write and some are read-only because they have no idea which of the components ends up being important for disassembly. We still refer to them as "composite.component" and calling them a property or attribute might be easier to talk about internally (since talking about read-write component relevant for read-only component is annoying at times) but for the general interface they are components.

@timj timj force-pushed the tickets/DM-25447 branch 5 times, most recently from dfb1217 to b3e3bd9 Compare June 30, 2020 22:35
Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'm glad the last review pass was useful. I don't have much to add in this pass - just minor comments - though some of the old ones are still relevant as well.

tests/test_butler.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/core/datasets/type.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/core/datasets/type.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/core/datasets/type.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/core/datasets/type.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/datastores/fileLikeDatastore.py Outdated Show resolved Hide resolved
timj added 19 commits July 1, 2020 08:36
This necessitated a allComponents method to return the
read/write components and read-only components together.
This allows us to test specialist formatter capabilities
without having to expand the generic YAML and pickle
formatters.  It also allows us to test read parameters
in formatters in addition to assemblers.
For a disassembled component the formatter is only
interested in reading the entire file, not extracting
a component from the component.
This enables disassembly testing with the new formatters
without involving butler registry.
We have to process the parameters before we extract the
read-only component.
This creates dataset types for all the components.
We no longer store components at the registry level
so there is not much to be gained by registering
the dataset types.
timj added 3 commits July 1, 2020 08:57
This requires a new method on assembler to return which component
should be used to calculate the read-only component.
Parameters for read-only components are problematic since it
is not entirely clear whether the parameters should be applied
to the component that is used to calculate the read-only
component, or should be applied to the calculation of
the read-only component itself.

The complication is that assemblers must support the same
parameters as formatters (otherwise in-memory datastore
can not function) and also can apply parameters that
the formatter did not understand. In the current implementation
this means that the assembler can only see the final storage
class and at that point parameters might not be relevant.

As concrete examples. In the test suite I have added
a "counter" read-only component that returns the
number of elements in metrics.data.  This means that
the assembler sees an Integer storage class and integers
aren't amenable to parameters. Instead the "slice" parameter
is assumed to apply to the "data" component and then
"counter" is calculated on the sliced "data".

Another example is bbox in Exposure. This can be applied
to the "image" component. Where should parameters go?
To the calculation of the "image" or to the calculation
of the bounding box (origin is the only one). Once
the bounding box is created though the "origin" parameter
has no meaning so "origin" is only relevant when passed
to the "image" and can't be relevant to the assembler.

Therefore read-only components for disassembled
composites do not receive any parameters.
…mponent

Check that a component does have a parent but a composite does not.
The parent storage class is not checked to determine if the component
is defined by it.
@timj timj force-pushed the tickets/DM-25447 branch 9 times, most recently from 32cf9eb to 235453e Compare July 2, 2020 17:16
This required changes to how pickling and deep copy worked.
Also to simplify pickling the parentStorageClass is now
also a positional argument.
This allows pipeline definitions to set a temporary parent
when analyzing the pipeline but then during execution update
it with the real parent.
@timj timj merged commit 0a1db07 into master Jul 2, 2020
@timj timj deleted the tickets/DM-25447 branch July 2, 2020 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants