DM-25447: Add support for read-only components #319

timj · 2020-06-25T19:41:53Z

This was quite a lot more code than I was expecting. There is still an issue over what to do with parameters with read-only components. For now they are sent to the component that is being used to calculate the read-only component and can't be sent to the assembler at all.

TallJimbo

Code looks good. I have some of the same concerns on the big picture and increasing complexity that you do, but I'm mostly happy to trust that you've looked into alternatives more than I have and have chosen least-bad options. On the off chance a thought I had for simplifying things might not be one you've considered, I've left a couple of comments on those big-picture things at various points in the PR.

One more quick big-picture thought: in terms of nomenclature, I wonder if we should not be calling these "readComponents" a type of "component" at all - maybe "properties", with what we're calling "allComponents" here (i.e. real components and properties) -> "attributes"? Just a thought - I don't like those names enough better that I'd want to push for changing things this late in the game, but the naming right now feels like we started out by thinking of these as another kind of component, and by the end it became clear that they were more different than similar.

config/storageClasses.yaml

python/lsst/daf/butler/core/storageClass.py

TallJimbo · 2020-06-26T19:15:10Z

config/storageClasses.yaml

+  Point2I:
+    pytype: lsst.geom.Point2I
+  Filter:
+    pytype: lsst.afw.image.Filter


This gets me wondering about whether we should allow (maybe just in this context, maybe more broadly) regular Python classes to be used as StorageClasses when their definitions are trivial, rather than requiring those to be added explicitly.

That's an interesting thought albeit not for this ticket. So for StorageClass("lsst.afw.image.Filter") then rather than defaulting to object we assume that the pytype name is the storage class name... (we could even adopt the convention that names that start with upper case are never pytypes). Worth pondering.

tests/test_datastore.py

TallJimbo · 2020-06-26T19:31:41Z

python/lsst/daf/butler/datastores/inMemoryDatastore.py

+        if isReadOnlyComponent:
+            inMemoryDataset = writeStorageClass.assembler().handleParameters(inMemoryDataset, parameters)
+            # Then disable parameters for later
+            parameters = {}


Is there a concrete case where this matters? I'm just wondering if we should say, "you can't pass parameters if you want a read-only component" to try to make things a little simpler.

Edit: after writing the above I saw the commit message that referenced an origin parameter to bbox. I'd be okay with just dropping support for that (origin=PARENT is all we need - the box returned with origin=LOCAL is trivially calculable from the one returned with origin=PARENT). I'd be fine with dropping the xy0 and dimensions read components as being similarly redundant, though I gather that doesn't gain us as much.

I'll ponder this some more when I write up the formatter/assembler documentation. Stating that parameters are never allowed for read-only components is the easiest approach.

python/lsst/daf/butler/core/formatter.py

timj · 2020-06-26T21:22:01Z

I called them read components because from the point of view of the caller they act exactly like components. The caller doesn't even know that some are read/write and some are read-only because they have no idea which of the components ends up being important for disassembly. We still refer to them as "composite.component" and calling them a property or attribute might be easier to talk about internally (since talking about read-write component relevant for read-only component is annoying at times) but for the general interface they are components.

TallJimbo

Looks good! I'm glad the last review pass was useful. I don't have much to add in this pass - just minor comments - though some of the old ones are still relevant as well.

python/lsst/daf/butler/registry/datasets/byDimensions/_manager.py

tests/test_butler.py

python/lsst/daf/butler/core/datasets/type.py

python/lsst/daf/butler/datastores/fileLikeDatastore.py

This necessitated a allComponents method to return the read/write components and read-only components together.

No longer used.

This allows us to test specialist formatter capabilities without having to expand the generic YAML and pickle formatters. It also allows us to test read parameters in formatters in addition to assemblers.

For a disassembled component the formatter is only interested in reading the entire file, not extracting a component from the component.

This enables disassembly testing with the new formatters without involving butler registry.

We have to process the parameters before we extract the read-only component.

This creates dataset types for all the components.

We no longer store components at the registry level so there is not much to be gained by registering the dataset types.

This requires a new method on assembler to return which component should be used to calculate the read-only component.

Parameters for read-only components are problematic since it is not entirely clear whether the parameters should be applied to the component that is used to calculate the read-only component, or should be applied to the calculation of the read-only component itself. The complication is that assemblers must support the same parameters as formatters (otherwise in-memory datastore can not function) and also can apply parameters that the formatter did not understand. In the current implementation this means that the assembler can only see the final storage class and at that point parameters might not be relevant. As concrete examples. In the test suite I have added a "counter" read-only component that returns the number of elements in metrics.data. This means that the assembler sees an Integer storage class and integers aren't amenable to parameters. Instead the "slice" parameter is assumed to apply to the "data" component and then "counter" is calculated on the sliced "data". Another example is bbox in Exposure. This can be applied to the "image" component. Where should parameters go? To the calculation of the "image" or to the calculation of the bounding box (origin is the only one). Once the bounding box is created though the "origin" parameter has no meaning so "origin" is only relevant when passed to the "image" and can't be relevant to the assembler. Therefore read-only components for disassembled composites do not receive any parameters.

…mponent Check that a component does have a parent but a composite does not. The parent storage class is not checked to determine if the component is defined by it.

This required changes to how pickling and deep copy worked. Also to simplify pickling the parentStorageClass is now also a positional argument.

This allows pipeline definitions to set a temporary parent when analyzing the pipeline but then during execution update it with the real parent.

timj force-pushed the tickets/DM-25447 branch from 40f29ff to 9455f78 Compare June 25, 2020 22:25

timj mentioned this pull request Jun 25, 2020

DM-25447: Add read-only components to gen3 formatter for bbox/xy0/dimensions lsst/obs_base#267

Merged

timj force-pushed the tickets/DM-25447 branch from 9455f78 to 605ff20 Compare June 25, 2020 22:37

TallJimbo approved these changes Jun 26, 2020

View reviewed changes

timj force-pushed the tickets/DM-25447 branch 5 times, most recently from dfb1217 to b3e3bd9 Compare June 30, 2020 22:35

TallJimbo approved these changes Jul 1, 2020

View reviewed changes

timj added 19 commits July 1, 2020 08:36

Add detector and validPolygon as Exposure components

8bf3e17

Add Polygon formatter definition

965657e

Add support for StorageClass read-only components

1a77ec8

This necessitated a allComponents method to return the read/write components and read-only components together.

Add some read-only components to Exposure

669e3fb

Use DatasetRef.makeComponentRef simplification API

198b563

Add more detail to error message

8405916

Add a str method for dumping composite map details

4f5fd72

Remove getStoredItemInfo API

695aeef

No longer used.

Add new specialist formatters for MetricsExample

3c63cf6

This allows us to test specialist formatter capabilities without having to expand the generic YAML and pickle formatters. It also allows us to test read parameters in formatters in addition to assemblers.

Do not tell component formatter what component it is

d513d90

For a disassembled component the formatter is only interested in reading the entire file, not extracting a component from the component.

Enable true disassembly testing in low-level datastore tests

cbe8784

This enables disassembly testing with the new formatters without involving butler registry.

Cleanup component datasetType name API usage

faa4525

Change way parameters are handled if read-only component

b3b4868

We have to process the parameters before we extract the read-only component.

Use modern DatasetType API in tests

eebade8

Add makeAllComponentDatasetTypes method

529533a

This creates dataset types for all the components.

No longer write component dataset types to registry

19d36cd

We no longer store components at the registry level so there is not much to be gained by registering the dataset types.

Update queryDatasetTypes now that registry does not include components

6f7053f

Record parent storage class in composite dataset types

1d45183

Remove duplication of test formatter definitions between posix and s3

08dc7b1

timj added 3 commits July 1, 2020 08:57

Enable read-only component reading from disassembled components

c34118c

This requires a new method on assembler to return which component should be used to calculate the read-only component.

Raise exception if DatasetType has parent storage class but is not co…

76c899d

…mponent Check that a component does have a parent but a composite does not. The parent storage class is not checked to determine if the component is defined by it.

timj force-pushed the tickets/DM-25447 branch 9 times, most recently from 32cf9eb to 235453e Compare July 2, 2020 17:16

Add some tests for DatasetType parentStorageClass

59a9f95

This required changes to how pickling and deep copy worked. Also to simplify pickling the parentStorageClass is now also a positional argument.

timj force-pushed the tickets/DM-25447 branch from 235453e to 1b803ea Compare July 2, 2020 19:36

Add special placeholder parent storage class and update method

b02123d

This allows pipeline definitions to set a temporary parent when analyzing the pipeline but then during execution update it with the real parent.

timj force-pushed the tickets/DM-25447 branch from 1b803ea to b02123d Compare July 2, 2020 19:49

timj merged commit 0a1db07 into master Jul 2, 2020

timj deleted the tickets/DM-25447 branch July 2, 2020 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-25447: Add support for read-only components #319

DM-25447: Add support for read-only components #319

timj commented Jun 25, 2020

TallJimbo left a comment

TallJimbo Jun 26, 2020

timj Jul 1, 2020

TallJimbo Jun 26, 2020

timj Jul 1, 2020

timj commented Jun 26, 2020

TallJimbo left a comment

DM-25447: Add support for read-only components #319

DM-25447: Add support for read-only components #319

Conversation

timj commented Jun 25, 2020

TallJimbo left a comment

Choose a reason for hiding this comment

TallJimbo Jun 26, 2020

Choose a reason for hiding this comment

timj Jul 1, 2020

Choose a reason for hiding this comment

TallJimbo Jun 26, 2020

Choose a reason for hiding this comment

timj Jul 1, 2020

Choose a reason for hiding this comment

timj commented Jun 26, 2020

TallJimbo left a comment

Choose a reason for hiding this comment