DM-15561: Allow templates to be retrieved by DatasetRef and other cleanups #80

timj · 2018-08-28T01:44:10Z

Use DatasetRef, DatasetType or StorageClass to retrieve a file template.
Use same "give me names" code for formatter retrieval.
Remove storageClass and datasetType distinction from composites configruation.

pschella · 2018-08-28T14:56:58Z

config/composites.yaml

-    # into default config. Config class has no way of merging lists.
+  # Use a dict rather than a list to allow easy merging of user config
+  # into default config. Config class has no way of merging lists.
+  # Types can be StorageClass names or DatasetType names


Should this be explicit, with one overriding the other?

Per-DatasetType configurations should always override per-StorageClass configurations.

pschella · 2018-08-28T14:57:33Z

python/lsst/daf/butler/core/composites.py

-            for k in self[n]:
-                key = f"{n}.{k}"
-                if not isinstance(self[key], bool):
-                    raise ValueError(f"CompositesConfig: Key {key} is not a Boolean")


Ah, I see it was explicit. But we don't want this anymore?

It adds a lot of extra faff in all the routines that look up datasetType and storageClass in config files. They have to always look in two locations. That's not how we did it for formatters (we had a single block of configuration). If we really want to be explicit I should change formatters and templates to be explicit and keep composites as it is. I don't think we ever expect there to be clashes between storage class names and dataset type names. We could ensure that that is true by insisting that all storage class names start with an upper case and all dataset type names start with lower case...

pschella · 2018-08-28T15:00:35Z

python/lsst/daf/butler/core/composites.py

+            if name is not None and name in self.config["names"]:
+                disassemble = self.config[f"names.{name}"]
+                matchName = name
+                break


So we better hope that there is never a clash between a storageClassName and a datasetTypeName (the latter being something that we can't control and rely on users not to pick incorrectly)?

Yes. We have to decide. At the moment we decided that they can't clash. If we want to ensure they can't clash we need to decide which of the options we want to use (as defined in my previous comment: separate sections in every config, or insist on naming convention).

pschella · 2018-08-28T15:01:32Z

python/lsst/daf/butler/core/config.py

-        for key in name.split("."):
+        # Override the split for the simple case
+        if name in data:
+            keys = (name,)


Space after comma?

flake8 did not complain. I think flake8 doesn't like lists without a space at the end but tuples are fine.

pschella · 2018-08-28T15:03:52Z

python/lsst/daf/butler/core/datasets.py

+        names : `tuple` of `str`
+            Tuple of the `DatasetType` name and the `StorageClass` name.
+        """
+        return (self.name, *self.storageClass.lookupNames())


We have multiple lookup names for storageclasses now? Is this to deal with ImageI, ImageF, etc?

No, but to retain compatibility with the lookupNames() interface it returns a tuple with a single element in it that needs to be unpacked.

Whoa, I had no idea you could use argument unpacking in a regular tuple definition.

pschella · 2018-08-28T15:19:42Z

python/lsst/daf/butler/core/datasets.py

+        names : `tuple` of `str`
+            Tuple of the `DatasetType` name and the `StorageClass` name.
+        """
+        return self.datasetType.lookupNames()


Why are there multiple names? Why is there a lookup in configuration for a specific DatasetRef? If the intent is to lookup something about the DatasetType in the configuration i'd rather explicitly have the user do datasetRef.datasetType.lookupNames() such that the intention is clear.

Doing it like this means I have a single interface for DatasetType, DatasetRef and StorageClass so I call one method on whatever I've been given and I can look it up in the config. Without this I have to have an explicit isinstance check in those cases and forward it on myself. Since we want all these look ups to work with all three types it seemed easier and clearer to do it this way. Whilst looking at this I also realized that in theory we can have a disagreement between DatasetRef.components and DatasetRef.datasetType.storageClass.components and we never check.

pschella · 2018-08-28T15:25:15Z

python/lsst/daf/butler/core/fileTemplates.py

@@ -79,23 +86,28 @@ def getTemplate(self, datasetTypeName):
        KeyError
            No template could be located for this Dataset type.
        """
+
+        # Get the names to use for lookup
+        names = entity.lookupNames()


I'm still on the fence about this. On the one hand special casing this for DatasetRef would be sad, but on the other hand I don't like having lookupNames on datasetRef if it isn't actually DatasetRef specific. Maybe I'm just not liking the name lookupNames? Perhaps configTypeNames/typeNamesInConfig/typeNames or something would be better?

Correct. It's allowing us to take an object with a compatible interface and let us work out which entry in the config object is relevant. I have no problem with changing the name of this method.

Maybe make it private, since it is an implementation detail of the Butler framework and the classes that use it are effectively "friends"?

Returns datasetTypeName and/or StorageClassName.

No longer takes a str. This allows it to work with DatasetRef, DatasetType or StorageClass.

This allows a key of "calexp.wcs" to be found at the top level if the YAML file defined such a key. This is a bit of a band aid over the general solution of having to try combinations all the way down. We may have to reconsider allowing compound component names in keys of config files.

calexp.wcs falls back to calexp if not explicitly defined.

Now works with DatasetRef, DatasetType, StorageClass or str.

Storage Class names and Dataset Type names are assumed to be distinct so it gains us nothing to keep them separate in the configuration and merging them makes it consistent with formatter and templates that also allow per-datasetTypeName or per-StorageClass name configuration.

This provides an explicit interface for askign the question of whether the thing is a composite type or not. Simplifies doDisassembly significantly.

TallJimbo · 2018-08-28T16:54:09Z

config/composites.yaml

-    # into default config. Config class has no way of merging lists.
+  # Use a dict rather than a list to allow easy merging of user config
+  # into default config. Config class has no way of merging lists.
+  # Types can be StorageClass names or DatasetType names


Per-DatasetType configurations should always override per-StorageClass configurations.

TallJimbo · 2018-08-28T18:58:08Z

python/lsst/daf/butler/core/composites.py

@@ -69,7 +66,7 @@ def doDisassembly(self, entity):

        Parameters
        ----------
-        entity : `StorageClass` or `DatasetType`
+        entity : `StorageClass` or `DatasetType` or `DatasetRef`


(not actually this line) I'd be more comfortable just calling this method isComposite instead of doDisassembly, too - it matches the new and better precedent, and doDisassembly to me sounds like something that is doing disassembly.

It's not really saying "is this a composite" it's saying "is this a composite that should be disassembled", so they aren't the same thing. How about shouldBeDisassembled ?

Oops. I forgot to change the name to shouldBeDisassembled. I'll sneak it in to another ticket. Sorry.

TallJimbo · 2018-08-28T18:59:49Z

python/lsst/daf/butler/core/datasets.py

+        names : `tuple` of `str`
+            Tuple of the `DatasetType` name and the `StorageClass` name.
+        """
+        return (self.name, *self.storageClass.lookupNames())


Whoa, I had no idea you could use argument unpacking in a regular tuple definition.

TallJimbo · 2018-08-28T19:19:19Z

python/lsst/daf/butler/core/fileTemplates.py

+            Instance to use to look for a corresponding template.
+            A `DatasetType` name or a `StorageClass` name will be used
+            depending on the supplied entity. Priority is given to a
+            `DatasetType` name.


Just a thought for the future: template lookup by StorageClass is probably less useful in practice than template lookup by "set of DataUnits". In other words, I probably want my templates for Tract+Patch Datasets to be more similar than my templates for all SourceCatalog Datasets. But the existing support for optional replacement tokens in templates will hopefully make that a non-issue.

TallJimbo · 2018-08-28T19:24:02Z

python/lsst/daf/butler/core/fileTemplates.py

@@ -79,23 +86,28 @@ def getTemplate(self, datasetTypeName):
        KeyError
            No template could be located for this Dataset type.
        """
+
+        # Get the names to use for lookup
+        names = entity.lookupNames()


Maybe make it private, since it is an implementation detail of the Butler framework and the classes that use it are effectively "friends"?

TallJimbo · 2018-08-28T19:26:38Z

tests/config/basic/composites-bad.yaml

@@ -1,11 +1,10 @@
 composites:
-  storageClasses:
+  names:


What do you think about just dropping this names level and letting "default" be a specially-recognized key at the next level? It just feels like it's a superfluous node in the vast majority of config trees, which will inherit the default from somewhere. And anyone who wants to name their DatasetType "default" deserves what they get 🙂 .

I did think about that, but then I worried that it would make it very difficult to add any other configuration items to that config in the future.

How about changing names to disassembly (then at least the True/False relates to the name)?

disassembled?

Then the boolean values relate to the name of the section in the config.

It's effectively an internal interface for doing look ups into configuration files.

timj requested a review from TallJimbo August 28, 2018 03:12

pschella reviewed Aug 28, 2018

View reviewed changes

timj added 6 commits August 28, 2018 12:12

Add methods for looking up search strings for configurations

31a9624

Returns datasetTypeName and/or StorageClassName.

Rewrite getTemplates method to use lookupNames

607fc8a

No longer takes a str. This allows it to work with DatasetRef, DatasetType or StorageClass.

Add tests for templates with components

41bacf1

calexp.wcs falls back to calexp if not explicitly defined.

Rewrite getFormatter to use lookupNames()

a3c7222

Now works with DatasetRef, DatasetType, StorageClass or str.

timj force-pushed the tickets/DM-15561 branch from f139bd6 to 0d72870 Compare August 28, 2018 19:17

Add isComposite method to DatasetRef, DatasetType and StorageClass

dcd44b6

This provides an explicit interface for askign the question of whether the thing is a composite type or not. Simplifies doDisassembly significantly.

timj force-pushed the tickets/DM-15561 branch from 0d72870 to dcd44b6 Compare August 28, 2018 19:23

TallJimbo approved these changes Aug 28, 2018

View reviewed changes

timj added 3 commits August 28, 2018 12:45

Use "disassembled" for composites.yaml rather than "name"

6aa7ac5

Then the boolean values relate to the name of the section in the config.

Rename lookupNames to _lookupNames

47bb169

It's effectively an internal interface for doing look ups into configuration files.

Tweak docstring in yaml file

05d867b

timj merged commit 74cdce9 into master Aug 28, 2018

timj deleted the tickets/DM-15561 branch August 28, 2018 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-15561: Allow templates to be retrieved by DatasetRef and other cleanups #80

DM-15561: Allow templates to be retrieved by DatasetRef and other cleanups #80

timj commented Aug 28, 2018 •

edited

pschella Aug 28, 2018

TallJimbo Aug 28, 2018

pschella Aug 28, 2018

timj Aug 28, 2018

pschella Aug 28, 2018

timj Aug 28, 2018

pschella Aug 28, 2018

timj Aug 28, 2018

pschella Aug 28, 2018

timj Aug 28, 2018

TallJimbo Aug 28, 2018

pschella Aug 28, 2018

timj Aug 28, 2018

pschella Aug 28, 2018

timj Aug 28, 2018

TallJimbo Aug 28, 2018

TallJimbo Aug 28, 2018

TallJimbo Aug 28, 2018

timj Aug 28, 2018

timj Aug 29, 2018

TallJimbo Aug 28, 2018

TallJimbo Aug 28, 2018

TallJimbo Aug 28, 2018

TallJimbo Aug 28, 2018

timj Aug 28, 2018

timj Aug 28, 2018

TallJimbo Aug 28, 2018

DM-15561: Allow templates to be retrieved by DatasetRef and other cleanups #80

DM-15561: Allow templates to be retrieved by DatasetRef and other cleanups #80

Conversation

timj commented Aug 28, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timj commented Aug 28, 2018 •

edited