New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-15561: Allow templates to be retrieved by DatasetRef and other cleanups #80
Conversation
timj
commented
Aug 28, 2018
•
edited
edited
- Use DatasetRef, DatasetType or StorageClass to retrieve a file template.
- Use same "give me names" code for formatter retrieval.
- Remove storageClass and datasetType distinction from composites configruation.
config/composites.yaml
Outdated
# into default config. Config class has no way of merging lists. | ||
# Use a dict rather than a list to allow easy merging of user config | ||
# into default config. Config class has no way of merging lists. | ||
# Types can be StorageClass names or DatasetType names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be explicit, with one overriding the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per-DatasetType configurations should always override per-StorageClass configurations.
for k in self[n]: | ||
key = f"{n}.{k}" | ||
if not isinstance(self[key], bool): | ||
raise ValueError(f"CompositesConfig: Key {key} is not a Boolean") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see it was explicit. But we don't want this anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It adds a lot of extra faff in all the routines that look up datasetType and storageClass in config files. They have to always look in two locations. That's not how we did it for formatters (we had a single block of configuration). If we really want to be explicit I should change formatters and templates to be explicit and keep composites as it is. I don't think we ever expect there to be clashes between storage class names and dataset type names. We could ensure that that is true by insisting that all storage class names start with an upper case and all dataset type names start with lower case...
if name is not None and name in self.config["names"]: | ||
disassemble = self.config[f"names.{name}"] | ||
matchName = name | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we better hope that there is never a clash between a storageClassName and a datasetTypeName (the latter being something that we can't control and rely on users not to pick incorrectly)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We have to decide. At the moment we decided that they can't clash. If we want to ensure they can't clash we need to decide which of the options we want to use (as defined in my previous comment: separate sections in every config, or insist on naming convention).
for key in name.split("."): | ||
# Override the split for the simple case | ||
if name in data: | ||
keys = (name,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space after comma?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flake8 did not complain. I think flake8 doesn't like lists without a space at the end but tuples are fine.
names : `tuple` of `str` | ||
Tuple of the `DatasetType` name and the `StorageClass` name. | ||
""" | ||
return (self.name, *self.storageClass.lookupNames()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have multiple lookup names for storageclasses now? Is this to deal with ImageI
, ImageF
, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but to retain compatibility with the lookupNames()
interface it returns a tuple with a single element in it that needs to be unpacked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa, I had no idea you could use argument unpacking in a regular tuple definition.
names : `tuple` of `str` | ||
Tuple of the `DatasetType` name and the `StorageClass` name. | ||
""" | ||
return self.datasetType.lookupNames() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there multiple names? Why is there a lookup in configuration for a specific DatasetRef
? If the intent is to lookup something about the DatasetType
in the configuration i'd rather explicitly have the user do datasetRef.datasetType.lookupNames()
such that the intention is clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it like this means I have a single interface for DatasetType, DatasetRef and StorageClass so I call one method on whatever I've been given and I can look it up in the config. Without this I have to have an explicit isinstance
check in those cases and forward it on myself. Since we want all these look ups to work with all three types it seemed easier and clearer to do it this way. Whilst looking at this I also realized that in theory we can have a disagreement between DatasetRef.components
and DatasetRef.datasetType.storageClass.components
and we never check.
@@ -79,23 +86,28 @@ def getTemplate(self, datasetTypeName): | |||
KeyError | |||
No template could be located for this Dataset type. | |||
""" | |||
|
|||
# Get the names to use for lookup | |||
names = entity.lookupNames() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still on the fence about this. On the one hand special casing this for DatasetRef
would be sad, but on the other hand I don't like having lookupNames
on datasetRef
if it isn't actually DatasetRef
specific. Maybe I'm just not liking the name lookupNames
? Perhaps configTypeNames
/typeNamesInConfig
/typeNames
or something would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. It's allowing us to take an object with a compatible interface and let us work out which entry in the config object is relevant. I have no problem with changing the name of this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make it private, since it is an implementation detail of the Butler framework and the classes that use it are effectively "friends"?
Returns datasetTypeName and/or StorageClassName.
No longer takes a str. This allows it to work with DatasetRef, DatasetType or StorageClass.
This allows a key of "calexp.wcs" to be found at the top level if the YAML file defined such a key. This is a bit of a band aid over the general solution of having to try combinations all the way down. We may have to reconsider allowing compound component names in keys of config files.
calexp.wcs falls back to calexp if not explicitly defined.
Now works with DatasetRef, DatasetType, StorageClass or str.
Storage Class names and Dataset Type names are assumed to be distinct so it gains us nothing to keep them separate in the configuration and merging them makes it consistent with formatter and templates that also allow per-datasetTypeName or per-StorageClass name configuration.
This provides an explicit interface for askign the question of whether the thing is a composite type or not. Simplifies doDisassembly significantly.
config/composites.yaml
Outdated
# into default config. Config class has no way of merging lists. | ||
# Use a dict rather than a list to allow easy merging of user config | ||
# into default config. Config class has no way of merging lists. | ||
# Types can be StorageClass names or DatasetType names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per-DatasetType configurations should always override per-StorageClass configurations.
@@ -69,7 +66,7 @@ def doDisassembly(self, entity): | |||
|
|||
Parameters | |||
---------- | |||
entity : `StorageClass` or `DatasetType` | |||
entity : `StorageClass` or `DatasetType` or `DatasetRef` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not actually this line) I'd be more comfortable just calling this method isComposite
instead of doDisassembly
, too - it matches the new and better precedent, and doDisassembly
to me sounds like something that is doing disassembly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really saying "is this a composite" it's saying "is this a composite that should be disassembled", so they aren't the same thing. How about shouldBeDisassembled
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I forgot to change the name to shouldBeDisassembled
. I'll sneak it in to another ticket. Sorry.
names : `tuple` of `str` | ||
Tuple of the `DatasetType` name and the `StorageClass` name. | ||
""" | ||
return (self.name, *self.storageClass.lookupNames()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa, I had no idea you could use argument unpacking in a regular tuple definition.
Instance to use to look for a corresponding template. | ||
A `DatasetType` name or a `StorageClass` name will be used | ||
depending on the supplied entity. Priority is given to a | ||
`DatasetType` name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought for the future: template lookup by StorageClass is probably less useful in practice than template lookup by "set of DataUnits". In other words, I probably want my templates for Tract+Patch Datasets to be more similar than my templates for all SourceCatalog Datasets. But the existing support for optional replacement tokens in templates will hopefully make that a non-issue.
@@ -79,23 +86,28 @@ def getTemplate(self, datasetTypeName): | |||
KeyError | |||
No template could be located for this Dataset type. | |||
""" | |||
|
|||
# Get the names to use for lookup | |||
names = entity.lookupNames() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make it private, since it is an implementation detail of the Butler framework and the classes that use it are effectively "friends"?
@@ -1,11 +1,10 @@ | |||
composites: | |||
storageClasses: | |||
names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about just dropping this names
level and letting "default" be a specially-recognized key at the next level? It just feels like it's a superfluous node in the vast majority of config trees, which will inherit the default from somewhere. And anyone who wants to name their DatasetType "default" deserves what they get 🙂 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did think about that, but then I worried that it would make it very difficult to add any other configuration items to that config in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing names
to disassembly
(then at least the True/False relates to the name)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disassembled
?
Then the boolean values relate to the name of the section in the config.
It's effectively an internal interface for doing look ups into configuration files.