Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-15561: Allow templates to be retrieved by DatasetRef and other cleanups #80

Merged
merged 10 commits into from Aug 28, 2018

Conversation

timj
Copy link
Member

@timj timj commented Aug 28, 2018

  • Use DatasetRef, DatasetType or StorageClass to retrieve a file template.
  • Use same "give me names" code for formatter retrieval.
  • Remove storageClass and datasetType distinction from composites configruation.

@timj timj requested a review from TallJimbo August 28, 2018 03:12
# into default config. Config class has no way of merging lists.
# Use a dict rather than a list to allow easy merging of user config
# into default config. Config class has no way of merging lists.
# Types can be StorageClass names or DatasetType names
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be explicit, with one overriding the other?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-DatasetType configurations should always override per-StorageClass configurations.

for k in self[n]:
key = f"{n}.{k}"
if not isinstance(self[key], bool):
raise ValueError(f"CompositesConfig: Key {key} is not a Boolean")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see it was explicit. But we don't want this anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It adds a lot of extra faff in all the routines that look up datasetType and storageClass in config files. They have to always look in two locations. That's not how we did it for formatters (we had a single block of configuration). If we really want to be explicit I should change formatters and templates to be explicit and keep composites as it is. I don't think we ever expect there to be clashes between storage class names and dataset type names. We could ensure that that is true by insisting that all storage class names start with an upper case and all dataset type names start with lower case...

if name is not None and name in self.config["names"]:
disassemble = self.config[f"names.{name}"]
matchName = name
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we better hope that there is never a clash between a storageClassName and a datasetTypeName (the latter being something that we can't control and rely on users not to pick incorrectly)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We have to decide. At the moment we decided that they can't clash. If we want to ensure they can't clash we need to decide which of the options we want to use (as defined in my previous comment: separate sections in every config, or insist on naming convention).

for key in name.split("."):
# Override the split for the simple case
if name in data:
keys = (name,)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after comma?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flake8 did not complain. I think flake8 doesn't like lists without a space at the end but tuples are fine.

names : `tuple` of `str`
Tuple of the `DatasetType` name and the `StorageClass` name.
"""
return (self.name, *self.storageClass.lookupNames())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have multiple lookup names for storageclasses now? Is this to deal with ImageI, ImageF, etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but to retain compatibility with the lookupNames() interface it returns a tuple with a single element in it that needs to be unpacked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, I had no idea you could use argument unpacking in a regular tuple definition.

names : `tuple` of `str`
Tuple of the `DatasetType` name and the `StorageClass` name.
"""
return self.datasetType.lookupNames()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there multiple names? Why is there a lookup in configuration for a specific DatasetRef? If the intent is to lookup something about the DatasetType in the configuration i'd rather explicitly have the user do datasetRef.datasetType.lookupNames() such that the intention is clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it like this means I have a single interface for DatasetType, DatasetRef and StorageClass so I call one method on whatever I've been given and I can look it up in the config. Without this I have to have an explicit isinstance check in those cases and forward it on myself. Since we want all these look ups to work with all three types it seemed easier and clearer to do it this way. Whilst looking at this I also realized that in theory we can have a disagreement between DatasetRef.components and DatasetRef.datasetType.storageClass.components and we never check.

@@ -79,23 +86,28 @@ def getTemplate(self, datasetTypeName):
KeyError
No template could be located for this Dataset type.
"""

# Get the names to use for lookup
names = entity.lookupNames()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still on the fence about this. On the one hand special casing this for DatasetRef would be sad, but on the other hand I don't like having lookupNames on datasetRef if it isn't actually DatasetRef specific. Maybe I'm just not liking the name lookupNames? Perhaps configTypeNames/typeNamesInConfig/typeNames or something would be better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. It's allowing us to take an object with a compatible interface and let us work out which entry in the config object is relevant. I have no problem with changing the name of this method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it private, since it is an implementation detail of the Butler framework and the classes that use it are effectively "friends"?

Returns datasetTypeName and/or StorageClassName.
No longer takes a str. This allows it to work with DatasetRef,
DatasetType or StorageClass.
This allows a key of "calexp.wcs" to be found at the top level
if the YAML file defined such a key. This is a bit of a band
aid over the general solution of having to try combinations
all the way down.  We may have to reconsider allowing
compound component names in keys of config files.
calexp.wcs falls back to calexp if not explicitly defined.
Now works with DatasetRef, DatasetType, StorageClass or str.
Storage Class names and Dataset Type names are assumed to be
distinct so it gains us nothing to keep them separate in the
configuration and merging them makes it consistent with
formatter and templates that also allow per-datasetTypeName or
per-StorageClass name configuration.
This provides an explicit interface for askign the question
of whether the thing is a composite type or not. Simplifies
doDisassembly significantly.
# into default config. Config class has no way of merging lists.
# Use a dict rather than a list to allow easy merging of user config
# into default config. Config class has no way of merging lists.
# Types can be StorageClass names or DatasetType names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-DatasetType configurations should always override per-StorageClass configurations.

@@ -69,7 +66,7 @@ def doDisassembly(self, entity):

Parameters
----------
entity : `StorageClass` or `DatasetType`
entity : `StorageClass` or `DatasetType` or `DatasetRef`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not actually this line) I'd be more comfortable just calling this method isComposite instead of doDisassembly, too - it matches the new and better precedent, and doDisassembly to me sounds like something that is doing disassembly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really saying "is this a composite" it's saying "is this a composite that should be disassembled", so they aren't the same thing. How about shouldBeDisassembled ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I forgot to change the name to shouldBeDisassembled. I'll sneak it in to another ticket. Sorry.

names : `tuple` of `str`
Tuple of the `DatasetType` name and the `StorageClass` name.
"""
return (self.name, *self.storageClass.lookupNames())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, I had no idea you could use argument unpacking in a regular tuple definition.

Instance to use to look for a corresponding template.
A `DatasetType` name or a `StorageClass` name will be used
depending on the supplied entity. Priority is given to a
`DatasetType` name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought for the future: template lookup by StorageClass is probably less useful in practice than template lookup by "set of DataUnits". In other words, I probably want my templates for Tract+Patch Datasets to be more similar than my templates for all SourceCatalog Datasets. But the existing support for optional replacement tokens in templates will hopefully make that a non-issue.

@@ -79,23 +86,28 @@ def getTemplate(self, datasetTypeName):
KeyError
No template could be located for this Dataset type.
"""

# Get the names to use for lookup
names = entity.lookupNames()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it private, since it is an implementation detail of the Butler framework and the classes that use it are effectively "friends"?

@@ -1,11 +1,10 @@
composites:
storageClasses:
names:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about just dropping this names level and letting "default" be a specially-recognized key at the next level? It just feels like it's a superfluous node in the vast majority of config trees, which will inherit the default from somewhere. And anyone who wants to name their DatasetType "default" deserves what they get 🙂 .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think about that, but then I worried that it would make it very difficult to add any other configuration items to that config in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing names to disassembly (then at least the True/False relates to the name)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disassembled?

Then the boolean values relate to the name of the section
in the config.
It's effectively an internal interface for doing look ups into
configuration files.
@timj timj merged commit 74cdce9 into master Aug 28, 2018
@timj timj deleted the tickets/DM-15561 branch August 28, 2018 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants