New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-13350: Use templates to work out file names from data units #17
Conversation
497b9cf
to
a31a2dd
Compare
The templates use the standard Format Specification Mini-Language | ||
with the caveat that only named fields can be used. The field names | ||
are taken from the DataUnits along with two additional fields: | ||
"datasettype" will be replaced with the DataSetType and "component" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"datasetType"
and DatasetType
.
def __init__(self, template): | ||
self.template = template | ||
|
||
def format(self, dataunits, datasettype=None, component=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataUnits
or units
and datasetType
.
component. | ||
component : `str`, optional | ||
Component of a composite. If `datasettype` defines a component | ||
this parameter will be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same in the above.
|
||
def definedUnits(self): | ||
"""DataUnits with non-None values.""" | ||
return {k: v for k, v in self.units.items() if v is not None} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this is needed. Should be absorbed into DataUnitTypeSet
. But we will have to do that later anyway, so fine to keep it around for now I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know we had a class for that. I asked on Slack but got no hints. Sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see a DataUnitTypeSet
in the codebase.
# calexp.wcs means wcs component of a calexp | ||
if "." in datasettype: | ||
datasettype, component = datasettype.split(".", maxsplit=1) | ||
units["datasettype"] = datasettype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the name units
becomes confusing. Better call it something else, like fields
.
# Complain if we were meant to use a component | ||
if component is not None and not usedComponent: | ||
raise KeyError("Component {} specified but template {} did not use it".format(component, | ||
self.template)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this different from any other field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because if you explicitly specify a component but the component is not used that's going to be a problem because it means the template does not specify how to distinguish components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was assuming that DataUnits could be far more detailed than is needed for the file templates. That's why I didn't check that every unit was used.
|
||
if template is None: | ||
if "default" in self.templates: | ||
template = self.templates["default"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loading could go into a separate function or classmethod
of Template
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is determining the right template to use based on the dataset type by looking up in the datastore's template registry. I think I can make this work like the Factory we use for formatters and storage classes (although we are storing instances and not classes).
tests/config/basic/butler.yaml
Outdated
templates: | ||
default: "{datasettype}/{tract:?}/{patch:?}/{filter:?}/{visit:?}" | ||
calexp: "{datasettype}.{component:?}/{datasettype}_v{visit}_f{filter}_{component:?}" | ||
metric: "{datasettype}.{component:?}/{datasettype}_v{visit:08d}_f{filter}_{component:?}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per DatasetType
templates should (probably) live in Registry
. Given that we are dynamically adding them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are in datastore because we explicitly decided they should be moved inside datastore (before we had storage hints which would have been the butler using the templates). Not all datastores should define templates so I'm not sure why registry wants to define them. Is the issue that DataSetTypes are defined in registry and it seems odd that data store needs to have its own list of them? If I have chained datastores shouldn't I be able to use different templates in each one?
tests/test_templates.py
Outdated
self.assertTemplate(tmplstr, "calexp.wcs/v00052_fU", | ||
self.du, datasettype="calexp.wcs") | ||
self.assertTemplate(tmplstr, "calexp.psf/v00052_fU", | ||
self.du, datasettype="calexp", component="psf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without components? With multiple optionals?
tests/test_templates.py
Outdated
self.du, datasettype="calexp", component="psf") | ||
|
||
with self.assertRaises(KeyError): | ||
self.assertTemplate(tmplstr, "", self.du, component="wcs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different testcase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with having a separate test method for components.
Templates are specified in the YAML configuration file and indexed by dataset type.
Newer versions of sqlalchemy do not like generators.
@pschella I've rejigged things so that template collections are in their own class and I've extended the tests. Are you now okay with these changes? We agreed we would not move the template definitions just yet. |
Templates are specified in the YAML configuration file and indexed
by dataset type.
Needs some clean up but fundamentally seems sound. Doesn't yet support fallback to a user function and I haven't got standalone tests for the template formatting.