Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-13350: Use templates to work out file names from data units #17

Merged
merged 6 commits into from Mar 9, 2018

Conversation

timj
Copy link
Member

@timj timj commented Feb 22, 2018

Templates are specified in the YAML configuration file and indexed
by dataset type.

Needs some clean up but fundamentally seems sound. Doesn't yet support fallback to a user function and I haven't got standalone tests for the template formatting.

@timj timj force-pushed the tickets/DM-13350 branch 2 times, most recently from 497b9cf to a31a2dd Compare March 1, 2018 21:09
@timj timj requested a review from pschella March 1, 2018 21:10
The templates use the standard Format Specification Mini-Language
with the caveat that only named fields can be used. The field names
are taken from the DataUnits along with two additional fields:
"datasettype" will be replaced with the DataSetType and "component"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"datasetType" and DatasetType.

def __init__(self, template):
self.template = template

def format(self, dataunits, datasettype=None, component=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataUnits or units and datasetType.

component.
component : `str`, optional
Component of a composite. If `datasettype` defines a component
this parameter will be ignored.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same in the above.


def definedUnits(self):
"""DataUnits with non-None values."""
return {k: v for k, v in self.units.items() if v is not None}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is needed. Should be absorbed into DataUnitTypeSet. But we will have to do that later anyway, so fine to keep it around for now I suppose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know we had a class for that. I asked on Slack but got no hints. Sorry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see a DataUnitTypeSet in the codebase.

# calexp.wcs means wcs component of a calexp
if "." in datasettype:
datasettype, component = datasettype.split(".", maxsplit=1)
units["datasettype"] = datasettype
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the name units becomes confusing. Better call it something else, like fields.

# Complain if we were meant to use a component
if component is not None and not usedComponent:
raise KeyError("Component {} specified but template {} did not use it".format(component,
self.template))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this different from any other field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because if you explicitly specify a component but the component is not used that's going to be a problem because it means the template does not specify how to distinguish components.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was assuming that DataUnits could be far more detailed than is needed for the file templates. That's why I didn't check that every unit was used.


if template is None:
if "default" in self.templates:
template = self.templates["default"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loading could go into a separate function or classmethod of Template.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is determining the right template to use based on the dataset type by looking up in the datastore's template registry. I think I can make this work like the Factory we use for formatters and storage classes (although we are storing instances and not classes).

templates:
default: "{datasettype}/{tract:?}/{patch:?}/{filter:?}/{visit:?}"
calexp: "{datasettype}.{component:?}/{datasettype}_v{visit}_f{filter}_{component:?}"
metric: "{datasettype}.{component:?}/{datasettype}_v{visit:08d}_f{filter}_{component:?}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per DatasetType templates should (probably) live in Registry. Given that we are dynamically adding them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are in datastore because we explicitly decided they should be moved inside datastore (before we had storage hints which would have been the butler using the templates). Not all datastores should define templates so I'm not sure why registry wants to define them. Is the issue that DataSetTypes are defined in registry and it seems odd that data store needs to have its own list of them? If I have chained datastores shouldn't I be able to use different templates in each one?

self.assertTemplate(tmplstr, "calexp.wcs/v00052_fU",
self.du, datasettype="calexp.wcs")
self.assertTemplate(tmplstr, "calexp.psf/v00052_fU",
self.du, datasettype="calexp", component="psf")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without components? With multiple optionals?

self.du, datasettype="calexp", component="psf")

with self.assertRaises(KeyError):
self.assertTemplate(tmplstr, "", self.du, component="wcs")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different testcase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with having a separate test method for components.

@timj
Copy link
Member Author

timj commented Mar 9, 2018

@pschella I've rejigged things so that template collections are in their own class and I've extended the tests. Are you now okay with these changes? We agreed we would not move the template definitions just yet.

@timj timj merged commit 7f01492 into master Mar 9, 2018
@ktlim ktlim deleted the tickets/DM-13350 branch August 25, 2018 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants