New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-30335: Add special flag to support put on execution butler #529
Conversation
python/lsst/daf/butler/_butler.py
Outdated
if self._allow_put_of_predefined_dataset: | ||
# Get the matching ref for this run. | ||
existing_refs = list(self.registry.queryDatasets(datasetType, collections=run, | ||
dataId=dataId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use Registry.findDataset
instead here; that will be more efficient, and it will always return either a resolved ref or None
. Getting more than that should be impossible anyway due to database unique constraints.
I also think it'd be a lot cleaner if we actually put the resolved output refs in the QuantumGraph and only enabled this logic if the ref was already resolved (or even just called butler.datastore.put
). But that involves making the execution butler before we actually write out the QG (or generating UUIDs for predicted datasets out the butler, once we can assume UUIDs everwhere), so it's probably not worth worrying about right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Sorry. Will switch. I remembered getDataset but that needed the ID.
I also just had to push a change that is bigger than I expected because writing the quick test demonstrated that calling datastore.put leads to the data file being deleted if it goes wrong and that is very bad if you were relying on the file already being there (and it also overwrites it). I've added a datastore.knows()
API to work around that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New changes look good, too, though all of this special handling for execution butlers reinforces my thinking that we want to do something (like QuantumDirectory) to let us drop it someday. But not today. Merge away, whenever you're ready.
This is distinct from asking if it knows about the dataset and the artifact exists.
allow_put_of_predefined_dataset can be set to True to allow a put to work even if the registry thinks it knows about the entry. If datastore already knows about the dataset the put will still fail.
allow_put_of_predefined_dataset
can be set toTrue
to allow a put to work even if the registry thinks it knows about the entry. The put will still fail if datastore knows about the dataset.Checklist
doc/changes