Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-30335: Add special flag to support put on execution butler #529

Merged
merged 4 commits into from May 27, 2021

Conversation

timj
Copy link
Member

@timj timj commented May 26, 2021

allow_put_of_predefined_dataset can be set to True to allow a put to work even if the registry thinks it knows about the entry. The put will still fail if datastore knows about the dataset.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

if self._allow_put_of_predefined_dataset:
# Get the matching ref for this run.
existing_refs = list(self.registry.queryDatasets(datasetType, collections=run,
dataId=dataId))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use Registry.findDataset instead here; that will be more efficient, and it will always return either a resolved ref or None. Getting more than that should be impossible anyway due to database unique constraints.

I also think it'd be a lot cleaner if we actually put the resolved output refs in the QuantumGraph and only enabled this logic if the ref was already resolved (or even just called butler.datastore.put). But that involves making the execution butler before we actually write out the QG (or generating UUIDs for predicted datasets out the butler, once we can assume UUIDs everwhere), so it's probably not worth worrying about right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Sorry. Will switch. I remembered getDataset but that needed the ID.

I also just had to push a change that is bigger than I expected because writing the quick test demonstrated that calling datastore.put leads to the data file being deleted if it goes wrong and that is very bad if you were relying on the file already being there (and it also overwrites it). I've added a datastore.knows() API to work around that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New changes look good, too, though all of this special handling for execution butlers reinforces my thinking that we want to do something (like QuantumDirectory) to let us drop it someday. But not today. Merge away, whenever you're ready.

timj added 4 commits May 27, 2021 09:36
This is distinct from asking if it knows about the dataset and
the artifact exists.
allow_put_of_predefined_dataset can be set to True to
allow a put to work even if the registry thinks it
knows about the entry. If datastore already knows about
the dataset the put will still fail.
@timj timj merged commit 7204485 into master May 27, 2021
@timj timj deleted the tickets/DM-30335 branch May 27, 2021 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants