Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-25919: utilize new butler query functionality in QuantumGraph generation #139

Merged
merged 4 commits into from Aug 7, 2020

Conversation

TallJimbo
Copy link
Member

No description provided.

Copy link
Contributor

@natelust natelust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments

_LOG.debug("Iterating over query results to associate quanta with datasets.")
# Iterate over query results, populating data IDs for datasets and
# quanta and then connecting them to each other.
n = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for typing you generally just write n: int and not add other interpreter work

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't just for typing; there is logging code after the loop body that uses n, and we don't want that to crash if the loop is never executed.

@@ -495,6 +497,15 @@ def connectDataIds(self, registry, collections, userQuery):
Object representing the collections to search for input datasets.
userQuery : `str`, optional
User-provided expression to limit the data IDs processed.

Returns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not right, this function is not a function once the context manager hits it. The function call actually returns a object that can be used in a with statement. The result of returned_object.enter is what you are documenting here. I really don't know how to document that, but what you wrote here is potentially confusing to someone trying to use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know. This seemed like the the least-bad way to document it, and I figured the docstring itself explained the relationship to the context manager well enough.

for datasetType, refs in itertools.chain(self.inputs.items(), self.intermediates.items(),
self.outputs.items()):
datasetDataId = commonDataId.subset(datasetType.dimensions)
ref = refs.get(datasetDataId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider just biting the bullet and doing the double hash lookup that should be fast to make this cleaner to read (and below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using get here to avoid try...__getitem__...except for control flow, not to avoid a double lookup. Or maybe I'm just not seeing what you're proposing as an alternative?

This avoids the many-single-row-queries problem in data ID expansion
and regular dataset lookups.  The biggest problem, in prerequisite
dataset lookups, still exists, but will probably be deferred to another
ticket that builds on this one.
Higher level logic can explicitly skip writing these if they already
exist, so this code can't consider that an error.
This addresses a change to the default behavior of queryDatasets in
daf_butler.
@TallJimbo
Copy link
Member Author

@natelust, there are some conversations ongoing on this ticket (which I'd forgotten about when I asked you whether you'd signed off earlier), but Jenkins is green, this is a multi-package merge, and I don't want to miss my window and have to get through Jenkins again. I'm happy to fix anything that comes out of these ongoing conversations on my next ticket.

@TallJimbo TallJimbo merged commit 9573cbd into master Aug 7, 2020
@TallJimbo TallJimbo deleted the tickets/DM-25919 branch August 7, 2020 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants