DM-25919: utilize new butler query functionality in QuantumGraph generation #139

TallJimbo · 2020-07-19T01:46:16Z

No description provided.

natelust

A few comments

natelust · 2020-07-21T17:45:45Z

python/lsst/pipe/base/graphBuilder.py

+            _LOG.debug("Iterating over query results to associate quanta with datasets.")
+            # Iterate over query results, populating data IDs for datasets and
+            # quanta and then connecting them to each other.
+            n = 0


I think for typing you generally just write n: int and not add other interpreter work

This isn't just for typing; there is logging code after the loop body that uses n, and we don't want that to crash if the loop is never executed.

natelust · 2020-07-21T20:48:42Z

python/lsst/pipe/base/graphBuilder.py

@@ -495,6 +497,15 @@ def connectDataIds(self, registry, collections, userQuery):
            Object representing the collections to search for input datasets.
        userQuery : `str`, optional
            User-provided expression to limit the data IDs processed.
+
+        Returns


This is not right, this function is not a function once the context manager hits it. The function call actually returns a object that can be used in a with statement. The result of returned_object.enter is what you are documenting here. I really don't know how to document that, but what you wrote here is potentially confusing to someone trying to use it.

Yeah, I know. This seemed like the the least-bad way to document it, and I figured the docstring itself explained the relationship to the context manager well enough.

natelust · 2020-07-21T21:35:15Z

python/lsst/pipe/base/graphBuilder.py

+                for datasetType, refs in itertools.chain(self.inputs.items(), self.intermediates.items(),
+                                                         self.outputs.items()):
+                    datasetDataId = commonDataId.subset(datasetType.dimensions)
+                    ref = refs.get(datasetDataId)


consider just biting the bullet and doing the double hash lookup that should be fast to make this cleaner to read (and below)

I'm using get here to avoid try...__getitem__...except for control flow, not to avoid a double lookup. Or maybe I'm just not seeing what you're proposing as an alternative?

This avoids the many-single-row-queries problem in data ID expansion and regular dataset lookups. The biggest problem, in prerequisite dataset lookups, still exists, but will probably be deferred to another ticket that builds on this one.

Higher level logic can explicitly skip writing these if they already exist, so this code can't consider that an error.

This addresses a change to the default behavior of queryDatasets in daf_butler.

TallJimbo · 2020-08-07T19:39:22Z

@natelust, there are some conversations ongoing on this ticket (which I'd forgotten about when I asked you whether you'd signed off earlier), but Jenkins is green, this is a multi-package merge, and I don't want to miss my window and have to get through Jenkins again. I'm happy to fix anything that comes out of these ongoing conversations on my next ticket.

natelust approved these changes Aug 6, 2020

View reviewed changes

TallJimbo added 4 commits August 7, 2020 09:34

Add more logging to GraphBuilder.

ec304d9

Don't complain about existing init outputs in GraphBuilder.

aefa74b

Higher level logic can explicitly skip writing these if they already exist, so this code can't consider that an error.

Ensure prerequisite DatasetRefs have expanded data IDs.

f33be83

This addresses a change to the default behavior of queryDatasets in daf_butler.

TallJimbo force-pushed the tickets/DM-25919 branch from c504440 to f33be83 Compare August 7, 2020 13:34

TallJimbo merged commit 9573cbd into master Aug 7, 2020

TallJimbo deleted the tickets/DM-25919 branch August 7, 2020 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-25919: utilize new butler query functionality in QuantumGraph generation #139

DM-25919: utilize new butler query functionality in QuantumGraph generation #139

TallJimbo commented Jul 19, 2020

natelust left a comment

natelust Jul 21, 2020

TallJimbo Aug 7, 2020

natelust Jul 21, 2020

TallJimbo Aug 7, 2020

natelust Jul 21, 2020

TallJimbo Aug 7, 2020

TallJimbo commented Aug 7, 2020

DM-25919: utilize new butler query functionality in QuantumGraph generation #139

DM-25919: utilize new butler query functionality in QuantumGraph generation #139

Conversation

TallJimbo commented Jul 19, 2020

natelust left a comment

Choose a reason for hiding this comment

natelust Jul 21, 2020

Choose a reason for hiding this comment

TallJimbo Aug 7, 2020

Choose a reason for hiding this comment

natelust Jul 21, 2020

Choose a reason for hiding this comment

TallJimbo Aug 7, 2020

Choose a reason for hiding this comment

natelust Jul 21, 2020

Choose a reason for hiding this comment

TallJimbo Aug 7, 2020

Choose a reason for hiding this comment

TallJimbo commented Aug 7, 2020