DM- 26629: switch to calibration collections instead of the calibration_label dimension #148

TallJimbo · 2020-09-24T13:45:23Z

No description provided.

This method was clearly intended for this purpose, but wasn't being used, and couldn't actually be used due to a missing base-class version and lack of support for parent storage classes.

QG generation and the executor classes already guarantee this class is given resolved DatasetRefs, so this always saves an extra Registry lookup, and it's now necessary for calibration lookups, which can no longer be done on data ID alone (a timespan is needed as well).

natelust

I think it mostly looks fine

natelust · 2020-09-24T17:30:35Z

python/lsst/pipe/base/butlerQuantumContext.py

@@ -71,11 +71,11 @@ def __init__(self, butler: Butler, quantum: Quantum):
        def _get(self, ref):
            if isinstance(ref, DeferredDatasetRef):
                self._checkMembership(ref.datasetRef, self.allInputs)
-                return butler.getDeferred(ref.datasetRef)
+                return butler.getDirectDeferred(ref.datasetRef)


Im not supper comfortable linking this to something that MUST have been done by some other class. At minimum you should throw an execption here if ref.id is none (or in the constructor), and add something to the documentation.

getDirectDeferred will raise an exception if the ref.id is None.

Yes, thanks to Tim's comment on the daf_butler PR. As we discussed out-of-band, I'm also uncomfortable with this relying on logic in ctrl_mpexec, but I think the problem is where that code lives, not what it does or guarantees. But in addition to that check for not-None ID (which we can defer to butler), I'll go add some documentation to the ButlerQuantumContext class docs and anywhere else I can find that resolved DatasetRefs are a precondition.

natelust · 2020-09-24T17:30:52Z

python/lsst/pipe/base/butlerQuantumContext.py


            else:
                self._checkMembership(ref, self.allInputs)
-                return butler.get(ref)
+                return butler.getDirect(ref)


same as above

getDirect already gets upset if it's not a resolved ref.

natelust · 2020-09-25T22:04:30Z

python/lsst/pipe/base/graphBuilder.py

                        refs = list(
                            lookupFunction(datasetType, registry, quantum.dataId, collections)
                        )
+                    elif (datasetType.isCalibration()


My only comment about this block is that I had always intended to go back and make the else a free function so it was easier to use in a custom lookupFunction that wanted to extend that behavior. Perhaps you could do that and the same for the calibration bits if you think it is worth it. Its not too much code to copy for someone, and its use will be small so if you dont think its a good use of time I am fine with you not.

I agree that's a good idea, but this ticket is too big and cumbersome for any more scope, and I definitely plan to be revisit this block in the next month anyway.

natelust · 2020-09-25T22:05:34Z

python/lsst/pipe/base/graphBuilder.py

+                        # temporal join on a non-dimension-based timespan yet.
+                        timespan = quantum.dataId.timespan
+                        try:
+                            refs = [registry.findDataset(datasetType, quantum.dataId,


Can you be sure that this will only ever return one, is an exception raised if more than one could be found for a timespan? Does the certification process prevent this?

Yes, findDataset will raise in that case. The certification process is supposed to guarantee that is impossible for any calibration I can think of, but it's not something we can guarantee at the level of database constraints. So the exception basically says, "someone probably put together a malformed calibration repo".

natelust · 2020-09-25T22:07:10Z

python/lsst/pipe/base/graphBuilder.py

                        refs = list(registry.queryDatasets(datasetType,
                                                           collections=collections,
                                                           dataId=quantum.dataId,
                                                           deduplicate=True).expanded())
-                    quantum.prerequisites[datasetType].update({ref.dataId: ref for ref in refs})
+                    quantum.prerequisites[datasetType].update({ref.dataId: ref for ref in refs
+                                                               if ref is not None})


If my ticket lands first, this is broken, infact... looking at this makes me feel I need to go look back at my ticket...

Nevermind, this is a QuantumScaffolding

natelust · 2020-09-25T22:08:44Z

python/lsst/pipe/base/testUtils.py

+                newRefsForDatasetType.append(resolvedRef)
+            else:
+                newRefsForDatasetType.append(ref)
+        refsForDatasetType[:] = newRefsForDatasetType


This changes too with my ticket, the race is on, maybe I should have held back these comments? :)

At least (I imagine) it'll be an easy change in either case.

TallJimbo force-pushed the tickets/DM-26629 branch from fe3bb94 to 73fbc3a Compare September 24, 2020 13:59

TallJimbo added 5 commits September 25, 2020 15:05

Add isCalibration flag for dataset type definition to connections.

01e40fb

Special-case calibration lookups in QG generation.

65aa6a2

Actually use connection methods for creating dataset types.

20d5031

This method was clearly intended for this purpose, but wasn't being used, and couldn't actually be used due to a missing base-class version and lack of support for parent storage classes.

Resolve datasets prior to calling runQuantum in test utilities.

1158623

TallJimbo force-pushed the tickets/DM-26629 branch from 73fbc3a to 1b0a4c5 Compare September 25, 2020 19:05

natelust approved these changes Sep 25, 2020

View reviewed changes

TallJimbo merged commit 1ce627f into master Sep 26, 2020

TallJimbo deleted the tickets/DM-26629 branch September 26, 2020 05:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM- 26629: switch to calibration collections instead of the calibration_label dimension #148

DM- 26629: switch to calibration collections instead of the calibration_label dimension #148

TallJimbo commented Sep 24, 2020

natelust left a comment

natelust Sep 24, 2020

timj Sep 25, 2020

TallJimbo Sep 26, 2020

natelust Sep 24, 2020

timj Sep 25, 2020

natelust Sep 25, 2020

TallJimbo Sep 26, 2020

natelust Sep 25, 2020

TallJimbo Sep 26, 2020

natelust Sep 25, 2020

natelust Sep 25, 2020

natelust Sep 25, 2020

TallJimbo Sep 26, 2020

DM- 26629: switch to calibration collections instead of the calibration_label dimension #148

DM- 26629: switch to calibration collections instead of the calibration_label dimension #148

Conversation

TallJimbo commented Sep 24, 2020

natelust left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment