-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM- 26629: switch to calibration collections instead of the calibration_label dimension #148
Changes from all commits
01e40fb
65aa6a2
20d5031
1158623
1b0a4c5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -71,11 +71,11 @@ def __init__(self, butler: Butler, quantum: Quantum): | |
def _get(self, ref): | ||
if isinstance(ref, DeferredDatasetRef): | ||
self._checkMembership(ref.datasetRef, self.allInputs) | ||
return butler.getDeferred(ref.datasetRef) | ||
return butler.getDirectDeferred(ref.datasetRef) | ||
|
||
else: | ||
self._checkMembership(ref, self.allInputs) | ||
return butler.get(ref) | ||
return butler.getDirect(ref) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same as above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. getDirect already gets upset if it's not a resolved ref. |
||
|
||
def _put(self, value, ref): | ||
self._checkMembership(ref, self.allOutputs) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -707,22 +707,37 @@ def resolveDatasetRefs(self, registry, collections, run, commonDataIds, *, skipE | |
# These may have dimensions that extend beyond those we queried | ||
# for originally, because we want to permit those data ID | ||
# values to differ across quanta and dataset types. | ||
# For example, the same quantum may have a flat and bias with | ||
# a different calibration_label, or a refcat with a skypix | ||
# value that overlaps the quantum's data ID's region, but not | ||
# the user expression used for the initial query. | ||
for datasetType in task.prerequisites: | ||
lookupFunction = lookupFunctions.get(datasetType.name) | ||
if lookupFunction is not None: | ||
# PipelineTask has provided its own function to do the | ||
# lookup. This always takes precedence. | ||
refs = list( | ||
lookupFunction(datasetType, registry, quantum.dataId, collections) | ||
) | ||
elif (datasetType.isCalibration() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My only comment about this block is that I had always intended to go back and make the else a free function so it was easier to use in a custom lookupFunction that wanted to extend that behavior. Perhaps you could do that and the same for the calibration bits if you think it is worth it. Its not too much code to copy for someone, and its use will be small so if you dont think its a good use of time I am fine with you not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that's a good idea, but this ticket is too big and cumbersome for any more scope, and I definitely plan to be revisit this block in the next month anyway. |
||
and datasetType.dimensions <= quantum.dataId.graph | ||
and quantum.dataId.graph.temporal): | ||
# This is a master calibration lookup, which we have to | ||
# handle specially because the query system can't do a | ||
# temporal join on a non-dimension-based timespan yet. | ||
timespan = quantum.dataId.timespan | ||
try: | ||
refs = [registry.findDataset(datasetType, quantum.dataId, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you be sure that this will only ever return one, is an exception raised if more than one could be found for a timespan? Does the certification process prevent this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, |
||
collections=collections, | ||
timespan=timespan)] | ||
except KeyError: | ||
# This dataset type is not present in the registry, | ||
# which just means there are no datasets here. | ||
refs = [] | ||
else: | ||
# Most general case. | ||
refs = list(registry.queryDatasets(datasetType, | ||
collections=collections, | ||
dataId=quantum.dataId, | ||
deduplicate=True).expanded()) | ||
quantum.prerequisites[datasetType].update({ref.dataId: ref for ref in refs}) | ||
quantum.prerequisites[datasetType].update({ref.dataId: ref for ref in refs | ||
if ref is not None}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If my ticket lands first, this is broken, infact... looking at this makes me feel I need to go look back at my ticket... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nevermind, this is a QuantumScaffolding |
||
# Actually remove any quanta that we decided to skip above. | ||
if dataIdsToSkip: | ||
_LOG.debug("Pruning %d quanta for task with label '%s' because all of their outputs exist.", | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -163,6 +163,41 @@ def _refFromConnection(butler, connection, dataId, **kwargs): | |
from e | ||
|
||
|
||
def _resolveTestQuantumInputs(butler, quantum): | ||
"""Look up all input datasets a test quantum in the `Registry` to resolve | ||
all `DatasetRef` objects (i.e. ensure they have not-`None` ``id`` and | ||
``run`` attributes). | ||
|
||
Parameters | ||
---------- | ||
quantum : `~lsst.daf.butler.Quantum` | ||
Single Quantum instance. | ||
butler : `~lsst.daf.butler.Butler` | ||
Data butler. | ||
""" | ||
# TODO (DM-26819): This function is a direct copy of | ||
# `lsst.ctrl.mpexec.SingleQuantumExecutor.updateQuantumInputs`, but the | ||
# `runTestQuantum` function that calls it is essentially duplicating logic | ||
# in that class as well (albeit not verbatim). We should probably move | ||
# `SingleQuantumExecutor` to ``pipe_base`` and see if it is directly usable | ||
# in test code instead of having these classes at all. | ||
for refsForDatasetType in quantum.predictedInputs.values(): | ||
newRefsForDatasetType = [] | ||
for ref in refsForDatasetType: | ||
if ref.id is None: | ||
resolvedRef = butler.registry.findDataset(ref.datasetType, ref.dataId, | ||
collections=butler.collections) | ||
if resolvedRef is None: | ||
raise ValueError( | ||
f"Cannot find {ref.datasetType.name} with id {ref.dataId} " | ||
f"in collections {butler.collections}." | ||
) | ||
newRefsForDatasetType.append(resolvedRef) | ||
else: | ||
newRefsForDatasetType.append(ref) | ||
refsForDatasetType[:] = newRefsForDatasetType | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This changes too with my ticket, the race is on, maybe I should have held back these comments? :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least (I imagine) it'll be an easy change in either case. |
||
|
||
|
||
def runTestQuantum(task, butler, quantum, mockRun=True): | ||
"""Run a PipelineTask on a Quantum. | ||
|
||
|
@@ -185,6 +220,7 @@ def runTestQuantum(task, butler, quantum, mockRun=True): | |
If ``mockRun`` is set, the mock that replaced ``run``. This object can | ||
be queried for the arguments ``runQuantum`` passed to ``run``. | ||
""" | ||
_resolveTestQuantumInputs(butler, quantum) | ||
butlerQc = ButlerQuantumContext(butler, quantum) | ||
connections = task.config.ConnectionsClass(config=task.config) | ||
inputRefs, outputRefs = connections.buildDatasetRefs(quantum) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not supper comfortable linking this to something that MUST have been done by some other class. At minimum you should throw an execption here if ref.id is none (or in the constructor), and add something to the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getDirectDeferred will raise an exception if the ref.id is None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks to Tim's comment on the daf_butler PR. As we discussed out-of-band, I'm also uncomfortable with this relying on logic in ctrl_mpexec, but I think the problem is where that code lives, not what it does or guarantees. But in addition to that check for not-None ID (which we can defer to butler), I'll go add some documentation to the
ButlerQuantumContext
class docs and anywhere else I can find that resolvedDatasetRefs
are a precondition.