New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-12432: Fix timing measurement construction #9
Conversation
55f1b90
to
ff75694
Compare
python/lsst/ap/pipe/ap_pipe.py
Outdated
@@ -802,8 +804,10 @@ def _doDiffIm(processed_repo, dataId, templateType, template, diffim_repo): | |||
log.info('Running ImageDifference...') | |||
if not os.path.isdir(diffim_repo): | |||
os.mkdir(diffim_repo) | |||
diffim_result = ImageDifferenceTask.parseAndRun(args=args, config=config, doReturnResults=True) | |||
diffim_metadata = diffim_result.resultList[0].metadata | |||
ImageDifferenceTask.parseAndRun(args=args, config=config, doReturnResults=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're no longer grabbing the results of this call to parseAndRun
, you can presumably drop the doReturnResults
.
python/lsst/ap/pipe/ap_pipe.py
Outdated
ImageDifferenceTask.parseAndRun(args=args, config=config, doReturnResults=True) | ||
butler = dafPersist.Butler(inputs=diffim_repo) | ||
metadataType = ImageDifferenceTask()._getMetadataName() | ||
diffim_metadata = butler.get(metadataType, dataId_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this a bit confusing, primarily because I'm not sure I understand what's supposed to be in dataId
.
The example in the docstring for this function looks as though it's the sort of string you might pass on the command line, potentially specifying multiple data units (e.g. ccd=1^2^3^4 visit=5^6^7^8
). But the way it's treated in the code implies that it's actually a fully-specified data ID.
This makes a difference, since you're handing it off to parseAndRun
, which will iterate over all the data units if there's more than one. And if that happens, I think your logic above will break (since you're not requesting a well-qualified unit of metadata from the Butler). (Mind you, other logic in the function will probably have broken first...).
All of this makes me wonder why we're calling parseAndRun()
here, rather than just run()
(which is what we do in _doProcessCcd()
above): the latter seems much more straightforward and easy to reason about.
Of course, introducing parseAndRun()
is not a new innovation in this changeset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct that the code will break if there are multiple data units specified. However, other functions (such as _doProcessCcd
) will break under those circumstances as well, because they always assume one visit and frequently assume one CCD. IMO this is the main benefit of making ap_pipe
a command-line task: the dataID-expanding code in pipe_base
is so tightly coupled to the rest of ArgumentParser
that using the whole thing is about the only option.
I would argue that parseAndRun
is what we should be calling, because that would let us support parallel processing of multiple datasets once the too-specific dataID handling is fixed. Meredith did not use it for the other top-level tasks because she said that something was incompatible with parseAndRun
... I think it might have been an obs_decam
issue, but I unfortunately don't remember.
ff75694
to
53bd55f
Compare
ap_pipe
has been changed to always report "full" metadata (i.e., metadata for a task and all its subtasks) for each pipeline stage. This is the format expected byap_verify
, which needs to compute metrics for both top-level- and subtasks.For
ImageDifferenceTask
, which is currently the only top-level task called usingparseAndRun()
, I've opted to recover the metadata using the Butler rather than implementing DM-11935. However, this seems a rather clumsy solution -- in order to get the metadata for a (sub)task, I need to both explicitly identify the associated top-level task and pass in a parsed data ID. This approach seems like it will be difficult to scale up to a more generic measurement-handling system. Is there a way to cast a broader net using the Butler (e.g., "return all metadata for all data IDs")?