DM-12432: Fix timing measurement construction #9

kfindeisen · 2017-11-09T01:37:13Z

ap_pipe has been changed to always report "full" metadata (i.e., metadata for a task and all its subtasks) for each pipeline stage. This is the format expected by ap_verify, which needs to compute metrics for both top-level- and subtasks.

For ImageDifferenceTask, which is currently the only top-level task called using parseAndRun(), I've opted to recover the metadata using the Butler rather than implementing DM-11935. However, this seems a rather clumsy solution -- in order to get the metadata for a (sub)task, I need to both explicitly identify the associated top-level task and pass in a parsed data ID. This approach seems like it will be difficult to scale up to a more generic measurement-handling system. Is there a way to cast a broader net using the Butler (e.g., "return all metadata for all data IDs")?

jdswinbank · 2017-11-22T00:30:19Z

python/lsst/ap/pipe/ap_pipe.py

@@ -802,8 +804,10 @@ def _doDiffIm(processed_repo, dataId, templateType, template, diffim_repo):
    log.info('Running ImageDifference...')
    if not os.path.isdir(diffim_repo):
        os.mkdir(diffim_repo)
-    diffim_result = ImageDifferenceTask.parseAndRun(args=args, config=config, doReturnResults=True)
-    diffim_metadata = diffim_result.resultList[0].metadata
+    ImageDifferenceTask.parseAndRun(args=args, config=config, doReturnResults=True)


Since you're no longer grabbing the results of this call to parseAndRun, you can presumably drop the doReturnResults.

jdswinbank · 2017-11-22T01:00:54Z

python/lsst/ap/pipe/ap_pipe.py

+    ImageDifferenceTask.parseAndRun(args=args, config=config, doReturnResults=True)
+    butler = dafPersist.Butler(inputs=diffim_repo)
+    metadataType = ImageDifferenceTask()._getMetadataName()
+    diffim_metadata = butler.get(metadataType, dataId_dict)


I found this a bit confusing, primarily because I'm not sure I understand what's supposed to be in dataId.

The example in the docstring for this function looks as though it's the sort of string you might pass on the command line, potentially specifying multiple data units (e.g. ccd=1^2^3^4 visit=5^6^7^8). But the way it's treated in the code implies that it's actually a fully-specified data ID.

This makes a difference, since you're handing it off to parseAndRun, which will iterate over all the data units if there's more than one. And if that happens, I think your logic above will break (since you're not requesting a well-qualified unit of metadata from the Butler). (Mind you, other logic in the function will probably have broken first...).

All of this makes me wonder why we're calling parseAndRun() here, rather than just run() (which is what we do in _doProcessCcd() above): the latter seems much more straightforward and easy to reason about.

Of course, introducing parseAndRun() is not a new innovation in this changeset.

You are correct that the code will break if there are multiple data units specified. However, other functions (such as _doProcessCcd) will break under those circumstances as well, because they always assume one visit and frequently assume one CCD. IMO this is the main benefit of making ap_pipe a command-line task: the dataID-expanding code in pipe_base is so tightly coupled to the rest of ArgumentParser that using the whole thing is about the only option.

I would argue that parseAndRun is what we should be calling, because that would let us support parallel processing of multiple datasets once the too-specific dataID handling is fixed. Meredith did not use it for the other top-level tasks because she said that something was incompatible with parseAndRun... I think it might have been an obs_decam issue, but I unfortunately don't remember.

kfindeisen requested a review from jdswinbank November 9, 2017 01:37

kfindeisen force-pushed the tickets/DM-12432 branch from 55f1b90 to ff75694 Compare November 21, 2017 23:29

jdswinbank approved these changes Nov 22, 2017

View reviewed changes

Consistently return full metadata from ap_pipe.

53bd55f

kfindeisen force-pushed the tickets/DM-12432 branch from ff75694 to 53bd55f Compare November 22, 2017 18:26

kfindeisen merged commit 53bd55f into master Nov 22, 2017

kfindeisen deleted the tickets/DM-12432 branch February 25, 2019 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-12432: Fix timing measurement construction #9

DM-12432: Fix timing measurement construction #9

kfindeisen commented Nov 9, 2017 •

edited

jdswinbank Nov 22, 2017

jdswinbank Nov 22, 2017

kfindeisen Nov 22, 2017 •

edited

DM-12432: Fix timing measurement construction #9

DM-12432: Fix timing measurement construction #9

Conversation

kfindeisen commented Nov 9, 2017 • edited

jdswinbank Nov 22, 2017

Choose a reason for hiding this comment

jdswinbank Nov 22, 2017

Choose a reason for hiding this comment

kfindeisen Nov 22, 2017 • edited

Choose a reason for hiding this comment

kfindeisen commented Nov 9, 2017 •

edited

kfindeisen Nov 22, 2017 •

edited