Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-12432: Fix timing measurement construction #9

Merged
merged 1 commit into from Nov 22, 2017

Conversation

kfindeisen
Copy link
Member

@kfindeisen kfindeisen commented Nov 9, 2017

ap_pipe has been changed to always report "full" metadata (i.e., metadata for a task and all its subtasks) for each pipeline stage. This is the format expected by ap_verify, which needs to compute metrics for both top-level- and subtasks.

For ImageDifferenceTask, which is currently the only top-level task called using parseAndRun(), I've opted to recover the metadata using the Butler rather than implementing DM-11935. However, this seems a rather clumsy solution -- in order to get the metadata for a (sub)task, I need to both explicitly identify the associated top-level task and pass in a parsed data ID. This approach seems like it will be difficult to scale up to a more generic measurement-handling system. Is there a way to cast a broader net using the Butler (e.g., "return all metadata for all data IDs")?

@@ -802,8 +804,10 @@ def _doDiffIm(processed_repo, dataId, templateType, template, diffim_repo):
log.info('Running ImageDifference...')
if not os.path.isdir(diffim_repo):
os.mkdir(diffim_repo)
diffim_result = ImageDifferenceTask.parseAndRun(args=args, config=config, doReturnResults=True)
diffim_metadata = diffim_result.resultList[0].metadata
ImageDifferenceTask.parseAndRun(args=args, config=config, doReturnResults=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're no longer grabbing the results of this call to parseAndRun, you can presumably drop the doReturnResults.

ImageDifferenceTask.parseAndRun(args=args, config=config, doReturnResults=True)
butler = dafPersist.Butler(inputs=diffim_repo)
metadataType = ImageDifferenceTask()._getMetadataName()
diffim_metadata = butler.get(metadataType, dataId_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this a bit confusing, primarily because I'm not sure I understand what's supposed to be in dataId.

The example in the docstring for this function looks as though it's the sort of string you might pass on the command line, potentially specifying multiple data units (e.g. ccd=1^2^3^4 visit=5^6^7^8). But the way it's treated in the code implies that it's actually a fully-specified data ID.

This makes a difference, since you're handing it off to parseAndRun, which will iterate over all the data units if there's more than one. And if that happens, I think your logic above will break (since you're not requesting a well-qualified unit of metadata from the Butler). (Mind you, other logic in the function will probably have broken first...).

All of this makes me wonder why we're calling parseAndRun() here, rather than just run() (which is what we do in _doProcessCcd() above): the latter seems much more straightforward and easy to reason about.

Of course, introducing parseAndRun() is not a new innovation in this changeset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct that the code will break if there are multiple data units specified. However, other functions (such as _doProcessCcd) will break under those circumstances as well, because they always assume one visit and frequently assume one CCD. IMO this is the main benefit of making ap_pipe a command-line task: the dataID-expanding code in pipe_base is so tightly coupled to the rest of ArgumentParser that using the whole thing is about the only option.

I would argue that parseAndRun is what we should be calling, because that would let us support parallel processing of multiple datasets once the too-specific dataID handling is fixed. Meredith did not use it for the other top-level tasks because she said that something was incompatible with parseAndRun... I think it might have been an obs_decam issue, but I unfortunately don't remember.

@kfindeisen kfindeisen merged commit 53bd55f into master Nov 22, 2017
@kfindeisen kfindeisen deleted the tickets/DM-12432 branch February 25, 2019 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants