Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-14359: Fix data ID handling in ap_* #19

Merged
merged 6 commits into from May 14, 2018
Merged

Conversation

kfindeisen
Copy link
Member

This PR fixes several bugs in how datarefs were passed to ApPipeTask. It also updates ap_pipe to current style standards and removes use of future.

@kfindeisen kfindeisen requested a review from parejkoj May 8, 2018 22:59
Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say something about your __call__ being an almost direct copy of the one in TaskRunner, but it looks like you've already got tickets for that.

_siblingRef() worries me: you shouldn't have to do that at all.

@@ -186,6 +182,10 @@ def run(self, rawRef, calexpRef, templateIds=None, reuse=[]):
- differencer : output of `config.differencer.run` (`lsst.pipe.base.Struct` or `None`).
- associator : output of `config.associator.run` (`lsst.pipe.base.Struct` or `None`).
"""
if reuse is None:
reuse = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and fix. Though you should document the None choice in the docstring.

# Work around mismatched HDU lists for raw and processed data
cleanId = original.dataId.copy()
if 'hdu' in cleanId:
del cleanId['hdu']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method shouldn't be necessary: dataId munging like this shouldn't be needed at all . How does processCcd manage the case where 'hdu' is in the dataId?

Copy link
Member Author

@kfindeisen kfindeisen May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't do anything special (including when writing calexps or backgrounds).

As for the function as a whole being unnecessary, I think you're right. The original purpose (creating datarefs for calexp templates) would be better done using Butler.subset and a run-local butler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parejkoj After changing and testing, I think I originally introduced _siblingRef as an optimization for the following common case:

  • The user specifies a (raw) data ID on the command line that restricts both visit and ccd, and
  • The user specifies only a visit for --templateId, as specified by ImageDifferenceTask.

In fact, GetCalexpAsTemplateTask does exactly the same thing as _siblingRef: it takes the primary dataRef and overwrites only its visit field.

Anyway, I can replace the calls to _siblingRef with butler.subset, but this will mean that ap_pipe will always reduce an entire visit for --templateId even when only a few CCDs are needed (or when broken chips have been excluded from rawRef).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof. This sort of dataId munging is probably going to break with Gen3 butler. I guess we can leave it as is and deal with it when the new API is available.

Please file a ticket about how to handle the "dataRef and templateId" question with gen3 butler (and note the similar uses, e.g. GetCalexpAsTemplateTask), and reference it in _siblingRef's docstring (probably in Notes?). I think we need much better guidance on this sort of thing, but getting that guidance is at some point in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure all butler usage will break in version 3.

Where exactly do you want me to file a ticket? For which component, to do what?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The various manipulations of dataIds are the particular culprit here.

File it against ap_pipe, ip_diffim, and daf_butler. A title something like "pipeline management of dataRefs for data and templates". Does that make sense?

@kfindeisen kfindeisen merged commit d1eaeb9 into master May 14, 2018
@kfindeisen kfindeisen deleted the tickets/DM-14359 branch February 25, 2019 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants