New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-14359: Fix data ID handling in ap_* #19
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to say something about your __call__
being an almost direct copy of the one in TaskRunner
, but it looks like you've already got tickets for that.
_siblingRef()
worries me: you shouldn't have to do that at all.
@@ -186,6 +182,10 @@ def run(self, rawRef, calexpRef, templateIds=None, reuse=[]): | |||
- differencer : output of `config.differencer.run` (`lsst.pipe.base.Struct` or `None`). | |||
- associator : output of `config.associator.run` (`lsst.pipe.base.Struct` or `None`). | |||
""" | |||
if reuse is None: | |||
reuse = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch and fix. Though you should document the None choice in the docstring.
python/lsst/ap/pipe/ap_pipe.py
Outdated
# Work around mismatched HDU lists for raw and processed data | ||
cleanId = original.dataId.copy() | ||
if 'hdu' in cleanId: | ||
del cleanId['hdu'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method shouldn't be necessary: dataId
munging like this shouldn't be needed at all . How does processCcd manage the case where 'hdu' is in the dataId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't do anything special (including when writing calexps or backgrounds).
As for the function as a whole being unnecessary, I think you're right. The original purpose (creating datarefs for calexp templates) would be better done using Butler.subset
and a run
-local butler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parejkoj After changing and testing, I think I originally introduced _siblingRef
as an optimization for the following common case:
- The user specifies a (raw) data ID on the command line that restricts both visit and ccd, and
- The user specifies only a visit for
--templateId
, as specified byImageDifferenceTask
.
In fact, GetCalexpAsTemplateTask
does exactly the same thing as _siblingRef
: it takes the primary dataRef and overwrites only its visit field.
Anyway, I can replace the calls to _siblingRef
with butler.subset
, but this will mean that ap_pipe
will always reduce an entire visit for --templateId
even when only a few CCDs are needed (or when broken chips have been excluded from rawRef
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof. This sort of dataId munging is probably going to break with Gen3 butler. I guess we can leave it as is and deal with it when the new API is available.
Please file a ticket about how to handle the "dataRef and templateId" question with gen3 butler (and note the similar uses, e.g. GetCalexpAsTemplateTask
), and reference it in _siblingRef
's docstring (probably in Notes
?). I think we need much better guidance on this sort of thing, but getting that guidance is at some point in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure all butler usage will break in version 3.
Where exactly do you want me to file a ticket? For which component, to do what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The various manipulations of dataIds are the particular culprit here.
File it against ap_pipe, ip_diffim, and daf_butler. A title something like "pipeline management of dataRefs for data and templates". Does that make sense?
Having two dataRefs in the ApPipeTask API makes a lot of supporting code more difficult and more complicated than it needs to be.
58565c2
to
d1eaeb9
Compare
This PR fixes several bugs in how datarefs were passed to
ApPipeTask
. It also updatesap_pipe
to current style standards and removes use offuture
.