DM-14359: Fix data ID handling in ap_* #19

kfindeisen · 2018-05-08T22:59:24Z

This PR fixes several bugs in how datarefs were passed to ApPipeTask. It also updates ap_pipe to current style standards and removes use of future.

parejkoj

I was going to say something about your __call__ being an almost direct copy of the one in TaskRunner, but it looks like you've already got tickets for that.

_siblingRef() worries me: you shouldn't have to do that at all.

parejkoj · 2018-05-08T23:14:20Z

python/lsst/ap/pipe/ap_pipe.py

@@ -186,6 +182,10 @@ def run(self, rawRef, calexpRef, templateIds=None, reuse=[]):
            - differencer : output of `config.differencer.run` (`lsst.pipe.base.Struct` or `None`).
            - associator : output of `config.associator.run` (`lsst.pipe.base.Struct` or `None`).
        """
+        if reuse is None:
+            reuse = []


Good catch and fix. Though you should document the None choice in the docstring.

parejkoj · 2018-05-08T23:16:16Z

python/lsst/ap/pipe/ap_pipe.py

+    # Work around mismatched HDU lists for raw and processed data
+    cleanId = original.dataId.copy()
+    if 'hdu' in cleanId:
+        del cleanId['hdu']


This method shouldn't be necessary: dataId munging like this shouldn't be needed at all . How does processCcd manage the case where 'hdu' is in the dataId?

It doesn't do anything special (including when writing calexps or backgrounds).

As for the function as a whole being unnecessary, I think you're right. The original purpose (creating datarefs for calexp templates) would be better done using Butler.subset and a run-local butler.

@parejkoj After changing and testing, I think I originally introduced _siblingRef as an optimization for the following common case:

The user specifies a (raw) data ID on the command line that restricts both visit and ccd, and

The user specifies only a visit for --templateId, as specified by ImageDifferenceTask.

In fact, GetCalexpAsTemplateTask does exactly the same thing as _siblingRef: it takes the primary dataRef and overwrites only its visit field.

Anyway, I can replace the calls to _siblingRef with butler.subset, but this will mean that ap_pipe will always reduce an entire visit for --templateId even when only a few CCDs are needed (or when broken chips have been excluded from rawRef).

Oof. This sort of dataId munging is probably going to break with Gen3 butler. I guess we can leave it as is and deal with it when the new API is available.

Please file a ticket about how to handle the "dataRef and templateId" question with gen3 butler (and note the similar uses, e.g. GetCalexpAsTemplateTask), and reference it in _siblingRef's docstring (probably in Notes?). I think we need much better guidance on this sort of thing, but getting that guidance is at some point in the future.

I'm pretty sure all butler usage will break in version 3.

Where exactly do you want me to file a ticket? For which component, to do what?

The various manipulations of dataIds are the particular culprit here.

File it against ap_pipe, ip_diffim, and daf_butler. A title something like "pipeline management of dataRefs for data and templates". Does that make sense?

Having two dataRefs in the ApPipeTask API makes a lot of supporting code more difficult and more complicated than it needs to be.

kfindeisen requested a review from parejkoj May 8, 2018 22:59

parejkoj reviewed May 8, 2018

View reviewed changes

kfindeisen added 6 commits May 11, 2018 15:49

Clarify purpose of _siblingRef.

0a1094a

Remove calexpRef from ApPipeTask.run.

8fef325

Having two dataRefs in the ApPipeTask API makes a lot of supporting code more difficult and more complicated than it needs to be.

Allow handling of complex dataIds.

68aa5ee

Style cleanup.

fb2b370

Remove unsafe default argument.

8bc411b

Remove Python 2 support.

d1eaeb9

kfindeisen force-pushed the tickets/DM-14359 branch from 58565c2 to d1eaeb9 Compare May 11, 2018 22:50

kfindeisen merged commit d1eaeb9 into master May 14, 2018

kfindeisen deleted the tickets/DM-14359 branch February 25, 2019 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-14359: Fix data ID handling in ap_* #19

DM-14359: Fix data ID handling in ap_* #19

kfindeisen commented May 8, 2018

parejkoj left a comment

parejkoj May 8, 2018

parejkoj May 8, 2018

kfindeisen May 9, 2018 •

edited

kfindeisen May 11, 2018 •

edited

parejkoj May 11, 2018

kfindeisen May 11, 2018 •

edited

parejkoj May 11, 2018

DM-14359: Fix data ID handling in ap_* #19

DM-14359: Fix data ID handling in ap_* #19

Conversation

kfindeisen commented May 8, 2018

parejkoj left a comment

Choose a reason for hiding this comment

parejkoj May 8, 2018

Choose a reason for hiding this comment

parejkoj May 8, 2018

Choose a reason for hiding this comment

kfindeisen May 9, 2018 • edited

Choose a reason for hiding this comment

kfindeisen May 11, 2018 • edited

Choose a reason for hiding this comment

parejkoj May 11, 2018

Choose a reason for hiding this comment

kfindeisen May 11, 2018 • edited

Choose a reason for hiding this comment

parejkoj May 11, 2018

Choose a reason for hiding this comment

kfindeisen May 9, 2018 •

edited

kfindeisen May 11, 2018 •

edited

kfindeisen May 11, 2018 •

edited