DM-24638: Convert TransformSourceTableTask and friends to Gen3 #426

yalsayyad · 2020-11-05T01:40:11Z

No description provided.

erykoff

There are some updates for physical_filter replacing filter that need to be done, but otherwise looks good. The default pipeline file should also be updated (this will be clear after a rebase).

erykoff · 2020-11-20T17:50:54Z

python/lsst/pipe/tasks/functors.py

+                df = parq.get(columns=self.columns)
+            elif isinstance(parq, pd.DataFrame):
+                df = parq
+            else:


What would the parq instance be here? I think it's worth making this explicit and raising on else.

Meanwhile, does the __call__ need a docstring that explicitly says what parq can be? There's a lot of datatypes, but not all possible datatypes that this supports.

The parq instance is for the Gen3 case if deferLoad=False. The inputs datatypes are pd.DataFrames after loading. and the parq.toDataFrame is the original (Gen2). I'll add inline comments.

erykoff · 2020-11-20T18:34:08Z

python/lsst/pipe/tasks/postprocess.py

+        storageClass="DataFrame",
+        dimensions=("instrument", "visit", "detector")
+    )
+


So for DM-27164 (and this will need to be rebased of course), I defined from lsst.pipe.base import connectionTypes so you can use just connectionTypes here.

It's not obvious to me that connectionTypes is more readable than pipeBase.connectionTypes when there are other pipeBase objects abounding, but sure, I'll change them all.

I don't want to makework! But we don't have any standard convention on this, and that's unpleasant.

erykoff · 2020-11-20T18:36:35Z

python/lsst/pipe/tasks/postprocess.py

+    Must be subclassed.
+    """
+    inputCatalog = pipeBase.connectionTypes.Input(
+        name="",


Same here for connectionTypes

erykoff · 2020-11-20T18:44:26Z

python/lsst/pipe/tasks/postprocess.py

+    def runQuantum(self, butlerQC, inputRefs, outputRefs):
+        inputs = butlerQC.get(inputRefs)
+        result = self.run(parq=inputs['inputCatalog'], funcs=self.funcs,
+                          dataId=outputRefs.outputCatalog.dataId)


I'm not sure if using dataId in this way is really kosher in Gen3. Specifically, filter is now physical_filter. So in this case, I think that this won't work correctly. I'm trying to follow how dataId gets used downstream, if there's anything else that needs to come from it; if it's just returning the dataId in the struct that seems like a gen2 necessity not gen3 (though I could be wrong).

What I would suggest is specifically pulling out physical_filter for gen3 (and filter for gen2), and having that as a separate keyword which seems special and necessary. And then I think the dataId can be set only in the gen2 code run. Though I may be missing one of the uses here.

Ah right, because the regular dataId depends on what you pass on the command-line. OK I changed it outputRefs.outputCatalog.dataId.full so that it'll at least be the same keys are being added to the table every time. Technically, all this info is redundant: everything will eventually be looked up with the "cccVisitId." I've been adding these columns because we don't have a corresponding ccdVisit table yet to look up the full ccd info given a ccdVisitId.

Passing an optional dataId for logging was kosher last I checked.

erykoff · 2020-11-20T18:45:04Z

python/lsst/pipe/tasks/postprocess.py

+                                      dimensions=("instrument", "visit", "detector")):
+
+    inputCatalog = pipeBase.connectionTypes.Input(
+        doc="Wide input catalog of sources produced by WriteSourceTableTask",


Same here about connectionTypes.

erykoff · 2020-11-20T18:45:16Z

python/lsst/pipe/tasks/postprocess.py

+class ConsolidateSourceTableConnections(pipeBase.PipelineTaskConnections,
+                                        dimensions=("instrument", "visit")):
+    inputCatalogs = pipeBase.connectionTypes.Input(
+        doc="Input per-detector Source Tables",


natelust

Sorry for the drive by review, just caught my eye while I was waiting for a batch job and looking at something else in my email. You might have this well handled already, just something to think about.

natelust · 2021-02-16T16:50:43Z

python/lsst/pipe/tasks/postprocess.py

+
+    def runQuantum(self, butlerQC, inputRefs, outputRefs):
+        inputs = butlerQC.get(inputRefs)
+        result = self.run(parq=inputs['inputCatalog'], funcs=self.funcs,


If self.config.functorFile is False, does that mean self.funcs is never defined? If so I think this line will raise an exception.

self.config.functorFile is optional in the sense that you should be able to instantiate the task in the notebook and run any functors you want (not necessarily from a yaml); and there's no default set of functors that make sense for the base class.

self.config.functorFile is not optional when running the subclasses as a commandlineTask or pipelineTask. I'll add an appropriate validation error message somewhere.

natelust · 2021-02-16T17:07:19Z

pipelines/_SingleFrame.yaml

@@ -3,6 +3,8 @@ tasks:
  isr: lsst.ip.isr.IsrTask
  characterizeImage: lsst.pipe.tasks.characterizeImage.CharacterizeImageTask
  calibrate: lsst.pipe.tasks.calibrate.CalibrateTask
+  writeSourceTable: lsst.pipe.tasks.postprocess.WriteSourceTableTask
+  transformSourceTable: lsst.pipe.tasks.postprocess.TransformSourceTableTask


You might want to define another subset (or put them in Eli's consolidate group). The current subset if someone says run processCcd will only run the top 3 tasks and not your two new ones (which might be what we want) but if you want a grouping that encompasses your new tasks, maybe something like:

subsets: singleFrame: subset: - isr - characterizeImage - calibrate - writeSourceTable - transformSourceTable description: Single frame processing that includes table transformations processCcd: ....

These subsets let people quickly run a select group of tasks from the command line.

That makes sense to mirror what we had before. Can subsets refer to each other? e.g.

subsets: processCcd: - isr - characterizeImage - calibrate singleFrame: subset: - processCcd - writeSourceTable - transformSourceTable description: Single frame processing that includes table transformations

Edit: They can't.

Couldn't get the DECam repo going in time to add for all cameras, so @natelust take a look at https://github.com/lsst/obs_subaru/pull/341/files

So currently this is the only place that labeled subsets cant be used as a substitute for labels. This is because at the time we didnt think it was worth adding code and complexity to track and handle cyclical definitions, to save a few lines in a (mostly) static file. If you think this is a good feature to have, we can add it.

natelust · 2021-02-16T18:04:48Z

python/lsst/pipe/tasks/postprocess.py


    def getAnalysis(self, parq, funcs=None, band=None):
        # Avoids disk access if funcs is passed
        if funcs is None:
-            funcs = self.getFunctors()
+            funcs = self.funcs


natelust · 2021-02-16T18:04:53Z

python/lsst/pipe/tasks/postprocess.py

-        funcs = CompositeFunctor.from_file(self.config.functorFile)
-        funcs.update(dict(PostprocessAnalysis._defaultFuncs))
-        return funcs
+        return self.funcs


natelust · 2021-02-22T18:13:54Z

pipelines/_Forced.yaml

@@ -8,4 +8,3 @@ subsets:
      - forcedPhotCcd
      - forcedPhotCoadd
    description: A set of tasks to run when doing forced measurements


Does this pass github checks? I though yaml lint required a new line at the end (but maybe I have that backwards)

lint DOES require a newline at the end. You had 2 newlines before: https://github.com/lsst/pipe_tasks/blob/master/pipelines/_Forced.yaml

natelust · 2021-02-22T18:16:46Z

pipelines/_SingleFrame.yaml

+      - writeSourceTable
+      - transformSourceTable
+      - consolidateSourceTable
+    description: Set of tasks for complete single frame processing. Analogous to SingleFrameDriver.


I'm not sure if we should mention SingleFrameDriver in the docs, it does not help anyone unfamiliar with pipe_drivers, and its somewhat ties this package to that, potentially after the later is retired. It's up to your judgment though. If you feel it should be there then feel free to leave it.

in preparation for running Gen3 DRP pipelines on cameras beyond obs_subaru

erykoff reviewed Nov 20, 2020

View reviewed changes

yalsayyad force-pushed the tickets/DM-24638 branch from 1c850e3 to 11bbb5b Compare November 23, 2020 03:34

yalsayyad force-pushed the tickets/DM-24638 branch from 11bbb5b to c5b67a9 Compare February 16, 2021 06:19

natelust reviewed Feb 16, 2021

View reviewed changes

yalsayyad force-pushed the tickets/DM-24638 branch 4 times, most recently from d89413f to 08ab119 Compare February 18, 2021 04:52

natelust approved these changes Feb 22, 2021

View reviewed changes

yalsayyad added 3 commits February 23, 2021 20:22

Convert to Gen3 [Write|Transform|Consolidate]SourceTableTask

4b94b26

Add Source Table conversion to SingleFrame Pipeline

3d113cc

Normalize DRP pipeline

d14aee8

in preparation for running Gen3 DRP pipelines on cameras beyond obs_subaru

yalsayyad force-pushed the tickets/DM-24638 branch from 08ab119 to d14aee8 Compare February 24, 2021 02:22

yalsayyad merged commit 0a6ae79 into master Feb 24, 2021

yalsayyad deleted the tickets/DM-24638 branch February 24, 2021 02:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-24638: Convert TransformSourceTableTask and friends to Gen3 #426

DM-24638: Convert TransformSourceTableTask and friends to Gen3 #426

yalsayyad commented Nov 5, 2020

erykoff left a comment

erykoff Nov 20, 2020

erykoff Nov 20, 2020

yalsayyad Feb 16, 2021

erykoff Nov 20, 2020

yalsayyad Feb 16, 2021

erykoff Feb 16, 2021

erykoff Nov 20, 2020

erykoff Nov 20, 2020

yalsayyad Feb 16, 2021

erykoff Nov 20, 2020

erykoff Nov 20, 2020

natelust left a comment

natelust Feb 16, 2021

yalsayyad Feb 16, 2021

natelust Feb 16, 2021

yalsayyad Feb 17, 2021 •

edited

yalsayyad Feb 17, 2021

natelust Feb 17, 2021

natelust Feb 16, 2021

natelust Feb 16, 2021

natelust Feb 22, 2021

yalsayyad Feb 22, 2021

natelust Feb 22, 2021

DM-24638: Convert TransformSourceTableTask and friends to Gen3 #426

DM-24638: Convert TransformSourceTableTask and friends to Gen3 #426

Conversation

yalsayyad commented Nov 5, 2020

erykoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

natelust left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yalsayyad Feb 17, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yalsayyad Feb 17, 2021 •

edited