DM-22277: Gen3 object tables #490

yalsayyad · 2021-03-31T06:42:12Z

No description provided.

erykoff

Some minor comments below, otherwise looks good (assuming it works with ci_hsc_gen3).

erykoff · 2021-03-31T15:10:17Z

python/lsst/pipe/tasks/postprocess.py

+            catalogs[band] = {}
+            catalogs[band]['meas'] = measDict[band]['meas']
+            catalogs[band]['forced_src'] = forcedSourceDict[band]['forced_src']
+            catalogs[band]['ref'] = inputs['inputCatalogRef']


Would this read a bit better with:

catalogs[band] = {'meas': measDict[band]['meas'], 'forced_src': forcedSourceDict[band]['forced_src'], 'ref': inputs['inputCatalogRef']}

erykoff · 2021-03-31T15:12:39Z

python/lsst/pipe/tasks/postprocess.py

+
+        dataId = butlerQC.quantum.dataId
+        parq = self.run(catalogs=catalogs, tract=dataId['tract'], patch=dataId['patch'])
+        outputs = pipeBase.Struct(outputCatalog=parq.toDataFrame())


I don't know if there's a big overhead to converting to and from parquet and dataFrames, but I think it's odd that the last thing the run method does is convert the dataframe to parquet, and the first thing this code does is convert it back to a dataframe. I know that's what the gen2 code expects, but in terms of planning for gen2 removal, I think that run should return a dataframe and runDataRef could then parquet-ify the table?

I agree. ParquetTable will disappear with along with Gen2.
Was avoiding API changes on this one, but if the reviewer advocates for API changes, well then I guess I have to :)

erykoff · 2021-03-31T15:17:18Z

python/lsst/pipe/tasks/postprocess.py

+        dimensions=("tract", "patch", "band", "skymap"),
+        storageClass="SourceCatalog",
+        name="{coaddName}Coadd_meas",
+        multiple=True


I guess there's no advantage to these being deferLoad=True because we need to hold them all in memory at the same time anyway?

That's right. We're vertically concatenating them.

erykoff approved these changes Mar 31, 2021

View reviewed changes

yalsayyad force-pushed the tickets/DM-22277 branch 4 times, most recently from ce9e814 to 789e1b7 Compare April 1, 2021 03:57

yalsayyad added 2 commits April 1, 2021 01:42

Read in only requested columns and improve band-handling

42d3d98

Port to Gen3 the Tasks to convert Object Tables

a65dc2e

yalsayyad force-pushed the tickets/DM-22277 branch from 789e1b7 to a65dc2e Compare April 1, 2021 06:42

yalsayyad merged commit 1122c24 into master Apr 1, 2021

yalsayyad deleted the tickets/DM-22277 branch April 1, 2021 06:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-22277: Gen3 object tables #490

DM-22277: Gen3 object tables #490

yalsayyad commented Mar 31, 2021

erykoff left a comment

erykoff Mar 31, 2021

erykoff Mar 31, 2021

yalsayyad Mar 31, 2021

erykoff Mar 31, 2021

yalsayyad Mar 31, 2021

DM-22277: Gen3 object tables #490

DM-22277: Gen3 object tables #490

Conversation

yalsayyad commented Mar 31, 2021

erykoff left a comment

Choose a reason for hiding this comment

erykoff Mar 31, 2021

Choose a reason for hiding this comment

erykoff Mar 31, 2021

Choose a reason for hiding this comment

yalsayyad Mar 31, 2021

Choose a reason for hiding this comment

erykoff Mar 31, 2021

Choose a reason for hiding this comment

yalsayyad Mar 31, 2021

Choose a reason for hiding this comment