Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-22277: Gen3 object tables #490

Merged
merged 2 commits into from Apr 1, 2021
Merged

DM-22277: Gen3 object tables #490

merged 2 commits into from Apr 1, 2021

Conversation

yalsayyad
Copy link
Contributor

No description provided.

Copy link
Contributor

@erykoff erykoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments below, otherwise looks good (assuming it works with ci_hsc_gen3).

catalogs[band] = {}
catalogs[band]['meas'] = measDict[band]['meas']
catalogs[band]['forced_src'] = forcedSourceDict[band]['forced_src']
catalogs[band]['ref'] = inputs['inputCatalogRef']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this read a bit better with:

catalogs[band] = {'meas': measDict[band]['meas'],
                  'forced_src': forcedSourceDict[band]['forced_src'],
                  'ref': inputs['inputCatalogRef']}


dataId = butlerQC.quantum.dataId
parq = self.run(catalogs=catalogs, tract=dataId['tract'], patch=dataId['patch'])
outputs = pipeBase.Struct(outputCatalog=parq.toDataFrame())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there's a big overhead to converting to and from parquet and dataFrames, but I think it's odd that the last thing the run method does is convert the dataframe to parquet, and the first thing this code does is convert it back to a dataframe. I know that's what the gen2 code expects, but in terms of planning for gen2 removal, I think that run should return a dataframe and runDataRef could then parquet-ify the table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. ParquetTable will disappear with along with Gen2.
Was avoiding API changes on this one, but if the reviewer advocates for API changes, well then I guess I have to :)

dimensions=("tract", "patch", "band", "skymap"),
storageClass="SourceCatalog",
name="{coaddName}Coadd_meas",
multiple=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's no advantage to these being deferLoad=True because we need to hold them all in memory at the same time anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. We're vertically concatenating them.

@yalsayyad yalsayyad force-pushed the tickets/DM-22277 branch 4 times, most recently from ce9e814 to 789e1b7 Compare April 1, 2021 03:57
@yalsayyad yalsayyad merged commit 1122c24 into master Apr 1, 2021
@yalsayyad yalsayyad deleted the tickets/DM-22277 branch April 1, 2021 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants