Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-43516: Pass ObservationalIdentifiers to SingleCellCoadd #47

Merged
merged 11 commits into from Apr 1, 2024

Conversation

arunkannawadi
Copy link
Member

No description provided.

for ccd_row in warp.getInfo().getCoaddInputs().ccds:
if ccd_row.contains(cell_centers_sky[cellInfo.index]):
break

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious to know which of the two patterns is preferable. The former allows for certain assumptions, which had been useful in identifying existing bugs (that are getting fixed in DM-43515). The latter seems slightly more efficient in that the loop can break immediately after the corresponding detector has been found.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that there are very few detectors that could potentially overlap a warp, I think the first one's efficiency would be fine.

Or, rather, the big inefficiency here is that we're doing per-cell AST WCS calls, and we know from experience with our current coadds that that can be very slow. Ideally we would:

  • Make a Numpy array of cell centers in sky coordinates outside the loop over warps.
  • For each detector in each warp's coaddInputs.ccds, transform the sky cell centers to pixel coordinates (but outside of a loop over cells; this needs to be a vectorized skyToPixel call).
  • Compare those detector-coordinate positions to the detector bounding boxes.
  • To guard against WCSs that extrapolate really badly, make sure the cell-center positions on detectors that match round-trip back to the sky cell-center.

ExposureCatalog.subsetContaining and ExposureRecord.contains were written before we switched to AST and they're just fundamentally the wrong order of operations for dealing with its horrific non-vectorized performance.

That will probably be a fair bit uglier than what you have, but I think it's still the right thing to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a Numpy array of cell centers in sky coordinates outside the loop over warps.

OK, we do this now but as a dict instead of an array.

Compare those detector-coordinate positions to the detector bounding boxes.

We're going to have to do comparisons with polygons, since detectors aren't within rectangles anymore. I'll have to check that those comparisons are efficient as well.

But I'm going to create another ticket to improve the efficiency, since we are still very much limited by PSF evaluation per-cell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since detectors aren't within rectangles anymore.

That's why I suggested going all the way back to detector coordinates. But doing this in polygons on the sky or in coadd coordinates may well be more efficient if it avoids the reverse-transform to check for bad extrapolation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misunderstood the coordinate system in your comment at first.

for ccd_row in warp.getInfo().getCoaddInputs().ccds:
if ccd_row.contains(cell_centers_sky[cellInfo.index]):
break

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that there are very few detectors that could potentially overlap a warp, I think the first one's efficiency would be fine.

Or, rather, the big inefficiency here is that we're doing per-cell AST WCS calls, and we know from experience with our current coadds that that can be very slow. Ideally we would:

  • Make a Numpy array of cell centers in sky coordinates outside the loop over warps.
  • For each detector in each warp's coaddInputs.ccds, transform the sky cell centers to pixel coordinates (but outside of a loop over cells; this needs to be a vectorized skyToPixel call).
  • Compare those detector-coordinate positions to the detector bounding boxes.
  • To guard against WCSs that extrapolate really badly, make sure the cell-center positions on detectors that match round-trip back to the sky cell-center.

ExposureCatalog.subsetContaining and ExposureRecord.contains were written before we switched to AST and they're just fundamentally the wrong order of operations for dealing with its horrific non-vectorized performance.

That will probably be a fair bit uglier than what you have, but I think it's still the right thing to do.

@@ -359,7 +403,7 @@ def run(self, inputWarps, skyInfo, **kwargs):
outer=image_planes,
psf=cell_coadd_psf.computeKernelImage(cell_coadd_psf.getAveragePosition()),
inner_bbox=cellInfo.inner_bbox,
inputs=None, # TODO
inputs=frozenset(observation_identifiers_gc[cellInfo.index]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use a frozenset or set here we need to make sure we sort on save to avoid making the persisted order dependent on salted hashing. Might be better to remove duplicates then sort it and make a tuple right here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we really don't want to be using getAveragePosition two lines above. That is the average position of all the stars that went into the PSF model, and it's not necessarily going to be anywhere near this cell.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PSF coaddition, including dropping the use of getAveragePosition, is coming next in DM-43515.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about the ordering here when persisting. I can see if we try to compare if two files are identical without even opening them, but is that something we expect to support?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can make it all the way to full support, but I think it's good practice to minimize unnecessary differences.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, in the other PR, I added a __lt__ operator for ObservationIdentifiers and sort the inputs in the __init__ method of SingleCellCoadd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ObservationIdentifiers are sorted in the order of packed.

@arunkannawadi arunkannawadi merged commit 563af15 into main Apr 1, 2024
3 checks passed
@arunkannawadi arunkannawadi deleted the tickets/DM-43516 branch April 1, 2024 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants