New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-43516: Pass ObservationalIdentifiers to SingleCellCoadd #47
Conversation
4884921
to
7402797
Compare
since all kwargs get treated as part of the dataId.
7408787
to
c1f3b87
Compare
for ccd_row in warp.getInfo().getCoaddInputs().ccds: | ||
if ccd_row.contains(cell_centers_sky[cellInfo.index]): | ||
break | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious to know which of the two patterns is preferable. The former allows for certain assumptions, which had been useful in identifying existing bugs (that are getting fixed in DM-43515). The latter seems slightly more efficient in that the loop can break immediately after the corresponding detector has been found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there are very few detectors that could potentially overlap a warp, I think the first one's efficiency would be fine.
Or, rather, the big inefficiency here is that we're doing per-cell AST WCS calls, and we know from experience with our current coadds that that can be very slow. Ideally we would:
- Make a Numpy array of cell centers in sky coordinates outside the loop over warps.
- For each detector in each warp's
coaddInputs.ccds
, transform the sky cell centers to pixel coordinates (but outside of a loop over cells; this needs to be a vectorizedskyToPixel
call). - Compare those detector-coordinate positions to the detector bounding boxes.
- To guard against WCSs that extrapolate really badly, make sure the cell-center positions on detectors that match round-trip back to the sky cell-center.
ExposureCatalog.subsetContaining
and ExposureRecord.contains
were written before we switched to AST and they're just fundamentally the wrong order of operations for dealing with its horrific non-vectorized performance.
That will probably be a fair bit uglier than what you have, but I think it's still the right thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a Numpy array of cell centers in sky coordinates outside the loop over warps.
OK, we do this now but as a dict
instead of an array.
Compare those detector-coordinate positions to the detector bounding boxes.
We're going to have to do comparisons with polygons, since detectors aren't within rectangles anymore. I'll have to check that those comparisons are efficient as well.
But I'm going to create another ticket to improve the efficiency, since we are still very much limited by PSF evaluation per-cell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since detectors aren't within rectangles anymore.
That's why I suggested going all the way back to detector coordinates. But doing this in polygons on the sky or in coadd coordinates may well be more efficient if it avoids the reverse-transform to check for bad extrapolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I misunderstood the coordinate system in your comment at first.
for ccd_row in warp.getInfo().getCoaddInputs().ccds: | ||
if ccd_row.contains(cell_centers_sky[cellInfo.index]): | ||
break | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there are very few detectors that could potentially overlap a warp, I think the first one's efficiency would be fine.
Or, rather, the big inefficiency here is that we're doing per-cell AST WCS calls, and we know from experience with our current coadds that that can be very slow. Ideally we would:
- Make a Numpy array of cell centers in sky coordinates outside the loop over warps.
- For each detector in each warp's
coaddInputs.ccds
, transform the sky cell centers to pixel coordinates (but outside of a loop over cells; this needs to be a vectorizedskyToPixel
call). - Compare those detector-coordinate positions to the detector bounding boxes.
- To guard against WCSs that extrapolate really badly, make sure the cell-center positions on detectors that match round-trip back to the sky cell-center.
ExposureCatalog.subsetContaining
and ExposureRecord.contains
were written before we switched to AST and they're just fundamentally the wrong order of operations for dealing with its horrific non-vectorized performance.
That will probably be a fair bit uglier than what you have, but I think it's still the right thing to do.
@@ -359,7 +403,7 @@ def run(self, inputWarps, skyInfo, **kwargs): | |||
outer=image_planes, | |||
psf=cell_coadd_psf.computeKernelImage(cell_coadd_psf.getAveragePosition()), | |||
inner_bbox=cellInfo.inner_bbox, | |||
inputs=None, # TODO | |||
inputs=frozenset(observation_identifiers_gc[cellInfo.index]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use a frozenset
or set
here we need to make sure we sort on save to avoid making the persisted order dependent on salted hashing. Might be better to remove duplicates then sort it and make a tuple
right here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we really don't want to be using getAveragePosition
two lines above. That is the average position of all the stars that went into the PSF model, and it's not necessarily going to be anywhere near this cell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PSF coaddition, including dropping the use of getAveragePosition
, is coming next in DM-43515.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care about the ordering here when persisting. I can see if we try to compare if two files are identical without even opening them, but is that something we expect to support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can make it all the way to full support, but I think it's good practice to minimize unnecessary differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, in the other PR, I added a __lt__
operator for ObservationIdentifiers
and sort the inputs in the __init__
method of SingleCellCoadd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ObservationIdentifiers
are sorted in the order of packed
.
No description provided.