New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-30490: Add magnitude outlier rejection to AstrometryTask. #144
Conversation
# rejecting objects in zero-noise tests. | ||
zpSigma = np.clip(1.4826*np.median(np.abs(deltaMag[goodDelta] - zp)), 1e-3, None) | ||
|
||
self.log.debug(f"Rough zeropoint from astrometry matches is {zp:.4f} +/- {zpSigma:.4f}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a fly-by comment...I'd actually like to see this as a log.info
since it could be very useful to know in case things went horribly "wrong" with this zeropoint estimate (i.e. most likely something amiss with the source catalog itself). We should normally have a "decent" estimate of what a photometric zeropoint should be, so having this logged here might help reveal seriously non-photometric visits (or something else funky going on).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that should be something we have to extract from the logs: that's a number that we should be recording as a metric (much like the astrometric scatter from the fit), so that we can readily access and plot it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the lazy-logging formatter here, so that the string doesn't have to be evaluated unless it is logged, e.g. log.debug("Rough zeropoint from astrometry matches is %.4f +/- %.4f.", zp, zpSigma)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some suggestions on logging and using numpy/scipy for nan-free calculations, and a comment on the tests.
I think you can squash the "nan protection" commit into the main one, since they both apply to the same method.
# Use median absolute deviation (MAD) for zpSigma | ||
# Also require a minimum scatter to prevent floating-point errors from | ||
# rejecting objects in zero-noise tests. | ||
zpSigma = np.clip(1.4826*np.median(np.abs(deltaMag - zp)), 1e-3, None) | ||
zpSigma = np.clip(1.4826*np.median(np.abs(deltaMag[goodDelta] - zp)), 1e-3, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace this whole calculation with scipy.stats.median_abs_deviation
, which has a nan_policy
option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above about protecting against nans being insufficient. But I can use the scipy stats version instead of rolling my own.
# rejecting objects in zero-noise tests. | ||
zpSigma = np.clip(1.4826*np.median(np.abs(deltaMag[goodDelta] - zp)), 1e-3, None) | ||
|
||
self.log.debug(f"Rough zeropoint from astrometry matches is {zp:.4f} +/- {zpSigma:.4f}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that should be something we have to extract from the logs: that's a number that we should be recording as a metric (much like the astrometric scatter from the fit), so that we can readily access and plot it.
# rejecting objects in zero-noise tests. | ||
zpSigma = np.clip(1.4826*np.median(np.abs(deltaMag[goodDelta] - zp)), 1e-3, None) | ||
|
||
self.log.debug(f"Rough zeropoint from astrometry matches is {zp:.4f} +/- {zpSigma:.4f}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the lazy-logging formatter here, so that the string doesn't have to be evaluated unless it is logged, e.g. log.debug("Rough zeropoint from astrometry matches is %.4f +/- %.4f.", zp, zpSigma)
<= self.config.magnitudeOutlierRejectionNSigma*zpSigma) | ||
goodStars = goodDelta[goodStars] | ||
nOutlier = nMatch - goodStars.size | ||
self.log.info(f"Removing {nOutlier} of {nMatch} magnitude outliers.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about: info("Removed %s magnitude outliers out of %s total astrometry matches.", nOutlier, nMatch)?
I think it would also be good to have a lazy-formatted debug-level message that includes some more information about those outliers (say, the min/max sigma of the removed outliers?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the indexing, this can't easily be done with a lazy computation, and I don't think it's necessary to go into some great computations just for a debug statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What did you think about my reworded log message? As written, I feel like it might be a bit hard to get the meaning in the middle of a log.
tests/test_astrometryTask.py
Outdated
sourceCat=sourceCat, | ||
exposure=self.exposure, | ||
) | ||
self.assertEqual(len(resultsMagOutlierRejection.matches), len(resultsRefSelect.matches)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all that this is testing is that we get the same number of matches whether we do outlier rejection or not? That doesn't seem like a useful test of the outlier rejection system itself. Could you set a tighter outlier threshold and check that some things actually got rejected? I don't know what this test data actually looks like.
There's a maxAngSep
and maxPixSep
calcluation above this that could be recycled (in a closure or free function, maybe?) to check the actual result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the test catalogs are all 0-magnitude, perfect. I didn't add any scatter to this, I just wanted to test here that my code ran smoothly and didn't reject anything. Adding scatter to the test reference catalogs would be another whole thing that I was hoping to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon further review, this is necessary to do all the proper checking, and I have added some outliers to the source and reference catalogs.
afecf9e
to
d2a7dfb
Compare
# Use median absolute deviation (MAD) for zpSigma | ||
# Also require a minimum scatter to prevent floating-point errors from | ||
# rejecting objects in zero-noise tests. | ||
zpSigma = np.clip(1.4826*scipy.stats.median_abs_deviation(deltaMag[goodDelta]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can avoid that constant by setting scale="normal"
in the median_abs_deviation
call.
sourceCat=sourceCat, | ||
exposure=self.exposure, | ||
) | ||
self.assertLess(len(resultsMagOutlierRejection.matches), len(resultsRefSelect.matches)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you know exactly how many sources should be rejected, since you explicitly constructed the list? If you're not going to test the actual fit result, at least be more strict about testing the rejections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relatedly: I'd like the rejection test to check something close to the specific value of the nSigma config field, instead of 1000x
. There's some math going on in the new method, and it would be good to know that nSigma
means what a user thinks it to mean.
tests/test_astrometryTask.py
Outdated
|
||
sourceCat[sourceInstFluxKey][unResolved[0]] = 0.0 | ||
sourceCat[sourceInstFluxKey][unResolved[1]] = -1.0 | ||
sourceCat[sourceInstFluxKey][unResolved[2]] *= 1000.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment would be good here, to note what each of these values should be testing. I think 0
and -1
are to check the nan/inf rejection math, while 1000 is to check the outlier reject level?
d2a7dfb
to
b2d31d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the new unittest: that's a lot better, and I like that it's a true unit test. A few more comments on it.
@@ -162,6 +164,17 @@ def doTest(self, pixelsToTanPixels, order=3): | |||
) | |||
self.assertLess(len(resultsRefSelect.matches), len(results.matches)) | |||
|
|||
# try again, allowing magnitude outlier rejection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to keep this test--and given your new one, I don't know what point this serves--you should at least mention why you expect the same number of matches. If you could tell that setting the config actually changed the result, it might be worth keeping, but the test doesn't even tell us if the code was run (outside of looking at the coverage plot).
I think I'd advocate for restoring the code you added that made a few things outliers, and now you don't have to be careful about what kind of outlier, since all you care about is "did the outlier rejection actually run".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to leave these tests as "null" tests to just show that the code does no harm if everything is perfect. However, it's true that there's no other way to know that the code was actually run. So I'll add in the bad guys.
|
||
# Deliberately poison some reference fluxes. | ||
refCat['refFlux'][-1] = np.inf | ||
refCat['refFlux'][-2] = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason you're putting these (and the srcMag changes above) at the end of the array instead of the beginning? Don't need to change it, I'm just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea, I just started by saying "I'll fit the first 96" and went down that road.
b2d31d0
to
a5450d1
Compare
No description provided.