Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-30490: Add magnitude outlier rejection to AstrometryTask. #144

Merged
merged 3 commits into from Jun 16, 2021

Conversation

erykoff
Copy link
Contributor

@erykoff erykoff commented Jun 9, 2021

No description provided.

# rejecting objects in zero-noise tests.
zpSigma = np.clip(1.4826*np.median(np.abs(deltaMag[goodDelta] - zp)), 1e-3, None)

self.log.debug(f"Rough zeropoint from astrometry matches is {zp:.4f} +/- {zpSigma:.4f}.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a fly-by comment...I'd actually like to see this as a log.info since it could be very useful to know in case things went horribly "wrong" with this zeropoint estimate (i.e. most likely something amiss with the source catalog itself). We should normally have a "decent" estimate of what a photometric zeropoint should be, so having this logged here might help reveal seriously non-photometric visits (or something else funky going on).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that should be something we have to extract from the logs: that's a number that we should be recording as a metric (much like the astrometric scatter from the fit), so that we can readily access and plot it.

Copy link
Contributor

@parejkoj parejkoj Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the lazy-logging formatter here, so that the string doesn't have to be evaluated unless it is logged, e.g. log.debug("Rough zeropoint from astrometry matches is %.4f +/- %.4f.", zp, zpSigma)

Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some suggestions on logging and using numpy/scipy for nan-free calculations, and a comment on the tests.

I think you can squash the "nan protection" commit into the main one, since they both apply to the same method.

python/lsst/meas/astrom/astrometry.py Show resolved Hide resolved
# Use median absolute deviation (MAD) for zpSigma
# Also require a minimum scatter to prevent floating-point errors from
# rejecting objects in zero-noise tests.
zpSigma = np.clip(1.4826*np.median(np.abs(deltaMag - zp)), 1e-3, None)
zpSigma = np.clip(1.4826*np.median(np.abs(deltaMag[goodDelta] - zp)), 1e-3, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace this whole calculation with scipy.stats.median_abs_deviation, which has a nan_policy option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above about protecting against nans being insufficient. But I can use the scipy stats version instead of rolling my own.

# rejecting objects in zero-noise tests.
zpSigma = np.clip(1.4826*np.median(np.abs(deltaMag[goodDelta] - zp)), 1e-3, None)

self.log.debug(f"Rough zeropoint from astrometry matches is {zp:.4f} +/- {zpSigma:.4f}.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that should be something we have to extract from the logs: that's a number that we should be recording as a metric (much like the astrometric scatter from the fit), so that we can readily access and plot it.

python/lsst/meas/astrom/astrometry.py Show resolved Hide resolved
python/lsst/meas/astrom/astrometry.py Show resolved Hide resolved
# rejecting objects in zero-noise tests.
zpSigma = np.clip(1.4826*np.median(np.abs(deltaMag[goodDelta] - zp)), 1e-3, None)

self.log.debug(f"Rough zeropoint from astrometry matches is {zp:.4f} +/- {zpSigma:.4f}.")
Copy link
Contributor

@parejkoj parejkoj Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the lazy-logging formatter here, so that the string doesn't have to be evaluated unless it is logged, e.g. log.debug("Rough zeropoint from astrometry matches is %.4f +/- %.4f.", zp, zpSigma)

<= self.config.magnitudeOutlierRejectionNSigma*zpSigma)
goodStars = goodDelta[goodStars]
nOutlier = nMatch - goodStars.size
self.log.info(f"Removing {nOutlier} of {nMatch} magnitude outliers.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about: info("Removed %s magnitude outliers out of %s total astrometry matches.", nOutlier, nMatch)?

I think it would also be good to have a lazy-formatted debug-level message that includes some more information about those outliers (say, the min/max sigma of the removed outliers?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the indexing, this can't easily be done with a lazy computation, and I don't think it's necessary to go into some great computations just for a debug statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you think about my reworded log message? As written, I feel like it might be a bit hard to get the meaning in the middle of a log.

python/lsst/meas/astrom/astrometry.py Show resolved Hide resolved
sourceCat=sourceCat,
exposure=self.exposure,
)
self.assertEqual(len(resultsMagOutlierRejection.matches), len(resultsRefSelect.matches))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all that this is testing is that we get the same number of matches whether we do outlier rejection or not? That doesn't seem like a useful test of the outlier rejection system itself. Could you set a tighter outlier threshold and check that some things actually got rejected? I don't know what this test data actually looks like.

There's a maxAngSep and maxPixSep calcluation above this that could be recycled (in a closure or free function, maybe?) to check the actual result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the test catalogs are all 0-magnitude, perfect. I didn't add any scatter to this, I just wanted to test here that my code ran smoothly and didn't reject anything. Adding scatter to the test reference catalogs would be another whole thing that I was hoping to avoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further review, this is necessary to do all the proper checking, and I have added some outliers to the source and reference catalogs.

python/lsst/meas/astrom/astrometry.py Outdated Show resolved Hide resolved
@erykoff erykoff force-pushed the tickets/DM-30490 branch 3 times, most recently from afecf9e to d2a7dfb Compare June 15, 2021 17:34
# Use median absolute deviation (MAD) for zpSigma
# Also require a minimum scatter to prevent floating-point errors from
# rejecting objects in zero-noise tests.
zpSigma = np.clip(1.4826*scipy.stats.median_abs_deviation(deltaMag[goodDelta]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can avoid that constant by setting scale="normal" in the median_abs_deviation call.

sourceCat=sourceCat,
exposure=self.exposure,
)
self.assertLess(len(resultsMagOutlierRejection.matches), len(resultsRefSelect.matches))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you know exactly how many sources should be rejected, since you explicitly constructed the list? If you're not going to test the actual fit result, at least be more strict about testing the rejections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatedly: I'd like the rejection test to check something close to the specific value of the nSigma config field, instead of 1000x. There's some math going on in the new method, and it would be good to know that nSigma means what a user thinks it to mean.


sourceCat[sourceInstFluxKey][unResolved[0]] = 0.0
sourceCat[sourceInstFluxKey][unResolved[1]] = -1.0
sourceCat[sourceInstFluxKey][unResolved[2]] *= 1000.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment would be good here, to note what each of these values should be testing. I think 0 and -1 are to check the nan/inf rejection math, while 1000 is to check the outlier reject level?

Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the new unittest: that's a lot better, and I like that it's a true unit test. A few more comments on it.

@@ -162,6 +164,17 @@ def doTest(self, pixelsToTanPixels, order=3):
)
self.assertLess(len(resultsRefSelect.matches), len(results.matches))

# try again, allowing magnitude outlier rejection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to keep this test--and given your new one, I don't know what point this serves--you should at least mention why you expect the same number of matches. If you could tell that setting the config actually changed the result, it might be worth keeping, but the test doesn't even tell us if the code was run (outside of looking at the coverage plot).

I think I'd advocate for restoring the code you added that made a few things outliers, and now you don't have to be careful about what kind of outlier, since all you care about is "did the outlier rejection actually run".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to leave these tests as "null" tests to just show that the code does no harm if everything is perfect. However, it's true that there's no other way to know that the code was actually run. So I'll add in the bad guys.

tests/test_astrometryTask.py Outdated Show resolved Hide resolved
python/lsst/meas/astrom/astrometry.py Outdated Show resolved Hide resolved
tests/test_astrometryTask.py Outdated Show resolved Hide resolved

# Deliberately poison some reference fluxes.
refCat['refFlux'][-1] = np.inf
refCat['refFlux'][-2] = np.nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason you're putting these (and the srcMag changes above) at the end of the array instead of the beginning? Don't need to change it, I'm just curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, I just started by saying "I'll fit the first 96" and went down that road.

@erykoff erykoff merged commit ada85d7 into master Jun 16, 2021
@erykoff erykoff deleted the tickets/DM-30490 branch June 16, 2021 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants