Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-13055 Reject NaN centroid sigmas #98

Merged
merged 1 commit into from Dec 21, 2017
Merged

DM-13055 Reject NaN centroid sigmas #98

merged 1 commit into from Dec 21, 2017

Conversation

parejkoj
Copy link
Contributor

No description provided.

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor comments; nothing critical.

My only big-picture critique is that these SourceSelectors seem quite rigid, and I think the proximate cause is that not enough of these tests are configurable - perhaps this new test should be, at least, so code that doesn't care about centroid uncertainty will see no change in behavior?

I said "proximate" because I think ultimate cause of this rigidity is that Python polymorphism and pex_config are just not an ideal way to express a parameterized filter expression on tabular data (compared to, say, a user-specified SQL where clause). Of course, fixing that would require a lot more work on our table infrastructure, even if it's "just" finding a way to move to Pandas.

@@ -122,12 +124,16 @@ def _hasCentroid_vector(self, sourceCat):
"""Return True for each source that has a valid centroid"""
return np.isfinite(sourceCat.get(self.centroidXKey)) \
& np.isfinite(sourceCat.get(self.centroidYKey)) \
& np.isfinite(sourceCat.get(self.centroidXSigmaKey)) \
& np.isfinite(sourceCat.get(self.centroidYSigmaKey)) \
& ~sourceCat.get(self.centroidFlagKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always been a little uncomfortable with using binary & as a replacement for numpy.logical_and (I'd probably use logical_and.reduce here); we're sure nothing in this expression could cause one of the arguments to be promoted from bool to int and cause all the logic to go awry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never really had an issue with the binary & for np expressions, but I can convert to using np.logical_and

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't know of anyone having trouble either, so maybe I'm just paranoid. Just thought I'd bring it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried some tests, and it looks like np.logical_and.reduce() and np.bitwise_and.reduce() is significantly slower than chained &:

...
In [27]: aa=np.random.random(1000000) > .5

In [28]: %timeit np.logical_and.reduce((xx,yy,zz,aa))
669 µs ± 47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [29]: %timeit xx & yy & zz & aa
182 µs ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [35]: %timeit np.bitwise_and.reduce((xx,yy,zz,aa))
651 µs ± 9.46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So, I'd rather go with &, even though your concern isn't crazy.

return np.all(np.isfinite(centroid)) and not source.getCentroidFlag()
centroidErr = source.getCentroidErr()
return np.all(np.isfinite(centroid)) and np.all(np.isfinite(centroidErr)) \
and not source.getCentroidFlag()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.all(np.isfinite(centroid)) should always be unnecessary. If you really want to check, you could assert on it - it should be finite even when source.getCentroidFlag() is true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess I can turn the centroid finiteness checks into asserts, but I'm going to argue in an RFC shortly that the errors should also be finite for unflagged sources.

@parejkoj
Copy link
Contributor Author

My only big-picture critique is that these SourceSelectors seem quite rigid, and I think the proximate cause is that not enough of these tests are configurable - perhaps this new test should be, at least, so code that doesn't care about centroid uncertainty will see no change in behavior?

astrometrySourceSelector is rigid by design: its job is to select sources that are useful for astrometric measurements, which has a pretty clear definition that includes "reliable centroids". I don't see how "reliable centroid" doesn't include "reliable centroid error".

~np.isfinite(sourceCat.get(self.centroidYKey))) &
~sourceCat.get(self.centroidFlagKey))
assert ~centroid_nonfinite_but_not_flagged().any(), \
"Centroids not finite for %d unflagged sources." % (centroid_nonfinite_but_not_flagged().sum())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can actually just do

assert np.isfinite(sourceCat.get(self.centroidXKey)).all() and np.isfinite(sourceCat.get(self.centroidYKey)).all()

In other words, centroid values should never be NaN, even when the flag is set. And that's why I wanted to make that an assert: if a centroid algorithm disobeys that rule, it's a serious bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm obligated to remind you that snake case isn't allowed :(

centroid = source.getCentroid()
return np.all(np.isfinite(centroid)) and not source.getCentroidFlag()
assert np.all(np.isfinite(source.getCentroid())) and not source.getCentroidFlag(), \
'Centroid not finite for unflagged source: %s'%source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: no need to check for centroid flag in this assert.

Currently, there is no guarantee that a valid centroid_flag implies
a finite variance."""
add_good_source(self.src, 1)
add_good_source(self.src, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More snake case. Looks like a lot of it predates these changes; maybe file a ticket to fix that later?

assert on nonfinite centroids, flagged or not.
@parejkoj parejkoj merged commit 9d36d91 into master Dec 21, 2017
@ktlim ktlim deleted the tickets/DM-13055 branch August 25, 2018 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants