New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-13055 Reject NaN centroid sigmas #98
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple minor comments; nothing critical.
My only big-picture critique is that these SourceSelectors seem quite rigid, and I think the proximate cause is that not enough of these tests are configurable - perhaps this new test should be, at least, so code that doesn't care about centroid uncertainty will see no change in behavior?
I said "proximate" because I think ultimate cause of this rigidity is that Python polymorphism and pex_config are just not an ideal way to express a parameterized filter expression on tabular data (compared to, say, a user-specified SQL where clause). Of course, fixing that would require a lot more work on our table infrastructure, even if it's "just" finding a way to move to Pandas.
@@ -122,12 +124,16 @@ def _hasCentroid_vector(self, sourceCat): | |||
"""Return True for each source that has a valid centroid""" | |||
return np.isfinite(sourceCat.get(self.centroidXKey)) \ | |||
& np.isfinite(sourceCat.get(self.centroidYKey)) \ | |||
& np.isfinite(sourceCat.get(self.centroidXSigmaKey)) \ | |||
& np.isfinite(sourceCat.get(self.centroidYSigmaKey)) \ | |||
& ~sourceCat.get(self.centroidFlagKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always been a little uncomfortable with using binary &
as a replacement for numpy.logical_and
(I'd probably use logical_and.reduce
here); we're sure nothing in this expression could cause one of the arguments to be promoted from bool
to int
and cause all the logic to go awry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never really had an issue with the binary &
for np
expressions, but I can convert to using np.logical_and
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't know of anyone having trouble either, so maybe I'm just paranoid. Just thought I'd bring it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried some tests, and it looks like np.logical_and.reduce()
and np.bitwise_and.reduce()
is significantly slower than chained &
:
...
In [27]: aa=np.random.random(1000000) > .5
In [28]: %timeit np.logical_and.reduce((xx,yy,zz,aa))
669 µs ± 47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [29]: %timeit xx & yy & zz & aa
182 µs ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [35]: %timeit np.bitwise_and.reduce((xx,yy,zz,aa))
651 µs ± 9.46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So, I'd rather go with &
, even though your concern isn't crazy.
return np.all(np.isfinite(centroid)) and not source.getCentroidFlag() | ||
centroidErr = source.getCentroidErr() | ||
return np.all(np.isfinite(centroid)) and np.all(np.isfinite(centroidErr)) \ | ||
and not source.getCentroidFlag() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.all(np.isfinite(centroid))
should always be unnecessary. If you really want to check, you could assert
on it - it should be finite even when source.getCentroidFlag()
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I guess I can turn the centroid finiteness checks into asserts, but I'm going to argue in an RFC shortly that the errors should also be finite for unflagged sources.
|
~np.isfinite(sourceCat.get(self.centroidYKey))) & | ||
~sourceCat.get(self.centroidFlagKey)) | ||
assert ~centroid_nonfinite_but_not_flagged().any(), \ | ||
"Centroids not finite for %d unflagged sources." % (centroid_nonfinite_but_not_flagged().sum()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can actually just do
assert np.isfinite(sourceCat.get(self.centroidXKey)).all() and np.isfinite(sourceCat.get(self.centroidYKey)).all()
In other words, centroid values should never be NaN, even when the flag is set. And that's why I wanted to make that an assert: if a centroid algorithm disobeys that rule, it's a serious bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm obligated to remind you that snake case isn't allowed :(
centroid = source.getCentroid() | ||
return np.all(np.isfinite(centroid)) and not source.getCentroidFlag() | ||
assert np.all(np.isfinite(source.getCentroid())) and not source.getCentroidFlag(), \ | ||
'Centroid not finite for unflagged source: %s'%source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above: no need to check for centroid flag in this assert.
Currently, there is no guarantee that a valid centroid_flag implies | ||
a finite variance.""" | ||
add_good_source(self.src, 1) | ||
add_good_source(self.src, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More snake case. Looks like a lot of it predates these changes; maybe file a ticket to fix that later?
assert on nonfinite centroids, flagged or not.
fff379f
to
9d36d91
Compare
No description provided.