New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sample standard deviation for AMx. #105
Conversation
n = len(finiteEntries) | ||
if n > 1: | ||
rawRmsDist = np.std(np.array(distances)[finiteEntries]) | ||
rmsDist = n/(n-1) * rawRmsDist # Correct biased 1/n np.std() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be sqrt(n/n-1)
.
You could also just add a ddof=1
rmsDist = np.std(np.array(distances)[finiteEntries], ddof=1)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be sqrt(n/n-1).
Doh! I shouldn't code while sick. Thank you.
You could also just add a ddof=1
Much better!
I.e., use sqrt(1/(n-1)) normalization Was previously using sqrt(1/n) Add 'ddof=1' to np.std keywords to get sample std deviation.
OK. After being a bit embarrassed by not getting a one-line change right, I lightly re-factor to make at least this part testable with a doctest. Take another look. I think this is now ready. |
@yalsayyad Could you take another look at this? |
visit[obj1], ra[obj1], dec[obj1], | ||
visit[obj2], ra[obj2], dec[obj2]) | ||
if not distances: | ||
rmsDist = calcRmsDistanceForOneObject(visit, ra, dec, obj1, obj2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this runs? You're calling it with a different call signature than its definition.
|
||
Should return None | ||
>>> visit1, ra1, dec1 = [1], [10.12344], [0, 0] | ||
>>> visit2, ra2, dec2 = [1], [20.00000], [0, 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that ra and dec don't have to be the same length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you likely suspect, this was a mistake. I didn't mean to test this.
I'm not surprised that it works because matchVisitComputeDistance
iterates over visits and indexes ra
and dec
by that index, so if dec
is longer then it never accesses those extra entries.
# ddof=1 to get sample standard deviation (e.g., 1/(n-1)) | ||
return np.std(np.array(distances)[finiteEntries], ddof=1) | ||
|
||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return None
is implicit, but if you like it being obvious, it's fine.
@yalsayyad OK, I've pulled back from the refactor+testing rabbit hole and made this a simple one-line (plus two comment line changes) request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Use 1/(n-1) instead of 1/n to define standdard deviation.
Multiplies
np.std
, which is1/n
byn/n-1
.Passes Jenkins.