-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mstats.kurtosistest, and test coverage for skewtest/normaltest #3008
Conversation
denom[denom < 0] = masked | ||
if np.ma.isMaskedArray(denom): | ||
# For multi-dimensional array input | ||
denom[denom < 0] = masked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see masked
defined in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [1]: np.ma.masked
Out[1]: masked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josef-pkt: see line 45
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it finally, I had searched for masked with an additional whitespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see from reading the code when denom < 0.
However, I think this could be replaced by n < 5 or add this as additional condition for cases that should be masked.
The same lines are missing in skewtest.
Change looks good, I'm not sure the functions are good. Do they behave correctly if the number of valid observations in a column is too small? The |
@EvgeniBurovski sent me a PR to make the small value behavior identical. EDIT: PR merged to here I didn't intend to close the Statistics Review tickets on these functions yet, just fix them to not be hopelessly broken and finish up gh-2673. |
Coverage remained the same when pulling ae17bb44647dc714d28a2cc66de493ba12e27a2a on rgommers:pull-2673 into b7aa678 on scipy:master. |
I just prefer if we get functions fixed so we don't have to look at them anymore for a few years, unless it's just a quick one-line fix (which this was initially). |
@josef-pkt I agree in principle, but for me the priority should be to fix up the stats functions first and the mstats versions after that. So I fixed the obvious issues and added decent tests. If you want the full review/doc/fixes then this PR may sit here for a while. |
Deleted astype(), because it is not an attribute of int (which could be returned instead of an ndarray).
Also provide decent test coverage.
Change default of axis kw to 0 in ttest_rel. This is OK without warning because the function didn't work before anyway. Also provide basic test coverage (comparison against nonmasked version). Testing with various masked array inputs to be done. Closes scipygh-3047.
Rebased and fixed ttest issues in gh-3047 plus a bunch of other ones in the ttest functions. @josef-pkt I'd like to merge this. Can't fix every last thing about these mstats functions now but that's no good reason to not merge bug fixes that are good to go. |
The only question: is it "standard" to return nan nan for empty arrays?
Otherwise looks fine (I'm not going to look for missing pieces.) |
It depends on the function. It was already returning nan (or crashing in a few cases), so I didn't think too hard about it. But I think nan makes sense here. Empty makes sense usually for functions that return an array of the same shape as the input array. Here |
Travis error was a server glitch, can be ignored. |
I can think of both use cases, where I want nan and where I want empty. But, I don't have currently a use case for empty arrays. I think then that the masked function should return
(I don't know why std is not nan.) |
Currently masked is never returned, also not if all input is masked:
So stick to nan? |
Stick to nan. |
OK. Thanks for the review. |
Fix mstats.kurtosistest, and test coverage for skewtest/normaltest
Sorry for being late to the party. In Assume (nan, nan) is the correct result for a single empty data set, and suppose On the other hand, This is how the regular
(A separate issue is to fix the regular |
Thanks Warren. I won't forget about this one (but a bit short on time now). |
Supercedes gh-2673