New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: stats: add realistic examples to variance tests #17778
Conversation
[skip cirrus] [skip actions]
Co-authored-by: Jake Bowhay <60778417+j-bowhay@users.noreply.github.com>
[skip cirrus] [skip actions]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if a variance test was performed in the paper?
I'd be interested in seeing the results of the permutation version of the test. If that portion of the example belongs in the spearmanr/kendalltau, might as well follow the rest of that format here.
Then again, I'm having second thoughts about putting these in all the function documentation. The files are growing considerably, and I think they might be too lengthy (too much plotting code for the amount we use the function itself). Maybe we should move them over into a section of tutorials for hypothesis tests like we do for the distributions?
It was not done, but I've seen some blog posts (of questionable quality) using this examples for that.
Permutation version? Not sure I understand.
I think there are 2 separate things here.
I propose we discuss that next week. |
It is possible to use
Yes, you're right, there are really two issues.
|
We would get the following: def statistic(x, y, z):
return stats.levene(x, y, z).statistic
ref = stats.permutation_test((small_dose, medium_dose, large_dose), statistic,
permutation_type='pairings')
# PermutationTestResult(statistic=0.6457341109631506, pvalue=1.0, null_distribution=array([0.64573411, 0.64573411, 0.64573411, ..., 0.64573411, 0.64573411, 0.64573411])) Shall I include it then? |
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
[skip cirrus] [skip actions]
I did the update. (I noticed the colour of the hist does not corresponds to the legend for For the rest I am honestly getting lost with wording. Could you make suggestions? Thanks! |
* DOC: stats.levene: corrections * Apply suggestions from code review Co-authored-by: Matt Haberland <mhaberla@calpoly.edu> Co-authored-by: Pamphile Roy <roy.pamphile@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tupui! Yes, go ahead and add this example to the other variance tests.
[skip actions] [skip cirrus]
@mdhaber failure is not related. Depending on how close this is to be merged, I could put the fix here which is just a variable name change in a stats test which was introduced yesterday. |
Oops, I took care of that in gh-17865. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks close. If you wouldn't mind digging a bit into the conditions under which the null distributions used in the tests were derived, I'd appreciate it. (Never mind. The computational experiments don't lie.) The statement about the null distributions being an asymptotic approximation is not always true. In any case, it does not appear to be the most important thing to say for bartlett
.
scipy/stats/_morestats.py
Outdated
>>> flig_val = np.linspace(0, 8, 100) | ||
>>> pdf = dist.pdf(flig_val) | ||
>>> fig, ax = plt.subplots(figsize=(8, 5)) | ||
>>> def lev_plot(ax): # we'll re-use this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this test could be changed. In some PRs I removed the prefix and just made it plot
so that it would be easier to copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I will do this 👍
scipy/stats/_morestats.py
Outdated
Note that the chi-square distribution provides an asymptotic approximation | ||
of the null distribution; it is only accurate for samples with many |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is true here. It might be, but I'm not sure it's the thing that's worth mentioning.
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
rng = np.random.default_rng(1638083107694713882823079058616272161)
ps = []
ss = []
for i in range(10000):
samples = rng.normal(size=(5, 5))
s, p = stats.bartlett(*samples)
ss.append(s)
ps.append(p)
x = np.linspace(0, np.max(ss))
dist = stats.chi2(df=len(samples)-1)
plt.plot(x, dist.pdf(x))
plt.hist(ss, density=True, bins=50)
plt.show()
The graph matches the chi2 distribution quite well even for just a few observations per sample.
If we just change to the uniform distribution, it's a very different story:
The following might be more important:
Note that the chi-square distribution provides the null distribution when the observations are normally distributed. For samples drawn from non-normal populations, it may be more appropriate to perform a permutation test...
But it would probably be worth doing some research to confirm.
[skip actions] [skip cirrus]
Thanks Matt. Since these are just examples I propose to only write we know to be true. I don't think there is much value to gain, for the purpose of the examples, going deeper. |
scipy/stats/_morestats.py
Outdated
Note that the chi-square distribution provides an asymptotic approximation | ||
of the null distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are just examples I propose to only write we know to be true.
But we don't know that this is true. It doesn't appear to be true.
We do know that bartlett
is sensitive to non-normality. It is mentioned in the docstring.
So my suggestion was to remove the part we don't know and replace it with something we do know:
Note that the chi-square distribution provides the null distribution when the observations are normally distributed. For small samples drawn from non-normal populations, it may be more appropriate to perform a permutation test...
[skip ci] Co-authored-by: Pamphile Roy <roy.pamphile@gmail.com>
Thanks Matt! |
What does this implement/fix?
Adds a realistic example to
scipy.stats.levene
. The example is from:Additional information
When this looks good, I'll add similar examples to the other variance tests.