New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Student's versus Welch's t in emmeans package; apparent inconsistency #90
Comments
The choice of which t test is used is most definitely NOT in the design of emmeans. The For repeated-measures situations, social scientists (at least historically) seem to concentrate on the choice between the "univariate" and "multivariate" methods, depending on some assessment of sphericity or compound symmetry. The former typically involves pooled t tests. The latter could be performed using A somewhat separate matter is degrees of freedom. While the t statistic may come out the same as the Welch t statistic, the model object may or may not support the needed Satterthwaite degrees-of-freedom calculation. Most don't. I will try to alert Jonathon Love to this response and see if I can somehow add him as a participant. |
hey russell,
haha, i think we might be arguing semantics. i feel like these two sentences contradict each other :) but i get what you're saying. so @R180 has observed that emmeans when fed a 'normal' ANOVA produces t stats similar/same as a student's t-test, but when emmeans is fed an RM ANOVA, it produces t stats similar/same as a welch's t-test. @R180 felt this wasn't quite correct/consistent, so i suggested he check in with you. is this all working as expected? or does something here suggest that jamovi isn't doing something correctly? (we're both social scientists [unfortunately?], so you may find it easier to explain using that vernacular). with thanks |
Well, I don't really know what jamovi does, so don't understand the contradiction. If the user chooses the model, then the user must choose an appropriate one. If jamovi chooses it, then perhaps there needs to be clearer communication about the impact of those choices. But to @R180, the main point is that Jonathon, FWIW, I am not a social scientist, and often have trouble understanding social scientists' needs. I have been known to lock horns with social scientists over issues like post hoc power calculations (I think they are a sham). I'm a retired statistics prof, and like physical science/engineering applications the best. |
Well the thing is, I input data with wildly different variances that could
not possibly fit the equal-variance ANOVA model, but still no Welch's test
(I don't think). See below and attached.
I think that at a minimum, the output should indicate in a footnote what
type of t test was used, and what model criterion value led to the use of
that particular type. Example: "
[image: Wildly different variances.png]
* The _________________. Therefore each post hoc test employs a Welch's t."
…On Sun, Feb 17, 2019 at 4:13 PM Jonathon Love ***@***.***> wrote:
hey russell,
The choice of which *t* test is used is most definitely NOT in the design
of *emmeans*. The emmeans() function and its relatives simply use the
model that is fitted to the data, and use an appropriate test for that
model.
haha, i think we might be arguing semantics. i feel like these two
sentences contradict each other :) but i get what you're saying.
so @R180 <https://github.com/R180> has observed that emmeans when fed a
'normal' ANOVA produces *t* stats similar/same as a student's t-test, but
when emmeans is fed an RM ANOVA, it produces *t* stats similar/same as a
welch's t-test.
@R180 <https://github.com/R180> felt this wasn't quite
correct/consistent, so i suggested he check in with you.
is this all working as expected? or does something here suggest that
jamovi isn't doing something correctly? (we're both social scientists
[unfortunately?], so you may find it easier to explain using that
vernacular).
with thanks
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#90 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/APemyzUEEtkb1yUn1wTMSkGPjelTM1iAks5vOcYHgaJpZM4a_kop>
.
|
Honestly, I can't help with that. It's up to whatever process leads to developing the model. Again, emmeans() does not analyze the data. If you have wildly different variances, then it is up to you to choose an appropriate model. |
OK. So I'll assume that somewhere in jamovi, there is a model parameter value in the ANOVA routine (i.e., specification of Student's t test) that's det differently in the Repeated-Measures ANOVA routine (i.e., specification of Welch's t. Thanks. |
hi @R180, there is a practice in the social sciences, that you perform an ANOVA, if it's significant, you do post-hoc t-tests (simply on the data) to determine where the difference(s) is/are. with this approach, i don't actually think it makes sense to do the ANOVA in the first place, seeings as you're using completely different models with the post-hoc t-tests. you should just do the t-tests. we've been meaning to develop a pair-wise t-test module for jamovi for this other approach (which, we don't really think is a good approach, but hopefully can spur discussion). the beauty of the emmeans approach is that the t-tests make use of the same model as the primary test. |
no, i don't think this is true. i think the different t-values are simply emergent phenomena of the models. i think if we set the RM Model up in such a way that it provided student's t-test type results, it would no longer be an RM ANOVA. if you want to see the code, here we run the RM ANOVA: https://github.com/jamovi/jmv/blob/5ed9f64d6ea903d1c9453a75847809d656524116/R/anovarm.b.R#L43 and here we call emmeans on the model: https://github.com/jamovi/jmv/blob/5ed9f64d6ea903d1c9453a75847809d656524116/R/anovarm.b.R#L539 the t-values are determined by the model, not by a decision somewhere to use one type of t-test over another. |
Ok. I think that such behavior is problematic. We"very already established,
I think, that the choice of the type of t test is wholly inirely dependent
on what model jamovu and/or javovi choose to attempt to fit to the data,
and wholly independent of the specific characteristics if the data. So for
the package to be credible, I think there needs to be clear documentation
which model parameters govern the selection of the type if t test.
…On Sun, Feb 17, 2019, 6:21 PM Jonathon Love ***@***.*** wrote:
OK. So I'll assume that somewhere in jamovi, there is a model parameter
value in the ANOVA routine (i.e., specification of Student's t test) that's
det differently in the Repeated-Measures ANOVA routine (i.e., specification
of Welch's t.
no, i don't think this is true. i think the different t-values are simply
emergent phenomena of the models.
i think if we set the RM Model up is such a way that it provided student's
t-test type results, it would no longer be an RM ANOVA.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#90 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/APemywapvMGKdFgAWqXHeS8XX0mLcEAAks5vOePggaJpZM4a_kop>
.
|
Or to but it slightly differently, the documentation needs to present a
convincing rationale, and explanation, for the model's behavior. The model
can be capable of selections between Student's t only if it had been given
the freedom to make that choice. Do there needs to be a stated rational for
granting the model that freedom. Without such a rationale, the scientust
cannot defend the model's choices, and so cannot defend his it her use if
the model.
…On Sun, Feb 17, 2019, 6:21 PM Jonathon Love ***@***.*** wrote:
OK. So I'll assume that somewhere in jamovi, there is a model parameter
value in the ANOVA routine (i.e., specification of Student's t test) that's
det differently in the Repeated-Measures ANOVA routine (i.e., specification
of Welch's t.
no, i don't think this is true. i think the different t-values are simply
emergent phenomena of the models.
i think if we set the RM Model up is such a way that it provided student's
t-test type results, it would no longer be an RM ANOVA.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#90 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/APemywapvMGKdFgAWqXHeS8XX0mLcEAAks5vOePggaJpZM4a_kop>
.
|
Hmmmm. Well, there is a practice in statistics, whereby one fits a model, then -- before doing any testing at all, much less, post hoc comparisons -- one makes some serious efforts to determine whether the model reasonably fits the data. This process involves a lot of residual plots and such, and also perhaps some tests of particular model assumptions, such as some kind of test of variance homogeneity like Levene's test, or (in the case or repeated measures) sphericity. If one just forges ahead with a bunch of statistical tests that may or may not be appropriate, that's reckless. [Insert some random analogy with recent proceedings involving the US Government.] Statistics is serious and difficult business, and a recipe approach is as sure to fail as using a sea-level recipe for chocolate cake when high in the Andes. Any reasonable user interface would ensure that users have the opportunity to assess the success of the fitted model in explaining the data and in meeting its underlying assumptions. Additionally, while I do not personally use the afex package, I am pretty sure its aov model setups do try to perform some of these steps in their default output, and the returned object does include both the "univariate" and "multivariate" models so that the user may choose which one to use in post hoc analyses. To me, that is still too routinized, but it's better than plundering ahead without ever questioning the underlying model. In the case of jamovi, it is a serious design failure if the user does not have the opportunity to choose the model, or at least to be informed as to which one was chosen automatically. |
yes. but it's worth noting that we just fit standard models. we haven't made any decisions here.
so i suppose this is a question for @rvlenth. is there a ruleset a curious person could use to determine what type of t-test i kinda imagined the choice of t-test (and corrections?) was all pretty complicated - and i've been happy to assume with thanks |
sorry, we do provide all of that, i just took that being performed for granted. |
You can't take it for granted if the post hoc results are displayed before the user has a chance to make such determinations. |
yes, that's true. which is why we don't display the post hoc results straight away. but i feel like this is a bit of a tangent. |
(or are you talking about the pairwise t-test module? in which case, we're agreed that's a bad idea). |
Well in fact I don't know what I'm talking about, as I haven't used jamovi. So I should stay out of this and let you and Richard figure this out. |
but returning to my earlier question, if i understand correctly, richard wants to better understand the choices is there a ruleset a curious person could use to determine what type of t-test i kinda imagined the choice of t-test (and corrections?) was all pretty complicated - and i've been happy to assume with thanks |
I will say it yet again: EMMEANS() DOES NOT MAKE A CHOICE REGARDING WHICH T TEST TO USE. YOU ARE ABSOLUTELY INCORRECT TO ASSUME IT IS CHOOSING WHICH ONE TO USE. It uses the one model that it is given, and it computes t tests based on that model. If you think it is receiving more than one model and is being asked to choose between them, you are mistaken and you need to recode jamovi accordingly. If it's an object from afex, I think Henrik's driver (not mine) for those models defaults to the univariate model and there is an optional argument for specifying the multivariate one. But you (or your user) has to specify that option. |
sorry, i think we're muddled by semantics. the word 'choice' or 'choose' might be muddling us. i mean these two statements to be equivalent: "the package runs a t-test based on the model" or but either way, we're interested in what properties of the model, lead to different t-tests being performed. for example, there's the suggestion that an RM ANOVA leads to a welch's t-test being performed - i'm trying to understand why - because it doesn't seem obvious (or to determine that its not actually a welch's t-test). it sounds like we might need to ask henrik next ... but i can imagine him saying "t-tests? that's all emmeans, you'd have to ask russell" anyhow, i hope you're not too exasperated by all of this. i do try very hard to communicate clearly, but there are the odd spectacular fail. |
Yes, I am completely exasperated, because you are assuming that my software does things that it simply does not do. This is not a matter of semantics. I do not think you are understanding what I am trying to tell you -- e.g., what is meant by a statistical model. One model leads to one form of t test. If you want another form of t test, you need a different model.
There is nothing anywhere in the documentation of the emmeans package that says that different assumptions are examined and used to decide on an appropriate form of a t statistic. I don't know where you got that idea. |
Perhaps I can ask a simpler question. Do we know what formula emmeans uses to calculate the degrees of freedom for a given post-hoc t test? Or is that formula computed by some other package on which emmeans depends? -- Richard Anderson |
It uses a formula for depending on the model class. If it is
And I remind you that the model class is determined by the R code you used to fit the model. Fior example
This is just another way of saying that |
I'll be more specific with that example... Model where we assume constant error variance
Model where we assume a different error variance for each cell:
These are essentially the Welch tests; however, gls doesn't use the Satterthwaite method to get the d.f. At this point, I suggest you not use jamovi to do your analyses, as it apparently does not currently support the kind of models you need. |
Thank you. This is actually quite illuminating and goes a long way toward addressing my initial question. You indicated "These are essentially the Welch tests; however, gls doesn't use the Satterthwaite method to get the d.f. Satterthwaite, which I believe is another name for Welch's, does, typically use a unique method to calculate degrees of freedom--a method that can produce fractional rather than integer-value degrees of freedom, an that can produce d.f. values that are responsive to the variances in the cells being tested. So instead of having d.f. = 48 for each contrast, d.f. could vary across the contrasts. I have been looking at output and assuming that the post-hoc tests Student's t tests because the d.f. did not reflect the Welch-Satterwaithe method (i.e, were not responsive to variance inequality across cells and were never fractional. So the ANOVA does not use the Satterthwaite method to get the d.f. However, repeated-measures ANOVA does use the Satterthwaite method to get the d.f. And so repeated measures post hocs are responsive to unequal cell variances, and are sometimes fractional, and so does sometimes produce a different d.f. value despite n being constant. I'm starting to think that these inconsistencies (e.g., the fact that the gls routine doesn't use the Satterthwaite d.f., while some other routine, perhaps written by someone else, for repeated-measures post-hocs does use Satterthwaite d.f.) may be part of what one accepts when one uses R packages. Each package author may make choices that aren't wrong, but that also may not be consistent with choices made by other authors. And when those various packages are brought together inside a an overarching package, the inconsistencies become more noticeable. Is this what's probably happening? Or am I off base here? @rvlenth @jonathon-love |
There is indeed a lot of diversity in how R packages are implemented. Some are more carefully written than others. Some reflect the biases of their developers and may pass over certain areas that are important to others. Many serve a very general class of situations, so that specific methods such as Welch's t test become just a special case, for which details such as degrees-of-freedom methods kind of go by the wayside. This is actually true of any open-source product that accepts contributed code. Much of what you get is on the bleeding edge of new developments -- but it can be your blood! PS but don't lose sight of what's important. The d.f. are not as important as having t ratios that are correctly made. Once the d.f. exceed 15 or so, it hardly makes a difference; but the wrong t value makes a big difference. |
FWIW, I have added to the
In general, Satterthwaite d.f. requires finding the variance of the variance estimates. I do this by simulation, perturbing the response values and obtaining a gradient in the estimated covariance matrix. So the result varies a bit from one call to another. In the above, each group has 9 observations, so each mean should have 8 d.f., and the results shown are pretty close. For the differences, we can compare with what we get by other means; for example, comparing (wool AS, tension L) with (wool A, tension M), we have
and the output shows 11.4 d.f. (the P values differ because of the (approximate) Tukey adjustment). This additional feature will be in the next emmeans update. Note that it applies only to |
ah! russell this is the answer is was fishing for! i just didn't know how to ask it:
so returning to richard's remark:
but shouldn't the t stats for a model, i.e. it's been an assumption of mine that with am i correct on this russell? (or does each package construct it's own post-hoc strategy, and puts that in the model object?) with thanks |
I give up. I wrote the code and I know what it does and doesn’t do.
Jonathon, I very much appreciate what you have done to help me understand and rein-in some package-dependencies issues. But in this thread, you have fished for a misinterpretation of one verb in my latest extended answer. I have run out of ways to explain it. So I think I have to just let you continue to believe what you insist on believing, regardless of my protests. It is too bad but there is no other option.
Perhaps you have a statistics department (as opposed to a social science department) near you with a colleague who can understand the way I think, and who can sit down with you and explain what I’ve been trying to say. A model is more than its formula; it is also the underlying assumptions upon which it is based.
BTW, in social-science terms, what we are seeing here is a blazing example of confirmation bias.
|
Sorry for adding something more here after it is closed. But as the author of The question as I understand it is: What are the degrees of freedoms for repeated-measures models based on the Let us use the example data set described in more details here. It has both, between-subjects and within-subjects factors.
For the between-subjects factors, the df are easily to understand, N - k:
For repeated-measures factors, the df are not that clear. For example:
For the other factor:
As a reminder, they are based on the
This shows that this model estimates individual variances per error strata. How exactly the df are calculated from this, is not something I fully understand (even after looking at the source code. But maybe you can give a high-level overview of this, Russ? One final word, I agree that it is somewhat unfortunate that The default is
I plan to change the default in the future from the |
sorry russell, this has been hard for me too. everytime you've characterised what i think, my thoughts are "that isn't what i think", and every time you describe what is the case, my thoughts are "that is (more or less) what i think". i feel like you've seized on the word 'choose' as having a very specific meaning, when i mean it in a very general sense ... and i was hoping by pointing out that you'd used the word 'decide' that you would see we're actually on the same page here. you seem bewildered that i can think such foolish things, and i'm a bit bewildered that you can think that i think such foolish things. so it seems like the best move is to leave this discussion. anyhow, i have appreciated you taking the time, even when you were feeling frustrated. jonathon |
Henrik's remarks bring up a whole different dimension to this whole thing, having to do with mixed models and a different manifestation of the Satterthwaite df that have to do with combining errors from different sources (error strata, where at least in The challenge faced by a software developer is that, by making sophisticated methods available, there is an implicit assumption that the user understands the methods being offered and is responsible for ensuring that s/he is using it appropriately. But of course, there will always be users who don't understand what is appropriate and consequently will misuse the software -- perhaps resulting in an analysis that is markedly worse than some simpler analysis they could have done more or less correctly. Then there is the challenge of simplified user interfaces aimed at less experienced users, where we are also trusting that the design of the interface keeps users out of trouble and that the user interface correctly represents what is being offered -- but may not always fulfill that need either. And, let's admit it, people like me are part of the problem too. Many professional statisticians are a lot more interested in math than in data and scientific reasoning, and sometimes lead people down a very misleading path. So you can't trust us either. I could claim myself as an exception, but I'd be wrong in doing so because I have my own set of blind spots, and I am blind to my own blind spots. I guess what maybe I'll try to do is add something to the FAQs vignette and perhaps another illustrative example somewhere else. (Or perhaps a whole new vignette on repeated measures? No, that'd just be asking for trouble.) But I'm not going to comment further here. |
OK, I'll say one more thing. In response to Jonathon's question:
YES. Each package potentially implies a different post hoc strategy. You choose the package and modeling options that do the best job of fitting the data. Then you give that result to |
thanks russell, i see that's where i was incorrect. i'd assumed that for a given model formula, irrespective of package used to construct the model, i'd get the same post hocs. i see now the situation is more complicated than that. with thanks |
Greetings,
I recently had an exchange with Jonathan Love (developer of jamovi statistical software) about some behavior of the emmeans package, which jamovi makes use of. I submitted a request to Jonathan that jamovi enforce some consistency regarding the use of Student’s t versus some alternative-to-Student's (Welch’s?) t tests in post-hoc comparisons. Through trial and error, I've noticed that in a non-repeated-measures ANOVA, the degrees of freedom for the post-hoc tests do not change no matter how unequal the within-cell variances are. So I assume they are based on Student's t. But for repeated-measures ANOVA, the degrees of freedom for each post-hoc test does depend on the degree of inequality of variances. So I presume these are Welch's (or some other non-Student's) t test. Jonathan said that this difference in the type of t test employed is caused by the design of the emmeans package (not jamovi) and so the issue should be directed to the developer(s) of the emmeans package. So my question is, what t-test types does emmeans employ, why is there an apparent inconsistency in the type of post-hoc t test employed, and is there a way to fix the inconsistency?
The text was updated successfully, but these errors were encountered: