New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with difference smooths and identifiability constraint #108
Comments
@dinga92 I need to look at case 1 more closely, but case 2 looks like a misunderstanding of what we mean by a difference of smooth functions. In Case 2 your functions have exactly the same shape and only differ in terms of the group means. As those group means are not involved in the smooth, we don't consider them differences. Indeed, for more complex models there is no other way to think of a difference of smooth functions. For complex models with multiple terms and parametric terms, the best we could do then is compare differences of fitted values from the model holding other covariates at fixed values, which would condition the differences on the combination of factor covariates in such a model. So, we just focus on pairs of factor-by smooths and ignore the differences in means. If you want to test that you need to focus on parametric group means, not the smooths. |
Thanks for your reply. Case 1 is also just a direct consequence of the mean being removed from the smooth function. Since the function is a bit shifted, the average is also different. I think what I expected is a difference between smooth-by-factor that also includes the factor intercept, which is of course different than just a difference between mean centered smooths alone. I know I can test the main effect but that is a bit different question. E.g.,
|
I'm not sure I follow your bullet 3. Consider the model
I don't follow the point about correcting for the average temperature? What is correcting for this? By default and in the way you are fitting these models there is no such thing as the average temperature. The model intercept is coded as the the mean of Y for the reference level of the factor with deviations between this reference level and each other level coded by the other parametric terms related to the factor. You can change the meaning of the coefficients by changing the contrasts used, but I don't know of any that would correct for the average of the response. Anyway, I'm not saying that what you want to do is not possible or desirable, it's just not what was implemented as the function was designed to focus on
? Now there is nothing in this model's model matrix that codes for the levels of |
Turns out I was massively over-thinking this, at least for the case of including group means (the complexity where there are multiple factors just cancels for any reasonable comparison because the level of the second factor you condition on is the same for both groups you are differencing.) From 0.7.3.12 with |
I think I found a problem/bug with the difference smooths computation, although maybe I am just doing something wrong.
I was trying to create difference smooths, and I noticed that the difference smooths might be quite biased due to the identifiability constraint that forces smooths to be centered at 0. In your blogpost, you wrote that this could be mitigated by including additional intercept terms into the model; however, this does not seem to solve the problem. Using the factor smooth basis, also didn't work.
I think that including the parametric term into the model doe indeed takes the different group averages into account, but this is then not taken into account when calculating the difference.
Is this a known issue, or a bug, or am I modeling it wrong?
See the code below, I hope it makes sense.
The text was updated successfully, but these errors were encountered: