Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluating parametric terms with factor by-variable #68

Open
Excidion opened this issue May 12, 2020 · 3 comments
Open

Evaluating parametric terms with factor by-variable #68

Excidion opened this issue May 12, 2020 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@Excidion
Copy link

When calling evaluate_parametric_term on a a term of two factor variables, the information on wich values belongs to which combination seems to get lost.

Imagine a model with a formula that includes y~factorvar1:factorvar2. When calling

evaluate_parametric_term(model, "factorvar1:factorvar2")

the resulting table contains no information on which row belongs to which combination of levels of the factors. The column "term" only contains the entry "factorvar1:factorvar2" in every row.

Also the table has the same number of rows as my dataset, which is not the behavior of evaluate_parametric_term or evaluate_smooth that I am used to.

Am I missing something or have I been interpreting the purpose of these functions wrong? Or is this indeed a bug?

@gavinsimpson
Copy link
Owner

There's a bug here somewhere (either by design or by problem with the code). I'll take a look. Thanks for letting me know.

@gavinsimpson gavinsimpson self-assigned this May 13, 2020
@gavinsimpson gavinsimpson added the bug Something isn't working label May 13, 2020
@gavinsimpson gavinsimpson changed the title Evaluating paremetric terms with factory by-variable Evaluating parametric terms with factor by-variable May 13, 2020
@gavinsimpson
Copy link
Owner

gavinsimpson commented May 14, 2020

@Excidion I figured out what the issue was but fixing this means you won't be able to do what you wanted with the form of interaction. R treats f1:f2 as an order 2 term even if it is the only term (beyond the constant) in the model, and I'm using the order according to R to stop evaluate_parametric_term() with an error.

You could achieve what you want using interaction(f1, f2, drop = TRUE) to create a single factor from the interaction of f1 and f2 in the data before fitting the model. R would consider that term order 1.

I'd welcome some input on what you would like to have happen if evaluate_parametric-term() were to support interaction terms. The original behaviour was to plot the partial effect of a term (as plot.gam() would and as per termplot()) and it seems that handling interactions this way isn't something that makes sense.

I could literally return the contribution to the fitted model for the f1:f2 term only and in value return

  • the usual R generated labels you see for combinations of factor levels in interactions: f1Level1:f2Level2, or
  • just the concatenate the pair of levels, one per factor into a string: Level1-Level2.

The other option, which would be a departure from what evaluate_smooth does, would be to return what are often called the estimated marginal means; i.e. return predictions for all combinations of levels in the data. This would no longer be a partial effect for a single term; I'd need to include the main effects of the factors involved in the interaction. They'd also be conditional on some values for the other terms in the model as I have to provide something for the other terms in the model to be able to use predict(). I feel there are good existing ways to do this (emmeans), however.

Thoughts on what behaviour you were expecting?

@Excidion
Copy link
Author

I am definitely more fond of returning the contributions to the fitted model. This is, at least for me, closer to some of the core aspects why i value GAM models - the intuitive interpretability of their results.

On how to handle the entries of the value column: The first bullet point seems very R-ish and more in line with how evaluate_smooth() handles this (eg. a term with s(var):fvar has s(var):f1level1 in the smooth column and an extra by_variable column).

Hope I picked up on your questions in the right way and my answers can be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants