Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for mgcv #130

Closed
mjskay opened this issue Sep 4, 2018 · 17 comments
Closed

Add support for mgcv #130

mjskay opened this issue Sep 4, 2018 · 17 comments

Comments

@mjskay
Copy link
Owner

mjskay commented Sep 4, 2018

Looks like gratia::simulate might be a direction for supporting mgcv in tidybayes::fitted_draws. Still todo: find the pieces to support tidy_draws and predicted_draws as well.

cc @gavinsimpson @noamross @alexpghayes

@noamross
Copy link

noamross commented Sep 4, 2018

I think implementing those pieces in gratia and importing them here makes sense. For instance, https://github.com/gavinsimpson/gratia/blob/master/R/simulate-methods.R#L50-L51, is basically the fitted_draws and just needs to be abstracted out.

@mjskay
Copy link
Owner Author

mjskay commented Sep 4, 2018

Yeah, that makes sense. That is about the right level of abstraction based on how brms and rstanarm support works --- tidybayes just takes the output of brms::fitted / rstanarm::posterior_linpred (for fitted_draws) or brms::predict / rstanarm::posterior_predict (for predicted_draws) and turns it into a long-format data frame.

@gavinsimpson
Copy link

I have realised that simulate() is pre-my tidy ephiphany --- it returns a matrix, the horror! But that's doable.

@mjskay
Copy link
Owner Author

mjskay commented Sep 5, 2018

You're in good company, brms and rstanarm also return matrices! I'm happy for tidybayes to handle the matrix -> long data frame part if gratia generates the matrix form for both the posterior and the posterior predictive.

@alexpghayes
Copy link

alexpghayes commented Sep 5, 2018 via email

@mjskay
Copy link
Owner Author

mjskay commented Sep 5, 2018

I would not object! Makes my life easier. :)

@gavinsimpson, if you are interested in having the output be compatible with tidybayes, the columns output by fitted_draws and predicted_draws are described here. That naming scheme was a result of extended discussion with the Stan developers, and I like to think we came to a pretty good consensus.

Mean time I'll wait on gratia for this and then wrap whatever you decide to do in tidybayes::fitted_draws/tidybayes::predicted_draws for the purpose of a consistent interface in tidybayes (@gavinsimpson if you open an issue in the gratia repo for this, can you @ me? thanks).

@gavinsimpson
Copy link

gavinsimpson commented Sep 19, 2019

Hi @mjskay @alexpghayes - following up on this.

I've been working on this from the mgcv/gratia side. I now have

  • smooth_samples() for posterior draws of entire smooth functions
  • fitted_samples() --> tidybayes::fitted_draws()
  • predicted_samples() --> tidybayes::predicted_draws()

(I'll add a linpred_samples() to map to tidybayes::linpred_draws(), just haven't got round to adding that yet)

The examples here show this in action: https://gavinsimpson.github.io/gratia/reference/predicted_samples.html

I opened an issue on gratia gavinsimpson/gratia#52 to note that I need to convert to tidybayes objects as I followed a different naming convention. As that issue notes, I thought of adding as_tidybayes() generic and methods to gratia to do the conversion to match the objects tidybayes, but perhaps that's not needed or perhaps some of that code should be in tidybayes not gratia? I'm not sure.

So, just to confirm; seems like @alexpghayes was favouring having the conversion code in gratia, but not sure if that was just going to a data frame or if he preferred gratia to have code needed to return tidybayes compliant tibbles directly. The as_tidybayes() functions would do that for those foo_samples() methods I mentioned.

Another issue is the smooth_samples(); I don't see that as fitting in with any of the tidybayes posterior draw functions at the moment (correct me if I'm wrong) as those functions essentially refer to the response (in different ways and with differing levels of sampling variation). In sample_smooths() we're getting draws from functions. Is there something similar in tidybayes that I can look at that I missed?

@alexpghayes
Copy link

alexpghayes commented Sep 22, 2019

So, just to confirm; seems like @alexpghayes was favouring having the conversion code in gratia, but not sure if that was just going to a data frame or if he preferred gratia to have code needed to return tidybayes compliant tibbles directly. The as_tidybayes() functions would do that for those foo_samples() methods I mentioned.

I would just keep it all in gratia, including the code to go to a data frame. No need to force users to load two packages when all the functionality lives in one. Also, name-wise, it may be worth considering something slightly different than as_tidybayes(), which implies a conversion to an object of class tidybayes (rather a tibble with different naming standards).

@mjskay
Copy link
Owner Author

mjskay commented Sep 26, 2019

Thanks for coming back around to this @gavinsimpson!

Some thoughts:

  • I agree with @alexpghayes re: conversion function names; I avoided as_... for the same reason. The convention I adopted in tidybayes for these function names is to_{package}_names() and from_{package}_names(), which I think is particularly nice when combined with :: notation: e.g. you get tidybayes::to_broom_names() or tidybayes::from_broom_names() (see here). So if these functions live in gratia you would have gratia::to_tidybayes_names() / gratia::from_tidybayes_names(); if they're in tidybayes we'd have tidybayes::from_gratia_names()/tidybayes::to_gratia_names(). I don't have a strong preference; gratia` might make the most sense for where to put them? I suppose they could also be put in both without any conflict given that the names are not the same.

  • I'm fine with ..._draws() living in gratia, but wherever they live I would like to make sure the interfaces line up as precisely as possible. One thing I've tried to be very careful about is ensuring that fitted_draws / predicted_draws have as close as possible to the same interface for every model type, which includes renaming arguments and modifying some argument values. E.g. you have scale = c("response", "linear_predictor") in fitted_samples(), which in fitted_draws is scale = c("response", "linear"); this lines up with how brms does it but is translated into different arguments for rstanarm. Some of this standardization is documented in Standardize function arguments where possible #70, but admittedly I could do a much better job of this. One advantage to the ..._draws() forms living in tidybayes is it makes it potentially easier to ensure consistency in argument usage, which I think can be important from a user perspective. On the other hand, if they live in gratia that would incentivize me to better document all of the consistency-related design decisions of the API so that it is easier for other packages to follow the interface. Either of these is fine with me as long as consistency is maintained: for users, I think API consistency is a bigger concern than having an additional dependency (I think dependencies are a bigger concern for package developers than users, unless you're talking about packages that are actually hard to install---which describes neither gratia nor tidybayes). I would be curious how broom has handled the problem of ensuring that tidying functions for similar object types don't have divergent interfaces (@alexpghayes?).

  • One wrinkle in all of this is the question of what happens when both packages are loaded simultaneously. predicted_samples() and fitted_samples() are also functions in tidybayes, but they are deprecated: I had previously used these names and switched to the ..._draws() forms a few versions back when I did an overhaul of function and argument names for the purposes of making things more consistent. The old function names will throw a deprecation warning and then translate arguments in the form needed for ..._draws() (and translate output back on the way out into the old format returned by ..._samples()).

@alexpghayes
Copy link

I would be curious how broom has handled the problem of ensuring that tidying functions for similar object types don't have divergent interfaces (@alexpghayes?).

Is the concern here that *_draws() will behave differently for gam objects and other objects? The broom solution was to write a detailed specification, and to test to ensure the specification was met.

@gavinsimpson
Copy link

Thanks @alexpghayes @mjskay. Some follow-up thoughts/comments/responses.

I was trying to avoid having gratia have any dependency on tidybayes. I have no issue with tidybayes, but I didn't want to become reliant on another package largely because I'm finding it difficult enough at the moment juggling life and work. So I was trying to have a way that tidybayes could hook into gratia for the GAM/mgcv functionality. I assumed you would make the ..._samples() defunct in the next version of tidybayes so any issue is likely to be limited.

I guess I envisaged tidybayes::xxx_draws.gam() being a wrapper around the gratia::xxx_samples() functions.

Thinking about this having read your comments, it would appear that I favour having tidybayes::as_tidybayes_names.foo() and tidybayes::XXX_draws.gam() in your package (I'll happily write a first pass and do a PR), that way should you change the interface of xxx_draws() you can easily adapt these functions rather than needing me to make changes to gratia and get those released to CRAN.

  1. Good point re as_tidybayes() implying something that isn't. Agree that as_tidybayes_names() is better and I like the from_ to_() idea. So I guess if you're happy to have the wrapper functions in tidybayes, that these should be tidybayes::from_gratia_names()/tidybayes::to_gratia_names().
  2. I preferred "linear_predictor" as it is more explicit about the scale and it's how I constant refer to that when teaching as "link" scale (sensu predict.gam()'s type = 'link') is confusing to my students. But as these are going to need xxx_draws() as wrappers then this sort of interface inconsistency can be worked around seamlessly in the wrapper.
  3. I would suggest then that tidybayes import the xxx_samples() functions from gratia and call them with an explicit namespace via gratia::xxx_samples() in the xxx_draws.gam() wrappers.

Would that work for everyone?

If so I can do a first pass on the xxx_draws() methods for classes of GAM objects in mgcv, do a PR to tidybayes and then iterate from there?

@mjskay
Copy link
Owner Author

mjskay commented Sep 26, 2019

@gavinsimpson that sounds like a good approach to me!

@alexpghayes Right, the concern I had was that either the output format or the arguments will be inconsistent across implementations (mostly the arguments; I think the output format is straightforward). I imagine that it is easier to ensure consistency when the various functions all live in the same package, as in broom, and was trying to adopt a similar model for tidybayes. I suspect that trying to keep those interfaces consistent when the implementations are spread across many packages is very difficult; the inconsistencies in base-R generic function interfaces across packages is one example of this.

I see the multiple-packages-necessary-for-users argument as a little less concerning when I try to imagine use cases for ..._draws() over ...samples(). I'm imagining at least two different possible users:

  • a user who wants to make use of the ..._draws() form who is already familiar with it from tidybayes, or who wants to use it with other tidybayes functions that assume its particular column names. Such a user would already have tidybayes installed, so putting those functions in tidybayes does not negatively affect them.
  • a user who hasn't used tidybayes but wants a tidy format because it is useful in itself. Such a user could just use the ..._samples() form in gratia and need never install tidybayes.

This is why I think @gavinsimpson's proposal is a good path forward.

@gavinsimpson
Copy link

The size of the set of packages that tidybayes might want to operate on is much smaller than those that broom might want to operate on, but I do think it is worth considering at this stage whether you really want functionality for potentially all of these packages (and the inherent issues that a huge number of potential Imports/dependencies brings) to live in tidybayes?

With broom we're seeing (perhaps, at least this is my view) the impact of this (it's a huge package with many moving parts now and is probably a nightmare to maintain and have package authors contribute as they need to install a huge amount of add-ons to work on their little bit) and the move to having separate broom-related packages (like the mixed models one).

Is this something you are concerned about, want to think about now? In which case I could prepare a tidybayes_gam pkg containing the wrappers we discussed, but which adheres to the tidybayes standards. It would be easier for now for me to add code to tidybayes and iterate on it until you are happy to include it in tidybayes, but at some point this approach might get unwieldy.

@mjskay
Copy link
Owner Author

mjskay commented Sep 26, 2019

Ah good point, I hadn't thought about that potential maintenance issue (maybe I wasn't thinking ambitiously enough? :) ) Maybe a tidybayes.gam makes the most sense then.

Is the convention . or _ for such package names? I fell like I've only seen ., but that may also have been only from packages that use . in variable names.

@gavinsimpson
Copy link

Re naming conventions; I'm not sure, but in the broom context it is broom.mixed so let's follow them.

OK; I'll set up a repo on GH for tidybayes.gam and get some functionality working and then get back to you for input/comment.

@mjskay
Copy link
Owner Author

mjskay commented Sep 26, 2019

Great, thanks!

@mjskay
Copy link
Owner Author

mjskay commented Jul 12, 2021

I think this is all covered within {gratia} now, no? Closing this unless something comes up.

@mjskay mjskay closed this as completed Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants