Update to emmeans support: dpar = "mean" #993

rvlenth · 2020-09-01T22:20:43Z

This PR adds the option dpar = "mean" to the support methods for the emmeans package. When emmeans() or other package function is called with dpar = "mean", we obtain the expectation of the posterior predictive distribution at each grid point, rather than one of the model parameters.

The implementation has two basic parts:

In recover_data.brmsfit, we combine all the predictors involved in modeling all fixed-effect parameters, so that
the reference grid includes reference levels for all these predictors.
In emm_basis.brmsfit, the posterior sample (post.beta slot) is obtained using posterior_epred().
The bhat and V slots are the column means and covariance matrix, and the X matrix is the identity.

Note that I also added a detail to the documentation, and extendsd the example that was in place.
That example serves as a good illustration for dpar = "mean" because the family is lognormal, making for
a stark difference between the estimated mu parameter and the posterior_epred values of
exp(mu + sigma^2/2).

add number of chains to emmeans method

paul-buerkner · 2020-09-02T06:27:59Z

Thank you a lot! That was quick. :-D

I moved the PR to be merged with the new emmeans-mean branch I just created so that I can make changes to the PR before merging into master. I will take a detailed look this week.

nahorp · 2020-09-08T05:50:15Z

Thanks @rvlenth for working on this so quickly! :) I have a couple of questions,

Note that I also added a detail to the documentation, and extendsd the example that was in place.
That example serves as a good illustration for dpar = "mean" because the family is lognormal, making for
a stark difference between the estimated mu parameter and the posterior_epred values of
exp(mu + sigma^2/2).

Where can I find this example?

Similar to dpar = "mean", theoretically dpar = "variance" or dpar = "sd" could be calculated too based on the mu and sigma parameters (continuing with the lognormal example)?

rvlenth · 2020-09-08T14:01:28Z

Look in the help file for `emm_basis.brmsfit` in the patched version. Previously, dpar = "mu" and dpar = "sigma" were already supported to obtain estimates of those parameters. Note that in the lognormal case, "mu" is the mean of log(Y), whereas "mean" is the mean of Y.

…

Sent from my iPad On Sep 8, 2020, at 12:50 AM, Rohan Puri <notifications@github.com> wrote: Thanks @rvlenth<https://github.com/rvlenth> for working on this so quickly! :) I have a couple of questions, 1. Note that I also added a detail to the documentation, and extendsd the example that was in place. That example serves as a good illustration for dpar = "mean" because the family is lognormal, making for a stark difference between the estimated mu parameter and the posterior_epred values of exp(mu + sigma^2/2). Where can I find this example? 1. Similar to dpar = "mean", theoretically dpar = "variance" or dpar = "sd" could be calculated too based on the mu and sigma parameters (continuing with the lognormal example)? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#993 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGMJPL2ZHX522SEJAT7KLLLSEXA2LANCNFSM4QSJOIZQ>.

nahorp · 2020-09-08T21:24:24Z

Note that in the lognormal case, "mu" is the mean of log(Y), whereas "mean" is the mean of Y.

I see... and in the lognormal case, the "mean" is calculated as exp(mu + sigma^2/2), correct?

Similarly, if "sigma" is the standard deviation of log(Y), could dpar = "sd" give the standard deviation of Y? According to this website, that would be (exp(sigma^2) - 1) * mean^2 for a lognormal distribution...

rvlenth · 2020-09-08T21:44:39Z

No -- dpar = "sd" gets you the SD of log(Y). It would take some extra coding to produce estimates of the SD of Y. In general, 'dpar' is used to specify a MODEL parameter to estimate, and is an argument in predict() as well as emm_basis.brmsfit. dpar = "mean" is only allowed in the latter, and is handled as a special case. It appears possible that can cause confusion, and if so, something besides dpar should be chosen as the argument name when one wants the results of posterior_epred(). Russ

…

Sent from my iPad On Sep 8, 2020, at 4:24 PM, Rohan Puri <notifications@github.com> wrote: Note that in the lognormal case, "mu" is the mean of log(Y), whereas "mean" is the mean of Y. I see... and in the lognormal case, the "mean" is calculated as exp(mu + sigma^2/2), correct? Similarly, if "sigma" is the standard deviation of log(Y), could dpar = "sd" give the standard deviation of Y? According to this website<https://brilliant.org/wiki/log-normal-distribution/#properties-of-the-log-normal-distribution>, that would be (exp(sigma^2) - 1) * mean^2 for a lognormal distribution... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#993 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGMJPL5RXF72ZMGQP7GNGW3SE2OJNANCNFSM4QSJOIZQ>.

nahorp · 2020-09-09T00:49:36Z

It would take some extra coding to produce estimates of the SD of Y.

Indeed, that is what I meant to say when I suggested dpar = "sd" or dpar = "variance" (apologies for the inarticulation). I understand the dpar argument in regards to it being a distributional parameter of the model and I agree that a different argument might be better suited for results of posterior_epred() in the long run.

paul-buerkner · 2020-09-21T08:01:59Z

Thank you again for this PR! It took me some time to work on this as I was on holiday. I merged it to a branch of brms and will make some changes before merging into master. I have one question @rvlenth with regard to your code:

preds <- all.vars(.extract_par_terms(object, resp, NULL, nlpar)$fe)
for (pf in object$formula$pforms)
    preds <- union(preds, all.vars(pf[-2]))
bterms <- list(fe = reformulate(preds))

The for loop over all formulas seems to be dangerous territory for me, especially all.vars(pf[-2]). This extracts all variables (on the RHS of the formula) and puts it into preds. Accordingly, it may contain a lot of variables of non-yet supported terms, for example, splines or GPs. Further, as it does extract variables without preprocessing by brms, it may even contain variables that are not even relating to any data (but to parameters or pure symbols used for other purposes by brms). Can you explain what the desired behavior should be (that is, what variables you want to extract here) so that I can change the code accordingly?

rvlenth · 2020-09-21T18:34:41Z

I'm a little unclear on whether this has been resolved or not. But what is needed in this loop is to make sure we include all the variables that will be needed later in posterior_epred(), in the section of emm_basis.brmsfit() that has the same conditions. If there are more things here than are needed, then I think that means that you need to use something besides dpar and nlpar to select what model params are in play. Note that, previous to this PR, the emm_basis.brmsfit() allowed the user to specify resp, dpar, and nlpar without any restrictions. So if there are restrictions on what the user should specify, those restrictions need to be imposed in both recover_data.brmsfit() and emm_basis.brmsfit().

paul-buerkner · 2020-09-22T07:26:57Z

Ok, thank you. I will change the code so that all data variables are included and see how things work out and then report back.

paul-buerkner · 2020-09-22T12:17:05Z

Thanks. I think I fixed it now.

rvlenth · 2020-09-22T12:36:38Z

Paul, We do need all the variables required for predicting the location, scale, and shape parts of the model. But I should have made it clearer that it could be a problem to have other variables NOT needed for that, because the software creates a grid of reference values based on those variables. FYI, that can be observed via emmeans::ref_grid(model) Russ

…

Sent from my iPad On Sep 22, 2020, at 2:27 AM, Paul-Christian Bürkner <notifications@github.com> wrote: Ok, thank you. I will change the code so that all data variables are included and see how things work out and then report back. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#993 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGMJPL6ZGDMCT5HDTH6IFITSHBGVDANCNFSM4QSJOIZQ>.

paul-buerkner · 2020-09-22T12:59:39Z

Which variables, for example, would not be required for location, scale and shape parts?

rvlenth · 2020-09-22T13:14:26Z

I don't know. Maybe I misunderstood something you said earlier about looping over formulas, about additional parameters and unsupported features. My intent in looping over those formulas was to obtain the variables involved in the fixed-effects models for those different features. For example, the location model may involve Treatment and Rainfall, and the scale model may involve Treatment and Location. We need to make sure we have identified all three variables.

…

Sent from my iPad On Sep 22, 2020, at 7:59 AM, Paul-Christian Bürkner <notifications@github.com> wrote: Which variables, for example, would not be required for location, scale and shape parts? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#993 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGMJPLZXYQ6O5CISJI45F63SHCNUVANCNFSM4QSJOIZQ>.

paul-buerkner · 2020-09-22T13:23:27Z

I see. yeah my response earlier was quite cryptic :-D I hope thinks should be ok right now but I am happy to adjust the code if problems occur.

singmann and others added 5 commits August 25, 2020 13:24

add number of chains to emmeans method

8016266

Merge pull request paul-buerkner#989 from singmann/master

6fb36b4

add number of chains to emmeans method

add reference

4628906

Merge branch 'master' of https://github.com/paul-buerkner/brms

c520a8c

Update to emmeans support: dpar = "mean"

81a24d9

paul-buerkner changed the base branch from master to emmeans-mean September 2, 2020 06:25

paul-buerkner merged commit c1ad1fc into paul-buerkner:emmeans-mean Sep 21, 2020

This was referenced Sep 21, 2020

Support dpar = "mean" in emmeans #1005

Closed

Support dpar = "mean" in emmeans #1006

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to emmeans support: dpar = "mean" #993

Update to emmeans support: dpar = "mean" #993

rvlenth commented Sep 1, 2020

paul-buerkner commented Sep 2, 2020

nahorp commented Sep 8, 2020

rvlenth commented Sep 8, 2020 via email

nahorp commented Sep 8, 2020

rvlenth commented Sep 8, 2020 via email

nahorp commented Sep 9, 2020 •

edited

Loading

paul-buerkner commented Sep 21, 2020

rvlenth commented Sep 21, 2020

paul-buerkner commented Sep 22, 2020

paul-buerkner commented Sep 22, 2020

rvlenth commented Sep 22, 2020 via email

paul-buerkner commented Sep 22, 2020

rvlenth commented Sep 22, 2020 via email

paul-buerkner commented Sep 22, 2020

Update to emmeans support: dpar = "mean" #993

Update to emmeans support: dpar = "mean" #993

Conversation

rvlenth commented Sep 1, 2020

paul-buerkner commented Sep 2, 2020

nahorp commented Sep 8, 2020

rvlenth commented Sep 8, 2020 via email

nahorp commented Sep 8, 2020

rvlenth commented Sep 8, 2020 via email

nahorp commented Sep 9, 2020 • edited Loading

paul-buerkner commented Sep 21, 2020

rvlenth commented Sep 21, 2020

paul-buerkner commented Sep 22, 2020

paul-buerkner commented Sep 22, 2020

rvlenth commented Sep 22, 2020 via email

paul-buerkner commented Sep 22, 2020

rvlenth commented Sep 22, 2020 via email

paul-buerkner commented Sep 22, 2020

nahorp commented Sep 9, 2020 •

edited

Loading