Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new var_type "ran_pars" #90

Merged
merged 8 commits into from Feb 1, 2021
Merged

new var_type "ran_pars" #90

merged 8 commits into from Feb 1, 2021

Conversation

larmarange
Copy link
Owner

@larmarange larmarange commented Jan 27, 2021

fix #88

  • add unit tests
  • update documentation & vignettes
  • update NEWS

@ddsjoberg
Copy link
Collaborator

ddsjoberg commented Jan 28, 2021

This is a 🌟 fantastic 🌟 pull request! Thank you! I love adding the type as ran_pars...makes perfect sense.

There is a change I think that should be made. The variable column was being defined as the group variable. This could make sense for a simple random intercept model. But when we have a random slope, the variable name is now the grouping variable rather than the variable name the random slope's associated variable. In the example below, we have a fixed and random slope on stage, but the variable is assigned as grade (the grouping variable). I think the variable should be a combination of the original term and the group,.

For example, the new variable could be sd__(Intercept):group. Also, I think the non-unique term led to a duplication of a row in a merge?

library(broom.helpers)

# random intercept and slope on stage, grouped by grade
mod <- lme4::lmer(age ~ stage + (stage|grade) + (1|grade), gtsummary::trial)
#> boundary (singular) fit: see ?isSingular

tidy_plus_plus(mod, tidy_fun = broom.mixed::tidy) %>% knitr::kable()
#> Registered S3 method overwritten by 'broom.mixed':
#>   method      from 
#>   tidy.gamlss broom
term variable var_label var_class var_type var_nlevels contrasts contrasts_type reference_row label n_obs effect group estimate std.error statistic conf.low conf.high
stageT1 stage T Stage factor categorical 4 contr.treatment treatment TRUE T1 51 NA NA 0.0000000 NA NA NA NA
stageT2 stage T Stage factor categorical 4 contr.treatment treatment FALSE T2 53 fixed NA 1.3296329 2.811093 0.4729950 -4.180008 6.839274
stageT3 stage T Stage factor categorical 4 contr.treatment treatment FALSE T3 41 fixed NA 2.6255378 3.006066 0.8734131 -3.266244 8.517320
stageT4 stage T Stage factor categorical 4 contr.treatment treatment FALSE T4 44 fixed NA -2.0147062 2.948711 -0.6832498 -7.794074 3.764661
sd__(Intercept) grade Grade factor ran_pars 3 NA NA NA sd__(Intercept) NA ran_pars grade 0.0030761 NA NA NA NA
sd__(Intercept) grade.1 grade.1 NA ran_pars NA NA NA NA sd__(Intercept) NA ran_pars grade 0.0030761 NA NA NA NA
cor__(Intercept).stageT2 grade Grade factor ran_pars 3 NA NA NA cor__(Intercept).stageT2 NA ran_pars grade -1.0000000 NA NA NA NA
cor__(Intercept).stageT3 grade Grade factor ran_pars 3 NA NA NA cor__(Intercept).stageT3 NA ran_pars grade -0.9999980 NA NA NA NA
cor__(Intercept).stageT4 grade Grade factor ran_pars 3 NA NA NA cor__(Intercept).stageT4 NA ran_pars grade -0.9999727 NA NA NA NA
sd__stageT2 grade Grade factor ran_pars 3 NA NA NA sd__stageT2 NA ran_pars grade 0.0038700 NA NA NA NA
cor__stageT2.stageT3 grade Grade factor ran_pars 3 NA NA NA cor__stageT2.stageT3 NA ran_pars grade 0.9999980 NA NA NA NA
cor__stageT2.stageT4 grade Grade factor ran_pars 3 NA NA NA cor__stageT2.stageT4 NA ran_pars grade 0.9999727 NA NA NA NA
sd__stageT3 grade Grade factor ran_pars 3 NA NA NA sd__stageT3 NA ran_pars grade 0.0056111 NA NA NA NA
cor__stageT3.stageT4 grade Grade factor ran_pars 3 NA NA NA cor__stageT3.stageT4 NA ran_pars grade 0.9999651 NA NA NA NA
sd__stageT4 grade Grade factor ran_pars 3 NA NA NA sd__stageT4 NA ran_pars grade 0.0072645 NA NA NA NA
sd__(Intercept) grade Grade factor ran_pars 3 NA NA NA sd__(Intercept) NA ran_pars grade.1 0.0000000 NA NA NA NA
sd__(Intercept) grade.1 grade.1 NA ran_pars NA NA NA NA sd__(Intercept) NA ran_pars grade.1 0.0000000 NA NA NA NA
sd__Observation Residual Residual NA ran_pars NA NA NA NA sd__Observation NA ran_pars Residual 14.3311624 NA NA NA NA

Created on 2021-01-27 by the reprex package (v0.3.0)

@larmarange
Copy link
Owner Author

In your example, there is an issue because two terms are identical ("sd__(Intercept)") but for two different groups.

If I do not populate variable col with groups, then variable column will be filled with the value of term. In that specific case, it will produce an unwanted header row when add_header_rows = TRUE.

library(broom.helpers)
library(broom.mixed)
#> Registered S3 method overwritten by 'broom.mixed':
#>   method      from 
#>   tidy.gamlss broom

mod <- lme4::lmer(age ~ stage + (stage|grade) + (1|grade), gtsummary::trial)
#> boundary (singular) fit: see ?isSingular

mod %>% tidy_plus_plus() %>% knitr::kable()
term variable var_label var_class var_type var_nlevels contrasts contrasts_type reference_row label n_obs effect group estimate std.error statistic conf.low conf.high
stageT1 stage T Stage factor categorical 4 contr.treatment treatment TRUE T1 51 NA NA 0.0000000 NA NA NA NA
stageT2 stage T Stage factor categorical 4 contr.treatment treatment FALSE T2 53 fixed NA 1.3296329 2.811093 0.4729950 -4.180008 6.839274
stageT3 stage T Stage factor categorical 4 contr.treatment treatment FALSE T3 41 fixed NA 2.6255378 3.006066 0.8734131 -3.266244 8.517320
stageT4 stage T Stage factor categorical 4 contr.treatment treatment FALSE T4 44 fixed NA -2.0147062 2.948711 -0.6832498 -7.794074 3.764661
sd__(Intercept) sd__(Intercept) sd__(Intercept) NA ran_pars NA NA NA NA sd__(Intercept) NA ran_pars grade 0.0030761 NA NA NA NA
cor__(Intercept).stageT2 cor__(Intercept).stageT2 cor__(Intercept).stageT2 NA ran_pars NA NA NA NA cor__(Intercept).stageT2 NA ran_pars grade -1.0000000 NA NA NA NA
cor__(Intercept).stageT3 cor__(Intercept).stageT3 cor__(Intercept).stageT3 NA ran_pars NA NA NA NA cor__(Intercept).stageT3 NA ran_pars grade -0.9999980 NA NA NA NA
cor__(Intercept).stageT4 cor__(Intercept).stageT4 cor__(Intercept).stageT4 NA ran_pars NA NA NA NA cor__(Intercept).stageT4 NA ran_pars grade -0.9999727 NA NA NA NA
sd__stageT2 sd__stageT2 sd__stageT2 NA ran_pars NA NA NA NA sd__stageT2 NA ran_pars grade 0.0038700 NA NA NA NA
cor__stageT2.stageT3 cor__stageT2.stageT3 cor__stageT2.stageT3 NA ran_pars NA NA NA NA cor__stageT2.stageT3 NA ran_pars grade 0.9999980 NA NA NA NA
cor__stageT2.stageT4 cor__stageT2.stageT4 cor__stageT2.stageT4 NA ran_pars NA NA NA NA cor__stageT2.stageT4 NA ran_pars grade 0.9999727 NA NA NA NA
sd__stageT3 sd__stageT3 sd__stageT3 NA ran_pars NA NA NA NA sd__stageT3 NA ran_pars grade 0.0056111 NA NA NA NA
cor__stageT3.stageT4 cor__stageT3.stageT4 cor__stageT3.stageT4 NA ran_pars NA NA NA NA cor__stageT3.stageT4 NA ran_pars grade 0.9999651 NA NA NA NA
sd__stageT4 sd__stageT4 sd__stageT4 NA ran_pars NA NA NA NA sd__stageT4 NA ran_pars grade 0.0072645 NA NA NA NA
sd__(Intercept) sd__(Intercept) sd__(Intercept) NA ran_pars NA NA NA NA sd__(Intercept) NA ran_pars grade.1 0.0000000 NA NA NA NA
sd__Observation sd__Observation sd__Observation NA ran_pars NA NA NA NA sd__Observation NA ran_pars Residual 14.3311624 NA NA NA NA
mod %>% tidy_plus_plus(add_header_rows = TRUE) %>% knitr::kable()
term variable var_label var_class var_type var_nlevels header_row contrasts contrasts_type reference_row label n_obs effect group estimate std.error statistic conf.low conf.high
NA stage T Stage factor categorical 4 TRUE contr.treatment treatment NA T Stage NA NA NA NA NA NA NA NA
stageT1 stage T Stage factor categorical 4 FALSE contr.treatment treatment TRUE T1 51 NA NA 0.0000000 NA NA NA NA
stageT2 stage T Stage factor categorical 4 FALSE contr.treatment treatment FALSE T2 53 fixed NA 1.3296329 2.811093 0.4729950 -4.180008 6.839274
stageT3 stage T Stage factor categorical 4 FALSE contr.treatment treatment FALSE T3 41 fixed NA 2.6255378 3.006066 0.8734131 -3.266244 8.517320
stageT4 stage T Stage factor categorical 4 FALSE contr.treatment treatment FALSE T4 44 fixed NA -2.0147062 2.948711 -0.6832498 -7.794074 3.764661
NA sd__(Intercept) sd__(Intercept) NA ran_pars NA TRUE NA NA NA sd__(Intercept) NA NA NA NA NA NA NA NA
sd__(Intercept) sd__(Intercept) sd__(Intercept) NA ran_pars NA FALSE NA NA NA sd__(Intercept) NA ran_pars grade 0.0030761 NA NA NA NA
cor__(Intercept).stageT2 cor__(Intercept).stageT2 cor__(Intercept).stageT2 NA ran_pars NA NA NA NA NA cor__(Intercept).stageT2 NA ran_pars grade -1.0000000 NA NA NA NA
cor__(Intercept).stageT3 cor__(Intercept).stageT3 cor__(Intercept).stageT3 NA ran_pars NA NA NA NA NA cor__(Intercept).stageT3 NA ran_pars grade -0.9999980 NA NA NA NA
cor__(Intercept).stageT4 cor__(Intercept).stageT4 cor__(Intercept).stageT4 NA ran_pars NA NA NA NA NA cor__(Intercept).stageT4 NA ran_pars grade -0.9999727 NA NA NA NA
sd__stageT2 sd__stageT2 sd__stageT2 NA ran_pars NA NA NA NA NA sd__stageT2 NA ran_pars grade 0.0038700 NA NA NA NA
cor__stageT2.stageT3 cor__stageT2.stageT3 cor__stageT2.stageT3 NA ran_pars NA NA NA NA NA cor__stageT2.stageT3 NA ran_pars grade 0.9999980 NA NA NA NA
cor__stageT2.stageT4 cor__stageT2.stageT4 cor__stageT2.stageT4 NA ran_pars NA NA NA NA NA cor__stageT2.stageT4 NA ran_pars grade 0.9999727 NA NA NA NA
sd__stageT3 sd__stageT3 sd__stageT3 NA ran_pars NA NA NA NA NA sd__stageT3 NA ran_pars grade 0.0056111 NA NA NA NA
cor__stageT3.stageT4 cor__stageT3.stageT4 cor__stageT3.stageT4 NA ran_pars NA NA NA NA NA cor__stageT3.stageT4 NA ran_pars grade 0.9999651 NA NA NA NA
sd__stageT4 sd__stageT4 sd__stageT4 NA ran_pars NA NA NA NA NA sd__stageT4 NA ran_pars grade 0.0072645 NA NA NA NA
sd__(Intercept) sd__(Intercept) sd__(Intercept) NA ran_pars NA FALSE NA NA NA sd__(Intercept) NA ran_pars grade.1 0.0000000 NA NA NA NA
sd__Observation sd__Observation sd__Observation NA ran_pars NA NA NA NA NA sd__Observation NA ran_pars Residual 14.3311624 NA NA NA NA

Created on 2021-01-28 by the reprex package (v0.3.0)

Some potential solutions:

  • keep variable column empty (NA) but it could create issues later in the code and possibly in tbl_regression() and ggcoef_model()
  • populate variable with the value of group but prefixed with "Group: " (or custom text) to clearly distinct random parameters from fixed effects

@larmarange
Copy link
Owner Author

After reading https://cran.r-project.org/web/packages/broom.mixed/vignettes/broom_mixed_intro.html I'm thinking about:

  • add a var_type equal to ran_vals for random-effect values;
  • add a selector all_ran_vals()
  • add two arguments to tidy_identify_variables() : disambiguate_terms (default TRUE) & disambiguate_sep (default: "_"). When TRUE, for random effects, group value will be prefixed to term. (should we add a column with the original term?)
  • change tidy_add_header_rows to exlude ran_pars and ran_vals rows

@larmarange
Copy link
Owner Author

or maybe create a function disambiguate_terms()

@ddsjoberg
Copy link
Collaborator

After reading https://cran.r-project.org/web/packages/broom.mixed/vignettes/broom_mixed_intro.html I'm thinking about:

  • add a var_type equal to ran_vals for random-effect values;

That sounds perfect

  • add a selector all_ran_vals()

👍🏼

  • add two arguments to tidy_identify_variables() : disambiguate_terms (default TRUE) & disambiguate_sep (default: "_"). When TRUE, for random effects, group value will be prefixed to term. (should we add a column with the original term?)

I think this could be helpful, but also maybe not required. Since both the original term and the group column are returned, there would be no need to disambiguate or parse the new variable name?

  • change tidy_add_header_rows to exlude ran_pars and ran_vals rows
    👍🏼

@larmarange larmarange mentioned this pull request Feb 1, 2021
3 tasks
@codecov
Copy link

codecov bot commented Feb 1, 2021

Codecov Report

Merging #90 (41c2f37) into master (4d44eec) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #90   +/-   ##
=======================================
  Coverage   99.85%   99.85%           
=======================================
  Files          33       33           
  Lines        1375     1396   +21     
=======================================
+ Hits         1373     1394   +21     
  Misses          2        2           
Impacted Files Coverage Δ
R/select_helpers.R 100.00% <100.00%> (ø)
R/tidy_add_header_rows.R 100.00% <100.00%> (ø)
R/tidy_add_variable_labels.R 100.00% <100.00%> (ø)
R/tidy_identify_variables.R 100.00% <100.00%> (ø)

@larmarange
Copy link
Owner Author

OK. For now, let's keep tidy_disambiguate_terms() in a corner and have more time to mature that question.

The other points have been implementd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix lmer Issue with Random Effects being Dropped
2 participants