Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguate terms #98

Closed
ddsjoberg opened this issue Feb 16, 2021 · 4 comments
Closed

Disambiguate terms #98

ddsjoberg opened this issue Feb 16, 2021 · 4 comments

Comments

@ddsjoberg
Copy link
Collaborator

Hi @larmarange !

Sorry to re-visit this topic so soon. I think I didn't communicate very well last time. If possible, I think the variable name should uniquely identify the terms. To do this, we would need to combine both the "term" and the "group" columns. Below are two suggestions. What do you think?

library(broom.helpers)
packageVersion("broom.helpers")
#> [1] '1.1.0.9002'

lme4::lmer(marker ~ (1|grade) + (1|response) + (death|response), gtsummary::trial) %>%
  tidy_plus_plus() %>%
  dplyr::select(term, variable, group, var_type) %>%
  dplyr::mutate(
    variable_suggested1 = paste(term, group, sep = "."),
    variable_suggested2 = glue::glue("{term}.group({group})")
  ) %>%
  knitr::kable()
#> boundary (singular) fit: see ?isSingular
#> Registered S3 method overwritten by 'broom.mixed':
#>   method      from 
#>   tidy.gamlss broom
term variable group var_type variable_suggested1 variable_suggested2
sd__(Intercept) sd__(Intercept) grade ran_pars sd__(Intercept).grade sd__(Intercept).group(grade)
sd__(Intercept) sd__(Intercept) response ran_pars sd__(Intercept).response sd__(Intercept).group(response)
sd__(Intercept) sd__(Intercept) response.1 ran_pars sd__(Intercept).response.1 sd__(Intercept).group(response.1)
cor__(Intercept).death cor__(Intercept).death response.1 ran_pars cor__(Intercept).death.response.1 cor__(Intercept).death.group(response.1)
sd__death sd__death response.1 ran_pars sd__death.response.1 sd__death.group(response.1)
sd__Observation sd__Observation Residual ran_pars sd__Observation.Residual sd__Observation.group(Residual)

Created on 2021-02-16 by the reprex package (v1.0.0)

@larmarange
Copy link
Owner

larmarange commented Feb 16, 2021

Dear @ddsjoberg

I guess you are referring to the discussion in #90 regarding terms disambiguation. Your suggestion to change "variable" but not "term" seems strange for me.

The thing is that with broom.mixed all terms are not unique, this is why the authors of broom.mixed suggests in a vignette to disambiguate terms, cf. https://cran.r-project.org/web/packages/broom.mixed/vignettes/broom_mixed_intro.html

For me, disambiguation should be done directly in the term column, to restore unicity of that col, and this is why I was suggesting to propose a new function tidy_disambiguate_terms() transforming "sd__(intercept)" into "grade.sd__(intercept)" (with a option to choose the separator and with the creation of a new column "original_term" to identify when disambiguation was performed).

At broom.helpers level, we do not know how the tibble will be used, if results will be grouped by "variable_label" or simply listed by "term".

In tidy_plus_plus() the disambiguation would of course be optional.

Such disambiguation should not be a problem for gtsummaty::tbl_regression().

@larmarange
Copy link
Owner

In broom.mixed vignette:

the categorical or continuous predictor variables that control the expected value (i.e., enter into the linear predictor for some part of the model) are called terms (term column in tidy() output); note that unlike in base broom, the term column may have duplicated values, because the same term may enter multiple model components (e.g. zero-inflated and conditional models; models for more than one response; fixed effects and random effects)

(...)

tidy(fitted_model) %>% tidyr::unite(term, group, term) will create a new term column that’s the combination of the group and term columns (which will disambiguate random-effect terms from different grouping variables); unite(term, component, term) will disambiguate conditional and zero-inflation parameters. The code below shows a slightly more complicated (but prettier) approach. (Some sort of disambiguate_terms() function could be added in a future version of the package …)

@ddsjoberg
Copy link
Collaborator Author

Gotcha! Yes, that is the original issue I was referring to!

I had read the word disambiguate differently than the authors of the vignette intended. I read it as separating a single column into multiple columns rather than the opposite! From there, I wasn't reading the rest of our convo correctly....sorry about that! 👀

I was suggesting to change variable rather than the term initially because I was thinking the term and group columns should remain exactly as they come from the broom.mixed tidier function. But I think disambiguating the term column makes a lot of sense, and I am for it!

@larmarange
Copy link
Owner

OK will prepare a PR tomorrow or the day after with a new function tidy_disambiguate_terms().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants