support of brms regression models #89

larmarange · 2021-01-27T20:21:13Z

add unit tests
update documentation & vignettes
update NEWS

and basic support for crr() models

larmarange · 2021-01-27T20:35:08Z

I have added basic support of brmrs model.

It still needs to be tested with different cases. I had to deal with the fact that there is no terms method for such a model or model.matrix method.

Please have a look in particular to the folowing files:

R/model_get_model_matrix.R
R/model_get_terms.R

In addition, sd__* or cor__* terms are currently as intercept in var_type. What should be the appropriate type?

library(broom.helpers)
library(broom.mixed)
#> Registered S3 method overwritten by 'broom.mixed':
#>   method      from 
#>   tidy.gamlss broom
library(dplyr)
#> 
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
load(system.file("extdata", "brms_example.rda", package="broom.mixed"))
brms_crossedRE %>% tidy_plus_plus(intercept = T) %>% knitr::kable()

term	variable	var_label	var_class	var_type	var_nlevels	contrasts	contrasts_type	reference_row	label	n_obs	effect	component	group	estimate	std.error	conf.low	conf.high
(Intercept)	(Intercept)	(Intercept)	NA	intercept	NA	NA	NA	NA	(Intercept)	32	fixed	cond	NA	33.8852996	3.8076391	26.2799792	40.4710956
wt	wt	wt	numeric	continuous	NA	NA	NA	NA	wt	32	fixed	cond	NA	-3.9117729	1.3853041	-6.2748412	-1.1740918
sd__(Intercept)	sd__(Intercept)	sd__(Intercept)	NA	intercept	NA	NA	NA	NA	sd__(Intercept)	NA	ran_pars	cond	cyl	4.5724777	3.6413002	0.3504339	14.6291128
sd__(Intercept)	sd__(Intercept)	sd__(Intercept)	NA	intercept	NA	NA	NA	NA	sd__(Intercept)	NA	ran_pars	cond	gear	4.5715387	3.2147052	0.0637144	12.8430223
sd__wt	sd__wt	sd__wt	NA	intercept	NA	NA	NA	NA	sd__wt	NA	ran_pars	cond	gear	1.7124580	1.3191685	0.1206113	4.9200716
cor__(Intercept).wt	cor__(Intercept).wt	cor__(Intercept).wt	NA	intercept	NA	NA	NA	NA	cor__(Intercept).wt	NA	ran_pars	cond	gear	-0.2871626	0.5833413	-0.9630521	0.8859865
sd__Observation	sd__Observation	sd__Observation	NA	intercept	NA	NA	NA	NA	sd__Observation	NA	ran_pars	cond	Residual	2.5889769	0.4143306	2.0324740	3.6253873

library(brms)
#> Le chargement a nécessité le package : Rcpp
#> Loading 'brms' package (version 2.14.4). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attachement du package : 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
mtcars <- mtcars %>% mutate(gear = gear %>% factor, vs = vs %>% factor, carb = carb %>% factor)
contrasts(mtcars$vs) <- "contr.sum"
model_brms_2 <- brm(mpg ~ (1 | gear) + hp + vs + carb, data = mtcars, backend = "cmdstanr", refresh = 0)
#> Compiling Stan program...
#> Start sampling
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 finished in 0.9 seconds.
#> Chain 2 finished in 0.9 seconds.
#> Chain 3 finished in 0.9 seconds.
#> Chain 4 finished in 0.9 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.9 seconds.
#> Total execution time: 4.0 seconds.
#> 
#> Warning: 4 of 4000 (0.0%) transitions ended with a divergence.
#> This may indicate insufficient exploration of the posterior distribution.
#> Possible remedies include: 
#>   * Increasing adapt_delta closer to 1 (default is 0.8) 
#>   * Reparameterizing the model (e.g. using a non-centered parameterization)
#>   * Using informative or weakly informative prior distributions
model_brms_2 %>% tidy_plus_plus(intercept = T) %>% knitr::kable()

term	variable	var_label	var_class	var_type	var_nlevels	contrasts	contrasts_type	reference_row	label	n_obs	effect	component	group	estimate	std.error	conf.low	conf.high
(Intercept)	(Intercept)	(Intercept)	NA	intercept	NA	NA	NA	NA	(Intercept)	32	fixed	cond	NA	29.4504172	3.7296890	21.7064200	36.7770375
hp	hp	hp	numeric	continuous	NA	NA	NA	NA	hp	32	fixed	cond	NA	-0.0393540	0.0224428	-0.0852282	0.0036887
vs1	vs	vs	factor	dichotomous	2	contr.sum	sum	FALSE	0	18	fixed	cond	NA	0.1308886	1.0230560	-1.9260285	2.0488853
vs2	vs	vs	factor	dichotomous	2	contr.sum	sum	TRUE	1	14	NA	NA	NA	-0.1401940	NA	NA	NA
carb1	carb	carb	factor	categorical	6	contr.treatment	treatment	TRUE	1	7	NA	NA	NA	0.0000000	NA	NA	NA
carb2	carb	carb	factor	categorical	6	contr.treatment	treatment	FALSE	2	10	fixed	cond	NA	-2.3439493	1.9225791	-5.9479403	1.5653123
carb3	carb	carb	factor	categorical	6	contr.treatment	treatment	FALSE	3	3	fixed	cond	NA	-3.0203249	2.8991606	-8.4971475	2.8255470
carb4	carb	carb	factor	categorical	6	contr.treatment	treatment	FALSE	4	10	fixed	cond	NA	-5.6717046	2.8845651	-11.0020925	0.1474700
carb6	carb	carb	factor	categorical	6	contr.treatment	treatment	FALSE	6	1	fixed	cond	NA	-6.2901211	4.6691185	-15.1743125	3.0509023
carb8	carb	carb	factor	categorical	6	contr.treatment	treatment	FALSE	8	1	fixed	cond	NA	-4.7348576	6.7370994	-17.5422225	8.4991105
sd__(Intercept)	sd__(Intercept)	sd__(Intercept)	NA	intercept	NA	NA	NA	NA	sd__(Intercept)	NA	ran_pars	cond	gear	4.5361121	2.5209174	1.2628375	10.9923625
sd__Observation	sd__Observation	sd__Observation	NA	intercept	NA	NA	NA	NA	sd__Observation	NA	ran_pars	cond	Residual	3.1846172	0.5090278	2.3806148	4.3333210

^{Created on 2021-01-27 by the reprex package (v0.3.0)}

storopoli · 2021-01-27T21:25:40Z

Regarding:

getting the model matrix by brms::standata() it seems OK because that is the appropriate function that brms tell you to use to get data as a list from brms model and X is indeed the model matrix.
getting terms using brms::brmsterms(resp_rhs_all = TRUE) seems a perfect approach because models in brms can have multiple response variables (multivariate models). Also indeed purrr::pluck("allvars") seems to get all variables from the formula.
sd__(Intercept) is the standard deviation (maybe median absolute deviation, since most of the estimates in bmrs are based on the median) for the group-level intercept. Since the formula has one intercept for each one of the gear (factor) (1 | gear). brms will compute a population-level intercept (Intercept) and the sd for all the group-level intercepts sd__(Intercept).
sd__Observation is the model residuals (epsilon random error term)
I took a look at the brms_crossedRE$formula to understand cor__(Intercept).wt and the formula is mpg ~ wt + (1 | cyl) + (1 + wt | gear) . My guess is, since we have two group-level intercepts (one per gear and other per cyl), brms will insert a correlation term for the intercepts. But that is far from my bayesian expertise.
Here is the full summary for brms_crossedRE:

Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: mpg ~ wt + (1 | cyl) + (1 + wt | gear) 
   Data: mtcars (Number of observations: 32) 
Samples: 2 chains, each with iter = 100; warmup = 50; thin = 1;
         total post-warmup samples = 100

Group-Level Effects: 
~cyl (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     4.57      3.64     0.35    14.63 1.12       13       50

~gear (Number of levels: 3) 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)         4.57      3.21     0.06    12.84 1.00       43       48
sd(wt)                1.71      1.32     0.12     4.92 1.03       27       77
cor(Intercept,wt)    -0.29      0.58    -0.96     0.89 1.00       45       61

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    33.89      3.81    26.28    40.47 1.04       54       31
wt           -3.91      1.39    -6.27    -1.17 1.03       27       60

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     2.59      0.41     2.03     3.63 1.01      118       77

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

fix #88

storopoli

I think those files are OK.

ddsjoberg · 2021-01-28T02:11:56Z

@larmarange thank you kindly for updating to accommodate brms models. I see that it took some re-structuring and adding new S3 functions.

storopoli · 2021-02-01T09:59:05Z

Hey guys, I don't know if you are aware but {broom.mixed} has nice tidiers for brms models. Here is a very complex one with random intercepts and slopes

library(brms)
fit  <- brm(bf(mpg ~ 1 + (1 + hp | cyl), decomp = "QR"), data = mtcars, refresh = 0) 
broom.mixed::tidy(fit, conf.int = TRUE)

r$> broom.mixed::tidy(fit, conf.int = TRUE)                                                                                                                                                      
Registered S3 method overwritten by 'broom.mixed':
  method      from 
  tidy.gamlss broom
# A tibble: 5 x 8
  effect   component group    term                estimate std.error conf.low conf.high
  <chr>    <chr>     <chr>    <chr>                  <dbl>     <dbl>    <dbl>     <dbl>
1 fixed    cond      NA       (Intercept)          20.8       3.52   13.9        28.0  
2 ran_pars cond      cyl      sd__(Intercept)       7.09      4.12    1.21       17.5  
3 ran_pars cond      cyl      sd__hp                0.0440    0.0360  0.00184     0.143
4 ran_pars cond      cyl      cor__(Intercept).hp  -0.221     0.554  -0.976       0.881
5 ran_pars cond      Residual sd__Observation       3.27      0.480   2.48        4.33

PS: Sorry reprex is not working well in my environment.

larmarange · 2021-02-01T12:09:28Z

Dear @storopoli broom.mixed::tidy is by default with mixed models.

See also #90 about better management of mixed models in general.

new var_type "ran_pars"

larmarange · 2021-02-01T14:49:08Z

@ddsjoberg R CMD check fails for R3.5 and R3.4. It says that "brms" is not available, although installed through historical version of CRAN.

Any idea?

ddsjoberg · 2021-02-02T02:21:19Z

i'll take a look tomorrow. I may post a bunch of commits to test FYI

larmarange · 2021-02-02T12:44:06Z

@ddsjoberg R CMD check fails for R3.5 and R3.4. It says that "brms" is not available, although installed through historical version of CRAN.

Any idea?

Maybe an option could be to not force suggested packages for R 3.4 and 3.5?

ddsjoberg · 2021-02-02T14:31:48Z

Maybe an option could be to not force suggested packages for R 3.4 and 3.5?

Do you mean not do checks on 3.5 and 3.4, or is there a way to skip the suggested packages?

I looked at 3.4 and it seems to be an issue installing rstan. Either of the options above seem reasonable

larmarange · 2021-02-02T15:00:41Z

In that case, it would be only a partial check for R 3.4 (i.e. for R3.4, we will skip the tests relative to brms). But I have to try to see if it works.

I think you may find that R Cmd check will fail if every package in Suggests cannot be installed (even if the checks using that package are skipped when it's not installed).

larmarange · 2021-02-02T16:07:12Z

OK with the change all checks passed.

Please note that for R 3.4, some tests are skipped due to the fact that brms is not available. However, I guess that it is acceptable for R 3.4 and R 3.5 to skip corresponding tests if some packages are not available. For R release, oldrel and devel, we still force all suggested packages to pass the tests.

ddsjoberg · 2021-02-02T16:12:56Z

That is amazing!
I can't wait to look at the code to see how you did it!

larmarange · 2021-02-02T16:16:01Z

I have separated checks in 2 workflows. For R 3.4 and R 3.5 I have added _R_CHECK_FORCE_SUGGESTS_: false when calling rcmdcheck.

      - name: Check
        env:
          _R_CHECK_CRAN_INCOMING_REMOTE_: false
          _R_CHECK_FORCE_SUGGESTS_: false
        run: rcmdcheck::rcmdcheck(args = c("--no-manual", "--as-cran"), error_on = "warning", check_dir = "check")
        shell: Rscript {0}

fix #91

codecov · 2021-02-02T16:30:15Z

Codecov Report

Merging #89 (45f483d) into master (4d44eec) will decrease coverage by 0.13%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #89      +/-   ##
==========================================
- Coverage   99.85%   99.71%   -0.14%     
==========================================
  Files          33       34       +1     
  Lines        1375     1414      +39     
==========================================
+ Hits         1373     1410      +37     
- Misses          2        4       +2

Impacted Files	Coverage Δ
R/model_get_xlevels.R	`100.00% <ø> (ø)`
R/model_compute_terms_contributions.R	`96.82% <100.00%> (-3.18%)`	⬇️
R/model_get_model_matrix.R	`100.00% <100.00%> (ø)`
R/model_get_terms.R	`100.00% <100.00%> (ø)`
R/model_identify_variables.R	`100.00% <100.00%> (ø)`
R/model_list_variables.R	`100.00% <100.00%> (ø)`
R/select_helpers.R	`100.00% <100.00%> (ø)`
R/tidy_add_header_rows.R	`100.00% <100.00%> (ø)`
R/tidy_add_term_labels.R	`100.00% <100.00%> (ø)`
R/tidy_add_variable_labels.R	`100.00% <100.00%> (ø)`
... and 3 more

larmarange added 2 commits January 27, 2021 13:49

model_get_terms()

e3cf5f0

basic support of brms

818e661

new var_type "ran_pars"

48bbba4

fix #88

storopoli reviewed Jan 27, 2021

View reviewed changes

take into account variable column in x to populate var_label

838b137

larmarange added 2 commits January 28, 2021 12:49

random parameters: do not populate variable col

a67178e

test all_ran_pars()

3e54035

larmarange added 11 commits February 1, 2021 13:19

ran_vals

f8f4828

do not apply tidy_add_header_rows() to random parameters

df72ed8

test with all_ran_vals()

41c2f37

update NEWS

bbd48ac

Merge pull request #90 from larmarange/ran_pars

c8d00f3

new var_type "ran_pars"

model_get_terms()

847df08

basic support of brms

ad4fa01

Merge branch 'brms' of github.com:larmarange/broom.helpers into brms

92152dc

historical version of brms for R 3.4

2821ff6

historical version of brms for R 3.5

f132fcc

ouppppppps

14f023b

more generic

2fa5a51

larmarange mentioned this pull request Feb 1, 2021

Support crr models in tidy_plus_plus #91

Closed

larmarange added 2 commits February 2, 2021 13:46

add test for brms::brm

a71b5c1

update NEWS and vignette

8d90367

larmarange added 6 commits February 2, 2021 16:05

try to separate R CMD check for R 4 & R 3

4d716a6

force suggests FALSE

bd98e01

avoid mutate_at()

f2b9527

more internal checks

55b6278

:: instead of :::

f93c4ad

skip in tests

7dba30b

basic support of cmprsk::crr() models

45f483d

fix #91

larmarange merged commit eaa823f into master Feb 2, 2021

larmarange deleted the brms branch December 6, 2021 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support of brms regression models #89

support of brms regression models #89

larmarange commented Jan 27, 2021 •

edited

larmarange commented Jan 27, 2021

storopoli commented Jan 27, 2021

storopoli left a comment

ddsjoberg commented Jan 28, 2021

storopoli commented Feb 1, 2021

larmarange commented Feb 1, 2021

larmarange commented Feb 1, 2021

ddsjoberg commented Feb 2, 2021

larmarange commented Feb 2, 2021

ddsjoberg commented Feb 2, 2021

larmarange commented Feb 2, 2021 •

edited by ddsjoberg

larmarange commented Feb 2, 2021

ddsjoberg commented Feb 2, 2021

larmarange commented Feb 2, 2021 •

edited

codecov bot commented Feb 2, 2021

support of brms regression models #89

support of brms regression models #89

Conversation

larmarange commented Jan 27, 2021 • edited

larmarange commented Jan 27, 2021

storopoli commented Jan 27, 2021

storopoli left a comment

Choose a reason for hiding this comment

ddsjoberg commented Jan 28, 2021

storopoli commented Feb 1, 2021

larmarange commented Feb 1, 2021

larmarange commented Feb 1, 2021

ddsjoberg commented Feb 2, 2021

larmarange commented Feb 2, 2021

ddsjoberg commented Feb 2, 2021

larmarange commented Feb 2, 2021 • edited by ddsjoberg

larmarange commented Feb 2, 2021

ddsjoberg commented Feb 2, 2021

larmarange commented Feb 2, 2021 • edited

codecov bot commented Feb 2, 2021

Codecov Report

larmarange commented Jan 27, 2021 •

edited

larmarange commented Feb 2, 2021 •

edited by ddsjoberg

larmarange commented Feb 2, 2021 •

edited