Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support of brms regression models #89

Merged
merged 27 commits into from
Feb 2, 2021
Merged

support of brms regression models #89

merged 27 commits into from
Feb 2, 2021

Conversation

larmarange
Copy link
Owner

@larmarange larmarange commented Jan 27, 2021

fix #87

  • add unit tests
  • update documentation & vignettes
  • update NEWS

and basic support for crr() models

fix #91

@larmarange
Copy link
Owner Author

@ddsjoberg @storopoli

I have added basic support of brmrs model.

It still needs to be tested with different cases. I had to deal with the fact that there is no terms method for such a model or model.matrix method.

Please have a look in particular to the folowing files:

  • R/model_get_model_matrix.R
  • R/model_get_terms.R

In addition, sd__* or cor__* terms are currently as intercept in var_type. What should be the appropriate type?

library(broom.helpers)
library(broom.mixed)
#> Registered S3 method overwritten by 'broom.mixed':
#>   method      from 
#>   tidy.gamlss broom
library(dplyr)
#> 
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
load(system.file("extdata", "brms_example.rda", package="broom.mixed"))
brms_crossedRE %>% tidy_plus_plus(intercept = T) %>% knitr::kable()
term variable var_label var_class var_type var_nlevels contrasts contrasts_type reference_row label n_obs effect component group estimate std.error conf.low conf.high
(Intercept) (Intercept) (Intercept) NA intercept NA NA NA NA (Intercept) 32 fixed cond NA 33.8852996 3.8076391 26.2799792 40.4710956
wt wt wt numeric continuous NA NA NA NA wt 32 fixed cond NA -3.9117729 1.3853041 -6.2748412 -1.1740918
sd__(Intercept) sd__(Intercept) sd__(Intercept) NA intercept NA NA NA NA sd__(Intercept) NA ran_pars cond cyl 4.5724777 3.6413002 0.3504339 14.6291128
sd__(Intercept) sd__(Intercept) sd__(Intercept) NA intercept NA NA NA NA sd__(Intercept) NA ran_pars cond gear 4.5715387 3.2147052 0.0637144 12.8430223
sd__wt sd__wt sd__wt NA intercept NA NA NA NA sd__wt NA ran_pars cond gear 1.7124580 1.3191685 0.1206113 4.9200716
cor__(Intercept).wt cor__(Intercept).wt cor__(Intercept).wt NA intercept NA NA NA NA cor__(Intercept).wt NA ran_pars cond gear -0.2871626 0.5833413 -0.9630521 0.8859865
sd__Observation sd__Observation sd__Observation NA intercept NA NA NA NA sd__Observation NA ran_pars cond Residual 2.5889769 0.4143306 2.0324740 3.6253873
library(brms)
#> Le chargement a nécessité le package : Rcpp
#> Loading 'brms' package (version 2.14.4). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attachement du package : 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
mtcars <- mtcars %>% mutate(gear = gear %>% factor, vs = vs %>% factor, carb = carb %>% factor)
contrasts(mtcars$vs) <- "contr.sum"
model_brms_2 <- brm(mpg ~ (1 | gear) + hp + vs + carb, data = mtcars, backend = "cmdstanr", refresh = 0)
#> Compiling Stan program...
#> Start sampling
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 finished in 0.9 seconds.
#> Chain 2 finished in 0.9 seconds.
#> Chain 3 finished in 0.9 seconds.
#> Chain 4 finished in 0.9 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.9 seconds.
#> Total execution time: 4.0 seconds.
#> 
#> Warning: 4 of 4000 (0.0%) transitions ended with a divergence.
#> This may indicate insufficient exploration of the posterior distribution.
#> Possible remedies include: 
#>   * Increasing adapt_delta closer to 1 (default is 0.8) 
#>   * Reparameterizing the model (e.g. using a non-centered parameterization)
#>   * Using informative or weakly informative prior distributions
model_brms_2 %>% tidy_plus_plus(intercept = T) %>% knitr::kable()
term variable var_label var_class var_type var_nlevels contrasts contrasts_type reference_row label n_obs effect component group estimate std.error conf.low conf.high
(Intercept) (Intercept) (Intercept) NA intercept NA NA NA NA (Intercept) 32 fixed cond NA 29.4504172 3.7296890 21.7064200 36.7770375
hp hp hp numeric continuous NA NA NA NA hp 32 fixed cond NA -0.0393540 0.0224428 -0.0852282 0.0036887
vs1 vs vs factor dichotomous 2 contr.sum sum FALSE 0 18 fixed cond NA 0.1308886 1.0230560 -1.9260285 2.0488853
vs2 vs vs factor dichotomous 2 contr.sum sum TRUE 1 14 NA NA NA -0.1401940 NA NA NA
carb1 carb carb factor categorical 6 contr.treatment treatment TRUE 1 7 NA NA NA 0.0000000 NA NA NA
carb2 carb carb factor categorical 6 contr.treatment treatment FALSE 2 10 fixed cond NA -2.3439493 1.9225791 -5.9479403 1.5653123
carb3 carb carb factor categorical 6 contr.treatment treatment FALSE 3 3 fixed cond NA -3.0203249 2.8991606 -8.4971475 2.8255470
carb4 carb carb factor categorical 6 contr.treatment treatment FALSE 4 10 fixed cond NA -5.6717046 2.8845651 -11.0020925 0.1474700
carb6 carb carb factor categorical 6 contr.treatment treatment FALSE 6 1 fixed cond NA -6.2901211 4.6691185 -15.1743125 3.0509023
carb8 carb carb factor categorical 6 contr.treatment treatment FALSE 8 1 fixed cond NA -4.7348576 6.7370994 -17.5422225 8.4991105
sd__(Intercept) sd__(Intercept) sd__(Intercept) NA intercept NA NA NA NA sd__(Intercept) NA ran_pars cond gear 4.5361121 2.5209174 1.2628375 10.9923625
sd__Observation sd__Observation sd__Observation NA intercept NA NA NA NA sd__Observation NA ran_pars cond Residual 3.1846172 0.5090278 2.3806148 4.3333210

Created on 2021-01-27 by the reprex package (v0.3.0)

@storopoli
Copy link

Regarding:

  1. getting the model matrix by brms::standata() it seems OK because that is the appropriate function that brms tell you to use to get data as a list from brms model and X is indeed the model matrix.

  2. getting terms using brms::brmsterms(resp_rhs_all = TRUE) seems a perfect approach because models in brms can have multiple response variables (multivariate models). Also indeed purrr::pluck("allvars") seems to get all variables from the formula.

  3. sd__(Intercept) is the standard deviation (maybe median absolute deviation, since most of the estimates in bmrs are based on the median) for the group-level intercept. Since the formula has one intercept for each one of the gear (factor) (1 | gear). brms will compute a population-level intercept (Intercept) and the sd for all the group-level intercepts sd__(Intercept).

  4. sd__Observation is the model residuals (epsilon random error term)

  5. I took a look at the brms_crossedRE$formula to understand cor__(Intercept).wt and the formula is mpg ~ wt + (1 | cyl) + (1 + wt | gear) . My guess is, since we have two group-level intercepts (one per gear and other per cyl), brms will insert a correlation term for the intercepts. But that is far from my bayesian expertise.
    Here is the full summary for brms_crossedRE:

Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: mpg ~ wt + (1 | cyl) + (1 + wt | gear) 
   Data: mtcars (Number of observations: 32) 
Samples: 2 chains, each with iter = 100; warmup = 50; thin = 1;
         total post-warmup samples = 100

Group-Level Effects: 
~cyl (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     4.57      3.64     0.35    14.63 1.12       13       50

~gear (Number of levels: 3) 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)         4.57      3.21     0.06    12.84 1.00       43       48
sd(wt)                1.71      1.32     0.12     4.92 1.03       27       77
cor(Intercept,wt)    -0.29      0.58    -0.96     0.89 1.00       45       61

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    33.89      3.81    26.28    40.47 1.04       54       31
wt           -3.91      1.39    -6.27    -1.17 1.03       27       60

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     2.59      0.41     2.03     3.63 1.01      118       77

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Copy link

@storopoli storopoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those files are OK.

@ddsjoberg
Copy link
Collaborator

@larmarange thank you kindly for updating to accommodate brms models. I see that it took some re-structuring and adding new S3 functions.

@storopoli
Copy link

Hey guys, I don't know if you are aware but {broom.mixed} has nice tidiers for brms models. Here is a very complex one with random intercepts and slopes

library(brms)
fit  <- brm(bf(mpg ~ 1 + (1 + hp | cyl), decomp = "QR"), data = mtcars, refresh = 0) 
broom.mixed::tidy(fit, conf.int = TRUE)
r$> broom.mixed::tidy(fit, conf.int = TRUE)                                                                                                                                                      
Registered S3 method overwritten by 'broom.mixed':
  method      from 
  tidy.gamlss broom
# A tibble: 5 x 8
  effect   component group    term                estimate std.error conf.low conf.high
  <chr>    <chr>     <chr>    <chr>                  <dbl>     <dbl>    <dbl>     <dbl>
1 fixed    cond      NA       (Intercept)          20.8       3.52   13.9        28.0  
2 ran_pars cond      cyl      sd__(Intercept)       7.09      4.12    1.21       17.5  
3 ran_pars cond      cyl      sd__hp                0.0440    0.0360  0.00184     0.143
4 ran_pars cond      cyl      cor__(Intercept).hp  -0.221     0.554  -0.976       0.881
5 ran_pars cond      Residual sd__Observation       3.27      0.480   2.48        4.33 

PS: Sorry reprex is not working well in my environment.

@larmarange
Copy link
Owner Author

Dear @storopoli broom.mixed::tidy is by default with mixed models.

See also #90 about better management of mixed models in general.

@larmarange
Copy link
Owner Author

@ddsjoberg R CMD check fails for R3.5 and R3.4. It says that "brms" is not available, although installed through historical version of CRAN.

Any idea?

@ddsjoberg
Copy link
Collaborator

i'll take a look tomorrow. I may post a bunch of commits to test FYI

@larmarange
Copy link
Owner Author

@ddsjoberg R CMD check fails for R3.5 and R3.4. It says that "brms" is not available, although installed through historical version of CRAN.

Any idea?

Maybe an option could be to not force suggested packages for R 3.4 and 3.5?

@ddsjoberg
Copy link
Collaborator

Maybe an option could be to not force suggested packages for R 3.4 and 3.5?

Do you mean not do checks on 3.5 and 3.4, or is there a way to skip the suggested packages?

I looked at 3.4 and it seems to be an issue installing rstan. Either of the options above seem reasonable

@larmarange
Copy link
Owner Author

larmarange commented Feb 2, 2021

In that case, it would be only a partial check for R 3.4 (i.e. for R3.4, we will skip the tests relative to brms). But I have to try to see if it works.

I think you may find that R Cmd check will fail if every package in Suggests cannot be installed (even if the checks using that package are skipped when it's not installed).

@larmarange
Copy link
Owner Author

OK with the change all checks passed.

Please note that for R 3.4, some tests are skipped due to the fact that brms is not available. However, I guess that it is acceptable for R 3.4 and R 3.5 to skip corresponding tests if some packages are not available. For R release, oldrel and devel, we still force all suggested packages to pass the tests.

@ddsjoberg
Copy link
Collaborator

That is amazing!
I can't wait to look at the code to see how you did it!

@larmarange
Copy link
Owner Author

larmarange commented Feb 2, 2021

I have separated checks in 2 workflows. For R 3.4 and R 3.5 I have added _R_CHECK_FORCE_SUGGESTS_: false when calling rcmdcheck.

      - name: Check
        env:
          _R_CHECK_CRAN_INCOMING_REMOTE_: false
          _R_CHECK_FORCE_SUGGESTS_: false
        run: rcmdcheck::rcmdcheck(args = c("--no-manual", "--as-cran"), error_on = "warning", check_dir = "check")
        shell: Rscript {0}

@codecov
Copy link

codecov bot commented Feb 2, 2021

Codecov Report

Merging #89 (45f483d) into master (4d44eec) will decrease coverage by 0.13%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #89      +/-   ##
==========================================
- Coverage   99.85%   99.71%   -0.14%     
==========================================
  Files          33       34       +1     
  Lines        1375     1414      +39     
==========================================
+ Hits         1373     1410      +37     
- Misses          2        4       +2     
Impacted Files Coverage Δ
R/model_get_xlevels.R 100.00% <ø> (ø)
R/model_compute_terms_contributions.R 96.82% <100.00%> (-3.18%) ⬇️
R/model_get_model_matrix.R 100.00% <100.00%> (ø)
R/model_get_terms.R 100.00% <100.00%> (ø)
R/model_identify_variables.R 100.00% <100.00%> (ø)
R/model_list_variables.R 100.00% <100.00%> (ø)
R/select_helpers.R 100.00% <100.00%> (ø)
R/tidy_add_header_rows.R 100.00% <100.00%> (ø)
R/tidy_add_term_labels.R 100.00% <100.00%> (ø)
R/tidy_add_variable_labels.R 100.00% <100.00%> (ø)
... and 3 more

@larmarange larmarange merged commit eaa823f into master Feb 2, 2021
@larmarange larmarange deleted the brms branch December 6, 2021 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support crr models in tidy_plus_plus Support brms regression models
3 participants