Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caching of cmdstan binaries broken with >= 2.20 (for cmdstan >= 2.29) #1544

Closed
wds15 opened this issue Sep 25, 2023 · 10 comments
Closed

caching of cmdstan binaries broken with >= 2.20 (for cmdstan >= 2.29) #1544

wds15 opened this issue Sep 25, 2023 · 10 comments
Milestone

Comments

@wds15
Copy link
Contributor

wds15 commented Sep 25, 2023

When using cmdstanr as backend with a configured caching directory for the binaries, then this used to avoid recompilation of the Stan programs. In newer brms version this behaviour is lost and one always recompiles the Stan programs. Here is small example:

library(brms)
suppressPackageStartupMessages(library(here))
# instruct brms to use cmdstanr as backend and cache all Stan binaries
options(brms.backend="cmdstanr", cmdstanr_write_stan_file_dir=here("brms-cache"))
# create cache directory if not yet available
dir.create(here("brms-cache"), FALSE)

prior1 <- prior(normal(0,10), class = b) +
  prior(cauchy(0,2), class = sd)

fit1 <- brm(count ~ zAge + zBase * Trt + (1|patient),
            data = epilepsy, family = poisson(), prior = prior1)

## compiles the model AGAIN!!!
refit1 <- brm(count ~ zAge + zBase * Trt + (1|patient),
              data = epilepsy, family = poisson(), prior = prior1)


stan_model <- stancode(fit1)

stan_data <- standata(fit1)

## but ccmdstanr is not the issue here:
library(cmdstanr)

stan_file_1 <- write_stan_file(stan_model)
model_1 <- cmdstan_model(stan_file_1)

stan_file_2 <- write_stan_file(stan_model)
model_2 <- cmdstan_model(stan_file_2)

## the above does NOT recompile 

The issue appears to be here:

if (cmdstanr::cmdstan_version() >= "2.29.0") {

In the line below the overwrite_file flag is set to TRUE. This way the file is always overwritten and gets a new time-stamp which is newer than an already existing model binary and hence the recompile is triggered. Not sure why the file needs to be overwritten, but that should be avoided as this does otherwise force recompilation of already compiled models. Just run the above with brms 2.19 and then the recompilation of the model for the refit model does not happen.

@wds15
Copy link
Contributor Author

wds15 commented Sep 25, 2023

Actually I am not sure about brms 2.19... I used cmdstan 2.28 up until now where this problem does not exist.

@wds15
Copy link
Contributor Author

wds15 commented Sep 25, 2023

Maybe this Stan code canonicalise stuff can be made optional? Whenever one disables this, then stanc will do more complaints I guess, but I rather have stanc complain at me, but still get model binary caching.

@luwidmer
Copy link

Seems to have been introduced with this commit in 2.17.0: 70a1d44

@wds15
Copy link
Contributor Author

wds15 commented Sep 25, 2023

For anyone who wants to fix this right away... here is an ugly hack which disables model overwriting for all brms models:

fix_brms_recompile <- function() {
    cat("INFO: Disabling Stan model canonicalization procedure in brms which forces model recompilations!\n")
    # canonicalize Stan model file in accordance with the current Stan version... AND NEVER overwrite anything
    fixed_canonicalize_stan_model <- function(stan_file, overwrite_file = TRUE) {
        cmdstan_mod <- cmdstanr::cmdstan_model(stan_file, compile = FALSE)
        out <- utils::capture.output(
                          cmdstan_mod$format(
                                          canonicalize = list("deprecations", "braces", "parentheses"),
                                          overwrite_file = FALSE, backup = FALSE
                                      )
                      )
        paste0(out, collapse = "\n")
    }
    brms <- asNamespace("brms")
    unlockBinding(".canonicalize_stan_model", brms)
    assignInNamespace(".canonicalize_stan_model", fixed_canonicalize_stan_model, ns="brms", envir=brms)
    assign(".canonicalize_stan_model", fixed_canonicalize_stan_model, envir=brms)
    lockBinding(".canonicalize_stan_model", env=brms)
}

Not sure what side effects this has other than that stanc will report more warnings, but it does seem to work for me ok as of now (just call the function and after that brms is "fixed")

@paul-buerkner
Copy link
Owner

@rok-cesnovar do you have a suggestion how to best resolve this? Also, do you remember why we put the cmdstan > 2.29 version check in that one place mentioned above?

@wds15
Copy link
Contributor Author

wds15 commented Sep 26, 2023

In all honesty, I think that there is a strong rationale to drop this feature from brms to canonicalise the models. I mean, a user can put the respective stanc3 options in the make/local file of cmdstan (STANCFLAGS). To make it easier for users to enable this permanently a good place to do so is the cmdstanr documentation, for example.

The >= 2.29 version check ensures that in this case there does not exist any rstan version with this high version number, I think. This way you ensure that the canonicalisation is only done when it is safe to assume that rstan is not used, which was needed to export Stan functions from models build under cmdstan. However, this hack (convert a cmdstan model to rstan in order to expose the functions) is not needed any more, since we now have in cmdstanr 0.6.1 the functionality to export Stan user functions.

@paul-buerkner
Copy link
Owner

Just to make sure I understand correctly, you are suggesting to never canonicalize the model?

@wds15
Copy link
Contributor Author

wds15 commented Sep 26, 2023

Yes, that is the suggestion here.

This would cause that the user gets to see warnings from stanc3 if using newer cmdstan installations, since brms is emitting oldish code which still works with oldish rstan.

Not sure if there are other arguments for canonicalization beyond that one avoids warnings.

The argument is that users can setup cmdstan such that the canonicalization is always done for the user in full automation (and not just when they use brms).

The argument to include this into brms is that this makes brms emit "nicer" Stan code without changing brms itself.

However, maybe brms can anyway be ported now to the new array syntax, which is the main motivation here (I guess)? Now rstan 2.26 is on CRAN such that the new array syntax is supported if I recall right.

@paul-buerkner
Copy link
Owner

Makes sense. brms is already using the new array syntax since recently so that should not be a concern anymore.

@paul-buerkner paul-buerkner added this to the 2.21.0 milestone Sep 26, 2023
@paul-buerkner
Copy link
Owner

okay. I have commented out the relevant code for now. I did a CRAN release only recently so we have some time now to safely try it out on github and see if not canonicalizing causes any trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants