Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an R function to convert data list to JSON #202

Open
avehtari opened this issue Jan 28, 2024 · 2 comments
Open

Add an R function to convert data list to JSON #202

avehtari opened this issue Jan 28, 2024 · 2 comments
Labels
dependencies Pull requests that update a dependency file enhancement New feature or request R-lang

Comments

@avehtari
Copy link

In R interface, StanModel$new() accepts data as "a JSON string literal, a path to a data file in JSON format ending in “.json”". CmdStanR has handy function write_stan_json() https://mc-stan.org/cmdstanr/reference/write_stan_json.html to write R data list to a JSON file compatible with CmdStan (the function does more than direct write of list in json).

  1. It might be useful to have this in bridgestan without dependency on cmdstanr
  2. It would be awesome to have a function which doesn't require explicitly writing to file, but would just return JSON string literal.

The use case for 2. is when there is a need to instantiate the model with new data with minimal overhead. In the following code snippet sd is a data list, and we update part of the list, call new, and use log density, gradient and Hessian. Based on timing experiments, this is slightly faster than using ``write_stan_json()to write to a file and passing the file path tonew()`

sd$tau <- draws_sd[s, ]
model <- StanModel$new(model_so, to_stan_json(sd), SEED)

Another use case is when using BridgeStan with brms generated models and data

model_so <- compile_model(write_stan_file(stancode(brms_fit)))
model <- StanModel$new(model_so, to_stan_json(stan_data(brms_fit)), SEED)

to_stan_json() can be modified from write_stan_json by removing references to file (the code is using BSD-3 license).

to_stan_json <- function(data, always_decimal = FALSE) {
  if (!is.list(data)) {
    stop("'data' must be a list.", call. = FALSE)
  }

  data_names <- names(data)
  if (length(data) > 0 &&
      (length(data_names) == 0 ||
       length(data_names) != sum(nzchar(data_names)))) {
    stop("All elements in 'data' list must have names.", call. = FALSE)

  }
  if (anyDuplicated(data_names) != 0) {
    stop("Duplicate names not allowed in 'data'.", call. = FALSE)
  }

  for (var_name in data_names) {
    var <- data[[var_name]]
    if (!(is.numeric(var) || is.factor(var) || is.logical(var) ||
          is.data.frame(var) || is.list(var))) {
      stop("Variable '", var_name, "' is of invalid type.", call. = FALSE)
    }
    if (anyNA(var)) {
      stop("Variable '", var_name, "' has NA values.", call. = FALSE)
    }

    if (is.table(var)) {
      var <- unclass(var)
    } else if (is.logical(var)) {
      mode(var) <- "integer"
    } else if (is.data.frame(var)) {
      var <- data.matrix(var)
    } else if (is.list(var)) {
      var <- list_to_array(var, var_name)
    }
    data[[var_name]] <- var
  }

  # unboxing variables (N = 10 is stored as N : 10, not N: [10])
  jsonlite::toJSON(
    data,
    auto_unbox = TRUE,
    factor = "integer",
    always_decimal = always_decimal,
    digits = NA,
    pretty = TRUE
  )
}
@WardBrian
Copy link
Collaborator

I agree it would be nice if this code could be easier to use outside cmdstanr. That said, I'm not sure BridgeStan is the right place for it, either.

Recently for cmdstanpy I spun off stanio to hold this code, which cmdstanpy now depends on, and so other things like bridgestan could use it more easily. It has both dump_stan_json (returns a string) and write_stan_json (writes to a file). The StanJulia universe did something similar recently

I would suggest seeing if the cmdstanr folks would be willing to do this. Ideally it would be a very small package that could be put on CRAN and both bridgestan and cmdstanr could take a dependency on it

@bob-carpenter
Copy link
Collaborator

Last time I used BridgeStan, I tried to use a dictionary and then realized I had to go look up JSON literal syntax instead. I think it would be worth having a dictionary interface for data in BridgeStan. For R, the equivalent structure is a list.

@WardBrian WardBrian added enhancement New feature or request R-lang dependencies Pull requests that update a dependency file labels Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file enhancement New feature or request R-lang
Projects
None yet
Development

No branches or pull requests

3 participants