Support multivariate non-linear models #7

DoktorMike · 2021-11-15T20:54:20Z

I am building a model using the non-linear interface in brms and it works like a charm with your package. Now I need to extend this to be a multivariate model, i.e., multiple response variables and I cannot seem to get it to work.

Brief description of the problem

library(tidyverse)
library(brms)
library(bayesian)

mydf <- tibble(
        x = sin(seq(1, 10, 0.25)),
        z = 2 * x + rnorm(length(x), 0, 0.5),
        y = 1 * x + rnorm(length(x), 0, 0.5)
)

myrec <- mydf %>%
        recipe() %>%
        update_role(z, new_role = "outcome") %>%
        update_role(y, new_role = "outcome") %>%
        update_role(x, new_role = "predictor") 
        #add_role(has_role("outcome"), new_role = "predictor")

myform <- bf(z ~ b1 * x, b1 ~ 1, nl = TRUE) +
        bf(y ~ b2 * x, b2 ~ 1, nl = TRUE)
myprior <- prior(normal(0, 1), nlpar = "b1", resp = "z") +
	prior(normal(0, 1), nlpar = "b2", resp = "y")
mymodel <- bayesian(family = normal(), cores = 4, chains = 4) %>%
        set_engine("brms") %>%
        set_mode("regression") %>%
        update(formula = z ~ z + y + x) %>%
        update(formula.override = bayesian_formula(myform)) %>%
        update(prior = myprior) %>%
        update(family = list("gaussian", "gaussian"))

myworkflow <- workflow() %>%
        add_recipe(myrec) %>%
        add_model(spec = mymodel)

myworkflow_fit <- myworkflow %>%
        fit(data = mydf)

I get the error message Error in formula.default(object, env = baseenv()) : invalid formula but when just using one response variable it works fine. So is this an unsupported feature or am I doing something wrong?

If you instead set the mymodel to be univariate like this:

# Single response variable works
mymodel <- bayesian(family = normal(), cores = 4, chains = 4) %>%
        set_engine("brms") %>%
        set_mode("regression") %>%
        update(formula.override = bayesian_formula(bf(z ~ b * x, b ~ 1, nl = T))) %>%
        update(prior = prior(normal(0, 1), nlpar = "b")) %>%
        update(family = "gaussian")

Everything works.

The text was updated successfully, but these errors were encountered:

DoktorMike · 2021-11-25T13:14:22Z

Update:

I can get the training to work by setting the class of the formula manually instead of relying on bayesian_formula. Thus setting mymodel to this works.

# Multiple response variable FAILS
myform <- bf(z ~ b1 * x, b1 ~ 1, nl = TRUE) +
        bf(y ~ b2 * x, b2 ~ 1, nl = TRUE)
class(myform) <- union(class(myform), "formula") # Need to actually fit the mvmodel
myprior <- prior(normal(0, 1), nlpar = "b1", resp = "z") +
        prior(normal(0, 1), nlpar = "b2", resp = "y")
mymodel <- bayesian(family = normal(), cores = 4, chains = 4) %>%
        set_engine("brms") %>%
        set_mode("regression") %>%
        update(formula = z ~ z + y + x) %>%
        #update(formula.override = bayesian_formula(myform)) %>%
        update(formula.override = myform) %>%
        update(prior = myprior) %>%
        update(family = list("gaussian", "gaussian"))

The predictions do not work though. Most likely because parsnip doesn't know how to handle multivariate responses from brms. The error we get is: Error in results[, 1] : incorrect number of dimensions

The traceback is:

13: eval_tidy(xs[[j]], mask)
12: tibble_quos(xs, .rows, .name_repair)
11: tibble::tibble(.pred = results[, 1])
10: object$spec$method$pred$numeric$post(res, object)
9: predict_numeric.model_fit(object = object, new_data = new_data,
       ...)
8: predict_numeric(object = object, new_data = new_data, ...)
7: predict.model_fit(fit, new_data, type = type, opts = opts, ...)
6: predict(fit, new_data, type = type, opts = opts, ...)
5: predict.workflow(., new_data = mydf, type = NULL, level = 0.4)
4: predict(., new_data = mydf, type = NULL, level = 0.4)
3: list2(...)
2: bind_cols(., mydf)
1: myworkflow_fit %>% predict(new_data = mydf, type = NULL, level = 0.4) %>%
       bind_cols(mydf)

Tracked it down to happen in this function:

myworkflow_fit$fit$fit$spec$method$pred$numeric$post
function (results, object)
{
    tibble::tibble(.pred = results[, 1])
}

But where this is defined I haven't found.

hsbadr · 2021-11-27T00:19:12Z

Thanks @DoktorMike! It'll likely need a minor workaround to avoid the expected linear formula or single family currently required by tidymodels. The main difference between bayesian_formula() and brms::brmsformula() is adding the formula class; so, it should work fine. But, the predictions need to detect and support lists or multiple response variables (or n-dimensional outcome, in general). I'll look into it, ASAP.

DoktorMike · 2021-11-28T10:16:00Z

Sounds great @hsbadr ! Let me know if I can do anything to help.

Partially address #7.

hsbadr · 2021-12-30T17:30:57Z

@DoktorMike Can you test this now? It should work fine. As for the predictions, you can use type = "raw" to get the predictions, or predict directly from the brmsfit object:

mybrmsfit <- myworkflow_fit %>%
  extract_fit_engine()
  
predict(mybrmsfit, head(mydf))

I'll add support for multivariate non-linear models in the predictions.

Partially addresses #7.

hsbadr · 2021-12-30T17:53:28Z

You can now use the following:

myworkflow_fit %>% 
  predict(
    new_data = head(mydf),
    type = "raw"
  )

You may pass other arguments to predict.brmsfit as follows:

myworkflow_fit %>% 
  predict(
    new_data = head(mydf),
    type = "raw",
    opts = list(
      summary = TRUE,
      robust = FALSE,
      probs = c(0.025, 0.975)
    )
  )

DoktorMike · 2021-12-31T12:59:33Z

Awesome @hsbadr ! I can confirm that this indeed works as expected.

hsbadr added the enhancement New feature or request label Nov 19, 2021

hsbadr added a commit that referenced this issue Dec 30, 2021

bayesian_formula: Handle mvbrmsformula class

2c05127

Partially address #7.

hsbadr added a commit that referenced this issue Dec 30, 2021

bayesian: Pass summary argument in raw predictions

e9bbb2a

Partially addresses #7.

hsbadr closed this as completed in 9674c77 Dec 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multivariate non-linear models #7

Support multivariate non-linear models #7

DoktorMike commented Nov 15, 2021 •

edited

DoktorMike commented Nov 25, 2021 •

edited

hsbadr commented Nov 27, 2021

DoktorMike commented Nov 28, 2021

hsbadr commented Dec 30, 2021

hsbadr commented Dec 30, 2021

DoktorMike commented Dec 31, 2021

Support multivariate non-linear models #7

Support multivariate non-linear models #7

Comments

DoktorMike commented Nov 15, 2021 • edited

DoktorMike commented Nov 25, 2021 • edited

hsbadr commented Nov 27, 2021

DoktorMike commented Nov 28, 2021

hsbadr commented Dec 30, 2021

hsbadr commented Dec 30, 2021

DoktorMike commented Dec 31, 2021

DoktorMike commented Nov 15, 2021 •

edited

DoktorMike commented Nov 25, 2021 •

edited