Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multivariate non-linear models #7

Closed
DoktorMike opened this issue Nov 15, 2021 · 6 comments
Closed

Support multivariate non-linear models #7

DoktorMike opened this issue Nov 15, 2021 · 6 comments
Labels
enhancement New feature or request

Comments

@DoktorMike
Copy link

DoktorMike commented Nov 15, 2021

I am building a model using the non-linear interface in brms and it works like a charm with your package. Now I need to extend this to be a multivariate model, i.e., multiple response variables and I cannot seem to get it to work.

Brief description of the problem

library(tidyverse)
library(brms)
library(bayesian)

mydf <- tibble(
        x = sin(seq(1, 10, 0.25)),
        z = 2 * x + rnorm(length(x), 0, 0.5),
        y = 1 * x + rnorm(length(x), 0, 0.5)
)

myrec <- mydf %>%
        recipe() %>%
        update_role(z, new_role = "outcome") %>%
        update_role(y, new_role = "outcome") %>%
        update_role(x, new_role = "predictor") 
        #add_role(has_role("outcome"), new_role = "predictor")

myform <- bf(z ~ b1 * x, b1 ~ 1, nl = TRUE) +
        bf(y ~ b2 * x, b2 ~ 1, nl = TRUE)
myprior <- prior(normal(0, 1), nlpar = "b1", resp = "z") +
	prior(normal(0, 1), nlpar = "b2", resp = "y")
mymodel <- bayesian(family = normal(), cores = 4, chains = 4) %>%
        set_engine("brms") %>%
        set_mode("regression") %>%
        update(formula = z ~ z + y + x) %>%
        update(formula.override = bayesian_formula(myform)) %>%
        update(prior = myprior) %>%
        update(family = list("gaussian", "gaussian"))

myworkflow <- workflow() %>%
        add_recipe(myrec) %>%
        add_model(spec = mymodel)

myworkflow_fit <- myworkflow %>%
        fit(data = mydf)

I get the error message Error in formula.default(object, env = baseenv()) : invalid formula but when just using one response variable it works fine. So is this an unsupported feature or am I doing something wrong?

If you instead set the mymodel to be univariate like this:

# Single response variable works
mymodel <- bayesian(family = normal(), cores = 4, chains = 4) %>%
        set_engine("brms") %>%
        set_mode("regression") %>%
        update(formula.override = bayesian_formula(bf(z ~ b * x, b ~ 1, nl = T))) %>%
        update(prior = prior(normal(0, 1), nlpar = "b")) %>%
        update(family = "gaussian")

Everything works.

@hsbadr hsbadr added the enhancement New feature or request label Nov 19, 2021
@DoktorMike
Copy link
Author

DoktorMike commented Nov 25, 2021

Update:

I can get the training to work by setting the class of the formula manually instead of relying on bayesian_formula. Thus setting mymodel to this works.

# Multiple response variable FAILS
myform <- bf(z ~ b1 * x, b1 ~ 1, nl = TRUE) +
        bf(y ~ b2 * x, b2 ~ 1, nl = TRUE)
class(myform) <- union(class(myform), "formula") # Need to actually fit the mvmodel
myprior <- prior(normal(0, 1), nlpar = "b1", resp = "z") +
        prior(normal(0, 1), nlpar = "b2", resp = "y")
mymodel <- bayesian(family = normal(), cores = 4, chains = 4) %>%
        set_engine("brms") %>%
        set_mode("regression") %>%
        update(formula = z ~ z + y + x) %>%
        #update(formula.override = bayesian_formula(myform)) %>%
        update(formula.override = myform) %>%
        update(prior = myprior) %>%
        update(family = list("gaussian", "gaussian"))

The predictions do not work though. Most likely because parsnip doesn't know how to handle multivariate responses from brms. The error we get is: Error in results[, 1] : incorrect number of dimensions

The traceback is:

13: eval_tidy(xs[[j]], mask)
12: tibble_quos(xs, .rows, .name_repair)
11: tibble::tibble(.pred = results[, 1])
10: object$spec$method$pred$numeric$post(res, object)
9: predict_numeric.model_fit(object = object, new_data = new_data,
       ...)
8: predict_numeric(object = object, new_data = new_data, ...)
7: predict.model_fit(fit, new_data, type = type, opts = opts, ...)
6: predict(fit, new_data, type = type, opts = opts, ...)
5: predict.workflow(., new_data = mydf, type = NULL, level = 0.4)
4: predict(., new_data = mydf, type = NULL, level = 0.4)
3: list2(...)
2: bind_cols(., mydf)
1: myworkflow_fit %>% predict(new_data = mydf, type = NULL, level = 0.4) %>%
       bind_cols(mydf)

Tracked it down to happen in this function:

myworkflow_fit$fit$fit$spec$method$pred$numeric$post
function (results, object)
{
    tibble::tibble(.pred = results[, 1])
}

But where this is defined I haven't found.

@hsbadr
Copy link
Owner

hsbadr commented Nov 27, 2021

Thanks @DoktorMike! It'll likely need a minor workaround to avoid the expected linear formula or single family currently required by tidymodels. The main difference between bayesian_formula() and brms::brmsformula() is adding the formula class; so, it should work fine. But, the predictions need to detect and support lists or multiple response variables (or n-dimensional outcome, in general). I'll look into it, ASAP.

@DoktorMike
Copy link
Author

Sounds great @hsbadr ! Let me know if I can do anything to help.

hsbadr added a commit that referenced this issue Dec 30, 2021
@hsbadr
Copy link
Owner

hsbadr commented Dec 30, 2021

@DoktorMike Can you test this now? It should work fine. As for the predictions, you can use type = "raw" to get the predictions, or predict directly from the brmsfit object:

mybrmsfit <- myworkflow_fit %>%
  extract_fit_engine()
  
predict(mybrmsfit, head(mydf))

I'll add support for multivariate non-linear models in the predictions.

hsbadr added a commit that referenced this issue Dec 30, 2021
@hsbadr
Copy link
Owner

hsbadr commented Dec 30, 2021

You can now use the following:

myworkflow_fit %>% 
  predict(
    new_data = head(mydf),
    type = "raw"
  )

You may pass other arguments to predict.brmsfit as follows:

myworkflow_fit %>% 
  predict(
    new_data = head(mydf),
    type = "raw",
    opts = list(
      summary = TRUE,
      robust = FALSE,
      probs = c(0.025, 0.975)
    )
  )

@hsbadr hsbadr closed this as completed in 9674c77 Dec 30, 2021
@DoktorMike
Copy link
Author

Awesome @hsbadr ! I can confirm that this indeed works as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants