Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use lme4::lmer() inside a targets workflow #160

Closed
3 tasks done
gorkang opened this issue Sep 17, 2020 · 3 comments
Closed
3 tasks done

Can't use lme4::lmer() inside a targets workflow #160

gorkang opened this issue Sep 17, 2020 · 3 comments

Comments

@gorkang
Copy link

gorkang commented Sep 17, 2020

Prework

  • I understand and agree to the code of conduct.
  • I understand and agree to the contributing guidelines.
  • Be considerate of the maintainer's time and make it as easy as possible to troubleshoot any problems you identify. Read here and here to learn about minimal reproducible examples. Format your code according to the tidyverse style guide to make it easier for others to read.

Description

Can't find a way to use lme4::lmer() inside a targets workflow when the data object is created in one of the previous steps.

Below you can see that when data = ggplot2::mpg it works, but when data = DF_raw, it gives an error.

I assume I am doing something wrong, that is why I am filling a "Trouble".

Thanks for the help!

Reproducible example

_targets.R file:

library(targets)

read_data <- function() {
  DF_raw = ggplot2::mpg
  return(DF_raw)
}


targets::tar_option_set(packages = c("targets", "ggplot2", "lme4"))


targets <- list(

  tar_target(DF_raw, read_data()),
  
  tar_target(model1, lm(year ~ displ + manufacturer, data = DF_raw)),
  tar_target(model2, lme4::lmer(year ~ displ + (1|manufacturer), data = ggplot2::mpg)),
  tar_target(model3, lme4::lmer(year ~ displ + (1|manufacturer), data = DF_raw))
  
)

tar_pipeline(targets)
#> <pipeline with 4 targets>

Created on 2020-09-17 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       Ubuntu 18.04.5 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Atlantic/Canary             
#>  date     2020-09-17                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                          
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.0.2)                  
#>  backports     1.1.10     2020-09-15 [1] CRAN (R 4.0.2)                  
#>  callr         3.4.4      2020-09-07 [1] CRAN (R 4.0.2)                  
#>  cli           2.0.2      2020-02-28 [1] CRAN (R 4.0.2)                  
#>  codetools     0.2-16     2018-12-24 [4] CRAN (R 4.0.0)                  
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 4.0.2)                  
#>  data.table    1.13.0     2020-07-24 [1] CRAN (R 4.0.2)                  
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 4.0.2)                  
#>  devtools      2.3.1      2020-07-21 [1] CRAN (R 4.0.2)                  
#>  digest        0.6.25     2020-02-23 [1] CRAN (R 4.0.2)                  
#>  ellipsis      0.3.1      2020-05-15 [1] CRAN (R 4.0.2)                  
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.0.2)                  
#>  fansi         0.4.1      2020-01-08 [1] CRAN (R 4.0.2)                  
#>  fs            1.5.0      2020-07-31 [1] CRAN (R 4.0.2)                  
#>  glue          1.4.2      2020-08-27 [1] CRAN (R 4.0.2)                  
#>  highr         0.8        2019-03-20 [1] CRAN (R 4.0.2)                  
#>  htmltools     0.5.0      2020-06-16 [1] CRAN (R 4.0.2)                  
#>  igraph        1.2.5      2020-03-19 [1] CRAN (R 4.0.2)                  
#>  knitr         1.29.5     2020-09-09 [1] Github (yihui/knitr@7d2dd40)    
#>  lifecycle     0.2.0      2020-03-06 [1] CRAN (R 4.0.2)                  
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 4.0.2)                  
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 4.0.2)                  
#>  pillar        1.4.6      2020-07-10 [1] CRAN (R 4.0.2)                  
#>  pkgbuild      1.1.0      2020-07-13 [1] CRAN (R 4.0.2)                  
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.0.2)                  
#>  pkgload       1.1.0      2020-05-29 [1] CRAN (R 4.0.2)                  
#>  prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.0.2)                  
#>  processx      3.4.4      2020-09-03 [1] CRAN (R 4.0.2)                  
#>  ps            1.3.4      2020-08-11 [1] CRAN (R 4.0.2)                  
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.0.2)                  
#>  R6            2.4.1      2019-11-12 [1] CRAN (R 4.0.2)                  
#>  remotes       2.2.0      2020-07-21 [1] CRAN (R 4.0.2)                  
#>  rlang         0.4.7      2020-07-09 [1] CRAN (R 4.0.2)                  
#>  rmarkdown     2.3        2020-06-18 [1] CRAN (R 4.0.2)                  
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 4.0.2)                  
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 4.0.2)                  
#>  stringi       1.5.3      2020-09-09 [1] CRAN (R 4.0.2)                  
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.0.2)                  
#>  targets     * 0.0.0.9000 2020-09-10 [1] Github (wlandau/targets@37af27f)
#>  testthat      2.3.2      2020-03-02 [1] CRAN (R 4.0.2)                  
#>  tibble        3.0.3      2020-07-10 [1] CRAN (R 4.0.2)                  
#>  tidyselect    1.1.0      2020-05-11 [1] CRAN (R 4.0.2)                  
#>  usethis       1.6.1      2020-04-29 [1] CRAN (R 4.0.2)                  
#>  vctrs         0.3.4      2020-08-29 [1] CRAN (R 4.0.2)                  
#>  withr         2.2.0      2020-04-20 [1] CRAN (R 4.0.2)                  
#>  xfun          0.17       2020-09-09 [1] CRAN (R 4.0.2)                  
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.0.2)                  
#> 
#> [1] /home/emrys/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

When running targets::tar_make() I get the following output:

targets::tar_make()
● run target DF_raw
● run target model2
boundary (singular) fit: see ?isSingular
● run target model1
● run target model3
x error target model3
Error : 'data' not found, and some variables missing from formula environment .
Error: callr subprocess failed: 'data' not found, and some variables missing from formula environment .
Type .Last.error.trace to see where the error occured

Desired result

Would expect to run the model3 target.

Diagnostic information

packageDescription("targets")$GithubSHA1
[1] "37af27f3b30c620a0dde66ff7d0120f0af98630a"

@wlandau
Copy link
Collaborator

wlandau commented Sep 17, 2020

This is an instance of ropensci/drake#1012 and a permanent consequence of how targets manages in-memory data. As a workaround, you can write a custom function to force the data into the environment of your formula. Example:

library(targets)

tar_script({
  library(targets)
  options(crayon.enabled = FALSE)
  tar_option_set(packages = c("targets", "ggplot2", "lme4"))
  read_data <- function() {
    DF_raw = ggplot2::mpg
    return(DF_raw)
  }
  run_lmer <- function(data) {
    envir <- environment()
    envir$data <- data
    f <- as.formula("Reaction ~ Days + (Days | Subject)", env = envir)
    lme4::lmer(f, data = data)
  }
  tar_pipeline(
    tar_target(df, read_data()),
    tar_target(model, lme4::lmer(year ~ displ + (1|manufacturer), data = df))
  )
})

tar_make()
#> ● run target df
#> ● run target model
#> boundary (singular) fit: see ?isSingular

tar_read(model)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: year ~ displ + (1 | manufacturer)
#>    Data: df
#> REML criterion at convergence: 1364.562
#> Random effects:
#>  Groups       Name        Std.Dev.
#>  manufacturer (Intercept) 0.00    
#>  Residual                 4.47    
#> Number of obs: 234, groups:  manufacturer, 15
#> Fixed Effects:
#> (Intercept)        displ  
#>   2001.7084       0.5161  
#> convergence code 0; 0 optimizer warnings; 1 lme4 warnings

Created on 2020-09-17 by the reprex package (v0.3.0)

@wlandau wlandau closed this as completed Sep 17, 2020
@gorkang
Copy link
Author

gorkang commented Sep 18, 2020

Thanks for the quick response @wlandau

I was trying to implement the custom function solution, but apparently there is something else(more?) going on here.

It seems the lmer() models will work when the data is called df. There is no need for a custom function.

targets Example: Without custom function, data is called df:

library(targets)

tar_script({
  library(targets)
  options(crayon.enabled = FALSE)
  tar_option_set(packages = c("targets", "ggplot2", "lme4"))
  read_data <- function() {
    DF_raw = ggplot2::mpg
    return(DF_raw)
  }
  # run_lmer <- function(data) {
  #   envir <- environment()
  #   envir$data <- data
  #   f <- as.formula("Reaction ~ Days + (Days | Subject)", env = envir)
  #   lme4::lmer(f, data = data)
  # }
  tar_pipeline(
    tar_target(df, read_data()),
    tar_target(model, lme4::lmer(year ~ displ + (1|manufacturer), data = df))
  )
})

tar_make()
#> ● run target df
#> ● run target model
#> boundary (singular) fit: see ?isSingular

tar_read(model)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: year ~ displ + (1 | manufacturer)
#>    Data: df
#> REML criterion at convergence: 1364.562
#> Random effects:
#>  Groups       Name        Std.Dev.
#>  manufacturer (Intercept) 0.00    
#>  Residual                 4.47    
#> Number of obs: 234, groups:  manufacturer, 15
#> Fixed Effects:
#> (Intercept)        displ  
#>   2001.7084       0.5161  
#> convergence code 0; 0 optimizer warnings; 1 lme4 warnings

Created on 2020-09-18 by the reprex package (v0.3.0)

Not sure if I should post this in the ropensci/drake#1012 web, but just in case... the same happens in drake. Everything is fine as long as the data is called df:

drake Example (see mod2):

library(drake)
suppressPackageStartupMessages(library(lme4))

fit_lmer <- function(dat) {
  envir <- environment()
  envir$dat <- dat
  f <- as.formula("Reaction ~ Days + (Days | Subject)", env = envir)
  lme4::lmer(f, data = dat)
}

plan <- drake_plan(
  dat = sleepstudy,
  df = sleepstudy,
  mod = fit_lmer(dat),
  mod2 = lme4::lmer("Reaction ~ Days + (Days | Subject)", df)
)

make(plan)
#> ▶ target df
#> ▶ target dat
#> ▶ target mod2
#> ▶ target mod

Created on 2020-09-18 by the reprex package (v0.3.0)

@wlandau
Copy link
Collaborator

wlandau commented Sep 18, 2020

Interesting. I wonder what's so special about df.

In any case, it may be worth contacting the maintainers of lme4 about this. Seems like it's lmer()'s job to check the parent frame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants