Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create and access a data.frame target in the same make() #1077

Closed
3 tasks done
brendanf opened this issue Nov 20, 2019 · 3 comments
Closed
3 tasks done

Cannot create and access a data.frame target in the same make() #1077

brendanf opened this issue Nov 20, 2019 · 3 comments
Assignees

Comments

@brendanf
Copy link
Contributor

Prework

Description

I ran into this while playing around with #1070 and #1076, but it seems to be a general issue with format = "diskframe".

make() fails with an error when a target stored using format = "diskframe" is built and accessed during the same make(). The error does not occur if building and accessing occur in separate calls to make(), or if the target is not stored using format = "diskframe" (even if it actually is a disk.frame).

Reproducible example

failing
library(drake)
library(magrittr)
n <- 200
observations = data.frame(
  type = sample(letters[1:3], n, replace = TRUE),
  size = runif(n),
  stringsAsFactors = FALSE
)

plan <- drake_plan(
  all_data = target(
    disk.frame::as.disk.frame(
      observations,
      shardby = "type",
      outdir = drake_tempfile()
    ),
    format = "diskframe"
  ),
  result = target(
    all_data %>%
      disk.frame::chunk_group_by(type) %>%
      disk.frame::chunk_summarize(mean = mean(size)) %>%
      as.data.frame()
  )
)

make(plan) # error
#> target all_data
#> target result
#> fail result
#> Error: Target `result` failed. Call `diagnose(result)` for details. Error message:
#>   [ENOENT] Failed to search directory '/tmp/Rtmpbh73wg/reprex3de91a00873f/.drake/drake/tmp/file4d6258852284': no such file or directory
make(plan) # works the second time.
#> target all_data
#> target result
#> Target result messages:
#>   
#> Attaching package: 'purrr'
#> 
#>   The following object is masked from 'package:magrittr':
#> 
#>     set_names

Created on 2019-11-20 by the reprex package (v0.3.0)

no error when building and referencing are in different calls to `make()`
library(drake)
library(magrittr)
n <- 200
observations = data.frame(
  type = sample(letters[1:3], n, replace = TRUE),
  size = runif(n),
  stringsAsFactors = FALSE
)

plan <- drake_plan(
  all_data = target(
    disk.frame::as.disk.frame(
      observations,
      shardby = "type",
      outdir = ignore(drake_tempfile())
    ),
    format = "diskframe"
  ),
  result = target(
    all_data %>%
      disk.frame::chunk_group_by(type) %>%
      disk.frame::chunk_summarize(mean = mean(size)) %>%
      as.data.frame()
  )
)
make(plan, targets = "all_data")
#> In drake, consider r_make() instead of make(). r_make() runs make() in a fresh R session for enhanced robustness and reproducibility.
#> target all_data
deps_profile(all_data, drake_config(plan)) #nothing changed
#> # A tibble: 5 x 4
#>   name     changed old              new             
#>   <chr>    <lgl>   <chr>            <chr>           
#> 1 command  FALSE   e23d9c0137184274 e23d9c0137184274
#> 2 depend   FALSE   8367e81881e846e7 8367e81881e846e7
#> 3 file_in  FALSE   ""               ""              
#> 4 file_out FALSE   ""               ""              
#> 5 seed     FALSE   1605012276       1605012276
make(plan) #success, but why did we build all_data?
#> target all_data
#> target result
#> Target result messages:
#>   
#> Attaching package: 'purrr'
#> 
#>   The following object is masked from 'package:magrittr':
#> 
#>     set_names
readd(result)
#>   type      mean
#> 1    c 0.5116828
#> 2    a 0.5130528
#> 3    b 0.4180417

Created on 2019-11-20 by the reprex package (v0.3.0)

no error when the target is not stored as `disk.frame`
library(drake)
library(magrittr)
n <- 200
observations = data.frame(
  type = sample(letters[1:3], n, replace = TRUE),
  size = runif(n),
  stringsAsFactors = FALSE
)

plan <- drake_plan(
  all_data = target(
    disk.frame::as.disk.frame(
      observations,
      shardby = "type",
      outdir = ignore(drake_tempfile())
    )
  ),
  result = target(
    all_data %>%
      disk.frame::chunk_group_by(type) %>%
      disk.frame::chunk_summarize(mean = mean(size)) %>%
      as.data.frame()
  )
)

make(plan)
#> In drake, consider r_make() instead of make(). r_make() runs make() in a fresh R session for enhanced robustness and reproducibility.
#> target all_data
#> target result
#> Target result messages:
#>   
#> Attaching package: 'purrr'
#> 
#>   The following object is masked from 'package:magrittr':
#> 
#>     set_names
readd(result)
#>   type      mean
#> 1    c 0.5081372
#> 2    a 0.4998035
#> 3    b 0.5227328

Created on 2019-11-20 by the reprex package (v0.3.0)

Expected result

The disk.frame target should be accessible from the same make() that it was created in.

Session info

Using 32b0695

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  os       Ubuntu 18.04.3 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Stockholm            
#>  date     2019-11-20                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                         
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.5.1)                 
#>  backports     1.1.5      2019-10-02 [1] CRAN (R 3.5.1)                 
#>  base64url     1.4        2018-05-14 [1] CRAN (R 3.5.1)                 
#>  callr         3.3.2      2019-09-22 [1] CRAN (R 3.5.1)                 
#>  cli           1.1.0      2019-03-19 [1] CRAN (R 3.5.1)                 
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.1)                 
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 3.5.1)                 
#>  devtools      2.2.1      2019-09-24 [1] CRAN (R 3.5.1)                 
#>  digest        0.6.22     2019-10-21 [1] CRAN (R 3.5.1)                 
#>  drake       * 7.7.0.9002 2019-11-18 [1] Github (ropensci/drake@32b0695)
#>  ellipsis      0.3.0      2019-09-20 [1] CRAN (R 3.5.1)                 
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.5.1)                 
#>  filelock      1.0.2      2018-10-05 [1] CRAN (R 3.5.1)                 
#>  fs            1.3.1      2019-05-06 [1] CRAN (R 3.5.1)                 
#>  glue          1.3.1      2019-03-12 [1] CRAN (R 3.5.1)                 
#>  highr         0.8        2019-03-20 [1] CRAN (R 3.5.1)                 
#>  htmltools     0.4.0      2019-10-04 [1] CRAN (R 3.5.1)                 
#>  igraph        1.2.4.1    2019-04-22 [1] CRAN (R 3.5.1)                 
#>  knitr         1.26       2019-11-12 [1] CRAN (R 3.5.1)                 
#>  magrittr    * 1.5        2014-11-22 [1] CRAN (R 3.5.1)                 
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.5.1)                 
#>  pillar        1.4.2      2019-06-29 [1] CRAN (R 3.5.1)                 
#>  pkgbuild      1.0.6      2019-10-09 [1] CRAN (R 3.5.1)                 
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 3.5.1)                 
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.5.1)                 
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.5.1)                 
#>  processx      3.4.1      2019-07-18 [1] CRAN (R 3.5.1)                 
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.5.1)                 
#>  R6            2.4.1      2019-11-12 [1] CRAN (R 3.5.1)                 
#>  Rcpp          1.0.3      2019-11-08 [1] CRAN (R 3.5.1)                 
#>  remotes       2.1.0      2019-06-24 [1] CRAN (R 3.5.1)                 
#>  rlang         0.4.1      2019-10-24 [1] CRAN (R 3.5.1)                 
#>  rmarkdown     1.17       2019-11-13 [1] CRAN (R 3.5.1)                 
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.5.1)                 
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.5.1)                 
#>  storr         1.2.1      2018-10-18 [1] CRAN (R 3.5.1)                 
#>  stringi       1.4.3      2019-03-12 [1] CRAN (R 3.5.1)                 
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.5.1)                 
#>  testthat      2.3.0      2019-11-05 [1] CRAN (R 3.5.1)                 
#>  tibble        2.1.3      2019-06-06 [1] CRAN (R 3.5.1)                 
#>  txtq          0.2.0      2019-10-15 [1] CRAN (R 3.5.1)                 
#>  usethis       1.5.0      2019-04-07 [1] CRAN (R 3.5.1)                 
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.1)                 
#>  xfun          0.11       2019-11-12 [1] CRAN (R 3.5.1)                 
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.5.1)                 
#> 
#> [1] /home/brendan/miniconda3/envs/oueme-dev/lib/R/library
@wlandau
Copy link
Collaborator

wlandau commented Nov 20, 2019

When make() stores targets, it usually puts the target's return value in memory too (depending on the memory strategy). I suspect that when drake moves a disk.frame to its permanent home in the cache, it is not updating the initial in-memory representation (a drake_tempfile() path that we already moved). I will investigate.

@wlandau
Copy link
Collaborator

wlandau commented Nov 20, 2019

I need to look at assign_to_envir() and value_format(). We probably also need a more general "drake_format" class for formatted targets.

drake/R/local_build.R

Lines 407 to 433 in 32b0695

assign_to_envir <- function(target, value, config) {
memory_strategy <- config$layout[[target]]$memory_strategy %||NA%
config$memory_strategy
if (memory_strategy %in% c("autoclean", "unload", "none")) {
return()
}
if (
identical(config$lazy_load, "eager") &&
!is_encoded_path(target) &&
!is_imported(target, config)
) {
assign(
x = target,
value = value_format(value),
envir = config$envir_targets
)
}
invisible()
}
value_format <- function(x) {
if (any(grepl("^drake_format_", class(x)))) {
x$value
} else {
x
}
}

@wlandau
Copy link
Collaborator

wlandau commented Nov 20, 2019

Good catch on this one, @brendanf. Should be working now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants