Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sourceWithProgress() and nested R scripts #4586

Open
wlandau opened this issue Apr 5, 2019 · 4 comments

Comments

@wlandau
Copy link

commented Apr 5, 2019

System details

RStudio Edition : <!-- Desktop or Server -->
RStudio Version : 
OS Version      : 
R Version       : 

Steps to reproduce the problem

Based on ropensci/drake#807, I predict that if we have a script called inner.R

x <- 1

and if we run the following job.R script in the job launcher

source("inner.R")

then the job launcher's environment will not have a binding for x. @gadenbuie, would you mind verifying this (and filling in your system details above)? I do not have access to the job launcher myself.

Describe the problem in detail

It looks like sourceWithProgress() creates a fresh clean environment in which to source the R script, which seems like a good idea. However, what if the script itself sources other scripts? By default, the child scripts will use the global environment, not the environment that sources the parent script: https://stackoverflow.com/questions/55008645/control-the-environment-of-nested-calls-to-source. I believe this might be affecting ropensci/drake#807 and https://github.com/gadenbuie/drake-rstudio-jobs-example#readme.

Describe the behavior you expected

I expect/hope x to be defined in the environment in which the job launcher runs job.R.

@jmcphers

This comment has been minimized.

Copy link
Member

commented Apr 5, 2019

The reason it creates a fresh environment is to support the "export results" features:

image

Because objects created by the script are placed in the fresh environment by default, that environment contains the "results" of the script which can be easily transferred back to the main session.

Do you export results from your scripts?

@wlandau

This comment has been minimized.

Copy link
Author

commented Apr 6, 2019

I believe the results were exported in ropensci/drake#807. @gadenbuie, is this true? Until I can access the local version of the job launcher, I cannot test this out myself. (I am having trouble compiling the IDE preview of 1.2.1335 because the Ubuntu 18 tarball appears to not have a CMakeLists.txt file).

gadenbuie added a commit to gadenbuie/drake-rstudio-jobs-example that referenced this issue Apr 9, 2019
@gadenbuie

This comment has been minimized.

Copy link

commented Apr 9, 2019

No, the results were not exported.

Briefly, here's the structure of the job I'm running:

  1. A outer script that sets up a drake plan and triggers some computation.
  2. Inner scripts that hold functions and other objects used in the computation.

I'm running the outer script as a job in RStudio, but the use of source() inside the outer script to pull in the inner scripts is cause subtle issues in drake.

@wlandau please correct me if I'm wrong, but by default drake tracks dependencies of the computation in the outer script by inspecting the environment in which the drake plan is created (or where drake::make() is invoked?). Because the job (outer) is executed in a fresh environment, drake looks in this environment for the function definitions, but they exist in the global environment where they are placed by default by source().

Technically, the computation succeeds as expected because evaluating foo() in environment bar_env (where it doesn't exist) still finds foo() in the global environment (where it does exist), but this breaks drake's dependency hashing and invalidates up-to-date drake targets. Or, generalized to RStudio specifically, jobs that expect objects from source()-ed inner scrips to exist in the same environment as the outer script will break.

Here's a small example (also available in the reprex repo). We have a script called inner.R.

inside_inner <- 1

And another script called outer.R.

source("inner.R")
inside_outer <- 2
cat(ls(), sep = "\n", file = "outer.out")

cat(ls(envir = .GlobalEnv), sep = "\n", file = "global.out")

Running outer.R as an RStudio job:

rstudioapi::jobRunScript("outer.R", workingDir = getwd(), exportEnv = NULL)

Yeilding the following results in outer.out

cat(readLines("outer.out"), sep = "\n")
## inside_outer

Notice that inside_inner is not in the environment where outer.R is evaluated, but it is in the global environment.

cat(readLines("global.out"), sep = "\n")
## emitProgress
## inside_inner
## sourceWithProgress
session_info()

RStudio Version 1.2.1268 Build 1275 (87a9693)

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.11.1 Chrome/65.0.3325.230 Safari/537.36

> sessioninfo::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.2 (2018-12-20)
 os       macOS High Sierra 10.13.6   
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/New_York            
 date     2019-04-09                  

─ Packages ────────────────────────────────────────────────────────────────────────────────────────
 package     * version date       lib source           
 assertthat    0.2.1   2019-03-21 [1] standard (@0.2.1)
 cli           1.1.0   2019-03-19 [1] standard (@1.1.0)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.5.0)   
 packrat       0.4.9-3 2018-06-01 [1] CRAN (R 3.5.0)   
 rstudioapi    0.9.0   2019-01-09 [1] CRAN (R 3.5.2)   
 sessioninfo   1.1.1   2018-11-05 [1] standard (@1.1.1)
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.5.0) 

Edit: changed language to use outer/inner instead of primary/child throughout.

@wlandau

This comment has been minimized.

Copy link
Author

commented Apr 9, 2019

Thanks @gadenbuie.

@wlandau please correct me if I'm wrong, but by default drake tracks dependencies of the computation in the outer script by inspecting the environment in which the drake plan is created (or where drake::make() is invoked?). Because the job (outer) is executed in a fresh environment, drake looks in this environment for the function definitions, but they exist in the global environment where they are placed by default by source().

By default, drake tracks dependencies in the environment that calls make(), and it does not look in any ancestor environments (i.e. no inheritance). You supply an environment with functions etc. to the envir argument of make().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.