Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sourceWithProgress() and nested R scripts #4586

Closed
wlandau opened this issue Apr 5, 2019 · 6 comments
Closed

sourceWithProgress() and nested R scripts #4586

wlandau opened this issue Apr 5, 2019 · 6 comments
Labels
bug jobs stale

Comments

@wlandau
Copy link

@wlandau wlandau commented Apr 5, 2019

System details

RStudio Edition : <!-- Desktop or Server -->
RStudio Version : 
OS Version      : 
R Version       : 

Steps to reproduce the problem

Based on ropensci/drake#807, I predict that if we have a script called inner.R

x <- 1

and if we run the following job.R script in the job launcher

source("inner.R")

then the job launcher's environment will not have a binding for x. @gadenbuie, would you mind verifying this (and filling in your system details above)? I do not have access to the job launcher myself.

Describe the problem in detail

It looks like sourceWithProgress() creates a fresh clean environment in which to source the R script, which seems like a good idea. However, what if the script itself sources other scripts? By default, the child scripts will use the global environment, not the environment that sources the parent script: https://stackoverflow.com/questions/55008645/control-the-environment-of-nested-calls-to-source. I believe this might be affecting ropensci/drake#807 and https://github.com/gadenbuie/drake-rstudio-jobs-example#readme.

Describe the behavior you expected

I expect/hope x to be defined in the environment in which the job launcher runs job.R.

@jmcphers
Copy link
Member

@jmcphers jmcphers commented Apr 5, 2019

The reason it creates a fresh environment is to support the "export results" features:

image

Because objects created by the script are placed in the fresh environment by default, that environment contains the "results" of the script which can be easily transferred back to the main session.

Do you export results from your scripts?

@wlandau
Copy link
Author

@wlandau wlandau commented Apr 6, 2019

I believe the results were exported in ropensci/drake#807. @gadenbuie, is this true? Until I can access the local version of the job launcher, I cannot test this out myself. (I am having trouble compiling the IDE preview of 1.2.1335 because the Ubuntu 18 tarball appears to not have a CMakeLists.txt file).

gadenbuie added a commit to gadenbuie/drake-rstudio-jobs-example that referenced this issue Apr 9, 2019
@gadenbuie
Copy link
Member

@gadenbuie gadenbuie commented Apr 9, 2019

No, the results were not exported.

Briefly, here's the structure of the job I'm running:

  1. A outer script that sets up a drake plan and triggers some computation.
  2. Inner scripts that hold functions and other objects used in the computation.

I'm running the outer script as a job in RStudio, but the use of source() inside the outer script to pull in the inner scripts is cause subtle issues in drake.

@wlandau please correct me if I'm wrong, but by default drake tracks dependencies of the computation in the outer script by inspecting the environment in which the drake plan is created (or where drake::make() is invoked?). Because the job (outer) is executed in a fresh environment, drake looks in this environment for the function definitions, but they exist in the global environment where they are placed by default by source().

Technically, the computation succeeds as expected because evaluating foo() in environment bar_env (where it doesn't exist) still finds foo() in the global environment (where it does exist), but this breaks drake's dependency hashing and invalidates up-to-date drake targets. Or, generalized to RStudio specifically, jobs that expect objects from source()-ed inner scrips to exist in the same environment as the outer script will break.

Here's a small example (also available in the reprex repo). We have a script called inner.R.

inside_inner <- 1

And another script called outer.R.

source("inner.R")
inside_outer <- 2
cat(ls(), sep = "\n", file = "outer.out")

cat(ls(envir = .GlobalEnv), sep = "\n", file = "global.out")

Running outer.R as an RStudio job:

rstudioapi::jobRunScript("outer.R", workingDir = getwd(), exportEnv = NULL)

Yeilding the following results in outer.out

cat(readLines("outer.out"), sep = "\n")
## inside_outer

Notice that inside_inner is not in the environment where outer.R is evaluated, but it is in the global environment.

cat(readLines("global.out"), sep = "\n")
## emitProgress
## inside_inner
## sourceWithProgress
session_info()

RStudio Version 1.2.1268 Build 1275 (87a9693)

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.11.1 Chrome/65.0.3325.230 Safari/537.36

> sessioninfo::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.2 (2018-12-20)
 os       macOS High Sierra 10.13.6   
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/New_York            
 date     2019-04-09                  

─ Packages ────────────────────────────────────────────────────────────────────────────────────────
 package     * version date       lib source           
 assertthat    0.2.1   2019-03-21 [1] standard (@0.2.1)
 cli           1.1.0   2019-03-19 [1] standard (@1.1.0)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.5.0)   
 packrat       0.4.9-3 2018-06-01 [1] CRAN (R 3.5.0)   
 rstudioapi    0.9.0   2019-01-09 [1] CRAN (R 3.5.2)   
 sessioninfo   1.1.1   2018-11-05 [1] standard (@1.1.1)
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.5.0) 

Edit: changed language to use outer/inner instead of primary/child throughout.

@wlandau
Copy link
Author

@wlandau wlandau commented Apr 9, 2019

Thanks @gadenbuie.

@wlandau please correct me if I'm wrong, but by default drake tracks dependencies of the computation in the outer script by inspecting the environment in which the drake plan is created (or where drake::make() is invoked?). Because the job (outer) is executed in a fresh environment, drake looks in this environment for the function definitions, but they exist in the global environment where they are placed by default by source().

By default, drake tracks dependencies in the environment that calls make(), and it does not look in any ancestor environments (i.e. no inheritance). You supply an environment with functions etc. to the envir argument of make().

@stale
Copy link

@stale stale bot commented Feb 6, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, per https://github.com/rstudio/rstudio/wiki/Issue-Grooming. Thank you for your contributions.

@stale stale bot added the stale label Feb 6, 2021
@stale
Copy link

@stale stale bot commented Feb 20, 2021

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Feb 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug jobs stale
Projects
None yet
Development

No branches or pull requests

4 participants