Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reuse cargo build directory between runs of R #291

Open
aavogt opened this issue Jun 24, 2023 · 6 comments
Open

reuse cargo build directory between runs of R #291

aavogt opened this issue Jun 24, 2023 · 6 comments

Comments

@aavogt
Copy link

aavogt commented Jun 24, 2023

My code loaded with rust_source does not compile because I have not studied from_robj yet. This makes R exit:

Caused by error in `invoke_cargo()`:
! Rust code could not be compiled successfully. Aborting.
✖ error[E0599]: no function or associated item named `from_robj` found for reference `&ArrayBase<OwnedRepr<i32>, Dim<[usize; 2]>>` in the current scope
 --> src/lib.rs:2:1
  |
2 | #[extendr]
  | ^^^^^^^^^^ function or associated item not found in `&ArrayBase<OwnedRepr<i32>, Dim<[usize; 2]>>`
  |
  = note: this error originates in the attribute macro `extendr` (in Nightly builds, run with -Z macro-backtrace for more info)

✖ error: aborting due to previous error

This process takes 20s because all rust dependencies are recompiled. If the cargo project stays in the same location, subsequent calls to rust_source take 2s. Here is an ugly way to get the cargo project to always be in $PWD/connect_build:

  # duplicates most of rextendr:::get_build_dir
  rlang::env_bind(rextendr:::the, build_dir = {
        b <- file.path(getwd(),"connect_build")
        if (!dir.exists(b)) {
                dir.create(b)
                dir.create(file.path(b, "src"))
                dir.create(file.path(b, "R"))
                dir.create(file.path(b, ".cargo"))
        }
        b
  })
  rust_source("connect.rs", features="ndarray")

Could you change rust_source to make the above code shorter. Perhaps it could look like:

rust_source("connect.rs", cache_build="connect_build")
rust_source("connect.rs") # connect_build comes from connect.rs
@Ilia-Kosenkov
Copy link
Member

By default we cache the build if cache_build = TRUE
https://github.com/extendr/rextendr/blob/1a43843b191536ab057949b1c40ca3e9e9949b90/R/source.R#L38C1-L39C31

We do not support different caches for different calls to rust_source() though, since it would be really hard to track it.

Check what happens in your case if instead you read the contents of connect.rs into a string and then pass it as code to rust_source():

rextendr::rust_source(code = connect_rs_contents, ...)

@aavogt
Copy link
Author

aavogt commented Jun 25, 2023

I need to use code= for #234. cache_build works within one R session only:

# a.R
rs <- function(file="lib.rs") {                          
  rextendr::rust_source(code=paste0(readLines(file), collapse="\n"), features="ndarray", env=parent.frame())
}
rs() # always slow
rs() # always fast
R --vanilla -q -s < a.R
ℹ build directory: /tmp/RtmpXeLmcd/file37b52d703be358
    Updating crates.io index
   Compiling autocfg v1.1.0
   Compiling proc-macro2 v1.0.63
   Compiling libR-sys v0.4.0
   Compiling unicode-ident v1.0.9
   Compiling quote v1.0.28
   Compiling num-traits v0.2.15
   Compiling matrixmultiply v0.3.7
   Compiling num-integer v0.1.45
   Compiling syn v1.0.109
   Compiling paste v1.0.12
   Compiling extendr-engine v0.4.0
   Compiling rawpointer v0.2.1
   Compiling num-complex v0.4.3
   Compiling extendr-api v0.4.0
   Compiling ndarray v0.15.6
   Compiling lazy_static v1.4.0
   Compiling extendr-macros v0.4.0
   Compiling rextendr1 v0.0.1 (/tmp/RtmpXeLmcd/file37b52d703be358)
    Finished dev [unoptimized + debuginfo] target(s) in 17.98s
✔ Writing /tmp/RtmpXeLmcd/file37b52d703be358/target/extendr_wrappers.R
ℹ build directory: /tmp/RtmpXeLmcd/file37b52d703be358
   Compiling rextendr2 v0.0.1 (/tmp/RtmpXeLmcd/file37b52d703be358)
    Finished dev [unoptimized + debuginfo] target(s) in 0.82s
✔ Writing /tmp/RtmpXeLmcd/file37b52d703be358/target/extendr_wrappers.R

# run it again and work is duplicated because there is a new build directory
R --vanilla -q -s < a.R
ℹ build directory: /tmp/RtmpANhNDD/file37c0c52dbd9ae8
    Updating crates.io index
   Compiling autocfg v1.1.0
   Compiling libR-sys v0.4.0
   Compiling proc-macro2 v1.0.63
   Compiling quote v1.0.28
   Compiling unicode-ident v1.0.9
   Compiling num-traits v0.2.15
   Compiling matrixmultiply v0.3.7
   Compiling num-integer v0.1.45
   Compiling syn v1.0.109
   Compiling rawpointer v0.2.1
   Compiling extendr-engine v0.4.0
   Compiling paste v1.0.12
   Compiling num-complex v0.4.3
   Compiling extendr-api v0.4.0
   Compiling ndarray v0.15.6
   Compiling lazy_static v1.4.0
   Compiling extendr-macros v0.4.0
   Compiling rextendr1 v0.0.1 (/tmp/RtmpANhNDD/file37c0c52dbd9ae8)
    Finished dev [unoptimized + debuginfo] target(s) in 19.48s
✔ Writing /tmp/RtmpANhNDD/file37c0c52dbd9ae8/target/extendr_wrappers.R
ℹ build directory: /tmp/RtmpANhNDD/file37c0c52dbd9ae8
   Compiling rextendr2 v0.0.1 (/tmp/RtmpANhNDD/file37c0c52dbd9ae8)
    Finished dev [unoptimized + debuginfo] target(s) in 0.85s
✔ Writing /tmp/RtmpANhNDD/file37c0c52dbd9ae8/target/extendr_wrappers.R
# b.R
library(glue)
library(stringr)
rs <- function(file="lib.rs") {                          
    rlang::env_bind(rextendr:::the, build_dir = {
        b <- file.path(getwd(),glue("{str_remove(file, '.rs')}_build")) 
        if (!dir.exists(b)) {
                dir.create(b)
                dir.create(file.path(b, "src"))
                dir.create(file.path(b, "R"))
                dir.create(file.path(b, ".cargo"))
        }
        b
  })
  rextendr::rust_source(code=paste0(readLines(file), collapse="\n"), features="ndarray", env=parent.frame())
}
rs() # usually fast
rs() # always fast
# b.R looks like a.R at first:
rm -rf lib_build
R --vanilla -q -s < b.R
ℹ build directory: /home/aavogt/wip/gregg-ocr/fitting/lib_build
    Updating crates.io index
   Compiling autocfg v1.1.0
   Compiling proc-macro2 v1.0.63
   Compiling libR-sys v0.4.0
   Compiling unicode-ident v1.0.9
   Compiling quote v1.0.28
   Compiling num-traits v0.2.15
   Compiling matrixmultiply v0.3.7
   Compiling num-integer v0.1.45
   Compiling syn v1.0.109
   Compiling extendr-engine v0.4.0
   Compiling paste v1.0.12
   Compiling rawpointer v0.2.1
   Compiling num-complex v0.4.3
   Compiling extendr-api v0.4.0
   Compiling ndarray v0.15.6
   Compiling lazy_static v1.4.0
   Compiling extendr-macros v0.4.0
   Compiling rextendr1 v0.0.1 (/home/aavogt/wip/gregg-ocr/fitting/lib_build)
    Finished dev [unoptimized + debuginfo] target(s) in 18.17s
✔ Writing /home/aavogt/wip/gregg-ocr/fitting/lib_build/target/extendr_wrappers.R
ℹ build directory: /home/aavogt/wip/gregg-ocr/fitting/lib_build
   Compiling rextendr2 v0.0.1 (/home/aavogt/wip/gregg-ocr/fitting/lib_build)
    Finished dev [unoptimized + debuginfo] target(s) in 0.81s
✔ Writing /home/aavogt/wip/gregg-ocr/fitting/lib_build/target/extendr_wrappers.R

# but subsequent runs are fast
R --vanilla -q -s < b.R
ℹ build directory: /home/aavogt/wip/gregg-ocr/fitting/lib_build
   Compiling rextendr1 v0.0.1 (/home/aavogt/wip/gregg-ocr/fitting/lib_build)
    Finished dev [unoptimized + debuginfo] target(s) in 0.64s
✔ Writing /home/aavogt/wip/gregg-ocr/fitting/lib_build/target/extendr_wrappers.R
ℹ build directory: /home/aavogt/wip/gregg-ocr/fitting/lib_build
   Compiling rextendr2 v0.0.1 (/home/aavogt/wip/gregg-ocr/fitting/lib_build)
    Finished dev [unoptimized + debuginfo] target(s) in 0.68s
✔ Writing /home/aavogt/wip/gregg-ocr/fitting/lib_build/target/extendr_wrappers.R

So far I use one rs file and one cache. I do not want different caches for different calls.

@Ilia-Kosenkov
Copy link
Member

So most of rextendr::rust_*() functions are designed for interactive experimentation, we did not aim at supporting cross-session caching of cargo artifacts. At this point, I am not sure we should.

What is your scenario that you actually rely on such usage of {rextendr} interactive compilation?

@aavogt
Copy link
Author

aavogt commented Jun 29, 2023

In closest_pairs.R I source rust_source.R to replace rust_source(). In one terminal I have while true; do timeout 5 R --vanilla -s < closest_pairs.R || echo "timed out"; inotifywait -e modify lib.rs closest_pairs.R; done. After saving changes to either file, I wait a few seconds and then I either see the error in the terminal, or okular reloads the plots.

Usually I use Nvim-R to edit a Rmd file. But after making a change to lib.rs, I save it, switch to the .Rmd file and then I have to send the right chunks in the right order to R.

The rs/Nvim-R/Rmd workflow is even worse if I have to restart R. An infinite loop on the rust side can't be broken by a C-c sent to R. In that case I have to type C-a x in the window with R inside tmux. Then I switch back to nvim and send commands ,rq ,rf gg gn ,cd which starts a new R and sends the first chunk. Then I wait for evaluation to finish. I repeat ,cd and waiting until I get to the chunk that produces the plot. Compared with the inotifywait loop, there are many more keystrokes, some of which I have to pause in between. Therefore with rs/Nvim-R/Rmd, I have less time, attention and short term memory left over for changes to my code.

Once closest_pairs.R is complete, I move the code to a chunk in main.Rmd. The tibble from the closest_pairs chunk then makes many plots each in its own chunk. If I discover something that needs a change to lib.rs I move code back to a .R file and run it with the inotifywait loop above.

Ideally, Nvim-R's should support {rextendr} Rmd chunks and it should have a command to recursively source a chunk's dependencies. That would reduce but not eliminate my need for the inotifywait method.

@Ilia-Kosenkov
Copy link
Member

Oh, that is a very complex setup, and specific to your workflow, I believe. I am not entirely convinced we should explore something like this. {rextendr} serves two purposes:

  • Scaffolding extendr-powered package & facilitating its build process
  • Compiling small chunks of rust code on the fly, with caching enabled while the R session survives.

Our experience is not tailored for constantly executing Rust chunks in fresh R sessions, and I expect to hit all sorts of weird issues if we try to implement this.
I'll share this issue on our Discord to get more feedback from other maintainers.

@sorhawell
Copy link

sorhawell commented Jun 30, 2023

I only have experience with extendr via package builds e.g helloextendr using rextendr::document() or R CMD install ...

It is possible to use symlink files and dirs to achieve caching on temporary file structures.

if you look inside temp folder after a build you might find e.g. a rust/target or myobjectfile.a where all the compiled objects are. Before a tempoary build you can symlink a previous file or folder.

r-polars uses symlink to speed up compiling in girhub actions and in development to circumvent R CMD check creates tempoary folders

I sometimes work on extendr project in multiple forks which are cloned independently. Then I use symlink to save disk space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants