Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endless attempts to install pak when enabled #1577

Closed
MiguelRodo opened this issue Jul 21, 2023 · 10 comments · Fixed by #1620
Closed

Endless attempts to install pak when enabled #1577

MiguelRodo opened this issue Jul 21, 2023 · 10 comments · Fixed by #1620

Comments

@MiguelRodo
Copy link
Contributor

When I try to open R in a project that has used renv before on a new machine with RENV_CONFIG_PAK_ENABLED=true, pak is typically (not always) endlessly downloaded:

vscode@codespaces-bfa125:/workspaces/DataTidy23AppMpuFAUSTBCG$ radian
Installing package into ‘/workspaces/.local/.cache/R/renv/R-4.2/x86_64-pc-linux-gnu’
(as ‘lib’ is unspecified)
trying URL 'https://r-lib.github.io/p/pak/stable/source/linux-gnu/x86_64/src/contrib/../../../../../linux/x86_64/pak_0.5.1_R-4-2_x86_64-linux.tar.gz'
Content type 'application/gzip' length 8115818 bytes (7.7 MB)
==================================================
downloaded 7.7 MB

Installing package into ‘/workspaces/.local/.cache/R/renv/R-4.2/x86_64-pc-linux-gnu’
(as ‘lib’ is unspecified)
trying URL 'https://r-lib.github.io/p/pak/stable/source/linux-gnu/x86_64/src/contrib/../../../../../linux/x86_64/pak_0.5.1_R-4-2_x86_64-linux.tar.gz'
Content type 'application/gzip' length 8115818 bytes (7.7 MB)
==================================================
downloaded 7.7 MB

^C
Execution halted

The downloaded source packages are in
        ‘/tmp/RtmpQH5eud/downloaded_packages’
Error in loadNamespace(name) : there is no package called ‘pak’
Calls: source ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
In addition: Warning message:
In utils::install.packages("pak", repos = renv_pak_repos(stream)) :
  installation of package ‘pak’ had non-zero exit status
Execution halted

The downloaded source packages are in
        ‘/tmp/Rtmp4d09f2/downloaded_packages’
Error in loadNamespace(name) : there is no package called ‘pak’
In addition: Warning message:
In utils::install.packages("pak", repos = renv_pak_repos(stream)) :
  installation of package ‘pak’ had non-zero exit status
R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Platform: x86_64-pc-linux-gnu (64-bit)

The above will go on endlessly unless I force stop it.

This persists even though renv v1.0.0 and the nightly version of pak (well, I think - version 0.5.1.9000) were installed (the output below was generated from opening R in another non-renv directory (before attempting to open the directory that uses renv, I should add)):

R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pak_0.5.1.9000

loaded via a namespace (and not attached):
[1] compiler_4.2.3 tools_4.2.3   

[ins] r$> library(renv)
Error in library(renv) : there is no package called ‘renv’

[ins] r$> install.packages("renv")
Installing package into ‘/workspaces/.local/lib/R’
(as ‘lib’ is unspecified)
trying URL 'https://packagemanager.posit.co/cran/__linux__/jammy/latest/src/contrib/renv_1.0.0.tar.gz'
Content type 'binary/octet-stream' length 2169477 bytes (2.1 MB)
==================================================
downloaded 2.1 MB

* installing *binary* package ‘renv’ ...
* DONE (renv)

The downloaded source packages are in
        ‘/tmp/Rtmp6IZl19/downloaded_packages’

I have tried to produce a minimal example (setting up R4.2.3 and R4.3.1 on two different Docker images each), but I couldn't reproduce the error, however. So, could you give me some pointers as to where to look?

@kevinushey
Copy link
Collaborator

You could try adding some code like the following in your .Rprofile, before the renv activate script is run:

setHook("renv::autoload", function(...) {
  trace(renv:::renv_pak_init, quote(print(rlang::trace_back())))
})

That should hopefully give you some hints as to where renv_pak_init() is being called. (This is the function that's responsible for installing pak in renv.)

@kevinushey
Copy link
Collaborator

pak is typically (not always) endlessly downloaded:

The fact that this only happens sometimes is especially surprising. I'm not sure what to make of that.

@MiguelRodo
Copy link
Contributor Author

Yes, that is exactly the sort of tip I was looking for! Thanks very much. When I've run it, I'll let you know what happened

@kevinushey
Copy link
Collaborator

@MiguelRodo, do you have any updates?

@MiguelRodo
Copy link
Contributor Author

Hi @kevinushey, sorry, I've been snowed in at work this week. I'll have a look in the next couple days

@MiguelRodo
Copy link
Contributor Author

MiguelRodo commented Aug 1, 2023

Hi @kevinushey. So, I've been able to have a look again at this, here's what I've found.

I tried running the code you suggested

setHook("renv::autoload", function(...) {
  trace(renv:::renv_pak_init, quote(print(rlang::trace_back())))
})

but in a new container it complained that rlang didn't exist. So, I googled a bit and it seemed like sys.calls() might be able to replace rlang::trace_back (and lose some formatting). I'm not too sure about it, but in any case I put this above source("renv/activate.R"):

setHook("renv::autoload", function(...) {
  trace(renv:::renv_pak_init, quote(print(sys.calls())))
})

A lot of output was printed out, and I'm not too familiar with the sink function but this seemed to save to file what was going to the terminal:

if (file.exists("r_output.txt")) {
  stop()
}
file.create("r_output.txt")
file_connection <- file("r_output.txt")

sink(file = file_connection)

if (file.exists("r_msg.txt")) {
  file.remove("r_msg.txt")
}
file.create("r_msg.txt")
file_connection_msg <- file("r_msg.txt")
sink(file = file_connection_msg)
sink(file = file_connection_msg, type = "message")

I put that above the code snippet you suggested in .Rprofile (i.e. the above was the first thing to run). Here are the two files created. I'm sure you can understand exactly why but the one is blank (r_output.txt) and the other has the actual output (r_output.txt):

r_msg.txt
r_output.txt

I think the call stack is printed right at the bottom of r_msg.txt. I'm not sure what to make of it.

One thing I did notice was that R is being repeatedly restarted, in between attempts I think to install pak. If I'm understanding correctly, this is suggested from two things:

  • The temporary directory keeps changing

You can see the download directory is different between each attempt:

```R
The downloaded source packages are in/tmp/Rtmptm8Ymm/downloaded_packagesError in loadNamespace(name) : there is no package calledpakCalls: source ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
In addition: Warning message:
In utils::install.packages("pak", repos = renv_pak_repos(stream)) :
  installation of packagepakhad non-zero exit status
Execution halted

The downloaded source packages are in/tmp/RtmpQa8aMx/downloaded_packagesError in loadNamespace(name) : there is no package calledpakCalls: source ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
In addition: Warning message:
In utils::install.packages("pak", repos = renv_pak_repos(stream)) :
  installation of packagepakhad non-zero exit status
Execution halted
The downloaded source packages are in/tmp/RtmpYQR1VB/downloaded_packagesError in loadNamespace(name) : there is no package calledpakCalls: source ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
In addition: Warning message:
In utils::install.packages("pak", repos = renv_pak_repos(stream)) :
  installation of packagepakhad non-zero exit status
Execution halted
  • In the code snippet I wrote, initially I made it remove any old versions of r_output.txt and r_msg.txt. But I noticed these being repeatedly deleted, so I had to stop that.

I'm not sure, but it sounds like what might be happening is R starts, tries to install pak, does not install pak or installs it into a place it doesn't remember (or fails to install pak because it's trying to use pak to install pak), the R process stops and a new one starts that that finds pak not installed and then downloads it, fails to install it correctly and so repeats the cycle.

Note that the snippet I added stopped R restarting (as it used stop() whenever the files output was diverted to existed).

@MiguelRodo
Copy link
Contributor Author

MiguelRodo commented Aug 1, 2023

TLDR: The issue should only appear when a lockfile exists that specifies the BioConductor version, and pak is enabled but not already installed. I suspect that the recursing() inside renv::install is not working correctly, perhaps due to installation happening in a separate process to the renv::install call, leading to repeated install attempts.

Upon further digging:

  • In the call stack is renv_load_bioconductor, which is only called by renv:::load when there is a lockfile:
    if (length(lockfile)) {
        renv_load_r(project, lockfile$R)
        renv_load_python(project, lockfile$Python)
        renv_load_bioconductor(project, lockfile$Bioconductor)
    }

That would make sense of why I couldn't simply reproduce the issue by starting a new repo - there was no lockfile.

  • In renv_load_bioconductor, the function returns before running renv_bioconductor_init if the version of R is old (< 3.4) or if there is no BioConductor version:
if (getRversion() < "3.4") 
    return()
version <- bioconductor$Version
if (is.null(version)) 
    return()
renv_bioconductor_init()

This would also explain the flakiness, assuming the BioConductor version isn't set automatically simply based on the R version (which the presence of the if(is.null(version)) statement makes me think is true), as then this error only crops up when the lock file specifies that there are BioConductor packages.

  • As long you're using BiocManager (which I am), then renv_bioconductor_init will always do the call renv_bioconductor_init_biocmanager(library). So this step always happens.
  • In renv_bioconductor_init_biocmanager, if you're in a "clean" computer (i.e. no cache), then BiocManager will not be installed already and so install("BiocManager") will run. This has been my situation.
  • From there, renv::install calls renv_pak_init as long as the following is TRUE:
config$pak.enabled() && !recursing()

Obviously, pak needs to be enabled. so, recursing is checking if we're recursively calling install. I suppose this might be where the error is happening, as it's recursively calling install.

  • In any case, once you run renv_pak_init, it first checks through the stable, rc and devel pak streams to find the first one with version that exceeds the minimum required version. I don't know where the$ak_minver comes from, but oh well. One thing to note is that renv says that the nightly version of pak is required, but it starts by going through the stable version. That should work as long as it's correct about what pak_minver is and it's checking correctly. I note this in case the incorrect pak version is causing the bug.
  • From there, the function renv_pak_init_impl just installs utils::install.packages("pak", repos = renv_pak_repos(stream) and the namespace should then be loaded immediately afterwards using renv_namespace_load("pak"). So, I suppose it's possible that's what's going wrong is that when installing pak here, the .Rprofile is reloaded after downloading but before attempting the install (perhaps calling R as a system command, which would explain why the temp directory keeps changing), and so we go down this rabbit hole repeatedly because recursing() is not picking up the recursion. This might also be why the recursion is not being picked up - because a new R process is started each time to install the package. So there is no recursion from the function's "process-specific" view.
    So, if the recursing() code is correct and the issue is as above, then I guess you could do something like create a file in the R process calling renv::install that the any R process created to install the downloaded tar.gz can check if it exists so that it knows it's recursing (well, the process is recursing, not so much the function). I'm sure that's not the best plan (what if the install process doesn't "clean up"?), but the idea is to get something outside the R process that the process can refer to, to check if it's in a recursion. Maybe a parameter could work, but then I'm not sure how to selectively set that parameter.
  • To note, this is the actual error message:
Error in loadNamespace(name) : there is no package calledpakCalls: source ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
In addition: Warning message:
In utils::install.packages("pak", repos = renv_pak_repos(stream)) :
  installation of packagepakhad non-zero exit status

That suggests perhaps that renv_namespace_load is the problem. Perhaps the library installed to is incorrect, and then it isn't picked up afterwards as the R process to install maybe ignores .libPaths() from the original R process. I do use the R_LIBS env var (which I set to /workspace or /workspaces in GitHub Codespaces or GitPod, as that directory persists between container rebuilds). So this would only crop up if utils::install.packages were using R --vanilla or something like that.

No need for a long reply, particularly if I've missed the plot! I suspect it's most likely the recursion detection not working correctly, fwiw.

@kevinushey
Copy link
Collaborator

@MiguelRodo, thank you so much for the in-depth investigation -- it's hugely helpful! Here's my understanding + hypothesis:

  1. When the R session is started, it loads the project .Rprofile, which implies loading renv. As part of this, renv will also try to install the BiocManager package, for projects which are using Bioconductor.

  2. This installation is done by R, through an invocation of the form R CMD INSTALL <...>.

  3. Because that R process is not started with --vanilla, it also tries to process user + site profiles.

  4. This implies that the R process responsible for installing the project will also attempt to load renv, as it sources the project .Rprofile.

  5. This puts us into a loop -- we effectively end up back at step 1. and once again try to install pak recursively.

I have a PR up at #1620 that I think will help. We ensure that the R process launched for installation uses --vanilla, so we avoid re-running user + site profiles that shouldn't be relevant.

@MiguelRodo
Copy link
Contributor Author

Thanks, Kevin! I tested it my side, and it solved the issue - thanks for fixing it so quickly!

@kevinushey
Copy link
Collaborator

Glad to hear it -- thanks for taking the time to verify!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants