Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

R improvements #93

Closed
lorenzwalthert opened this issue Sep 17, 2021 · 5 comments 路 Fixed by #94
Closed

R improvements #93

lorenzwalthert opened this issue Sep 17, 2021 · 5 comments 路 Fixed by #94

Comments

@lorenzwalthert
Copy link
Contributor

lorenzwalthert commented Sep 17, 2021

To my knowledge, I just completed the first ever pre-commit.ci run with R 馃帀 (https://results.pre-commit.ci/run/github/403327126/1631911523.jRvhF2_RSR-hbEgpnbu7DA). There are a few things to improve though:

  • Although I suggested to --without-recommended-packages, I think we should not do that because some recommended packages are hard to build from source (when initializing the environment) and have system dependencies, and it won't add much to the size of the image to include it.
  • I am still wondering if we can make the global renv cache work under /opt/R. Unit tests should cover this, so I might submit a PR.
  • After 120s, it seems like the installation of the environment is timed out. While this seems to make sense for most repos that host only one hook, mine hosts a lot of hooks and hence installation takes a bit longer (but is cached thanks to your work). There are two major R repos from which to get packages, and both only support binaries for either Linux or Windows/macOS. I assume most users will not use Linux, so I prefer to have binary R package install for them locally, which means Linux package installs will be from source. Hence, I propose to increase the timeout limit. Not sure what's needed here for my hooks, should we try to double it? The reason I managed to complete a run was because I temporarily switched to binaries for Linux.
  • I anticipate some problems with missing system dependencies as some hooks have additional_dependencies:. In particular, my {roxygen2} hook requires the user to specify all runtime dependencies of his package as additional_dependencies: because {roxygen2} requires them (to generate the docs). There are tools in R to install these system dependencies, but they rely on apt and hence don't put the installed packages into /opt/R. We can also postpone this for now I think.
@asottile
Copy link
Member

  • Although I suggested to --without-recommended-packages, I think we should not do that because some recommended packages are hard to build from source (when initializing the environment) and have system dependencies, and it won't add much to the size of the image to include it.

If there's specific ones to add that seems fine, building all of them doesn't seem productive though (for instance I don't think a linter will need generalized fortran extensions, hopefully).

  • I am still wondering if we can make the global renv cache work under /opt/R. Unit tests should cover this, so I might submit a PR.

The only persistent state which is kept between runs is inside the environment directory. Additionally runs are run as an unprivileged user which does not have write permission to the /opt directories (again, intentionally). As such, I don't think that enabling /opt/R as a cache helps or is possible.

  • After 120s, it seems like the installation of the environment is timed out. While this seems to make sense for most repos that host only one hook, mine hosts a lot of hooks and hence installation takes a bit longer (but is cached thanks to your work). There are two major R repos from which to get packages, and both only support binaries for either Linux or Windows/macOS. I assume most users will not use Linux, so I prefer to have binary R package install for them locally, which means Linux package installs will be from source. Hence, I propose to increase the timeout limit. Not sure what's needed here for my hooks, should we try to double it? The reason I managed to complete a run was because I temporarily switched to binaries for Linux.

I don't really want to bump this up, it's already quite long and any increase here significantly increases my risk from a cost and abuse perspective.

  • I anticipate some problems with missing system dependencies as some hooks have additional_dependencies:. In particular, my {roxygen2} hook requires the user to specify all runtime dependencies of his package as additional_dependencies: because {roxygen2} requires them (to generate the docs). There are tools in R to install these system dependencies, but they rely on apt and hence don't put the installed packages into /opt/R. We can also postpone this for now I think.

yeahhhh that's not really going to work so well. those'll probably just be unsupported (?). It seems a little bit weird to build docs as part of pre-commit -- I'd usually leave that to a testing phase

@lorenzwalthert
Copy link
Contributor Author

Ok, thanks for the explanations. I found ways to work around these limitations except for the system dependencies.

@lorenzwalthert
Copy link
Contributor Author

Except one problem:

failed to create directory at path '/tmp/renv'

Traceback (most recent calls last):
19: suppressWarnings({
        old <- setwd("/pc/clone/dynByPtYQnevuTTk-fDJ5w/renv-default")
        source("renv/activate.R")
        setwd(old)
        renv::load("/pc/clone/dynByPtYQnevuTTk-fDJ5w/renv-default")
    })
18: withCallingHandlers(expr, warning = function(w) if (inherits(w, 
        classes)) tryInvokeRestart("muffleWarning"))
17: source("renv/activate.R")
16: withVisible(eval(ei, envir))
15: eval(ei, envir)
14: eval(ei, envir)
13: local(...)
12: eval.parent(substitute(eval(quote(expr), envir)))
11: eval(expr, p)
10: eval(expr, p)
 9: eval(quote(...), new.env())
 8: eval(quote(...), new.env())
 7: renv_bootstrap_load(project, libpath, version)
 6: renv::load(project)
 5: renv_load_project(project)
 4: ensure_parent_directory(projects)
 3: ensure_directory(unique(dirname(path)))
 2: stopf("failed to create directory at path '%s'", rownames(info))
 1: stop(sprintf(fmt, ...), call. = call.)

E.g. in https://results.pre-commit.ci/run/github/186313720/1632590538.J-B6BzAhSkSCbs91JsIBUQ. This pops up randomly I feel, i.e. not in every run. You set RENV_PATHS_ROOT in the Dockerfile so the root of the cache will be there. And then renv probably wants to use it (I guess not a problem because during running hooks it does not want to install packages) or at least ensure the directory exists.

I think we should rather disable the cache altogether with RENV_CONFIG_CACHE_ENABLED=False as per the docs (either R option renv.config.cache.enabled or in the above form as an env variable). I can file a PR to that effect if you want.

@asottile
Copy link
Member

I wonder if that's renv not being multiprocess safe (?) does it happen with require_serial: true

disabling the cache also seems fine 馃憤

@lorenzwalthert
Copy link
Contributor Author

lorenzwalthert commented Sep 26, 2021

Seems like you are right, I need to further investigate thread/process safety with {renv}, as I can't create a run with require_serial: True that shows the problematic behavior. Not sure but maybe #94 can resolve the problem anyways, at the very least I don't think it makes anything worse, so I suggest to merge it and see if the problem persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants