Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup-r-dependencies action always reinstalls site-packages if no cache is available #814

Closed
pascalgulikers opened this issue Mar 31, 2024 · 31 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@pascalgulikers
Copy link

STOP

If you are debugging a failed build or have a question about GitHub Actions in
general do NOT open an issue here. Either post on the Actions sections of
the GitHub Community or the RStudio Community forums.

Open an issue here only if you have a bug in one of the
custom R specific actions themselves.

Describe the bug
setup-r-dependencies uses pak::lockfile_create() with no lib-parameter. Therefor it defaults to .Library which means only installed base packages are being detected and all (pre-)installed site-packages (.Library.site) are being ignored and will be reinstalled.
See:

pak::lockfile_create(

To Reproduce
We're using nightly built base images for Github actions with most used R packages already included so we can speed up the install dependencies step in our workflows. As caching is being stored on the runner and not on the base image itself, it's not being used unless you rerun the same workflow/branch. Initial runs don't have cache but the installed site-packages should be checked, this does not happen because of the mentioned finding (.Library only contains R base packages).

Continuous-Integration:
    runs-on: ubuntu-latest
    container:
      image: ***.dkr.ecr.eu-central-1.amazonaws.com/base-images/builder-r-latest:latest

Expected behavior
Installed site-packages should be checked, this does not happen because of the mentioned finding (.Library only contains R base packages). Therefor all site-packages will be downloaded from remote repositories like CRAN or RSPM

Additional context
Possible workaround is to install the extra packages in the base images in the .Library folder (in our case it's /usr/local/lib/R/library) instead of the .Library.site folder (/usr/local/lib/R/site-library).
A nicer solution is to specify .Library.site for the lib-parameter of the pak::lockfile_create() call here, like so:

pak::lockfile_create(
  c(deps, extra_deps),
  lockfile = ".github/pkg.lock",
  lib = .Library.site,
  upgrade = (${{ inputs.upgrade }}),
  dependencies = c(needs, (${{ inputs.dependencies }}))
)

relevant (tried) env-variables:
R_LIBS_USER="/usr/local/lib/R/site-library"
R_LIBS_SITE="/usr/local/lib/R/site-library"
R_LIB_FOR_PAK="/usr/local/lib/R/site-library"

@pascalgulikers pascalgulikers added the bug an unexpected problem or unintended behavior label Mar 31, 2024
@gaborcsardi
Copy link
Member

Therefor it defaults to .Library

It defaults to whatever you set as your primary library. E.g. if you have a user library, that is used:
https://github.com/r-lib/actions/actions/runs/8486269828/job/23252419924#step:5:2599

Installed site-packages should be checked,

That is the expected behavior, pak does not consider other libraries, only the library it is installing to and the base and recommended packages in .Library.

Whether this is always the desired behavior is a good question, possibly not.

@pascalgulikers
Copy link
Author

pascalgulikers commented Mar 31, 2024

Since no lib parameter is being specified here, it will always be NULL which means:

Package library to install the packages to. Note that all dependent packages will be installed here, even if they are already installed in another library. The only exceptions are base and recommended packages installed in .Library. These are not duplicated in lib, unless a newer version of a recommemded package is needed.

As specified here: https://pak.r-lib.org/reference/lockfile_create.html

Unless you're using caching (actions/cache@v4) and a cache file is found

@pascalgulikers
Copy link
Author

pascalgulikers commented Apr 2, 2024

I've double checked this, and I can't get it to work. I.e. installed packages are always being ignored, even if they're installed in .Library next to the base packages. This is because the lib parameter is NULL.

example with the pre installed yaml package:

ls -al /usr/local/lib/R/library
drwxr-xr-x  8 root root  4096 Apr  2 02:32 yaml

R
# Dependency resolution
  cat("::group::Dependency resolution\n")
  cat("os-version=", sessionInfo()$running, "\n", file = Sys.getenv("GITHUB_OUTPUT"), sep = "", append = TRUE)
  r_version <-
    if (grepl("development", R.version.string)) {
      pdf(tempfile())
      ge_ver <- attr(recordPlot(), "engineVersion")
      dev.off()
      paste0("R version ", getRversion(), " (ge:", ge_ver, "; iid:", .Internal(internalsID()), ")")
    } else {
      R.version.string
    }
  cat("r-version=", r_version, "\n", file = Sys.getenv("GITHUB_OUTPUT"), sep = "", append = TRUE)
  needs <- sprintf("Config/Needs/%s", strsplit("check", "[[:space:],]+")[[1]])
  deps <- strsplit("deps::., any::sessioninfo", "[[:space:],]+")[[1]]
  extra_deps <- strsplit("", "[[:space:],]+")[[1]]
  dir.create(".github", showWarnings=FALSE)
  Sys.setenv("PKGCACHE_HTTP_VERSION" = "2")
  library(pak, lib.loc = Sys.getenv("R_LIB_FOR_PAK"))
  pak::lockfile_create(
    c(deps, extra_deps),
    lockfile = ".github/pkg.lock",
    upgrade = (FALSE),
    dependencies = c(needs, ("all"))
  )
  cat("::endgroup::\n")
  cat("::group::Show Lockfile\n")
  writeLines(readLines(".github/pkg.lock"))
  cat("::endgroup::\n")

The result for the yaml package is:

{
      "ref": "yaml",
      "package": "yaml",
      "version": "2.3.8",
      "type": "standard",
      "direct": false,
      "binary": false,
      "dependencies": [],
      "vignettes": false,
      "needscompilation": true,
      "metadata": {
        "RemoteType": "standard",
        "RemotePkgRef": "yaml",
        "RemoteRef": "yaml",
        "RemoteRepos": "https://packagemanager.rstudio.com/all/latest",
        "RemotePkgPlatform": "source",
        "RemoteSha": "2.3.8"
      },
      "sources": ["https://packagemanager.rstudio.com/all/latest/src/contrib/yaml_2.3.8.tar.gz"],
      "target": "src/contrib/yaml_2.3.8.tar.gz",
      "platform": "source",
      "rversion": "*",
      "directpkg": false,
      "license": "BSD_3_clause + file LICENSE",
      "sha256": "9ed079e2159cae214f3fefcbc4c8eb3b888ceabe902350adbdb1d181eda23fd8",
      "filesize": 94764,
      "dep_types": ["Depends", "Imports", "LinkingTo"],
      "params": [],
      "install_args": "",
      "repotype": "cranlike",
      "sysreqs": "",
      "sysreqs_packages": {}
    }

Which means it's going to be reinstalled

When I change the pak::lockfile_create() call to:

pak::lockfile_create(
    c(deps, extra_deps),
    lockfile = ".github/pkg.lock",
    lib = .Library.site,
    upgrade = (FALSE),
    dependencies = c(needs, ("all"))
  )

the result for the yaml package is:

{
      "ref": "installed::/usr/local/lib/R/library/yaml",
      "package": "yaml",
      "version": "2.3.8",
      "type": "installed",
      "direct": false,
      "binary": true,
      "dependencies": [],
      "vignettes": false,
      "needscompilation": true,
      "metadata": {
        "RemoteType": "installed",
        "RemotePkgRef": "installed::/usr/local/lib/R/library/yaml",
        "RemotePkgPlatform": "x86_64-pc-linux-gnu",
        "RemoteSha": "2.3.8"
      },
      "sources": [],
      "target": "src/contrib/yaml_2.3.8.tar.gz",
      "platform": "x86_64-pc-linux-gnu",
      "rversion": "R 4.3.3",
      "built": "R 4.3.3; x86_64-pc-linux-gnu; 2024-04-01 23:54:30 UTC; unix",
      "directpkg": false,
      "license": "BSD_3_clause + file LICENSE",
      "dep_types": ["Depends", "Imports", "LinkingTo"],
      "params": [],
      "install_args": "",
      "sysreqs_packages": {}
    }

@gaborcsardi
Copy link
Member

As I said above:

That is the expected behavior, pak does not consider other libraries, only the library it is installing to and the base and recommended packages in .Library.

I.e. it only considers the base and recommended packages in .Library.

@pascalgulikers
Copy link
Author

I see, so there's no way to use prebuilt base images to speed up deployments?
I've made a PR though: #815

@gaborcsardi
Copy link
Member

I see, so there's no way to use prebuilt base images to speed up deployments?

You need to install them into the same library that pak uses to install packages to.

@pascalgulikers
Copy link
Author

pascalgulikers commented Apr 2, 2024

I've tried that, but they're still being ignored and are being reinstalled.

Because the pak::lockfile_create() call isn't using a lib parameter and so it's NULL, the lockfile being created only contains upstream package sources and isn't detecting the ones already installed (except for the base and recommended packages in .Library).
Also see: https://github.com/r-lib/pak/blob/main/R/lockfile.R#L39-L43

Later on pak::lockfile_install() uses this lockfile (pak::lockfile_install(".github/pkg.lock")), so all packages are being reinstalled from RSPM/CRAN

@pascalgulikers
Copy link
Author

I'm losing my hair here, when I pull the base image locally (Docker Desktop) and run the scripts inside the container, the preinstalled packages seem to be detected. When I do the same in a Github workflow, they aren't.

  print(Sys.getenv("R_LIB_FOR_PAK"))
  print(Sys.getenv("R_LIBS_SITE"))
  print(Sys.getenv("R_LIBS_USER"))
  print(.libPaths())
  lib <- .libPaths()[1]
  config <- list(library = lib)
  print(config)

produces:

[1] "/usr/local/lib/R/site-library"
[1] "/usr/local/lib/R/site-library"
[1] "/usr/local/lib/R/site-library"
[1] "/usr/local/lib/R/site-library" "/usr/local/lib/R/library"     
$library
[1] "/usr/local/lib/R/site-library"

Nevertheless in GitHub actions:

Install/update packages
  ℹ Installing lockfile '.github/pkg.lock'
   
  → Will install **204** packages.
  → Will download **204** packages with unknown size.

Which takes forever.

Locally (Docker Desktop) in a pulled image (same base image, same DESCRIPTION file):

::group::Install/update packages
ℹ Installing lockfile .github/pkg.lock
 
→ Will install **9** packages.
→ Will download **9** packages with unknown size.

The /usr/local/lib/R/site-library contains all the preinstalled packages
The /usr/local/lib/R/site-library/_cache folder contains .lock files for all preinstalled packages

It looks like GitHub Actions is not acting the same way when you run actions in a pulled image -> container on top of the runner.
https://docs.github.com/en/enterprise-cloud@latest/actions/using-jobs/running-jobs-in-a-container

Or is it something with pkgcache/cache user dir that is different for a GitHub actor? How can I check?

@pascalgulikers
Copy link
Author

pascalgulikers commented Apr 2, 2024

In pulled image in Docker:

whoami
root
echo $HOME
/root

In GitHub actions container:

whoami
root
echo $HOME
/github/home

So could the GH action be looking in ~/.cache which is different in both cases?

@pascalgulikers
Copy link
Author

What I did now is:

mkdir -p /github/home/.cache && cp -R /root/.cache/* /github/home/.cache

Resulting in:

✔ Cached copy of wk 0.9.1 (source) is the latest build
  ✔ Cached copy of xfun 0.43 (source) is the latest build
  ✔ Cached copy of xml2 1.3.6 (source) is the latest build
  ✔ Cached copy of xtable 1.8-4 (source) is the latest build
  ✔ Cached copy of yaml 2.3.8 (source) is the latest build
  ✔ Cached copy of zip 2.3.1 (source) is the latest build
  ✔ Cached copy of sessioninfo 1.2.2 (source) is the latest build

Ofcourse not a really nice solution, I would still prefer to have pkg::lockfile_create(lib = Sys.getenv("R_LIB_FOR_PAK") as suggested in my PR. That way both R_LIB_FOR_PAK as pkgcache will be checked for the dependency plan

@gaborcsardi
Copy link
Member

Wait, you want to put packages into the pkgcache cache? I thought you wanted to pre-install them into the site library.

In any case, if you have something that works, that's great.

This is the issue that tracks being able to use multiple libraries: r-lib/pkgdepends#189

@pascalgulikers
Copy link
Author

No, I don't want to put them in pkgcache perse, but if they're not in there, they are being reinstalled and the site library packages are being ignored. You said they shouldn't be ignored but they do.

Maybe not on a standard Github runner but on a Github runner which uses a container on top, they are being ignored, no matter what I do or try. That's basicly what this bug is about

@pascalgulikers
Copy link
Author

I'm sorry, I can't provide you with an example as our code base and base images are private.
But the setup-r-dependencies action is not working for us without the lib parameter in the lockfile_create() call.
The lockfile being created contains only upstream urls except for the base and recommended packages.

If I use lockfile_create(lib = Sys.getenv("R_LIB_FOR_PAK"), the lockfile contains installed for already installed packages.

For example:
lockfile_create() without the lib-parameter:

{
      "ref": "yaml",
      "package": "yaml",
      "version": "2.3.8",
      "type": "standard",
      "direct": false,
      "binary": false,
      "dependencies": [],
      "vignettes": false,
      "needscompilation": true,
      "metadata": {
        "RemoteType": "standard",
        "RemotePkgRef": "yaml",
        "RemoteRef": "yaml",
        "RemoteRepos": "https://packagemanager.rstudio.com/all/latest",
        "RemotePkgPlatform": "source",
        "RemoteSha": "2.3.8"
      },
      "sources": ["https://packagemanager.rstudio.com/all/latest/src/contrib/yaml_2.3.8.tar.gz"],
      "target": "src/contrib/yaml_2.3.8.tar.gz",
      "platform": "source",
      "rversion": "*",
      "directpkg": false,
      "license": "BSD_3_clause + file LICENSE",
      "sha256": "9ed079e2159cae214f3fefcbc4c8eb3b888ceabe902350adbdb1d181eda23fd8",
      "filesize": 94764,
      "dep_types": ["Depends", "Imports", "LinkingTo"],
      "params": [],
      "install_args": "",
      "repotype": "cranlike",
      "sysreqs": "",
      "sysreqs_packages": {}
    }

And with the lib-parameter set at Sys.getenv("R_LIB_FOR_PAK"), the result is:

{
      "ref": "installed::/usr/local/lib/R/library/yaml",
      "package": "yaml",
      "version": "2.3.8",
      "type": "installed",
      "direct": false,
      "binary": true,
      "dependencies": [],
      "vignettes": false,
      "needscompilation": true,
      "metadata": {
        "RemoteType": "installed",
        "RemotePkgRef": "installed::/usr/local/lib/R/library/yaml",
        "RemotePkgPlatform": "x86_64-pc-linux-gnu",
        "RemoteSha": "2.3.8"
      },
      "sources": [],
      "target": "src/contrib/yaml_2.3.8.tar.gz",
      "platform": "x86_64-pc-linux-gnu",
      "rversion": "R 4.3.3",
      "built": "R 4.3.3; x86_64-pc-linux-gnu; 2024-04-01 23:54:30 UTC; unix",
      "directpkg": false,
      "license": "BSD_3_clause + file LICENSE",
      "dep_types": ["Depends", "Imports", "LinkingTo"],
      "params": [],
      "install_args": "",
      "sysreqs_packages": {}
    }

Although this shouldn't be a problem, since lockfile_install() should detect already installed packages in the target destination (in my case R_LIB_FOR_PAK = /usr/local/lib/site-library), it doesn't and is reïnstalling all packages anyway.

So basicly there are two bugs in my pov:

  1. lockfile_create() should detect already installed packages in the destination dir (by specifying the lib-parameter for this function call
  2. lockfile_install() should detect already installed packages in the destination dir, but follows the lockfile's "sources" spec, downloads them and installs them. This looks like something with missing pkgcache because the Github action runner is running in a different homedir as the builder which created the base images.

@gaborcsardi
Copy link
Member

installed packages in the target destination (in my case R_LIB_FOR_PAK = /usr/local/lib/site-library)

That's not where packages are installed by default. That's the library where pak is installed, to make is separate from the user's default library.

pak installs packages to .libPaths()[1], which is the user's library usually, but it might be something different on your container.

Also, lockfile_create() does not decide where packages are installed, lockfile_install() does.

@gaborcsardi
Copy link
Member

lockfile_create() should detect already installed packages in the destination dir (by specifying the lib-parameter for this function call

A lockfile is supposed to contain all dependencies of the project.

lockfile_install() should detect already installed packages in the destination dir, but follows the lockfile's "sources" spec, downloads them and installs them. This looks like something with missing pkgcache because the Github action runner is running in a different homedir as the builder which created the base images.

lockfile_install() needs to follow the lockfile's spec, that's the whole point of a lockfile. lockfile_install() will not reinstall packages that are already installed if the installed versions match the specification.

Here is an example:

FROM rhub/r-minimal

RUN installr -c

RUN echo -e 'Package: test\nVersion: 1.0.0\nImports: dplyr' \
    > DESCRIPTION

RUN R -e 'source("https://pak.r-lib.org/install.R")'

RUN R -e 'pak::lockfile_create()'

RUN R -e 'pak::lockfile_install()'

# ----------------------------------------------------------------------------
# SAVE Dockerfile here
# ----------------------------------------------------------------------------

# Add data.table as well
RUN echo -e 'Package: test\nVersion: 1.0.0\nImports: dplyr, data.table' \
    > DESCRIPTION

RUN R -e 'pak::lockfile_create()'

RUN R -e 'pak::lockfile_install()'

First we install dplyr, and then use the resulting image to add an extra data.table package to it. In real life the second part would happen in GitHub Actions of course. If you build this, you'll see that the second installation only adds the missing data.table package:

...
#12 0.307 > pak::lockfile_install()
#12 1.049 ℹ Installing lockfile 'pkg.lock'
#12 1.225
#12 1.226 → Will install 2 packages.
#12 1.247 → Will download 1 CRAN package (5.39 MB).
#12 1.285 → Will download 1 package with unknown size.
#12 1.290 + data.table   1.15.4 [bld][cmp][dl] (5.39 MB)
#12 1.292
#12 1.306 ℹ Getting 1 pkg (5.39 MB)
#12 6.761 ✔ Got data.table 1.15.4 (source) (5.39 MB)
#12 6.813 ℹ Building data.table 1.15.4
#12 17.58 ✔ Built data.table 1.15.4 (10.3s)
#12 17.70 ✔ Installed data.table 1.15.4  (60ms)
#12 17.73 ✔ 17 deps: kept 16, added 1, dld 1 (5.39 MB) [17.4s]
#12 17.74 ✔ Installed lockfile 'pkg.lock'
...

@pascalgulikers
Copy link
Author

Could be due to the fact that we're using the good old install.packages() for installing the site packages in the base image? This is because we're only installing packages, not a package based on a DESCRIPTION file..

@pascalgulikers
Copy link
Author

pascalgulikers commented Apr 3, 2024

Actually this is not true, as we're using:

RUN R -e 'update.packages(lib.loc = Sys.getenv("R_LIBS_SITE"), ask=FALSE)' && \
    R -e 'install.packages("pak", repos = sprintf("https://r-lib.github.io/p/pak/stable/%s/%s/%s", .Platform$pkgType, R.Version()$os, R.Version()$arch))' && \
    R -e 'pak::pak_cleanup(force = TRUE)' && \
    R -e 'pak::pkg_install(unlist(strsplit(Sys.getenv("LANGUAGE_LIBS"), ",")), lib = Sys.getenv("R_LIBS_SITE"), upgrade = TRUE)'

in the base image

@gaborcsardi
Copy link
Member

Even if you are using install.packages(), e.g as in

FROM rhub/r-minimal

RUN installr -c

RUN R -q -e 'install.packages("dplyr", repos = "https://cloud.r-project.org")'

# ----------------------------------------------------------------------------
# SAVE Dockerfile here
# ----------------------------------------------------------------------------

RUN R -e 'source("https://pak.r-lib.org/install.R")'

# Add data.table as well
RUN echo -e 'Package: test\nVersion: 1.0.0\nImports: dplyr, data.table' \
    > DESCRIPTION

RUN R -e 'pak::lockfile_create()'

RUN R -e 'pak::lockfile_install()'

you'll get

...
#10 0.244 > pak::lockfile_install()
#10 1.014 ℹ Installing lockfile 'pkg.lock'
#10 1.192
#10 1.193 → Will install 2 packages.
#10 1.214 → Will download 1 CRAN package (5.39 MB).
#10 1.251 → Will download 1 package with unknown size.
#10 1.256 + data.table   1.15.4 [bld][cmp][dl] (5.39 MB)
#10 1.257
#10 1.271 ℹ Getting 1 pkg (5.39 MB)
#10 2.354 ✔ Got data.table 1.15.4 (source) (5.39 MB)
#10 2.406 ℹ Building data.table 1.15.4
#10 13.11 ✔ Built data.table 1.15.4 (10.2s)
#10 13.25 ✔ Installed data.table 1.15.4  (63ms)
#10 13.27 ✔ 17 deps: kept 16, added 1, dld 1 (5.39 MB) [13s]
#10 13.29 ✔ Installed lockfile 'pkg.lock'
...

@pascalgulikers
Copy link
Author

This could be related to using Artifactory as packagemanager.
If I use RSPM this is registered in de lockfile for the yaml package:

    {
      "ref": "yaml",
      "package": "yaml",
      "version": "2.3.8",
      "type": "standard",
      "direct": false,
      "binary": false,
      "dependencies": [],
      "vignettes": false,
      "needscompilation": true,
      "metadata": {
        "RemoteType": "standard",
        "RemotePkgRef": "yaml",
        "RemoteRef": "yaml",
        "RemoteRepos": "https://packagemanager.rstudio.com/all/latest",
        "RemotePkgPlatform": "source",
        "RemoteSha": "2.3.8"
      },
      "sources": ["https://packagemanager.rstudio.com/all/latest/src/contrib/yaml_2.3.8.tar.gz"],
      "target": "src/contrib/yaml_2.3.8.tar.gz",
      "platform": "source",
      "rversion": "*",
      "directpkg": false,
      "license": "BSD_3_clause + file LICENSE",
      "sha256": "9ed079e2159cae214f3fefcbc4c8eb3b888ceabe902350adbdb1d181eda23fd8",
      "filesize": 94764,
      "dep_types": ["Depends", "Imports", "LinkingTo"],
      "params": [],
      "install_args": "",
      "repotype": "cranlike",
      "sysreqs": "",
      "sysreqs_packages": {}
    },

If I use our Artifactory setup for package sources, it's:

{
        "ref": "yaml",
        "package": "yaml",
        "version": "2.3.8",
        "type": "standard",
        "direct": false,
        "binary": false,
        "dependencies": [],
        "vignettes": false,
        "needscompilation": true,
        "metadata": {
          "RemoteType": "standard",
          "RemotePkgRef": "yaml",
          "RemoteRef": "yaml",
          "RemoteRepos": "***.jfrog.io/artifactory/cran-remote/",
          "RemotePkgPlatform": "source",
          "RemoteSha": "2.3.8"
        },
        "sources": ["***.jfrog.io/artifactory/cran-remote//src/contrib/yaml_2.3.8.tar.gz"],
        "target": "src/contrib/yaml_2.3.8.tar.gz",
        "platform": "source",
        "rversion": "*",
        "directpkg": false,
        "license": "BSD_3_clause + file LICENSE",
        "dep_types": ["Depends", "Imports", "LinkingTo"],
        "params": [],
        "install_args": "",
        "repotype": "cranlike",
        "sysreqs_packages": {}
      },

As you can see, the fields missing while creating the lockfile when using Artifactory as source, are:
sha256
filesize
sysreqs

Resulting in:

ℹ Installing lockfile .github/pkg.lock
 
→ Will install 163 packages.
→ Will download 163 packages with unknown size.
*truncated*
+ yaml                2.3.8    [bld][cmp][dl]
*truncated*
✔ Cached copy of yaml 2.3.8 (source) is the latest build

So it's using pkgcache as a fallback.
But GitHub actions is using another homedir for the root user (/github/home instead of /root), so no .cache folder is present and it will download and install all packages

I'm kind of stuck, implementing my (PR)[https://github.com//pull/815] would really be helpful.
Because then the packages will be marked as installed during lockfile_create() and lockfile_install() doesn't have to do a SHA comparison (which in our case will always fail) -> resulting in checking pkgcache -> fails because .cache folder isn't available in the active homedir -> reinstall of all site-packages

@gaborcsardi
Copy link
Member

If you are using custom repositories, then the important thing is that you use the same exact repositories for the pre-installation and the installation.

E.g. if you pre-install yaml from artifactory, but then use CRAN or PPM only for the installation, then pak will see that there is a yaml package that was installed from artifactory, but according to the lockfile it should be installed from CRAN/PPM, so it will (re)install it from there.

@gaborcsardi
Copy link
Member

I'm kind of stuck, implementing my PR would really be helpful.

Unfortunately that PR breaks other things, so I cannot merge it as is.

@pascalgulikers
Copy link
Author

That's too bad, but maybe as an optional parameter?
Looks like we'll have to look for another install-dependencies-r action then :(
We're using Artifactory for both the base images and GHA so that shouldn't be the problem..

Thanks for your time and effort though!

@gaborcsardi
Copy link
Member

We're using Artifactory for both the base images and GHA so that shouldn't be the problem..

So can you run getOption("repos") on your image, and also pak::repo_status()$url on GHA?

@gaborcsardi
Copy link
Member

I'm sorry, I can't provide you with an example as our code base and base images are private.

These kind of issues are much easier to solve if you have a public repository that reproduces your problem. Otherwise it is guesswork. It does not have to be your real image if you want to keep that private.

@pascalgulikers
Copy link
Author

I'm sorry, I can't provide you with an example as our code base and base images are private.

These kind of issues are much easier to solve if you have a public repository that reproduces your problem. Otherwise it is guesswork. It does not have to be your real image if you want to keep that private.

Too bad I can't, it would include a private Artifactory packagemanager (SaaS) as well

@pascalgulikers
Copy link
Author

I think we should clone the setup-r-dependencies action and change this 1 line of code ourselves then. With the risk of getting out-of-date. There no way we can influence this in the current setup because too many things are happening in this one action.
Atm we cannot pass the lib-parameter to lockfile_create() and we also can't influence the default of NULL

@pascalgulikers
Copy link
Author

As mentioned in r-lib/pak#608 (comment)
Exposing the lockfile_create() lib argument in setup-r-dependencies would be awesome! Anything I can do to implement this?

@gaborcsardi
Copy link
Member

Can you try this? You need to use the @v2-branch branch and set lockfile-create-lib:

- uses: r-lib/actions/setup-r-dependencies@v2
  with:
	lockfile-create-lib: '.libPaths()[1]'

If it works I'll update the @v2 tag.

@pascalgulikers
Copy link
Author

This seems to be working. In the lockfile there's for example:
"ref": "installed::/usr/local/lib/R/site-library/anytime",

So it's detecting the site-library packages now, resulting in:
✔ Cached copy of anytime 0.3.9 (x86_64-pc-linux-gnu) is the latest build

However all are being marked as to be downloaded:

  → Will install 201 packages.
  → Will download 201 packages with unknown size.

But in the end:
✔ 1 pkg + 199 deps: added 200, dld 23 (23.42 MB) [2m 49.4s]

It used to be:
✔ 1 pkg + 204 deps: added 205, dld 194 (160.20 MB) [23m 14.1s]

So we've saved 20+ minutes in installing dependencies by just this one parameter, we're very happy with that!
The total run of our workflow has been reduced from 57 minutes to 13, since containerizing after the package has been build is also much faster with this setup.

@gaborcsardi
Copy link
Member

OK, updated @v2.

Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue and include a link to this issue

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants