Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

renv::install() with explicit dependencies should prioritize remote ref #736

Closed
gadenbuie opened this issue Apr 23, 2021 · 12 comments
Closed

Comments

@gadenbuie
Copy link
Member

gadenbuie commented Apr 23, 2021

This is a complicated issue to describe, so I've created a companion repository.

In essence, we are using explicit snapshot types with a DESCRIPTION file. Some of the packages listed in the DESCRIPTION file require package versions hosted on GitHub. Because references such as @main or @master may move, we specifically reference remote package versions by commit SHA, such as:

Imports:
    htmltools,
    learnr,
    parsons,
    sortable
Remotes: 
    rstudio/htmltools@11cfbf3b331387922c0144203cf6e947c8bdaee9,
    rstudio/learnr@428d9af1506447f68890724bfc619ed1f2e7cfc0,
    rstudio/parsons@42d0bfb8f23a3c8048cc4c1fbca95a0ef1e0bca3,
    rstudio/sortable@cdde09d915ee0bc73353b079c7afbed8cb70ab4a

However, we're finding that renv::install() doesn't always prioritize the version listed in Remotes above all other versions. In the case of this reprex, sortable is installed at a different commit.

# GitHub =============================
- htmltools     [* -> rstudio/htmltools@11cfbf3b331387922c0144203cf6e947c8bdaee9]
- learnr        [* -> rstudio/learnr@428d9af1506447f68890724bfc619ed1f2e7cfc0]
- parsons       [* -> rstudio/parsons@42d0bfb8f23a3c8048cc4c1fbca95a0ef1e0bca3]
- sortable      [* -> rstudio/sortable]

The companion includes a number of things that I've tried to force installation of the package at the specific reference, but my attempts invariably arrive at a place where a different version of one of the remote packages is installed.

@gadenbuie gadenbuie changed the title renv::install() with explicit dependencies should prioritize remote ref renv::install() with explicit dependencies should prioritize remote ref Apr 23, 2021
@kevinushey
Copy link
Collaborator

I wonder if this is because parsons also declares a Remotes: field here:

https://github.com/rstudio/parsons/blob/master/DESCRIPTION#L29-L31

and that one is taking precedence over the one in the project's DESCRIPTION file.

@kevinushey
Copy link
Collaborator

The renv code that I believe is responsible for handling these remotes lives here:

renv/R/retrieve.R

Lines 760 to 804 in c145e9f

renv_retrieve_handle_remotes <- function(record) {
# TODO: what should we do if we detect incompatible remotes?
# e.g. if pkg A requests 'r-lib/rlang@0.3' but pkg B requests
# 'r-lib/rlang@0.2'.
# check and see if this package declares Remotes -- if so,
# use those to fill in any missing records
desc <- renv_description_read(record$Path)
if (is.null(desc$Remotes))
return(NULL)
fields <- strsplit(desc$Remotes, "\\s*,\\s*")[[1]]
for (field in fields) {
# TODO: allow customization of behavior when remote parsing fails?
remote <- catch(renv_remotes_resolve(field))
if (inherits(remote, "error")) {
fmt <- "failed to resolve remote '%s' declared by package '%s'; skipping"
warningf(fmt, field, record$Package)
next
}
# if installation of this package was not specifically requested by
# the user (ie: it's been requested as it's a dependency of this package)
# then update the record. note that we don't want to update in explicit
# installs as we don't want to override what was reported / requested
# in e.g. `renv::restore()`.
#
# allow override if a non-specific version of the package was requested
package <- remote$Package
state <- renv_restore_state()
if (package %in% state$packages) {
record <- state$records[[package]]
if (!identical(record, list(Package = package, Source = "Repository")))
next
}
# update the record
state$records[[package]] <- remote
}
}

Perhaps we should only update the remote if it's not already set.

@gadenbuie
Copy link
Member Author

Thanks @kevinushey this is working nicely except for one small problem. Since upgrading to this version of renv, we are now seeing extra Remotes fields being added to the renv.lock file, as in the following example entry for gradethis:

    "gradethis": {
      "Package": "gradethis",
      "Version": "0.2.3.9001",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteUsername": "rstudio",
      "RemoteRepo": "gradethis",
      "RemoteRef": "ae39493cbe2a9154e699024c379e0134d72a961f",
      "RemoteSha": "ae39493cbe2a9154e699024c379e0134d72a961f",
      "Remotes": "rstudio/learnr",
      "Remotes.1": "rstudio/learnr",
      "Hash": "3ffe9681a450c8062d102fb866f28021"
    },

The additional fields show up as Remotes.1 or Remotes.2. I haven't been able to isolate the behavior into a minimal reprex, but we cycle through renv::restore(), renv::install(), renv::snapshot() in a CI process and are noticing that new Remotes.N fields are being added even if the renv.lock hasn't changed (but maybe the package cache has?).

@kevinushey
Copy link
Collaborator

Okay, let me know when you have a reproducible example.

@dvg-p4
Copy link

dvg-p4 commented Jan 9, 2023

I'm also seeing this strange "Remotes.1" behavior--it's unfortunately with a private repo, so I'll have to see if I can get a repro on some public ones.

@kevinushey
Copy link
Collaborator

Thanks @dvg-p4 -- please let me know if you can share a reproducible example.

@dvg-p4
Copy link

dvg-p4 commented Jan 10, 2023

Reproduction of overwriting installs in R4 or R3

Setup

I created some dummy github package repos: innerPkg, which exports a single function, and outerPkg, which imports and uses innerPkg. Specifically, the DESCRIPTION file of outerPkg has:

Imports: 
    innerPkg (>= 0.1.0)
Remotes: 
    dvg-p4/innerPkg

outerPkg has a single release, 0.1.0, tagged. innerPkg has an 0.1.1 release tagged, but also has a further development commit on master, with DESCRIPTION updated to 0.1.1.9000 -- AFAICT this is reasonably standard practice for R package development.

Reproduction

Create a new R project using renv. Mine has the standard R studio boilerplate, plus an "analysis script" file:
image

my_script.R:

library(outerPkg)
outer_function()

Now, as expected, renv::status() yields:

The following package(s) are used in the project, but are not installed:
	outerPkg

So we go ahead and install it:

> renv::install("dvg-p4/outerPkg")
Retrieving 'https://api.github.com/repos/dvg-p4/outerPkg/tarball/4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9' ...
	OK [downloaded 2 Kb in 0.4 secs]
Retrieving 'https://api.github.com/repos/dvg-p4/innerPkg/tarball/9397494e8c0e54c694585c9cfc1e6fceb945e07f' ...
	OK [downloaded 2 Kb in 0.3 secs]
Installing innerPkg [0.1.1.9000] ...
	OK [built from source]
Moving innerPkg [0.1.1.9000] into the cache ...
	OK [moved to cache in 0.64 milliseconds]
Installing outerPkg [0.1.0] ...
	OK [built from source]
Moving outerPkg [0.1.0] into the cache ...
	OK [moved to cache in 0.6 milliseconds]

However, we notice this seems to have installed a weird dev version of innerPkg, so we decide to be more explicit about release tags:

> renv::install("dvg-p4/innerPkg@0.1.1")
Retrieving 'https://api.github.com/repos/dvg-p4/innerPkg/tarball/04ac8b33e4bfbb59a793ed54bb031a8d7c63805c' ...
	OK [downloaded 2 Kb in 0.3 secs]
Installing innerPkg [0.1.1] ...
	OK [built from source]
Moving innerPkg [0.1.1] into the cache ...
	OK [moved to cache in 0.66 milliseconds]
> renv::status()
The following package(s) are installed but not recorded in the lockfile:
  outerPkg   [dvg-p4/outerPkg]
  innerPkg   [dvg-p4/innerPkg@0.1.1]
Use `renv::snapshot()` to add these packages to your lockfile.

Before we do so, let's explicitly tag the version of outerPkg as well:

> renv::install("dvg-p4/outerPkg@0.1.0")
Retrieving 'https://api.github.com/repos/dvg-p4/outerPkg/tarball/4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9' ...
	OK [downloaded 2 Kb in 0.3 secs]
Installing innerPkg [0.1.1.9000] ...
	OK [linked cache]
Installing outerPkg [0.1.0] ...
	OK [built from source]
Moving outerPkg [0.1.0] into the cache ...
	OK [moved to cache in 0.62 milliseconds]
> renv::status()
The following package(s) are installed but not recorded in the lockfile:
  outerPkg   [dvg-p4/outerPkg@0.1.0]
  innerPkg   [dvg-p4/innerPkg]

This seems to be the core of the issue.
Expected behavior: renv::install() does not overwrite current compatible installations of packages
Observed behavior: renv::install() sometimes reinstalls the most recent commit on master of innerPkg when installing outerPkg
Workaround: install the specific tag of innerPkg after installing outerPkg:

> renv::install("dvg-p4/innerPkg@0.1.1")
Installing innerPkg [0.1.1] ...
	OK [linked cache]
> renv::status()
The following package(s) are installed but not recorded in the lockfile:
  outerPkg   [dvg-p4/outerPkg@0.1.0]
  innerPkg   [dvg-p4/innerPkg@0.1.1]
Use `renv::snapshot()` to add these packages to your lockfile.
> renv::snapshot()
The following package(s) will be updated in the lockfile:
# GitHub =============================
- innerPkg   [* -> dvg-p4/innerPkg@0.1.1]
- outerPkg   [* -> dvg-p4/outerPkg@0.1.0]
Do you want to proceed? [y/N]: y
* Lockfile written to '~/scratchpad/renv_dev_reprex/renv.lock'.

This creates a valid-looking lockfile:

...
"Packages": {
    "innerPkg": {
      "Package": "innerPkg",
      "Version": "0.1.1",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteUsername": "dvg-p4",
      "RemoteRepo": "innerPkg",
      "RemoteRef": "0.1.1",
      "RemoteSha": "04ac8b33e4bfbb59a793ed54bb031a8d7c63805c",
      "Hash": "6540c97eb55d6fc575127d2c51b547f4",
      "Requirements": []
    },
    "outerPkg": {
      "Package": "outerPkg",
      "Version": "0.1.0",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteUsername": "dvg-p4",
      "RemoteRepo": "outerPkg",
      "RemoteRef": "0.1.0",
      "RemoteSha": "4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9",
      "Remotes": "dvg-p4/innerPkg",
      "Hash": "9715a0717d6d3f577fc576749cb66144",
      "Requirements": [
        "innerPkg"
      ]
    },
...

And our status and actually-installed-packages look good as well.

> renv::status()
* The project is already synchronized with the lockfile.
> source("my_script.R")
[1] "This is outer_function()"
[1] "This is inner_function()"
[1] "You gave me"
[1] "hello from outer_function!"
[1] "This is the beta version 0.1.x of innerPkg!"
[1] "Outer function out."

If you're using R4, this workaround is sufficient--any further restore()s should work as expected.
In R3, however, there will be difficulty when you try to restore from a clean install, see below.

@dvg-p4
Copy link

dvg-p4 commented Jan 10, 2023

Reproduction of "Remotes.1" behavior (R3)

Create a minimal project with renv and a call to outerPkg::outer_function() as above.
Then:

> renv::install("dvg-p4/outerPkg@*release")
Retrieving 'https://api.github.com/repos/dvg-p4/outerPkg/tarball/4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9' ...
	OK [downloaded 2 Kb in 0.3 secs]
Retrieving 'https://api.github.com/repos/dvg-p4/innerPkg/tarball/9397494e8c0e54c694585c9cfc1e6fceb945e07f' ...
	OK [downloaded 2 Kb in 0.3 secs]
Installing innerPkg [0.1.1.9000] ...
	OK [built from source]
Moving innerPkg [0.1.1.9000] into the cache ...
	OK [moved to cache in 0.99 milliseconds]
Installing outerPkg [0.1.0] ...
	OK [built from source]
Moving outerPkg [0.1.0] into the cache ...
	OK [moved to cache in 0.78 milliseconds]
> renv::status()
The following package(s) are installed but not recorded in the lockfile:
           _
  outerPkg   [dvg-p4/outerPkg@0.1.0]
  innerPkg   [dvg-p4/innerPkg]

Use `renv::snapshot()` to add these packages to your lockfile.

> renv::install("dvg-p4/innerPkg@*release")
Installing innerPkg [0.1.1] ...
	OK [linked cache]
> renv::status()
The following package(s) are installed but not recorded in the lockfile:
           _
  outerPkg   [dvg-p4/outerPkg@0.1.0]
  innerPkg   [dvg-p4/innerPkg@0.1.1]

Use `renv::snapshot()` to add these packages to your lockfile.

> renv::snapshot()
The following package(s) will be updated in the lockfile:

# GitHub =============================
- innerPkg   [* -> dvg-p4/innerPkg@0.1.1]
- outerPkg   [* -> dvg-p4/outerPkg@0.1.0]

Do you want to proceed? [y/N]: y
* Lockfile written to '~/scratchpad/renv_r3/renv.lock'.
> renv::status()
* The project is already synchronized with the lockfile.

As expected so far.
Next, to simulate a fresh installation, delete the renv directory, and purge the cache:

> renv::purge("outerPkg")
The following packages will be purged from the cache:

	outerPkg 0.1.0 [Hash: 9715a0717d6d3f577fc576749cb66144]

Do you want to proceed? [y/N]: y
* Removed 1 package from the cache.
> renv::purge("innerPkg")
The following packages will be purged from the cache:

	innerPkg 0.1.1.9000 [Hash: 35c7f820c451d3895e8773441752fe61]
	innerPkg 0.1.1      [Hash: 6540c97eb55d6fc575127d2c51b547f4]

Do you want to proceed? [y/N]: y
* Removed 2 packages from the cache.

Restart R, then attempt to restore the environment:

> renv::activate()
* Project '~/scratchpad/renv_r3' loaded. [renv 0.16.0]
* The project library is out of sync with the lockfile.
* Use `renv::restore()` to install packages recorded in the lockfile.
> renv::restore()
The following package(s) will be updated:

# GitHub =============================
- innerPkg   [* -> dvg-p4/innerPkg@0.1.1]
- outerPkg   [* -> dvg-p4/outerPkg@0.1.0]

Do you want to proceed? [y/N]: y
Retrieving 'https://api.github.com/repos/dvg-p4/innerPkg/tarball/04ac8b33e4bfbb59a793ed54bb031a8d7c63805c' ...
	OK [downloaded 2 Kb in 0.4 secs]
Retrieving 'https://api.github.com/repos/dvg-p4/outerPkg/tarball/4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9' ...
	OK [downloaded 2 Kb in 0.3 secs]
Installing innerPkg [0.1.1] ...
	OK [built from source]
Moving innerPkg [0.1.1] into the cache ...
	OK [moved to cache in 0.89 milliseconds]
Installing outerPkg [0.1.0] ...
	OK [built from source]
Moving outerPkg [0.1.0] into the cache ...
	OK [moved to cache in 0.72 milliseconds]

Looks good so far but:

> renv::status()
The following package(s) are out of sync:

   Package   Lockfile Version   Library Version
  outerPkg              0.1.0             0.1.0

Use `renv::snapshot()` to save the state of your library to the lockfile.
Use `renv::restore()` to restore your library from the lockfile.

Expected behavior: A package that is the correct version from the same git reference will not display as out of sync with itself
Observed behavior: That message, and (when I attempt to resolve with another restore():

> renv::restore()
The following package(s) will be updated:

# GitHub =============================
- outerPkg   [dvg-p4/outerPkg@0.1.0: unchanged]

Do you want to proceed? [y/N]: y
Installing innerPkg [0.1.1.9000] ...
	OK [linked cache]
Installing outerPkg [0.1.0] ...
	OK [linked cache]
The dependency tree was repaired during package installation:
           _
  innerPkg   [dvg-p4/innerPkg]

Call `renv::snapshot()` to capture these dependencies in the lockfile.

> renv::status()
The following package(s) are out of sync:

   Package   Lockfile Version   Library Version
  innerPkg              0.1.1        0.1.1.9000
  outerPkg              0.1.0             0.1.0

Use `renv::snapshot()` to save the state of your library to the lockfile.
Use `renv::restore()` to restore your library from the lockfile.

The development version of innerPkg has been installed again! That's not what we want, what happens with another restore()?

> renv::restore()
The following package(s) will be updated:
# GitHub =============================
- innerPkg   [ver: 0.1.1.9000 -> 0.1.1; ref: master -> 0.1.1; sha: 9397494e -> 04ac8b33]
- outerPkg   [dvg-p4/outerPkg@0.1.0: unchanged]
Do you want to proceed? [y/N]: y
Installing innerPkg [0.1.1] ...
	OK [linked cache]
Installing outerPkg [0.1.0] ...
	OK [linked cache]
> renv::status()
The following package(s) are out of sync:
   Package   Lockfile Version   Library Version
  outerPkg              0.1.0             0.1.0
Use `renv::snapshot()` to save the state of your library to the lockfile.
Use `renv::restore()` to restore your library from the lockfile.

We're back to the state we were in before.

What if we try to snapshot?

> renv::snapshot()
The following package(s) will be updated in the lockfile:

# GitHub =============================
- outerPkg   [dvg-p4/outerPkg@0.1.0: unchanged]

Do you want to proceed? [y/N]: y
* Lockfile written to '~/scratchpad/renv_r3/renv.lock'.
> renv::status()
* The project is already synchronized with the lockfile.

Looks good, but if we inspect the lockfile:

"innerPkg": {
      "Package": "innerPkg",
      "Version": "0.1.1",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteUsername": "dvg-p4",
      "RemoteRepo": "innerPkg",
      "RemoteRef": "0.1.1",
      "RemoteSha": "04ac8b33e4bfbb59a793ed54bb031a8d7c63805c",
      "Hash": "6540c97eb55d6fc575127d2c51b547f4",
      "Requirements": []
    },
    "outerPkg": {
      "Package": "outerPkg",
      "Version": "0.1.0",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteUsername": "dvg-p4",
      "RemoteRepo": "outerPkg",
      "RemoteRef": "0.1.0",
      "RemoteSha": "4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9",
      "Remotes": "dvg-p4/innerPkg",
      "Remotes.1": "dvg-p4/innerPkg",
      "Hash": "961db7b03ea39735e90cbe3d0d337ede",
      "Requirements": [
        "innerPkg"
      ]
    },

we see a nonsensical "Remotes.1"!

@dvg-p4
Copy link

dvg-p4 commented Jan 10, 2023

Further, if we do another fresh installation from the lockfile with "Remotes.1", we get into the same state as before, with outerPkg supposedly not matching its identical version:

> renv::status()
The following package(s) are out of sync:

   Package   Lockfile Version   Library Version
  outerPkg              0.1.0             0.1.0

Use `renv::snapshot()` to save the state of your library to the lockfile.
Use `renv::restore()` to restore your library from the lockfile.

> renv::snapshot()
The following package(s) will be updated in the lockfile:

# GitHub =============================
- outerPkg   [dvg-p4/outerPkg@0.1.0: unchanged]

Do you want to proceed? [y/N]: y
* Lockfile written to '~/scratchpad/renv_r3/renv.lock'.
> renv::status()
* The project is already synchronized with the lockfile.

The snapshot from this state leads to the addition of a "Remotes.2" line in renv.lock:

...
"outerPkg": {
      "Package": "outerPkg",
      "Version": "0.1.0",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteUsername": "dvg-p4",
      "RemoteRepo": "outerPkg",
      "RemoteRef": "0.1.0",
      "RemoteSha": "4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9",
      "Remotes": "dvg-p4/innerPkg",
      "Remotes.1": "dvg-p4/innerPkg",
      "Remotes.2": "dvg-p4/innerPkg",
      "Hash": "59d61acd29d222d3fff16a28ffa80fb6",
      "Requirements": [
        "innerPkg"
      ]
    },
...

@kevinushey
Copy link
Collaborator

This is immensely helpful; thank you!

I think the core of the first issue you describe is that renv doesn't understand how to "mix" these two constraints:

Imports: 
    innerPkg (>= 0.1.0)
Remotes: 
    dvg-p4/innerPkg

In particular, when it sees the Remotes field in that form, it assumes that this means we should install innerPkg using whatever the "default" branch is (main or master). I believe you could tag the remote with the version directly as well, e.g.

Remotes: 
    dvg-p4/innerPkg@0.1.0

The documentation at https://cran.r-project.org/web/packages/remotes/vignettes/dependencies.html also suggests this is a supported mechanism for depending on a specific tag / version of a package on GitHub.

I'll see if I can reproduce / further understand the issue you describe with duplicated Remotes fields next.

@kevinushey
Copy link
Collaborator

I believe I've fixed the issue with duplicated Remotes fields here: 75fd390

@dvg-p4
Copy link

dvg-p4 commented Nov 29, 2023

A note for future passers-by: I still occasionally see duplicated Remotes fields, usually in situations involving a project that was created with an earlier version of renv and upgraded to one after this fix. Unfortunately I've been unable to get a good reprex for this; fortunately I've found the following steps to work around the problem:

  • Manually remove the relevant packages from renv.lock
  • Manually delete the relevant packages from the project's renv library
  • renv::install() the depender
  • renv::install() the dependency again if it's not the version you want
  • renv::snapshot()
  • If that doesn't work, renv::purge() them from the cache and try again (use with caution, for the reasons described in ?renv::purge)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants