Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dcoumentation: mention renv::hydrate() as main step in the worklow? #588

Closed
MatthieuStigler opened this issue Dec 3, 2020 · 12 comments
Closed

Comments

@MatthieuStigler
Copy link

I feel the use of renv::hydrate() is not documented enough in the renv documentation? As a newcomer, I had to dig a lot to understand it is a fundamental function that I am likely to use very often!

The documentation seems to be based on a workflow where any time a user would need a new library (new to the project, not to the system), she would manually install it with install.packages(). But this is rather a rare case, most of the time one is just using library() in a script to load a new package (again, new here refers to new to the project). In that case, renv::status() will not detect anything, and then the user won't be able to load that package! After a lot of trial-and-error, I believe the right function is renv::hydrate() in that case. As a newcomer, I tried first restore, refresh, clean, record.

Is it correct that after adding a new library() call the right function is renv::hydrate()? If yes, this would be worth adding in the main workflow.

I suggest adding in the Workflow section of https://rstudio.github.io/renv/articles/renv.html:

  1. Call renv::init() to initialize a new project-local environment with a private R library,
  2. Work in the project as normal, installing and removing new R packages as they are needed in the project,
  3. Call renv::snapshot() to save the state of the project library to the lockfile (called renv.lock),
  4. Continue working on your project, installing and updating R packages as needed. If you want to use packages that are in your system library but not yet in your project library, use library(pkg) in your script and then renv::hydrate()
@MatthieuStigler
Copy link
Author

I realize things are more complicated: renv::hydrate() installs also new versions of all dependencies, not only helps tracking/installing newly added dependencies. So I guess my question is: could the intro manual specify how one is supposed to "use" a package called with library() that was not present at the init time?

@kevinushey
Copy link
Collaborator

Is it correct that after adding a new library() call the right function is renv::hydrate()?

It depends on what you really want:

  • renv::hydrate() tries to copy whatever is available in your user / site libraries into your project library,
  • renv::install() tries to install the latest versions of the package requested into your project library.

If you do indeed want to re-use whatever happens to be available in your user library, renv::hydrate() will be useful. However, I think it will be more common to use renv::install() or install.packages() instead, except for the initial case where a project is initialized via renv::init().

The documentation seems to be based on a workflow where any time a user would need a new library (new to the project, not to the system), she would manually install it with install.packages(). But this is rather a rare case, most of the time one is just using library() in a script to load a new package (again, new here refers to new to the project).

That is correct -- note that this is mainly modeled after the main way users work with R outside of renv: use install.packages() whenever you need a new package, under the expectation that it will install the "latest" version of that package.

In that case, renv::status() will not detect anything, and then the user won't be able to load that package!

Can you elaborate? If your project is using a package, but it is not installed in the project library, then renv::status() should report that.

Ultimately though I view renv::hydrate() as a somewhat more "advanced" usage (just because it does something not available in base R) but I could see it being useful in certain types of workflows.

@MatthieuStigler
Copy link
Author

Thanks Kevin for the prompt answer!

I think the whole issue is about status() not necessarily detecting package called with library() ! Those seem to be detected by renv::dependencies(), yet status() does not report them? Is that on purpose?

It's tricky to do a reprex (reprex fails when using renv::init()), but here is I think a fully reproducible example. Note that loading a new package ash with library(ash) goes undetected by status()

Restarting R session...

> tmp_dir <- tempdir()
> pkg_dir <- paste0(tmp_dir, "/MyNewProject")
> usethis::create_project(path = pkg_dir, open = FALSE, rstudio = TRUE)
✓ Creating '/tmp/RtmpTWkMjE/MyNewProject/'
✓ Setting active project to '/tmp/RtmpTWkMjE/MyNewProject'
✓ Creating 'R/'
✓ Writing 'MyNewProject.Rproj'
✓ Adding '.Rproj.user' to '.gitignore'
✓ Setting active project to '<no active project>'
> 
> cat("library(viridisLite)", file=paste0(tmp_dir, "/MyNewProject/R/first_file.R"))
> 
> renv::init(pkg_dir, restart=FALSE)
* Initializing project ...
* Discovering package dependencies ... Done!
* Copying packages into the cache ... Done!
The following package(s) will be updated in the lockfile:

# CRAN ===============================
- renv          [* -> 0.12.3]
- viridisLite   [* -> 0.3.0]

* Lockfile written to '/tmp/RtmpTWkMjE/MyNewProject/renv.lock'.
> 
> renv::status(pkg_dir)
* The project is already synchronized with the lockfile.
> 
> cat("\nlibrary(ash)", file=paste0(tmp_dir, "/MyNewProject/R/first_file.R"), append = TRUE)
> 
> renv::status(pkg_dir)
* The project is already synchronized with the lockfile.
> deps <- renv::dependencies(pkg_dir)
Finding R package dependencies ... Done!
> deps$Package %in% installed.packages()[, "Package"]
[1] FALSE  TRUE  TRUE
> fs::dir_delete(tmp_dir)

@kevinushey
Copy link
Collaborator

I think the whole issue is about status() not necessarily detecting package called with library() ! Those seem to be detected by renv::dependencies(), yet status() does not report them? Is that on purpose?

That seems unintentional to me: I'll try to figure out what's going on. Thanks for reporting!

@MatthieuStigler
Copy link
Author

MatthieuStigler commented Jan 12, 2021

Hi Kevin!

Do you have a milestone for this issue? Not being able to rely on status changes quite a lot the renv workflow.

For now I use the code below as workaround, to you have an better/alternate to suggest? Thanks a lot!!

deps <- renv::dependencies()
is_missing <- !deps$Package %in% installed.packages()[, "Package"]
if(any(is_missing)){
  pkg_miss <- deps$Package[is_missing]
  cat("Missing:\n", deps$Package[is_missing])
  
  renv::install(pkg_miss)
}

@kevinushey
Copy link
Collaborator

To make sure I understand the issue, the problem here is that:

  1. A package ("ash", in your case) is referenced in your package's scripts, but
  2. The package is not recorded in the lockfile, nor is it installed in the library,
  3. renv::status() doesn't say anything about this state.

That is, renv::status() doesn't report about packages which appear to be used in a project, but aren't actually installed or referenced in the lockfile.

@kevinushey
Copy link
Collaborator

The state is a bit weird because "technically" the lockfile and library are actually in sync; it's just that the project references from packages which haven't yet been installed.

@MatthieuStigler
Copy link
Author

Exactly! The issue comes with packages that were added in scripts after the first init() I believe. So that is a very common case

This suggests that status should also run renv::dependencies() , and check if all dependencies are already in the library? I think it does though the opposite test of checking if there are installed packages not needed (not found with renv::dependencies())?

Thanks!!

@kevinushey
Copy link
Collaborator

Thanks! Implemented on master now with 7bd669b.

@MatthieuStigler
Copy link
Author

Thanks, this is great!

So this means now the workflow is that every "newly required" package (i.e. after init) should be 1) used with a library() call in scripts 2) specifically installed with renv:install(), even if the package is already in the user main library?

Is that correct? If yes, I would strongly recommend updating the documentation, as there is now an important difference with the traditional workflow (install a package once, which will last forever for any script) and the renv one (renv::install a package every time that you use it in a renv project, unless you were already calling it when you did the init() in that project).

This leads to a new request/question: is there a function that automatically screens and installs all "referenced but not installed packages"? That would be very useful, to make the workflow less manual! Thanks!!!

@kevinushey
Copy link
Collaborator

So this means now the workflow is that every "newly required" package (i.e. after init) should be 1) used with a library() call in scripts 2) specifically installed with renv:install(), even if the package is already in the user main library?

Yes, that's correct and expected. (The project library is isolated from the default user library, so packages in that library aren't automatically visible in projects.)

Is that correct? If yes, I would strongly recommend updating the documentation, as there is now an important difference with the traditional workflow (install a package once, which will last forever for any script) and the renv one (renv::install a package every time that you use it in a renv project, unless you were already calling it when you did the init() in that project).

This is documented in https://rstudio.github.io/renv/articles/renv.html, in that we instruct users to install packages as required in the project.

This leads to a new request/question: is there a function that automatically screens and installs all "referenced but not installed packages"? That would be very useful, to make the workflow less manual! Thanks!!!

That's what renv::hydrate() does.

@MatthieuStigler
Copy link
Author

Thanks Kevin! This takes us back to the initial point about the documentation: in my view, the sentence Work in the project as normal, installing and removing new R packages as they are needed in the project is quite misleading. There are two distinct workflows:

  • Normal: install a package once in a lifetime with install.packages(), then load it in every project with library().
  • renv: unless the library() call was present at init time, install a package with renv::install every single time you first use it in a project.

This is very different! I installed dplyr four years ago, and in a normal workflow never re-installed it. Now renv's workflow calls for installing manually it in every project (unless present at init time). This is a fundamental difference that would be worth clarifying in the documentation!

Now about renv::hydrate: this actually also update packages. Installing newly called packages and updating registered packages are quite distinct tasks. Is there any chance there could be a function specifically dedicated to installing used-yet-not-installed packages without the side effect of hydrate? This would provide a very welcome alternative to manual installing with renv::install!

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants