Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vignette for Sharing and Version Control #74

Closed
slopp opened this issue May 8, 2019 · 13 comments
Closed

Vignette for Sharing and Version Control #74

slopp opened this issue May 8, 2019 · 13 comments

Comments

@slopp
Copy link
Contributor

slopp commented May 8, 2019

It'd be good to elaborate what needs to be tracked in Git, and how cloning a renv project should work for a collaborator. For example:

What would happen if I'm using the renv packages using git and a coworker of me clones that repo opens the project and installs a brand ne packages. Is it automatically installed in the renv() folder despite the fact he hasn't installed the renv packages and hasn't run the init() function?

@kevinushey
Copy link
Collaborator

The answer to that question specifically: after the collaborator launches R, renv would bootstrap itself through the .Rprofile + renv/activate.R scripts, so that the project would be automagically activated on the collaborators machine. Because renv was bootstrapped and activated automatically, newly installed packages will be installed into the project library as well.

renv should update the project's .gitignore automatically as well so that users shouldn't need to do anything themselves, but we should document this nonetheless.

@slopp
Copy link
Contributor Author

slopp commented May 9, 2019

@kevinushey doesn't that require the user to commit the .Rprofile and renv/activate.R files in their Git repo? I was under the impression that user's were only expected to commit the lock file, in which case the bootstrapping wouldn't occur.

@kevinushey
Copy link
Collaborator

There's two possible workflows we could advertise here:

  1. Commit .Rprofile, renv.lock, and renv/activate.R. Collaborators will have renv bootstrapped for them automagically when an R session is opened in that project.

  2. Commit only renv.lock. Users must explicitly invoke renv::init() (or renv::activate()) to ensure that renv is activated.

Right now (1) is what we're doing (and it's what Packrat has done and has encouraged in the past), but I'm open to whether we should consider changing that.

@cboettig
Copy link

I'm in favor of 2. In general I think having a single config file and encouraging explicit calls instead of side-effects is preferable in the long term. I think this also translates more easily when a user tries to reason about how renv behaves outside the envisioned use case of a single project (e.g. when preparing Docker image, etc).

I think that's a bit closer to python use of requirements.txt or virtualenv, but maybe @yuvipanda has thoughts on this.

@yuvipanda
Copy link

Having a single file that then requires a command to activate sounds like the right thing to do from my perspective. This can be made 'automagic' in different contexts in different ways explicitly - for example, repo2docker can automatically run the appropriate calls if it finds this file (jupyterhub/repo2docker#660). This is what we currently do for DESCRIPTION or install.R files, for example.

Other IDEs / Environments might offer to do the same in a way that's useful for them.

Separating 'these are my dependencies, see them' from 'activate all my dependencies now, so I can use them' as distinct user actions will make it much easier for other tools to integrate with it.

@asodja
Copy link

asodja commented May 20, 2019

One thing I hate about Packrat is that it modifies .gitignore in root folder. Renv does this in a bit nicer way since it has "it's own" .gitignore in renv folder, but I still believe that handling git is out of scope of this package, since git is a separate system and should be in control of a user.

So what I want to say here is that please do not overwrite user's .gitignore no matter what way you choose to advertise. :)

@kevinushey kevinushey added this to the CRAN Release 1.0.0 milestone May 21, 2019
@kevinushey
Copy link
Collaborator

If we were to go the renv.lock-only route, we'd need to find a solution for how renv is activated after e.g. someone clones a project from GitHub. Some proposals:

  1. We could simply document that the first thing you do after cloning an renv project is to call renv::activate().

  2. The front-end (e.g. RStudio) could check if the user is opening an renv project that does not yet have the associated infrastructure activated. If so, it could offer (as a one-time request) to automatically activate renv for you.

  3. renv itself could also check (e.g. in a .onLoad() hook) whether the current project appears to be an un-activated project, and prompt the user to activate in that situation.

The main thing I'm worried about is that users might expect that an R session launched in an renv project should always activate the renv infrastructure (e.g. use the private library). That won't occur for a newly-cloned project if the only thing that has been committed is renv.lock.

@cderv
Copy link
Contributor

cderv commented May 22, 2019

I am in favor of 1 or 2. I think it is best to not do everything automagically for the user. It seems like a very few step to just renv::activate(). The IDE support seems a good thing. It was the case for packrat, and was very useful, mainly for new users.

The other solution to commit all is still possible to have an activation of the project renv automatically.

The two way to advertise seems the good one, with a default mode of only commiting renv.lock, and an advanced mode to commit the other file to have auto activation. I am not sure this can be prevented if someone has an .Rprofile for other purpose and commit it, it will commit for renv too I guess. And will default to mode 2. Am I right ?

@kevinushey
Copy link
Collaborator

One big downside to these approaches is that we'd have to start modifying the .gitignore to exclude the project .Rprofile, which feels strange to me.

I would rather optimize for the general user experience rather than ease-of-use with tools, since in general tool-builders are motivated and able to work around some of these sorts of things. In this case, one could just invoke R as R --vanilla if they really needed to be insulated from the project .Rprofile, or we could add something like an option or environment variable to suppress automagic loading of renv if need be.

On the balance, I still lean towards including .Rprofile, renv.lock, and the renv folder by default since I think this will lead to the most straight-forward workflow for the average user -- ie, if you start R within an renv project, then that renv project will automatically be activated for you. And this will be true regardless of whether you run R in the terminal, in RStudio, or otherwise.

@kevinushey
Copy link
Collaborator

Here's my thoughts: if we're going to automatically activate projects by using a project .Rprofile, then we need to also commit renv/activate.R, and there's really not much way around this. In theory, we could do without this and automatically activate renv activates within e.g. RStudio, but I think it's nice that the experience with renv is consistent regardless of your front-end in this respect.

So, on the balance, I think the right approach is this:

  1. Document and make clear that renv.lock is the only piece needed for restoring a private project library,

  2. Document that the project .Rprofile and renv/activate.R are used to automatically initialize renv for use with a project,

  3. Recommend committing the project .Rprofile as well as renv/activate.R, in addition to renv.lock.

I think the automatic renv activation is pretty central to the experience (otherwise you risk forgetting to call renv::activate() or something similar).

It's also worth saying that users are already familiar with this workflow through Packrat as well, so it will not feel that foreign.

Ultimately, it's just two files in the top level of the project directory: the renv folder, and renv.lock, and I think that's a fair ask.

@cderv
Copy link
Contributor

cderv commented Jun 15, 2019

I completely understand this. However, I wanted to share some last thoughts on automatic activation with .Rprofile ☺️

One drawback / edge case that I can think of (but could be pretty common) is if a user already manage some profile stuff in it ~/.Rprofile in its development environment. It will always work for renv because project's .Rprofile will be read before but all the user's configurations in its user's .Rprofile will not be sourced.
Also usethis::edit_r_profile() default to the user .Rprofile for any configuration. I find it common to edit it for example :

  • rlang error traceback,
  • conflict configuration
  • devtools & usehtis options for package development (recommendzs here)
  • different global package's options (like options(blogdown.author = "Christophe Dervieux"))

I find it to be an important drawback of renv to lose that. Or choose between your global option and automatic activation.

Another edge case is if R_PROFILE_USER is set to some file different form the project .Rprofile, I believe renv will not activate automatically because the project .Rprofile will not be sourced.

What I mean here is that the automatic activation based on .Rprofile needs at least some detailed explanation so that users know what it implies and how to configure there R environment.

There is obviously some solution like load the user's profile in the project's profile if it exists but still you need to master all this to have the correct configuration.

However, I think there may be some way of doing automatic activation differently. Something that you would but in you .Rprofile, like the user one as any other global option, that would say : I want automatic activation for my renv projects. It would know if a renv project is opened (file.exists('renv.lock') ❔) to activate it running something like activate.R (or check if activate.R is present and source it) .
You would put it in .Rprofile only if you want automatic activation. I acknowledge that cloning the repo would'nt be enough to get automatic activation though.

To illustrate the type of custom mechanism, it makes me think of the startup 📦
https://github.com/HenrikBengtsson/startup : You just add a startup::startup() in your .Rprofile (the one you want) and it will load anything in .Rprofile.d directory (project's or user's).

Just thoughts and ideas I add already with packrat way of doing. This is now or never to share them 😉
I can work on a POC for what I have in mind if you want.

@kevinushey
Copy link
Collaborator

Thanks @cderv -- I appreciate your insight!

The activation script used by renv does source the user .Rprofile:

# source the user profile if any, respecting R_PROFILE_USER
profile <- Sys.getenv("R_PROFILE_USER", unset = path.expand("~/.Rprofile"))
if (file.exists(profile)) {
current <- normalizePath(".Rprofile", winslash = "/", mustWork = FALSE)
if (!identical(profile, current))
source(profile)
}

It's worth noting: if you're using your user .Rprofile to configure R packages, and you try to load those packages within your .Rprofile, you risk breaking the encapsulation offered by renv -- ie, those packages could get loaded from your user / global library rather than the renv project library.

In theory, this could be alleviated as long as you're only setting R options and not explicitly loading the packages, but there's no guarantee that all users are doing this. One workaround would be to ensure that renv is always initialized first so that any requested packages are loaded from the project library, but this could break as well (e.g. if the user attempts to load packages not available in the private library)

There are also issues with multi-library configurations -- e.g. if you load a package from the user library that depends on rlang 0.3.1, but you have an older project depending on rlang 0.2.4 and that's installed the private library, then things could go wrong. Ensuring that only a single private library is used is the simplest way forward, but that still isn't great (e.g. you have to install development tools into the private library always) but at least the renv cache ensures this is no longer so expensive.

My biggest worry is that users could get confused in collaborative workflows. Suppose you're working with an renv project that you want to share with someone else. You put the project + renv.lock on GitHub, your collaborator clones the project, opens R, and ... nothing happens, because the project is not automatically activated. So now, we either have to tell the user to edit their user .Rprofile, and add some renv invocation to automatically load projects. Or, they have to remember to call renv::activate() every time they open the project, which seems easy to forget. (One solution would be to automatically modify the user .Rprofile when the renv package is loaded but this feels dangerous to me.)

@kevinushey
Copy link
Collaborator

One other challenge: different projects might use different versions of renv; if we start relying on some API to automatically activate projects (e.g. renv::auto_activate()) then that function will also need to learn how to:

  1. Bootstrap the version of renv requested in the lockfile,
  2. Unload the version of renv that is currently loaded,
  3. Load the bootstrapped version of renv.

I believe this is possible, but could get messy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants