Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R on Binder: What happens if MRAN goes away? #1240

Closed
DrAndiLowe opened this issue Feb 8, 2019 · 15 comments
Closed

R on Binder: What happens if MRAN goes away? #1240

DrAndiLowe opened this issue Feb 8, 2019 · 15 comments

Comments

@DrAndiLowe
Copy link

Apologies if this is a naive question, but here goes...

The documentation that describes configuring Binder to install R packages states that daily snapshots of CRAN that are hosted on MRAN are used. So what happens if MRAN disappears? More specifically, what dependencies are there on MRAN? Where are all the places it is used?

I want to setup my repo to trigger a packrat restore from a lock file instead of using MRAN. This provides better control over package versions (I've had problems, albeit rare, in the past when using checkpoint). Also, packrat works with packages installed from GitHub and BioConductor. I've already got this working locally, but not with Binder.

@betatim
Copy link
Member

betatim commented Feb 10, 2019

If MRAN stops existing we'd have to find a different CRAN snapshot'er. As far as I know MRAN makes snapshot of CRAN every day. This means that once you fix the MRAN date you can do install.package(blah) and get the same version every time.

Is packrat widely used? Do you have an example of what it is or how to use it? What would you place in the repository that you publish?

Project Binder doesn't have a lot of experience with the packaging options in each of the languages we support :-/

@DrAndiLowe
Copy link
Author

I've used MRAN for some work projects for reproducing my local R setup on a client's infrastructure, rather than for being able to reproduce this same setup at some point in time in the future. Of course, I hope that MRAN will not go away in the future, but it's still a single point of failure for systems that depend on checkpoint for package versioning. I don't know of an alternative repo of daily snapshots of CRAN.

Typically I'm building on a laptop running Windows 10 and deploying on AWS running Linux, and I've encountered weird problems in the past in which checkpoint fetches a binary for a package on one system, but on the other there is no binary available, which triggers compilation from source... from a different incompatible version. I'm not able to reliably replicate this behaviour, even less how to avoid it; I just know it exists because I've seen it myself. Probably this is not a concern for Binder, but it makes me nervous.

Packrat has the support of RStudio. It's a bit more fiddly to use, but you can restore BioConductor and GitHub packages also. I believe I saw a presentation by Microsoft in which they claim that you need to bundle your packrat package library with the thing you're shipping, but that isn't the case: it's possible to do a restore from a lock file, which is a snapshot of the packages that you're using for a project.

I've done this and would be happy to assist with a PR.

There are other solutions, but I've not used them. For example:
GRANbase: https://cran.r-project.org/web/packages/GRANBase/index.html
Automagic: https://cran.r-project.org/web/packages/automagic/index.html
I don't know what the pros and cons of these are.

Probably I should ask this as a separate question, but I'll leave this here as food for thought: How is Python package versioning done? I ask because with the Reticulate package, it's possible to call Python from R and vice versa, which means that it's possible to build R packages that are wrappers for Python packages. Already done for TensorFlow and Keras. I don't know if these kinds of R packages are tied to specific Python package versions. (Maybe they just install latest? No idea.) These kinds of R packages are probably going to become more common. How will they be handled?

@betatim
Copy link
Member

betatim commented Feb 12, 2019

(Answering the easy question first)

I don't know if these kinds of R packages are tied to specific Python package versions. (Maybe they just install latest? No idea.) These kinds of R packages are probably going to become more common. How will they be handled?

No idea. I'd say let's see what the authors of these wrappers come up with. I don't think repo2docker will have an opinion on this until there is a large enough user base that people frequently use these wrappers on in repo2docker repos and hopefully by then there will be a community standard/best practice we can copy.

@betatim
Copy link
Member

betatim commented Feb 12, 2019

Moving this issue to repo2docker repository as it is more about adding a new way to specify dependencies for R based repositories than about Binder as a whole.

Not moving it because I don't have the rights to do so.

@betatim
Copy link
Member

betatim commented Feb 12, 2019

To install packages from MRAN we set the CRAN mirror URL:

https://github.com/jupyter/repo2docker/blob/9766c9545540fb3e461c6ea92b054dbb3cb90184/repo2docker/buildpacks/r.py#L277

and after that users specify install.package(foobar) in their install.R and they get the version of the package at that date. You can also install from GitHub using that command (or variant thereof). Once you do that then the MRAN date is irrelevant and it is up to the user to specify a hash or a tag to pin down the version of the library that is being installed from GitHub.

If the repository is more like a package itself (like https://github.com/tidyverse/tidyverse) you can place a DESCRIPTION file in the root and repo2docker will install the dependencies listed in it for you, as well as your package.

Do you have an example repository with a packrat lockfile in it? If people are putting packrate lock files in repositories as a way for others to recreate their environment we might want to add support for it.

During the repo2docker build a lot of packages need to be compiled because there are no binaries for linux :-(

@DrAndiLowe
Copy link
Author

It is sufficient to do require(foobar); checkpoint will scan the working directory for .R and .Rmd files, parse them, find the require statements, and install from MRAN if possible.

I used to use devtools::install_github to install from GitHub, but recently discovered the remotes package that does the same but without all the rest of the package development stuff that comes with devtools.

I'm wondering if the DESCRIPTION file is sufficient to reconstitute a set of versioned packages, or if it's just a means to instruct the installer to exit if the required package dependencies cannot be satisfied with the latest versions that are on CRAN. Usually you do ">= foobar (1.2.3)" and not "== foobar (1.2.3)". I don't know what happens if the version of foobar on CRAN is 1.2.4. Version 1.2.3 gets installed, or the installer fails?

Looks here for something I cooked-up with packrat:
https://github.com/andrewjohnlowe/PackRatTest
However, I now know that there's a much better way of working than this.

@DrAndiLowe
Copy link
Author

Oops, I meant to put ">=" and "==" in the parentheses like so: foobar (>= 1.2.3) and foobar (== 1.2.3).

@DrAndiLowe
Copy link
Author

With regards to "During the repo2docker build a lot of packages need to be compiled because there are no binaries for linux":

Does repo2docker do this? https://www.jumpingrivers.com/blog/speeding-up-package-installation/

@betatim
Copy link
Member

betatim commented Feb 16, 2019

Thanks for the link! I created a new issue on the repo2docker repo to discuss the idea of using ncpus != 1.

@annakrystalli
Copy link

annakrystalli commented Feb 4, 2023

This issue is sadly relevant now 😭 https://techcommunity.microsoft.com/t5/azure-sql-blog/microsoft-r-application-network-retirement/ba-p/3707161

I couldn't see anything obvious in the documentation about plans for dealing with MRAN being retired in July 2023 but perhaps I missed it (apologies if I did!). I'm scheduled to teach a workshop on binderising an R project using holepunch next week and it just occurred to me that it all depends on MRAN and I'm not quite sure what to say about what will happen post July 2023. Is there any advice I should relay to participants?

@welcome

This comment was marked as duplicate.

@manics manics transferred this issue from jupyterhub/binderhub Feb 4, 2023
@manics
Copy link
Member

manics commented Feb 5, 2023

What's the current best practice recommended by the R community for reproducibility?

@annakrystalli
Copy link

I'd say package renv which represents the evolution of package packrat discussed earlier in this issue would be most appropriate.

@DrAndiLowe
Copy link
Author

This looks useful: https://www.brodrigues.co/blog/2023-01-12-repro_r/

@sgibson91
Copy link
Member

I think we have now officially switched over to the Posit Package Manager system and this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants