Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rstudio package manager #925

Closed
choldgraf opened this issue Jul 4, 2020 · 20 comments
Closed

Rstudio package manager #925

choldgraf opened this issue Jul 4, 2020 · 20 comments

Comments

@choldgraf
Copy link
Member

Looks like rstudio now runs a package manager too

https://mobile.twitter.com/hadleywickham/status/1279023422748659712

We should look into it as a more reliable way of getting R packages in Linux!

@kkmann
Copy link

kkmann commented Jul 6, 2020

Ha, I was just gonna ask about that. Imho, it's not only about reliability but also speed. Building the tidyverse alone nowadays takes ages...

Wouldn't this just require a few changes to https://github.com/jupyter/repo2docker/blob/master/repo2docker/buildpacks/r.py to work with a lockfile instead of install.R?

I guess there is a design choice as to how one wants to handle runtime.txt:

  1. give it precedence over the recorded package versions in the lockfile (imho awkward)
  2. use it only for infrastructure packages not recorded in the lockfile (devtools, IRKernel, ...)
    I don't think it would be user friendly to require a lockfile that contains all infrastructure packages as well.

@choldgraf
Copy link
Member Author

I don't think that it should be too difficult to make this change. The big question is whether or not this is a "canonical packaging solution for R". The RStudio package manager is super new, and (at least for me) it is hard to tell whether RStudio basically is the R community, or if there are others that would suggest different packaging approaches (e.g. the holepunch package leverages the Rocker community stacks). Maybe either @cboettig or @karthik could advise?

@betatim
Copy link
Member

betatim commented Jul 6, 2020

I took a look at https://packagemanager.rstudio.com/client/#/repos/1/overview to try and understand how you'd use this. It looks like this is very similar to MRAN in that it offers "frozen snapshots" and that you'd configure it in a way very similar to how we configure MRAN.

If that is true this would cool because it would be a small change and compatible/no lock-in with using other mirrors.

@choldgraf
Copy link
Member Author

Yeah - plus I believe they have linux binaries for everything which would be a big improvement

@cboettig
Copy link

cboettig commented Jul 6, 2020

yup, we're doing this on the rocker/versioned stack now as well. (using the binaries built for focal on our ubuntu 20.04 based images. Note that most but not all of the bionic RSPM binaries will work on focal.

@betatim
Copy link
Member

betatim commented Jul 6, 2020

Does a binary from the RStudio CRAN "mirror" work with all R versions or do we have to pay attention to getting a right match (or does R take care of that?)?

@cboettig
Copy link

cboettig commented Jul 7, 2020

@betatim that part is no different then installing from source. (most packages are compatible across most versions, specific packages will declare a minimum R version number in their dependencies. install.packages() checks this whether doing the source or the binary version).

@kkmann
Copy link

kkmann commented Jul 7, 2020

I assume that this will be adopted by a wider community fairly quickly, especially in conjunction with the renv package. If I understand it correctly there is no risk of lock-in since the basis is still MRAN. We could implement a fallback to installing from the corresponding MRAN snapshot.

@betatim
Copy link
Member

betatim commented Jul 7, 2020

@kkmann do you want to create a PR that switches the MRAN URL repo2docker uses to the RStudio mirror? Off the top of my head these are some other things we'd have to check/look at in that PR:

  • finding a valid snapshot date when the user doesn't specify one
  • what to do when repo2docker is used on a repo that specifies a date before "today" that is a valid MRAN date but for which there is no snapshot on the RStudio mirror (detect this and fall back to MRAN? always use MRAN for dates before "today" to preserve old behaviour?)

@kkmann
Copy link

kkmann commented Jul 7, 2020

@betatim happy to, I slightly missed the point initially in that I thought we need to support renv-style lockfiles to configure dependencies.

Also, the snapshots are not available daily but roughtly twice per week. I guess we would go with the next older one. A problem is that their API uses sequential numbering for the builds, so that makes it a bit harder to figure out which build to use. @cboettig how are you resolving that for rocker?

edits:

  • How are we going to handle system level dependencies? The RSPM only gives us the binaries. I haven't figured out how to query system level dependencies. https://packagemanager.rstudio.com/__docs__/admin/appendix/system-dependency-detection/ from the web service.
  • there is an automatic source fallback, so we do not need to worry about missing binaries
  • I am having trouble getting binaries on focal/R4.0 are they only being built for R3.6?

@betatim
Copy link
Member

betatim commented Jul 7, 2020

repo2docker currently doesn't do anything regarding system level dependencies. Instead owners of a repository have to know to list those dependencies in the apt.txt. It would be interesting to investigate if we can remove that step by looking at the SystemRequirements field in theDESCRIPTION file. However I'd leave that for a second PR instead of attempting to do this in the same PR as switching to the rstudio mirror.

Looking at how the rstuido website does the translation from date to build number: they fetch https://packagemanager.rstudio.com/__api__/repos/1/transaction-dates?_sort=date&_order=asc which contains the build number and date for each entry in the calendar. Maybe a first step is to also fetch this, parse the response and then check if the requested date is in this list. For dates before "today" (today == roughly the day we merge the PR switching repo2docker to the rstudio mirror) I'd stick with using MRAN and not let users choose what they want.

@kkmann
Copy link

kkmann commented Jul 7, 2020

Agreed on keeping system level dependencies a user responsibility. My understanding is that the DESCRIPTION file mechanism is rather informal and it's probably not a good idea to go down that rabbit hole.

Thanks for figuring out the api call. I guess that's the best we have at the moment given tht there are no daily builds.

I almost feel that it would be more consistent to use MRAN for any dates that are not available via the RSPM - after all, the user specifies an exact date in runtime.txt. That would allow people who know about the RSPM to quickly look up the closest date with binaries themselves and pick it. So repo2docker would essentially get a free optional speedup without any change in functionality (not entirely true: the MRAN snapshopts are taken at a particular time but I feel that's negligible).

@betatim
Copy link
Member

betatim commented Jul 8, 2020

I'm unsure about interleaving MRAN and the RStudio mirror. On the one hand it would be cool because as you describe people get a free speed up. On the other hand it feels like it would be too much "magic" for the average user to have present in their mind when things don't work.

We'd have two systems that could be unavailable at different times and changing the specified date will do more than just move you around in time, it will also change from "compile everything" to "most stuff comes as binary".

Maybe a draft PR is the way to go and then we can more easily judge which option feels better.

@kkmann
Copy link

kkmann commented Jul 8, 2020

Yep, I'll look into it but it could take a bit - it'll be my first time messing with repo2docker internals ;)

I guess the cleanest way out of this would be to provide an option to directly specify the R package repository to use instead of providing a date. My understanding is that runtime.txt is a repo2docker-specific hack anyhow; why not change the specification to allow a repo URL instead of a date as well? That would give (interested/advanced) users more control without changing the current default. Also, we would only have to modify the parsing of runtime.txt to the repo option in R.

edit:
maybe change runtime.txt to runtime.yml and seperate the parameters, as in

r: 3.6.0
repo: https://packagemanager.rstudio.com/all/__linux__/xenial/299

or

r: 3.6.0
date: 2020-07-05

@betatim
Copy link
Member

betatim commented Jul 8, 2020

The short answer to changing the format of runtime.txt is "no" :)

It is a semi invented-here format but strongly inspired by Heroku's runtime.txt. It is also used by things which aren't R in repo2docker and we'd have to support runtime.txt for a good long while for all the thousands of repos that already use it. While also having a new format, dealing with precedence between the two files, confused users, etc.

As far as I know no one has ever asked for the ability to configure CRAN mirror (even during the times when MRAN was down for days). You'd also have to explain which of the mirrors to use (the one that works with the ubuntu version repo2docker uses), detect when people are using the wrong one, think about how to migrate them over when we change the ubuntu version in r2d, etc. So I'd not give this functionality to people because it adds maintenance burden and "just puts ideas into people's heads".

Sorry that the answer to almost everything seems to be "nice idea but no" :-/ It isn't you or your ideas, it is mostly due to the legacy and shear number of existing repositories that need to keep working that put stones in our way.

@kkmann
Copy link

kkmann commented Jul 8, 2020

I guess that's what the 'needs: discussion' label is for - there's a bunch of trade-offs here x)

Hm, if we do not want to interleave MRAN and the packagemanager by RStudio, it is maybe better to give it some time and see how stable the service is / how people are using it. I can still try to put together a PR using the next-older RSPM date for dates after the initial RSPM date and we can play around with it a bit.

Thanks for all your comments, tremendously helpful!

@choldgraf
Copy link
Member Author

I am a +1 on gathering information about the service, how it's going, how many people are using it, etc. Thanks @kkmann for your thoughtful discussion!

@betatim
Copy link
Member

betatim commented Jul 9, 2020

Cross linking to a previous thread discussing alternatives to using the MRAN mirror: #773

@betatim
Copy link
Member

betatim commented Jul 9, 2020

I think being a bit conservative and letting others test the waters is a good strategy. A (draft/WIP) Pull Request with some code to let people try things out would be great.

@manics
Copy link
Member

manics commented Jan 26, 2022

Done in #1104

@manics manics closed this as completed Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants