-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rstudio package manager #925
Comments
Ha, I was just gonna ask about that. Imho, it's not only about reliability but also speed. Building the tidyverse alone nowadays takes ages... Wouldn't this just require a few changes to https://github.com/jupyter/repo2docker/blob/master/repo2docker/buildpacks/r.py to work with a lockfile instead of install.R? I guess there is a design choice as to how one wants to handle runtime.txt:
|
I don't think that it should be too difficult to make this change. The big question is whether or not this is a "canonical packaging solution for R". The RStudio package manager is super new, and (at least for me) it is hard to tell whether RStudio basically is the R community, or if there are others that would suggest different packaging approaches (e.g. the |
I took a look at https://packagemanager.rstudio.com/client/#/repos/1/overview to try and understand how you'd use this. It looks like this is very similar to MRAN in that it offers "frozen snapshots" and that you'd configure it in a way very similar to how we configure MRAN. If that is true this would cool because it would be a small change and compatible/no lock-in with using other mirrors. |
Yeah - plus I believe they have linux binaries for everything which would be a big improvement |
yup, we're doing this on the rocker/versioned stack now as well. (using the binaries built for |
Does a binary from the RStudio CRAN "mirror" work with all R versions or do we have to pay attention to getting a right match (or does R take care of that?)? |
@betatim that part is no different then installing from source. (most packages are compatible across most versions, specific packages will declare a minimum R version number in their dependencies. |
I assume that this will be adopted by a wider community fairly quickly, especially in conjunction with the renv package. If I understand it correctly there is no risk of lock-in since the basis is still MRAN. We could implement a fallback to installing from the corresponding MRAN snapshot. |
@kkmann do you want to create a PR that switches the MRAN URL repo2docker uses to the RStudio mirror? Off the top of my head these are some other things we'd have to check/look at in that PR:
|
@betatim happy to, I slightly missed the point initially in that I thought we need to support renv-style lockfiles to configure dependencies. Also, the snapshots are not available daily but roughtly twice per week. I guess we would go with the next older one. A problem is that their API uses sequential numbering for the builds, so that makes it a bit harder to figure out which build to use. @cboettig how are you resolving that for rocker? edits:
|
repo2docker currently doesn't do anything regarding system level dependencies. Instead owners of a repository have to know to list those dependencies in the Looking at how the rstuido website does the translation from date to build number: they fetch https://packagemanager.rstudio.com/__api__/repos/1/transaction-dates?_sort=date&_order=asc which contains the build number and date for each entry in the calendar. Maybe a first step is to also fetch this, parse the response and then check if the requested date is in this list. For dates before "today" (today == roughly the day we merge the PR switching repo2docker to the rstudio mirror) I'd stick with using MRAN and not let users choose what they want. |
Agreed on keeping system level dependencies a user responsibility. My understanding is that the DESCRIPTION file mechanism is rather informal and it's probably not a good idea to go down that rabbit hole. Thanks for figuring out the api call. I guess that's the best we have at the moment given tht there are no daily builds. I almost feel that it would be more consistent to use MRAN for any dates that are not available via the RSPM - after all, the user specifies an exact date in runtime.txt. That would allow people who know about the RSPM to quickly look up the closest date with binaries themselves and pick it. So repo2docker would essentially get a free optional speedup without any change in functionality (not entirely true: the MRAN snapshopts are taken at a particular time but I feel that's negligible). |
I'm unsure about interleaving MRAN and the RStudio mirror. On the one hand it would be cool because as you describe people get a free speed up. On the other hand it feels like it would be too much "magic" for the average user to have present in their mind when things don't work. We'd have two systems that could be unavailable at different times and changing the specified date will do more than just move you around in time, it will also change from "compile everything" to "most stuff comes as binary". Maybe a draft PR is the way to go and then we can more easily judge which option feels better. |
Yep, I'll look into it but it could take a bit - it'll be my first time messing with repo2docker internals ;) I guess the cleanest way out of this would be to provide an option to directly specify the R package repository to use instead of providing a date. My understanding is that edit:
or
|
The short answer to changing the format of It is a semi invented-here format but strongly inspired by Heroku's As far as I know no one has ever asked for the ability to configure CRAN mirror (even during the times when MRAN was down for days). You'd also have to explain which of the mirrors to use (the one that works with the ubuntu version repo2docker uses), detect when people are using the wrong one, think about how to migrate them over when we change the ubuntu version in r2d, etc. So I'd not give this functionality to people because it adds maintenance burden and "just puts ideas into people's heads". Sorry that the answer to almost everything seems to be "nice idea but no" :-/ It isn't you or your ideas, it is mostly due to the legacy and shear number of existing repositories that need to keep working that put stones in our way. |
I guess that's what the 'needs: discussion' label is for - there's a bunch of trade-offs here x) Hm, if we do not want to interleave MRAN and the packagemanager by RStudio, it is maybe better to give it some time and see how stable the service is / how people are using it. I can still try to put together a PR using the next-older RSPM date for dates after the initial RSPM date and we can play around with it a bit. Thanks for all your comments, tremendously helpful! |
I am a +1 on gathering information about the service, how it's going, how many people are using it, etc. Thanks @kkmann for your thoughtful discussion! |
Cross linking to a previous thread discussing alternatives to using the MRAN mirror: #773 |
I think being a bit conservative and letting others test the waters is a good strategy. A (draft/WIP) Pull Request with some code to let people try things out would be great. |
Done in #1104 |
Looks like rstudio now runs a package manager too
https://mobile.twitter.com/hadleywickham/status/1279023422748659712
We should look into it as a more reliable way of getting R packages in Linux!
The text was updated successfully, but these errors were encountered: