-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different versions of R or staying just a point release behind the current version #661
Comments
First of all: huge thank you for taking the time to fill in our (brand new!) issue template for this. I am already 200% more excited about getting this done :) Second: I think we should do this. Letting users choose which version of R they need is essential and there is no good reason for why you can do it for Python but not R (modulo the fact that we need to find time to do it). A reason R support is a bit behind is that we lack R users to guide how repo2docker should do things. One of our guiding principles is that we want to follow what the respective communities are already doing, so we can pickup users where they are instead of prescribing how they should be doing it. One idea that has been floated is to use the conda package manager to install different versions of the R binaries. You mentioned using what ever mechanism is in Ubuntu to install different versions. Is there another way? Otherwise we can look at the pros&cons of these two. I like the idea of extending the format for I think changing the version of R will be as much technical work as making it configurable (famous last words). Mostly because I think with the current setup we have, we can't actually change the version because we use what ever is "the version used with Ubuntu 18.04". Things to do:
|
I will take this up on! I have spent way too much time creating R containers in the past so this should help. Although I like the idea of using conda to install R this tends to inflate the size of the images quite a lot. I have a quick question though: is there a reason for us using the Ubuntu base image as opposed to aa Debian one to build the R images? (note that this is purely out of curiosity). |
miniconda will already be installed because we install a Jupyter notebook server as the default frontend (R via notebooks) and to host our proxy that then sends you on to RStudio. We should collect a few ways of installing the R binaries (not the R packages) in a Ubuntu base image that delivers all the things that a minimal repo2docker image delivers. Then we can look at how big the various images are and what we can tweak and what the trade off between size and extra engineering effort is etc.
All build packs in the core repo2docker setup share a base image to allow you to compose build packs. This is what allows us to produce images that have R and Python and X and Y installed. I don't think we should change either the ability to compose the build packs in repo2docker nor the base image we use for all of them. For enabling different build packs with different base images that (potentially) don't compose see #487 (comment). I don't think we have a written down origin story why Ubuntu is the base image. To me Ubuntu is the more widely used on desktop version of Debian with packages that are more up to date. |
Cool I will start with this and report on the findings, we might want to also evaluate whether having multistage builds makes sense and helps us building lighter weight images
Agree, keeping a consistent base image should help with long-term maintainability |
Interesting to see a breaking change (in terms of reproducibility) in R 3.6 - specifically to do with the random number algorithm and the sample() function: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494 So, R scripts involving set.seed() and sample() will produce different results in R 3.6 vs. older versions of R. I think this is another bit of evidence as to why being able to specify which version of R to launch in Binder is important for full reproducibility. Twitter chat suggests this modification of the random number generation algorithm could cause confusion amongst those not aware of the underlying change when trying to run older scripts using R 3.6. |
Is there any suggested way of getting a particular r-base version, without a full Dockerfile, available directly from the R project site? |
Not at the moment, this is what we are currently working on. For now the workaround would be to use a custom Docker (or any of the rocker project) t |
Thanks - I've been playing around setting up a Dockerfile in my repo using Rocker to run different versions of R. I've noticed that via this route, packages (such as ggplot2) load really quickly (and it doesn't look as if they're being built by Binder on the fly). Is the Dockerfile (containing the Rocker info). pulling a prebuilt version of packages into the Binder alongside base R or is the package build just happening super fast? I'm guessing it's the former as the occasional R Rocker (e.g., 3.5.3) doesn't seem to install the runtime.txt specified version of ggplot2 which - makes me think it's because there isn't a Rocker image of 3.5.3 with that version of ggplot2 pre-built (so the latest is installed by default). You can see the repo I'm using here: https://github.com/ajstewartlang/Binder_demo Either way, this seems like a nice workaround to using different versions of R and also getting certain packages up and running in the Binder quickly. |
Hi @ajstewartlang yes the rocker image
Now to have more granular control of your dependencies and versions I recommend installing pacman via the |
@ajstewartlang @trallard Shouldn't your problems be solved by using katthik/holepunch? For me it only did not suffice because I needed additional system dependencies. Apparently we're talking about different problems here and in holepunch:
RUN export DEBIAN_FRONTEND=noninteractive; apt-get -y update \
&& apt-get install -y gdal-bin \
libgeos-dev \
libudunits2-dev \
make \
wget in the docker file as in here did not work for me OOB. |
@pat-s I'm working to get holepunch stable for most use cases before allowing for adding additional system dependencies. If you file and issue there I can keep it on my list for the next set of fixes. |
For repo2docker what ever solution we want to try has to result in a buildpack that lets users choose the version of R and be composable with other buildpacks so users can install (say) Python and R stuff simultaneously. This means it isn't clear if we could use a different base image for the R buildpack, it would depend on it being "essentially the same" base image as the ubuntu one we use elsewhere. Another option is to extend the commands here and afterwards so that they know how to install different versions of R. |
I agree with @betatim that "use eventually the Rocker images" suggested by @pat-s will be hard to make work with r2d. Rocker images are Debian-based (so not too far away from Ubuntu), but I'm not sure what happens if multiple build packs are triggered on a different base image. [IMHO copying how Rocker does it in the R buildpack without replicating the whole variety of Rocker images is a reasonable approach for r2d.] A related issue about system dependencies: #762. |
@karthik The issue already exists: karthik/holepunch#20 @betatim @nuest I see. If there is the need to stay with the current image, I only see two options:
|
Another idea I just tested is to install the R binary from conda-forge. They seem to have all(?) versions available and we already use conda to install other packages. I tested this by taking https://github.com/binder-examples/r and adding a I also couldn't find a PPA that has different R versions for Ubuntu. The R website itself only seems to have 3.6 now :-/ Do you have any experience with how long it takes to compile R from source? |
I have done the whole R installation from source multiple times (mainly on alpine) and it can take quite a bit. |
This takes up to 5 mins depending on the CPU speed. But everything that's > 1 min would probably not be acceptable. That's why the travis folks use some precompiled binary. Idk anything about conda-forge but if that works for one version fixing others shouldn't be a big deal. |
Chiming in with some more details on conda-forge. I'm happy to help test anything out as you are considering the various options.
For conda-forge, we build the first patch of each minor release, e.g. 3.4.1, 3.5.1, 3.6.1, etc. This is because all the R conda binaries have to be re-built against each R release. https://anaconda.org/conda-forge/r-base/files
We added the xorg packages because R users were getting errors while trying to run analyses from minimal Docker containers, e.g. bgruening/docker-galaxy-stable#420 |
Is this good enough? Could it be stepped up if there was demand? I am not a R user so no idea what the implications are of only(?) having the first patch release to choose from.
Do you know if the |
I think it is good enough for most use cases. The differences between patch releases is minimal. The main issue occurs in the 3 months between the minor release and its first patch release. This problem is compounded if Bioconductor requires the new minor release.
That's hard for me to say. The team of people that maintains the conda-forge R packages is small. However, since the patch releases are so similar, the main burden would be increased CI time. conda-forge recently switched to Azure, which has helped reduce CI wait times, but I don't know if it sufficient. Another recent development that could help here is that conda-forge has started pinning R to only the minor release. See here for discussion. Thus going forward it is possible that we could provide all the patch level releases of r-base but still only build one binary of each R package for that minor release. As far as I know this potential has only been discussed, but no decision has yet been made.
The Ubuntu r-base package does not pull in any xorg packages as far as I can tell. I ran the following to confirm:
|
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/binder-from-github-dockerfile-not-starting-rstudio/7986/21 |
Proposed change
Allowing different versions of R (and possibly RStudio) to be run (R 3.6 vs. 3.5 vs. 3.4 etc).
Alternative options
Assuming that installing the most recent version of R would be less intensive that installing a pre-specified version, just keeping up to date with the major release versions would be a fantastic alternative - but maybe a point or two behind. R 3.6 has just been released - so maybe the most stable release is the previous version (3.5.3)
Who would use this feature?
Reproducibility has become increasingly important for researchers in Psychology and many groups and labs have switched to R for open and reproducible research (incl. in teaching). Given that reproducing the computational environment is arguably the gold standard of reproducibility, it would be hugely beneficial to be able to launch a particular version of R using repo2docker - so the runtime.txt file wouldn't just determine what version of packages are pulled from MRAN, but it could also contain a specification of which version of R to run.
I can imagine a future where research journal articles each contain a link in them so that reviewers and readers can launch a Binder to see the entire analysis script and data exactly as it was on the date the authors carried out the analysis. That would be a huge benefit to the community and a massive boon for reproducibility.
How much effort will adding it take?
I'm not sure I can estimate this.
Who can do this work?
I'm guessing someone who knows how to install different versions of R via Ubuntu’s package manager - and how repo2docker can read info in the runtime.txt file to determine which version of R to then install. I'm happy to help where I can - although I'm only a psychologist!
The text was updated successfully, but these errors were encountered: