Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get binary R packages from packagemanager.rstudio.com #1104

Merged
merged 21 commits into from
Jan 8, 2022

Conversation

yuvipanda
Copy link
Collaborator

@yuvipanda yuvipanda commented Dec 17, 2021

packagemanager.rstudio.com is a CRAN mirror provided
by rstudio, with binary packages prebuilt for many Linux
Distributions! https://www.rstudio.com/blog/announcing-public-package-manager/
has more excellent detail. It cuts down install times for R packages
by almost 90% in some cases!

Like MRAN (which we use now), they also provide a daily snapshot
of CRAN at that date
(https://docs.rstudio.com/rspm/news/#rstudio-package-manager-2021090).
The URL for CRAN for a particular date can be fetched via an API
call. We call that API, and we retry for earlier dates if we can't find one
for that date. However, note that rspm seems to do serverside magic
to give us packages from the earlier date anyway, so we don't need to
do the MRAN backoff behavior yet.

One possible issue about changing existing binder repos to use binary
builds rather than source builds is that the binary builds sometimes
require you have an apt package installed, and will fail if it is
not. We had to install the zmq library apt package for example -
source installs compile zmq from source, which is where the speedup
comes from. But unlike python wheels or conda packages, these binary
builds are not self-contained - they are linked to apt packages from
the specific distros. So some repos that worked before might fail now.
Due to this, we default to RSPM only if one of the following conditions are true:

  1. The snapshot date is recent - After Jan 1 2022. I picked this arbitrarily.
  2. R 4.1+ is being requested, as mran doesn't
    seem to support R 4.1?

We also bring in newer versions of RStudio based on what R version they support,
and a matching jupyter-rsession-proxy. Fixes #1041

A bug where asking for R 4.0 gave us R 4.1 is also fixed, and we add a separate test for
that as well. Fixes #1077

TODO:

  • Make sure fallback MRAN date works by checking to see if the URL exists still
  • Modify retry logic to look for snapshots in packagemanager.rstudio.com CRAN
    before looking into MRAN
  • Pin our devtools installs to dates and version that we know work for a given
    R version
  • Decide what the cutoff date for switchover to binary packages is
  • Fix failing unit test

packagemanager.rstudio.com is a CRAN mirror provided
by rstudio, with *binary packages* prebuilt for many Linux
Distributions! https://www.rstudio.com/blog/announcing-public-package-manager/
has more excellent detail. It cuts down install times for R packages
by almost 90% in some cases!

Like MRAN (which we use now), they also provide a daily snapshot
of CRAN at that date
(https://docs.rstudio.com/rspm/news/#rstudio-package-manager-2021090).
The URL for CRAN for a particular date can be fetched via an API
call. We call that API, and if there is no snapshot for that date
(anything before Oct 2017), we fall back on to MRAN. Adds a test
to test this fallback.

One possible issue about changing existing binder repos to use binary
builds rather than source builds is that the binary builds sometimes
require you have an apt package installed, and will fail if it is
not. We had to install the zmq library apt package for example -
source installs compile zmq from source, which is where the speedup
comes from. But unlike python wheels or conda packages, these binary
builds are not self-contained - they are linked to apt packages from
the specific distros. So some repos that worked before might fail now.
We can choose a more recent cut-off date to prevent this from happening.
We were doing this from an old MRAN snapshot. I moved the pin
a little ahead, so IRKernel can also be installed from CRAN
instead of from GitHub. R <= 4.0 gets the old version, and anything
newer gets a more recent version of devtools. This gives us
fast installs for IRkernel with binary packages.

Also add a R 4.0 and R 4.1 test
@manics
Copy link
Member

manics commented Dec 17, 2021

Is it guaranteed that CRAN will have a daily snapshot for each day after <YYYY-MM-DD> that is well defined? If not then would it be worth looking backwards for the closest CRAN snapshot to avoid a situation where someone has a repo whose runtime.txt references a date in CRAN, who then subsequently updates it to a more recent date that isn't in CRAN, causing it to switch to MRAN?

repo2docker/buildpacks/r.py Outdated Show resolved Hide resolved
@yuvipanda
Copy link
Collaborator Author

@manics I had initially read https://docs.rstudio.com/rspm/news/#rstudio-package-manager-2021090 and came to the conclusion they'll have a snapshot for all days, but on re-reading it I'm not sure. I'll set the MRAN / packagemanager.rstudio.com cutoff be based on dates, and retry to slightly older snapshots if it can't find one for that date.

@RaoOfPhysics
Copy link

Looking at https://packagemanager.rstudio.com/client/#/repos/1/overview it appears as if they've gone and snapshotted all of the days (with two exceptions in October 2017)?

@yuvipanda
Copy link
Collaborator Author

@RaoOfPhysics ah, glad you found the holes in October!!! Will help me test and make sure we cover those cases.

@RaoOfPhysics
Copy link

Here you go, @yuvipanda.

Screenshot 2021-12-20 at 12 14 24

- Install a different version of RStudio for R < 4.1,
  as latest RStudio doesn't seem to support those. And
  newer RStudio isn't supported on these older R versions.
- Cleanup how Shiny is installed - install it with the same
  apt invocation as rstudio (saves time), and install shiny-proxy
  from PyPI instead or GitHub. The release on PyPI is the same
  as our previous GitHub pin.
- Remove outdated comment about different behavior for R 3.6 - I
  think now we get all our R versions from the same apt repo. Plus,
  the conditional was adding more scripts than just adding extra apt
  package repos
- MRAN doesn't seem to have R 4.1 specific snapshots, so let's
  default to RSPM for anything 4.1+.
- Otherwise, snapshot dates in 2022 will result in using rspm
@yuvipanda
Copy link
Collaborator Author

Ok, I've changed the logic for when rspm is used as default to either be a snapshot request date in 2022+ or asking for R4.1. I think with these two, we shouldn't break any old repos that were dependent on source builds.

@RaoOfPhysics
Copy link

I haven’t thought too carefully about it, but I think the logic makes sense. Why R4.1+ and not R4.0+ though?

@yuvipanda
Copy link
Collaborator Author

@RaoOfPhysics The R4.1 decision is because R4.0 is the latest I see in MRAN (https://mran.microsoft.com/timemachine). I'm mostly trying to make sure we break as little existing repositories as possible...

@yuvipanda yuvipanda requested a review from minrk January 4, 2022 15:01
@yuvipanda
Copy link
Collaborator Author

Unfortunately it looks like R 4.1 is being installed even when we ask for R 4.0. Trying to figure out why.

Otherwise latest version was being installed, giving us
R 4.1 even when we ask for 4.0
@yuvipanda
Copy link
Collaborator Author

Looking at apt list --installed shows that r-base-core was not pinned by us, so ended up picking the latest version. I fixed that.

And add another R test for R4.0 + rspm
@yuvipanda
Copy link
Collaborator Author

I've fixed the tests as well now!

repo2docker/buildpacks/_r_base.py Outdated Show resolved Hide resolved
Copy link
Member

@manics manics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, once those are answered this looks good to merge!

repo2docker/buildpacks/r.py Outdated Show resolved Hide resolved
repo2docker/buildpacks/r.py Outdated Show resolved Hide resolved
repo2docker/buildpacks/r.py Outdated Show resolved Hide resolved
tests/unit/test_r.py Outdated Show resolved Hide resolved
repo2docker/buildpacks/r.py Outdated Show resolved Hide resolved
Copy link
Member

@manics manics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@yuvipanda
Copy link
Collaborator Author

@manics awesome! THANK YOU SO MUCH!

@manics manics mentioned this pull request Jan 26, 2022
yuvipanda added a commit to yuvipanda/repo2docker that referenced this pull request Mar 25, 2022
- Explains jupyterhub#1104
- Advertises that we get RStudio 'for free' when R is installed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants