Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some R 4.0.2 binary packages built under R 4.0.3 #127

Closed
ashiklom opened this issue Feb 23, 2021 · 12 comments
Closed

Some R 4.0.2 binary packages built under R 4.0.3 #127

ashiklom opened this issue Feb 23, 2021 · 12 comments

Comments

@ashiklom
Copy link

@ashiklom ashiklom commented Feb 23, 2021

I think this is an issue with the RStudio Package Manager, not with these images, but I wanted to call your attention to it because it prevents code on these images from passing R CMD check:

Some R binary packages seem to have been built using R 4.0.3. For example:

docker pull rocker/r-ver:4.0.2
docker run -it --rm rocker/r-ver:4.0.2
## From inside R
install.packages("coda")
library(coda)
# Warning message:
# package ‘coda’ was built under R version 4.0.3

From our own project, I know this affects at least the following packages:

  • coda
  • mvtnorm
  • XML
  • rjags
  • ggmap
  • gridExtra

An ugly workaround is to install these packages from source (e.g., remotes::install_github("cran/coda")), but that's hardly a good solution.

@ashiklom
Copy link
Author

@ashiklom ashiklom commented Feb 23, 2021

A possibly related issue is that install.packages("coda", type = "source") still installs the binary package, even though the URL it's pulling from has .../src/... in it. So I suspect there really is something wonky with the RStudio Package Manager.

EDIT: Sorry, my ignorance! It seems the default repository only has binary packages. To get source packages, you have to swap out the URL for https://packagemanager.rstudio.com/all/344. So the following works just fine:

install.packages("coda", repos = "https://packagemanager.rstudio.com/all/344")

Is there a way to specify multiple repositories and switch between them based on the install.packages type argument?

@ashiklom
Copy link
Author

@ashiklom ashiklom commented Feb 23, 2021

I've opened a discussion about this on RStudio Community here: https://community.rstudio.com/t/binary-packages-for-different-r-versions-in-rstudio-package-manager/97042

The more I think about this, the more I suspect that the best solution is to only do binary package installs on the latest version of R (or whatever version the RStudio Package Manager uses for package builds) and have all older versions default to source installs. This would significantly increase install times, and might force users to install additional system dependencies for certain packages, but with the trade-off of full reproducibility.

@cboettig
Copy link
Member

@cboettig cboettig commented Feb 23, 2021

Thanks. Correct, RSPM supports both source and binary builds for snapshots but does not support the type="source" link, which is designed only for windows/mac, so the RSPM solution is effectively a very clever hack which uses different URLs for source and binaries as you've noted. You can see these, e.g. https://packagemanager.rstudio.com/client/#/repos/2/overview which provides both the binary and source links. (@ashiklom you probably know this, I'm noting here for reference cause I gotta look it up often too!) Also, snapshots have thankfully recently adopted the a date-based syntax, instead of the numbered snapshots, so we might want to switch to anyway.

(Yes you can always specify multiple repositories but it won't help here, install.packages() gets the most recent version from any version available across the list of repos. The only way within RSPM to toggle between source and binary is via changing the URL)

I can confirm that:

 install.packages("coda", repo="https://packagemanager.rstudio.com/cran/__linux__/focal/2020-10-09")

gets coda built under R 4.0.3, despite the fact that R 4.0.3 was not released until 2020-10-10. This does appear to be a bug in RSPM. I could be wrong, but think we should report this to https://github.com/rstudio/r-system-requirements. The RStudio community thread is a good idea, but reading that it really sounds like it is a Rocker issue, while actually having nothing to do with Rocker, and we might get the RSPM developer eyeballs there sooner.

The behavior is definitely a concern. I know RSPM binaries will often lags behind CRAN by a few days (building all those binaries obviously takes time!) but I'm rather surprised to see a binary built with a version of R that wasn't yet released on the snapshot day, to me that suggests a potentially deeper issue with how the snapshots are being done.

I recognize that falling back on the source installs is a reasonable fix, but I'd rather not make it the default. Users expect binary installs, and some downstream uses effectively require them (e.g. avoiding time-outs in binder.org builds). For us, I think a more acceptable compromise would be to fall back to the next earlier snapshot, e.g. 2020-10-07, aka snapshot 343, three days before the new release instead of one day before the release, as our pin. It looks like builds still use 4.0.2 R there.

https://github.com/rocker-org/rocker-versioned2/blob/master/stacks/core-4.0.2.json#L15

Just wondering, are you seeing any negative consequences from this behavior? I see the potential concern, but I think most packages work just fine this way.

@ashiklom
Copy link
Author

@ashiklom ashiklom commented Feb 23, 2021

Thanks @cboettig ! This is really useful.

I could be wrong, but think we should report this to https://github.com/rstudio/r-system-requirements.

Sounds good to me. I tried briefly to find a RSPM GitHub repo, but nothing came up on my first pass. This seems like it would make sense.

For us, I think a more acceptable compromise would be to fall back to the next earlier snapshot, e.g. 2020-10-07, aka snapshot 343, three days before the new release instead of one day before the release, as our pin.

That seems like a great solution to me, assuming that RSPM only builds packages once using whatever version of R corresponds to that particular snapshot. I'm not sure whether that's actually the case -- it would be good to get an RSPM person to confirm. (I could imagine a use case where R is updated but packages are frozen, though it's a bit of a stretch, and I don't know why you wouldn't want to freeze both).

Just wondering, are you seeing any negative consequences from this behavior? I see the potential concern, but I think most packages work just fine this way.

We haven't encountered any actual performance issues because of this, and I don't expect that we would. However, because this triggers a Warning from R CMD check, it does cause any CI workflows that use R CMD check to fail. For example, see this recent build from PEcAn; error reproduced below for posterity:

  4 warnings found in modules/allometry.
  3 notes found in modules/allometry.
  ── R CMD check comparison ─────────────────── PEcAn.allometry 1.7.1 / 1.7.0 ────
  Status: BROKEN
  
  ── Fixed
  
  ✔ checking for unstated dependencies in ‘tests’ ... WARNING
  
  ── Still failing
  
  ✖ checking for missing documentation entries ... WARNING
  ✖ checking Rd \usage sections ... WARNING
  ✖ checking files in ‘vignettes’ ... WARNING
  ✖ checking DESCRIPTION meta-information ... NOTE
  ✖ checking dependencies in R code ... NOTE
  ✖ checking R code for possible problems ... NOTE
  
  Error: Please fix these and resubmit.
  Execution halted
  ── Newly failing
  
  ✖ checking whether package ‘PEcAn.allometry’ can be installed ... WARNING
  
  R check of modules/allometry reports the following new problems. Please fix these and resubmit:
  checking whether package ‘PEcAn.allometry’ can be installed ... WARNING
  Found the following significant warnings:
    Warning: package ‘coda’ was built under R version 4.0.3
    Warning: package ‘mvtnorm’ was built under R version 4.0.3
  See ‘/tmp/RtmprcTQo0/PEcAn.allometry.Rcheck/00install.out’ for details.
  make: *** [Makefile:139: .check/modules/allometry] Error 1

(Because we have a huge backlog of non-fatal code problems, we only have PRs fail on new warnings, and have been gradually chipping away at resolving existing ones. That's why the first few WARNINGs there are ignored).

@bdeitte
Copy link

@bdeitte bdeitte commented Feb 24, 2021

RSPM developer eyeballs are on this ticket now- it was raised up with the team. We don't have any good ideas right now on the core issue here, "Passing type=source to a binary URL should return source package", which is now logged as an issue. Bringing up the coda part of this discussion as it happened after people were looking at it.

@ashiklom
Copy link
Author

@ashiklom ashiklom commented Feb 25, 2021

Thanks @bdeitte ! Nice to know RSPM folks are in the loop on this!

the core issue here, "Passing type=source to a binary URL should return source package", which is now logged as an issue.

To me, the more pressing issue is knowing reliably what version of R was used to build package binaries for a given RSPM URL. It's already pretty straightforward to use a source-only URL if necessary. As far as I can tell, the current behavior is unpredictable at the package level -- e.g., for snapshot 344 in this thread, coda is installed with R 4.0.3 but, say, dplyr is compiled with R 4.0.2.

@tylfin
Copy link

@tylfin tylfin commented Feb 25, 2021

Hey @ashiklom, I'm also on the RSPM team and happy to fill in some details here

assuming that RSPM only builds packages once using whatever version of R corresponds to that particular snapshot. I'm not sure whether that's actually the case -- it would be good to get an RSPM person to confirm. (I could imagine a use case where R is updated but packages are frozen, though it's a bit of a stretch, and I don't know why you wouldn't want to freeze both).

Our binary building process basically works like this:

  1. Every weekday (M-F) we take a full snapshot of CRAN and produce a diff
  2. From that diff, we'll update and build a binary for any packages that changed inside the package dependency graph

Whenever a new version of R is released, we update our builders accordingly, so the binary versions will only be updated if the packages changed. That leads to this case you're seeing:

for snapshot 344 in this thread, coda is installed with R 4.0.3 but, say, dplyr is compiled with R 4.0.2.

From what I understand, this is a similar behavior to what happens with CRAN package binaries (for macOS and Windows). I think in this case it would be safe to ignore that pattern in your build system, Warning: package ‘[pkg]’ was built under R version 4.0.*.

We're also happy to consider changes to this process if it improves reliability

@cboettig
Copy link
Member

@cboettig cboettig commented Feb 25, 2021

Thanks @tylfin , this is super helpful. I agree that the warning is probably innocuous but it is kinda a bad look for us in an image which is claiming to be 'version stable' since R decides to display that warning prominently on every library call... It's a good point that this is similar to how binaries behave on CRAN as well, and we're kinda asking for it by trying to set our snapshot dates at the last moment that said version was current.

To mitigate the issue, we're considering shifting our policy to snapshot a week before the release rather than 'the nearest day before'. Do you think that would give enough time on balance that we wouldn't pick up a lot of binaries built with the subsequent R version? Do you have any stats on the distribution of lag times between CRAN snapshot day and the binary builds for those packages?

@tylfin
Copy link

@tylfin tylfin commented Feb 26, 2021

Do you have any stats on the distribution of lag times between CRAN snapshot day and the binary builds for those packages?

I believe we snapshot at 4AM UTC and the builds get kicked off immediately, taking between 1-4 hours depending on then number of packages that need to be rebuilt.

Do you think that would give enough time on balance that we wouldn't pick up a lot of binaries built with the subsequent R version?

I don't see why it wouldn't be a viable solution to pin to an earlier snapshot that has the correct package and R versions, if that works for you.

@cboettig
Copy link
Member

@cboettig cboettig commented Feb 26, 2021

Thanks Tyler, it's awesome that the full rebuild is that fast. But based on that timeline it seems a bit surprising that the snapshot dated 2020-10-09 would not have finished building coda using R 4.0.2 before R 4.0.3 was released the following day, 2020-10-10 though, right? or am I missing something in how this works?

@tylfin
Copy link

@tylfin tylfin commented Mar 1, 2021

It looks like the pin above is for 344 which I believe corresponds to a snapshot from 2020-10-13. You might want to test with the URL https://packagemanager.rstudio.com/cran/2020-10-09

As an aside, we are planning on simplifying the calendar view so it's easier to pick dates exactly in the future.

@cboettig
Copy link
Member

@cboettig cboettig commented Mar 3, 2021

my mistake! That's perfect. Things look good with that URL, and we'll stick with the new date-based format to avoid confusion in the future. Thanks again, really nice.

@ashiklom the image should be rebuilt with the new URL, so will close this out too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants