Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative to MRAN #773

Closed
betatim opened this issue Sep 4, 2019 · 17 comments
Closed

Alternative to MRAN #773

betatim opened this issue Sep 4, 2019 · 17 comments

Comments

@betatim
Copy link
Member

betatim commented Sep 4, 2019

Proposed change

Provide an alternative way of pinning R packages to using MRAN dates. MRAN has been flaky/unavailable several times this week which means repos that use a MRAN date can't be built.

This issue is about collecting alternatives and their pros/cons. As well as maybe some information on whether the current performance of MRAN is a sign for things to come or just an intermittent problem.

Who would use this feature?

All R users.

Who can do this work?

To work on this you need to understand how CRAN and R packaging work. It would help if you have some understanding or connection to MRAN to judge if the current outages are connected to a policy change.

@trallard
Copy link
Contributor

trallard commented Sep 4, 2019

In the past, I have successfully used Pacman to pin versions of R packages https://github.com/trinker/pacman

That way the packages and their versions can be declared like so in the install_packages.R file:

p_install_version(
    c("pacman", "testthat"),
    c("0.2.0", "0.9.1")
)

Pros:

  • It does what you'd expect from it, install the specific version of the package regardless of it being in CRAN, GitHub or the such
  • Pacman can also be used to check dependencies of a certain R package (as in other R packages this depends as opposed to system dependencies) p_depends(lattice)
  • It does not add additional install time

Cons:

  • As opposed to MRAN it is not a snapshot of the R packages in time but will install a pinned version of a package (pretty much the same as pip install pandas==0.25.

@sje30
Copy link

sje30 commented Sep 4, 2019

MRAN seems down right now :-( which seems to be blocking my repo2docker runs

CC: @nuest

Step 40/67 : RUN R --quiet -e "install.packages('devtools', repos='https://mran.microsoft.com/snapshot/2018-02-01', method='libcurl')" && R --quiet -e "devtools::install_github('IRkernel/IRkernel', ref='0.8.11')" && R --quiet -e "IRkernel::installspec(prefix='$NB_PYTHON_PREFIX')"
 ---> Running in 68d9b2021834
> install.packages('devtools', repos='https://mran.microsoft.com/snapshot/2018-02-01', method='libcurl')
Installing package into ‘/srv/rlibs’
(as ‘lib’ is unspecified)
Warning: unable to access index for repository https://mran.microsoft.com/snapshot/2018-02-01/src/contrib:
  cannot open URL 'https://mran.microsoft.com/snapshot/2018-02-01/src/contrib/PACKAGES'

@betatim
Copy link
Member Author

betatim commented Sep 4, 2019

MRAN has been up&down several time s over the last few days :(

Could we keep this issue as a place to only collect ideas for alternatives with their pros and cons. I think that would help keep us on track and keep an overview of the options. So I'd propose that we don't discuss the options until we have a few nor use it to discuss the current MRAN outage.

@DrAndiLowe
Copy link

Might be useful: https://moj-analytical-services.github.io/platform_user_guidance/conda-package-management.html

@daroczig
Copy link

daroczig commented Sep 6, 2019

We were also affected by the MRAN outage as using it to pin package version in our prod Docker builds (running several times a day), and although using a caching proxy etc to overcome temporarily network issues, the most recent 48 hrs downtime was really worrisome, so decided to look for alternatives at system1.com as well.

On the other hand, MRAN was pretty stable in the past ~2 years (since we use it at scale) and MS seems to take care of it nowadays as well, so not sure if we indeed need to change for something else.

Regarding pacman::p_install_version, I think you can pin min version, and if that's not met, it will install the most recent version instead what you have specified, so not the best match.

Installing fixed package versions from the archive folder of CRAN works, but that's still a 3rd party dependency, just like MRAN (as files can go away from CRAN).

At system1, we were thinking about starting our own CRAN time-machine implementation that is pretty easy to set up with daily ZFS snapshops (hosting this on S3 or similar would have a huge overhead due to file dupes) and a webserver, but needs infra and not sure who would cover the monthly fees in the long run. We were also thinking about applying for an R Consortium grant to cover these costs, but not sure if that's a reasonable idea if MRAN is out there already.

For now, I think we will go with mirroring some MRAN daily snapshots to our own infra via miniCRAN or similar to make sure we are not in a huge trouble if MRAN goes down, but will keep an eye on this thread for sure -- thanks a lot for the above great ideas, and please let me know if I / we can help with anything (and I will be happy to share our experiences later if that would be useful).

@daroczig
Copy link

daroczig commented Sep 9, 2019

FTR seems like MRAN is also using ZFS in the background:

image

From https://www.huber.embl.de/dsc/slides/R_Reproducibility-DSC.pdf

@daroczig
Copy link

The R Consosrtium just opened the Call for Proposals for R (infra) projects at https://www.r-consortium.org/blog/2019/09/13/get-funded-by-the-r-consortium-call-for-proposals-open-now

@nuest
Copy link
Contributor

nuest commented Sep 17, 2019

@daroczig Interesting idea about hosting your own MRAN. Is there a repo/document about your plans? I think with MRAN being used more widely, especially by other platforms such as BinderHub, a few mirrors would not hurt.

@daroczig
Copy link

daroczig commented Oct 5, 2019

MRAN seems to be down again with The specified CGI application encountered an error and the server terminated the process. :(

Asking for help at https://twitter.com/daroczig/status/1180483076918521859?s=09

Fortunately, I've got a local mirror created with miniCRAN -- I'm cleaning up the related script and will share here so that others can use it as well.

I'm still not sure where to host a general MRAN mirror, as seems to require quite some space that I don't have.

@daroczig
Copy link

daroczig commented Oct 7, 2019

MRAN was fixed 🎉

My script to maintain a local mirror: https://gist.github.com/daroczig/ef858d11b159f390b35fbbf8300b378d

@manics
Copy link
Member

manics commented Feb 15, 2020

MRAN seems to occasionally miss a snapshot, though other days are working:

This caused the stencila-r build to fail travis today:
https://github.com/jupyter/repo2docker/blob/8d490cf9d80f963f3746ab2bd04b9fb183b9bab9/repo2docker/buildpacks/r.py#L138-L140

Step 47/73 : RUN R --quiet -e "install.packages('shiny', repos='https://mran.microsoft.com/snapshot/2020-02-13', method='libcurl')"
 ---> Running in b342adcb6960
> install.packages('shiny', repos='https://mran.microsoft.com/snapshot/2020-02-13', method='libcurl')
Installing package into ‘/srv/rlibs’
(as ‘lib’ is unspecified)
Warning: unable to access index for repository https://mran.microsoft.com/snapshot/2020-02-13/src/contrib:
  cannot open URL 'https://mran.microsoft.com/snapshot/2020-02-13/src/contrib/PACKAGES'
Warning message:
package ‘shiny’ is not available (for R version 3.6.2) 

For this case (no date specified, use 2 days before) we could automatically try a URL for the N prior days too?

Does anyone have an overview of:

  • How often failures occur due to a single missing recent snapshot
  • Does the whole MRAN server often go down
  • Do old snapshots which should exist temporarily go missing (whilst other dates continue working) and for how long are they missing
  • Have things improved since this issue was opened?

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/r-packages-in-install-r-fail-to-load-mran-down/3350/2

@betatim
Copy link
Member Author

betatim commented Feb 18, 2020

I don't have any hard data on how often MRAN goes down. Anecdotally it goes down "much more often" than say the conda package mirrors. I think the last outage was 3-5months ago for MRAN and lasted a few days?

There was discussion at some point if MRAN should/could start thinning out the dates for which they have a snapshot. Not sure where that discussion went/ended though.

I think we should consider encouraging people to move to using an environment.yml for their R projects and get their packages via conda-forge.

@DrAndiLowe
Copy link

There is an issue with encouraging people to move to using an environment.yml for their R projects and get their packages via conda-forge: the repository contains only a fraction of what is available in CRAN and snapshotted by MRAN. Currently, the CRAN package repository features 15403 available packages. R packages in conda-forge have their names prefixed by "r-". There are about 2000 packages in conda-forge that match this pattern, so only about 13%. Several of my favourite R packages are not in conda-forge.

As an aside, if you're thinking of using an alternative way of installing R packages, it would be especially nice if it also worked with R packages not in CRAN (and therefore not currently snapshotted by MRAN), such as R packages in Bioconductor and Neuroconductor.

@daroczig
Copy link

daroczig commented Jul 8, 2020

As an alternative to Microsoft's MRAN, there's RStudio's public pkg manager as well now: https://packagemanager.rstudio.com/client/#/repos/1/overview

@yuvipanda
Copy link
Collaborator

#1104 switches to packagemanger.rstudio.com for newer R dates and versions.

@manics
Copy link
Member

manics commented Jan 24, 2022

Fixed in #1104 !

Note a follow-up issue though: #1116

@manics manics closed this as completed Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants