R: Using official cloud URL for CRAN #2956

HenrikBengtsson · 2017-01-28T22:48:05Z

cran.r-project.org runs on a single old-school server in Austria
and could potentially be overloaded if "everyone" used it.

cloud.r-project.org is a cloud-based repository that "automatic redirection to servers worldwide [...]", cf. https://cran.r-project.org/mirrors.html.

I assume, that cloud.* can be scale up as needed. Out of the official CRAN mirror, this should be the safest one to pick if a static CRAN mirror is needed.

cran.r-project.org runs on a single old-school server in Austria and could potentially be overloaded if "everyone" used it. cloud.r-project.org is a cloud-based repository that "automatic redirection to servers worldwide [...]", cf. https://cran.r-project.org/mirrors.html. I assume, that cloud.* can be scale up as needed. Out of the official CRAN mirror, this should be the safest one to pick if a static CRAN mirror is needed.

HenrikBengtsson · 2017-01-28T22:50:56Z

@adamjstewart, I assume you're the author of most of the r-*/package.py as well. I see that you're using:

$ grep -h -F " url " -r --include="package.py" r-* | sed 's|/src/.*||g' | sort -u
    url      = "https://cran.r-project.org
    url      = "https://cran.rstudio.com

right now. I'd like to suggest that you point these to https://cloud.r-project.org instead.

adamjstewart · 2017-01-29T00:26:02Z

@HenrikBengtsson Haha, I've actually never written an R package before #2952. The reason I show up as the author of all of them is because I wrote an RPackage base class and converted all of the existing packages to remove duplicated code in #2761. @glennpj and @JavierCVilla have actually written most of our R packages.

From what I can tell, it looks like the URLs are identical aside from the hostname, so what you propose would be an easy switch. Want to do it for every package?

HenrikBengtsson · 2017-01-29T00:30:39Z

I see - I guess I picked the wrong R package to find authorship.

Yes, I would update all those CRAN domains to point to cloud.r-project.org. It would also be less confusing to users and others cut'n'pasting from existing ones.

adamjstewart · 2017-01-29T00:46:23Z

I would probably wait until #2952 is merged and then convert them all at once. I can do it if you want, or you can submit a PR/add to this one.

adamjstewart · 2017-01-29T00:51:51Z

Also, I'm not sure if you have any sway with the R maintainers, but there repository system is a nightmare. I like the uniformity, but Spack generally assumes that all versions of a package will be found in a single directory, and there won't be any other software in that directory. Since R hosts the latest version of every package in one directory, and older versions in a different directory, we need a list_url for every R package. And fetching is very slow because we need to look through every link on the contrib page to find what we want. Of course, we could change how Spack searches for software to fit R better, but in the meantime, spidering for new versions is very slow. For example, compare the speed of zlib and rpart:

$ time spack versions zlib
==> Safe versions (already checksummed):
  1.2.10  1.2.8
==> Remote versions (not yet checksummed):
  1.2.11   1.2.5.3  1.2.4.1  1.2.3.3  1.2.2    1.2.0.4  1.1.1  1.0.2  0.79
  1.2.9    1.2.5.2  1.2.4    1.2.3.2  1.2.1.2  1.2.0.3  1.1.0  1.0.1  0.71
  1.2.7.3  1.2.5.1  1.2.3.9  1.2.3.1  1.2.1.1  1.2.0.2  1.0.9  0.99   0.9
  1.2.7.2  1.2.5    1.2.3.8  1.2.3    1.2.1    1.2.0.1  1.0.8  0.95   0.8
  1.2.7.1  1.2.4.5  1.2.3.7  1.2.2.4  1.2.0.8  1.2.0    1.0.7  0.94
  1.2.7    1.2.4.4  1.2.3.6  1.2.2.3  1.2.0.7  1.1.4    1.0.6  0.93
  1.2.6.1  1.2.4.3  1.2.3.5  1.2.2.2  1.2.0.6  1.1.3    1.0.5  0.92
  1.2.6    1.2.4.2  1.2.3.4  1.2.2.1  1.2.0.5  1.1.2    1.0.4  0.91

real	0m1.022s
user	0m0.409s
sys	0m0.125s

$ time spack versions r-rpart
==> Safe versions (already checksummed):
  4.1-10
==> Remote versions (not yet checksummed):
  4.1-9  4.1-1   3.1-52  3.1-43  3.1-34  3.1-26  3.1-17  3.1-6  3.0-1  1.0-6
  4.1-8  4.1-0   3.1-51  3.1-42  3.1-33  3.1-24  3.1-16  3.1-5  3.0-0  0.4-2
  4.1-7  4.0-3   3.1-50  3.1-41  3.1-32  3.1-23  3.1-15  3.1-4  2.0-3
  4.1-6  4.0-2   3.1-48  3.1-39  3.1-31  3.1-22  3.1-13  3.1-3  2.0-2
  4.1-5  4.0-1   3.1-47  3.1-38  3.1-30  3.1-21  3.1-12  3.1-2  2.0-1
  4.1-4  3.1-55  3.1-46  3.1-37  3.1-29  3.1-20  3.1-9   3.1-1  1.1-2
  4.1-3  3.1-54  3.1-45  3.1-36  3.1-28  3.1-19  3.1-8   3.1-0  1.1-1
  4.1-2  3.1-53  3.1-44  3.1-35  3.1-27  3.1-18  3.1-7   3.0-2  1.0-7

real	0m9.348s
user	0m3.221s
sys	0m0.178s

If the R maintainers simply added the latest version to the archive directory, we would only need to look in a single directory and there wouldn't be any competing packages to ignore.

tgamblin · 2017-01-29T00:59:59Z

@adamjstewart: activation should get fixed but I was planning on replacing it with a notion of "environments", kind of like Conda has, where you could have not only Python extensions but any other package activated in the same environment. Ideally that would become a common thing for users to do (like lightweight containers or virtualenv).

tgamblin · 2017-01-29T01:00:56Z

@adamjstewart: can you deduce the list_url from the URL? We could make list_url a property (or something like it) in RPackage.

adamjstewart · 2017-01-29T01:14:05Z

@tgamblin For CRAN, every package looks like:

homepage = "https://cran.r-project.org/package={name}"
url      = "https://cloud.r-project.org/src/contrib/{name}_{version}.tar.gz"
list_url = "https://cloud.r-project.org/src/contrib/Archive/{name}"

So technically we could put all of these in RPackage. The only problem is that {name} can contain capital letters or periods, which wouldn't be in self.name. We would need a cran_name attribute or something, similar to the pypi attribute I proposed for Python. Luckily the R/CRAN ecosystem seems much more consistent that Python/PyPI.

tgamblin · 2017-01-29T01:22:35Z

@adamjstewart: cool. if we can do class properties as described here I think you could have the derived package define cran_name and define the other three defined in terms of it in RPackage.

HenrikBengtsson · 2017-01-29T18:07:30Z

@adamjstewart, trying to get CRAN, which is run by a small group of volunteers with a large load, to change to directory structure for us is probably not worth it.

Instead of scraping the CRAN servers, I think the METACRAN API (mentioned in #2951 (comment)) could be a must faster alternative. It should contain all the information you need.

PS. For clarification, and you'll hear once in a while in the R forums: CRAN (Comprehensive R Archive Network) != R. This means that CRAN is the de facto standard online repository (and gatekeeper / FOSS protector, licenses, ...) for contributed R packages. The R core team developers and maintains R and distribute it via CRAN. They try to keep these two separated, but there are a few hard coded ties between the two, e.g. the R code knows about CRAN. It's good to know about this if you reach out on the R forums and ask for help or wanna discuss, say, CRAN.

adamjstewart · 2017-01-30T13:46:28Z

@tgamblin I think I want to wait until you get a chance to rework FetchStrategies before I work on cran and pypi.

cran.r-project.org runs on a single old-school server in Austria and could potentially be overloaded if "everyone" used it. cloud.r-project.org is a cloud-based repository that "automatic redirection to servers worldwide [...]", cf. https://cran.r-project.org/mirrors.html. I assume, that cloud.* can be scale up as needed. Out of the official CRAN mirror, this should be the safest one to pick if a static CRAN mirror is needed.

tgamblin approved these changes Jan 31, 2017

View reviewed changes

tgamblin merged commit 4f297f4 into spack:develop Jan 31, 2017

adamjstewart mentioned this pull request May 16, 2017

Added R 3.4.0 #4260

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R: Using official cloud URL for CRAN #2956

R: Using official cloud URL for CRAN #2956

HenrikBengtsson commented Jan 28, 2017

HenrikBengtsson commented Jan 28, 2017

adamjstewart commented Jan 29, 2017

HenrikBengtsson commented Jan 29, 2017

adamjstewart commented Jan 29, 2017

adamjstewart commented Jan 29, 2017

tgamblin commented Jan 29, 2017

tgamblin commented Jan 29, 2017

adamjstewart commented Jan 29, 2017

tgamblin commented Jan 29, 2017

HenrikBengtsson commented Jan 29, 2017

adamjstewart commented Jan 30, 2017

R: Using official cloud URL for CRAN #2956

R: Using official cloud URL for CRAN #2956

Conversation

HenrikBengtsson commented Jan 28, 2017

HenrikBengtsson commented Jan 28, 2017

adamjstewart commented Jan 29, 2017

HenrikBengtsson commented Jan 29, 2017

adamjstewart commented Jan 29, 2017

adamjstewart commented Jan 29, 2017

tgamblin commented Jan 29, 2017

tgamblin commented Jan 29, 2017

adamjstewart commented Jan 29, 2017

tgamblin commented Jan 29, 2017

HenrikBengtsson commented Jan 29, 2017

adamjstewart commented Jan 30, 2017