Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R: Using official cloud URL for CRAN #2956

Merged
merged 1 commit into from
Jan 31, 2017
Merged

R: Using official cloud URL for CRAN #2956

merged 1 commit into from
Jan 31, 2017

Conversation

HenrikBengtsson
Copy link
Contributor

cran.r-project.org runs on a single old-school server in Austria
and could potentially be overloaded if "everyone" used it.

cloud.r-project.org is a cloud-based repository that "automatic redirection to servers worldwide [...]", cf. https://cran.r-project.org/mirrors.html.

I assume, that cloud.* can be scale up as needed. Out of the official CRAN mirror, this should be the safest one to pick if a static CRAN mirror is needed.

cran.r-project.org runs on a single old-school server in Austria
and could potentially be overloaded if "everyone" used it.

cloud.r-project.org is a cloud-based repository that "automatic redirection to servers worldwide [...]", cf. https://cran.r-project.org/mirrors.html.

I assume, that cloud.* can be scale up as needed. Out of the official CRAN mirror, this should be the safest one to pick if a static CRAN mirror is needed.
@HenrikBengtsson
Copy link
Contributor Author

@adamjstewart, I assume you're the author of most of the r-*/package.py as well. I see that you're using:

$ grep -h -F " url " -r --include="package.py" r-* | sed 's|/src/.*||g' | sort -u
    url      = "https://cran.r-project.org
    url      = "https://cran.rstudio.com

right now. I'd like to suggest that you point these to https://cloud.r-project.org instead.

@adamjstewart
Copy link
Member

@HenrikBengtsson Haha, I've actually never written an R package before #2952. The reason I show up as the author of all of them is because I wrote an RPackage base class and converted all of the existing packages to remove duplicated code in #2761. @glennpj and @JavierCVilla have actually written most of our R packages.

From what I can tell, it looks like the URLs are identical aside from the hostname, so what you propose would be an easy switch. Want to do it for every package?

@HenrikBengtsson
Copy link
Contributor Author

I see - I guess I picked the wrong R package to find authorship.

Yes, I would update all those CRAN domains to point to cloud.r-project.org. It would also be less confusing to users and others cut'n'pasting from existing ones.

@adamjstewart
Copy link
Member

I would probably wait until #2952 is merged and then convert them all at once. I can do it if you want, or you can submit a PR/add to this one.

@adamjstewart
Copy link
Member

Also, I'm not sure if you have any sway with the R maintainers, but there repository system is a nightmare. I like the uniformity, but Spack generally assumes that all versions of a package will be found in a single directory, and there won't be any other software in that directory. Since R hosts the latest version of every package in one directory, and older versions in a different directory, we need a list_url for every R package. And fetching is very slow because we need to look through every link on the contrib page to find what we want. Of course, we could change how Spack searches for software to fit R better, but in the meantime, spidering for new versions is very slow. For example, compare the speed of zlib and rpart:

$ time spack versions zlib
==> Safe versions (already checksummed):
  1.2.10  1.2.8
==> Remote versions (not yet checksummed):
  1.2.11   1.2.5.3  1.2.4.1  1.2.3.3  1.2.2    1.2.0.4  1.1.1  1.0.2  0.79
  1.2.9    1.2.5.2  1.2.4    1.2.3.2  1.2.1.2  1.2.0.3  1.1.0  1.0.1  0.71
  1.2.7.3  1.2.5.1  1.2.3.9  1.2.3.1  1.2.1.1  1.2.0.2  1.0.9  0.99   0.9
  1.2.7.2  1.2.5    1.2.3.8  1.2.3    1.2.1    1.2.0.1  1.0.8  0.95   0.8
  1.2.7.1  1.2.4.5  1.2.3.7  1.2.2.4  1.2.0.8  1.2.0    1.0.7  0.94
  1.2.7    1.2.4.4  1.2.3.6  1.2.2.3  1.2.0.7  1.1.4    1.0.6  0.93
  1.2.6.1  1.2.4.3  1.2.3.5  1.2.2.2  1.2.0.6  1.1.3    1.0.5  0.92
  1.2.6    1.2.4.2  1.2.3.4  1.2.2.1  1.2.0.5  1.1.2    1.0.4  0.91

real	0m1.022s
user	0m0.409s
sys	0m0.125s
$ time spack versions r-rpart
==> Safe versions (already checksummed):
  4.1-10
==> Remote versions (not yet checksummed):
  4.1-9  4.1-1   3.1-52  3.1-43  3.1-34  3.1-26  3.1-17  3.1-6  3.0-1  1.0-6
  4.1-8  4.1-0   3.1-51  3.1-42  3.1-33  3.1-24  3.1-16  3.1-5  3.0-0  0.4-2
  4.1-7  4.0-3   3.1-50  3.1-41  3.1-32  3.1-23  3.1-15  3.1-4  2.0-3
  4.1-6  4.0-2   3.1-48  3.1-39  3.1-31  3.1-22  3.1-13  3.1-3  2.0-2
  4.1-5  4.0-1   3.1-47  3.1-38  3.1-30  3.1-21  3.1-12  3.1-2  2.0-1
  4.1-4  3.1-55  3.1-46  3.1-37  3.1-29  3.1-20  3.1-9   3.1-1  1.1-2
  4.1-3  3.1-54  3.1-45  3.1-36  3.1-28  3.1-19  3.1-8   3.1-0  1.1-1
  4.1-2  3.1-53  3.1-44  3.1-35  3.1-27  3.1-18  3.1-7   3.0-2  1.0-7

real	0m9.348s
user	0m3.221s
sys	0m0.178s

If the R maintainers simply added the latest version to the archive directory, we would only need to look in a single directory and there wouldn't be any competing packages to ignore.

@tgamblin
Copy link
Member

@adamjstewart: activation should get fixed but I was planning on replacing it with a notion of "environments", kind of like Conda has, where you could have not only Python extensions but any other package activated in the same environment. Ideally that would become a common thing for users to do (like lightweight containers or virtualenv).

@tgamblin
Copy link
Member

@adamjstewart: can you deduce the list_url from the URL? We could make list_url a property (or something like it) in RPackage.

@adamjstewart
Copy link
Member

@tgamblin For CRAN, every package looks like:

homepage = "https://cran.r-project.org/package={name}"
url      = "https://cloud.r-project.org/src/contrib/{name}_{version}.tar.gz"
list_url = "https://cloud.r-project.org/src/contrib/Archive/{name}"

So technically we could put all of these in RPackage. The only problem is that {name} can contain capital letters or periods, which wouldn't be in self.name. We would need a cran_name attribute or something, similar to the pypi attribute I proposed for Python. Luckily the R/CRAN ecosystem seems much more consistent that Python/PyPI.

@tgamblin
Copy link
Member

@adamjstewart: cool. if we can do class properties as described here I think you could have the derived package define cran_name and define the other three defined in terms of it in RPackage.

@HenrikBengtsson
Copy link
Contributor Author

@adamjstewart, trying to get CRAN, which is run by a small group of volunteers with a large load, to change to directory structure for us is probably not worth it.

Instead of scraping the CRAN servers, I think the METACRAN API (mentioned in #2951 (comment)) could be a must faster alternative. It should contain all the information you need.

PS. For clarification, and you'll hear once in a while in the R forums: CRAN (Comprehensive R Archive Network) != R. This means that CRAN is the de facto standard online repository (and gatekeeper / FOSS protector, licenses, ...) for contributed R packages. The R core team developers and maintains R and distribute it via CRAN. They try to keep these two separated, but there are a few hard coded ties between the two, e.g. the R code knows about CRAN. It's good to know about this if you reach out on the R forums and ask for help or wanna discuss, say, CRAN.

@adamjstewart
Copy link
Member

@tgamblin I think I want to wait until you get a chance to rework FetchStrategies before I work on cran and pypi.

@tgamblin tgamblin merged commit 4f297f4 into spack:develop Jan 31, 2017
@adamjstewart adamjstewart mentioned this pull request May 16, 2017
diaena pushed a commit to diaena/spack that referenced this pull request May 26, 2017
cran.r-project.org runs on a single old-school server in Austria
and could potentially be overloaded if "everyone" used it.

cloud.r-project.org is a cloud-based repository that "automatic redirection to servers worldwide [...]", cf. https://cran.r-project.org/mirrors.html.

I assume, that cloud.* can be scale up as needed. Out of the official CRAN mirror, this should be the safest one to pick if a static CRAN mirror is needed.
amklinv pushed a commit that referenced this pull request Jul 17, 2017
cran.r-project.org runs on a single old-school server in Austria
and could potentially be overloaded if "everyone" used it.

cloud.r-project.org is a cloud-based repository that "automatic redirection to servers worldwide [...]", cf. https://cran.r-project.org/mirrors.html.

I assume, that cloud.* can be scale up as needed. Out of the official CRAN mirror, this should be the safest one to pick if a static CRAN mirror is needed.
healther pushed a commit to electronicvisions/spack that referenced this pull request Jul 26, 2017
cran.r-project.org runs on a single old-school server in Austria
and could potentially be overloaded if "everyone" used it.

cloud.r-project.org is a cloud-based repository that "automatic redirection to servers worldwide [...]", cf. https://cran.r-project.org/mirrors.html.

I assume, that cloud.* can be scale up as needed. Out of the official CRAN mirror, this should be the safest one to pick if a static CRAN mirror is needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants