Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
155 lines (99 sloc) 7.73 KB
---
title: About URLs in DESCRIPTION
date: '2019-12-10'
slug: urls
tags:
- description
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
cache = TRUE,
warning = FALSE)
```
Among DESCRIPTION usual fields is the free-text `URL` field where package authors can store various links: to the development website, docs, upstream tool, etc. In this post, we shall explain why storing URLs in DESCRIPTION is important, where else you should add URLs and what kind of URLs are stored in CRAN packages these days.
## Why put URLs in DESCRIPTION?
In the following we'll assume your package has some sort of online development repository ([GitHub](https://happygitwithr.com/big-picture.html)? [GitLab](https://gitlab.com/HeidiSeibold/setup-git-rstudio-gitlab#setup-git-rstudio-gitlab)? R-Forge?) and a documentation website (handily created via [pkgdown](https://pkgdown.r-lib.org/)?). Adding URLs to your package's online homes is extremely useful for several reasons.
> As a side note: Yes, you can store several URLs under URL, even if the field name is singular. [See for instance `rhub`'s DESCRIPTION](https://github.com/r-hub/rhub/blob/c51e0704ae7011536757f151144415323f4d77b9/DESCRIPTION#L15) :link: :link:
```
URL: https://github.com/r-hub/rhub, https://r-hub.github.io/rhub/
```
Why put URLs in DESCRIPTION?
* It will help your users find your package's pretty documentation from the CRAN page, instead of just the less pretty PDF manual.
* Likewise, from the CRAN page your contributors can directly find where to submit patches.
* If your package has a package-level man page, and it should (e.g. as drafted by [`usethis::use_package_doc()`](https://usethis.r-lib.org/reference/use_package_doc.html) and then generated by [`roxygen2`](https://roxygen2.r-lib.org/articles/rd.html#packages)), then after typing say `library("rhub")` and then `?rhub`, your users will find the useful links.
* Other tools such as [`helpdesk`](https://github.com/yonicd/helpdesk) and the [`pkgsearch` RStudio addin](https://r-hub.github.io/pkgsearch/reference/index.html#section-rstudio-addin) can help surface the URLs you store in DESCRIPTION.
* Indirectly, having a link to the docs website and development repo will increase their page rank, [see useful comments in this discussion](https://community.rstudio.com/t/pkgdown-site-seo/26706), so potential users and contributors find them more easily by simply searching for your package.
* __edit after [Hugo Gruson's comment](https://github.com/r-hub/blog/issues/47#issuecomment-564065538)__ *"It's also worth noting that these URLs are used by `pkgdown`:*
- *the GitHub URL is used to automatically find out the repo containing the source code, and display a handy GitHub icon which links to the repo on the right of the top navbar (with the default theme).*
- *the URL to the `pkgdown` website is used to crosslink to this site from other `pkgdown` websites, as [explained in this vignette](https://pkgdown.r-lib.org/articles/linking.html#across-packages), creating a decentralized mesh for documentation, instead of relying on a centralized entity such as http://rdrr.io/."*
* __edit after [Jim Hester's tweet](https://twitter.com/jimhester_/status/1204411373708611584)__ *"Another reason for URLs in DESCRIPTION, [`remotes::install_dev()`](https://remotes.r-lib.org/reference/install_dev.html) uses them to find the dev repo!"*
> Quick tip, you can add GitHub URLs (URL and BugReports) to DESCRIPTION by running [`usethis::use_github_links()`](https://usethis.r-lib.org/reference/use_github_links.html). :rocket:
## Where else put your URLs?
For the same reasons as previously, you should make the most of all places that can store your package's URL(s). Have you put your package's docs URL
* in [the pkgdown config file](https://pkgdown.r-lib.org/reference/build_site.html#yaml-config), if that's how you built it?
* in the [GitHub repo website field](https://stackoverflow.com/questions/7757751/how-do-you-change-a-repository-description-on-github) (you need admin rights), or the equivalent for your development platform, [e.g. GitLab](https://docs.gitlab.com/ee/user/project/settings/)?
Have you used any of your package's URLs
* [In your public message about your package, e.g. as an answer to someone's question](https://community.rstudio.com/t/pkgdown-site-seo/26706/6)?
* In the slides of [your talk about the package](https://www.tidyverse.org/blog/2018/07/carpe-talk/)?
Don't miss any opportunity to point users and contributors in the right direction!
## What URLs do people use in DESCRIPTION files of CRAN packages?
In the following, we shall parse the URL field of the CRAN packages database.
```{r urls1}
db <- tools::CRAN_package_db()
db <- tibble::as_tibble(db[, c("Package", "URL")])
db <- dplyr::distinct(db)
```
There are `r nrow(db)` packages on CRAN at the time of writing, among which `r sum(is.na(db$URL))` with something written in the URL field. We can parse this data.
```{r urls2}
db <- db[!is.na(db$URL),]
library("magrittr")
# function from https://github.com/r-hub/pkgsearch/blob/26c4cc24b9296135b6238adc7631bc5250509486/R/addin.R#L490-L496
url_regex <- function() "(https?://[^\\s,;>]+)"
find_urls <- function(txt) {
mch <- gregexpr(url_regex(), txt, perl = TRUE)
res <- regmatches(txt, mch)[[1]]
if(length(res) == 0) {
return(list(NULL))
} else {
list(unique(res))
}
}
db %>%
dplyr::group_by(Package) %>%
dplyr::mutate(actual_url = find_urls(URL))%>%
dplyr::ungroup() %>%
tidyr::unnest(actual_url) %>%
dplyr::group_by(Package, actual_url) %>%
dplyr::mutate(url_parts = list(urltools::url_parse(actual_url))) %>%
dplyr::ungroup() %>%
tidyr::unnest(url_parts) %>%
dplyr::mutate(scheme = trimws(scheme)) -> parsed_db
```
There are `r length(unique(parsed_db$Package))` with at least one valid URL.
What are the packages with most links?
```{r urlsno}
mostlinks <- dplyr::count(parsed_db, Package, sort = TRUE)
mostlinks
```
The package with the most links in `URL` is `r gluedown::md_link(mostlinks$Package[1], glue::glue(" https://CRAN.R-project.org/package={mostlinks$Package[1]}"))`.
What is the most popular scheme, [http or https](https://howhttps.works/)?
```{r scheme}
dplyr::count(parsed_db, scheme, sort = TRUE)
```
There is a bit less that one third of http links.
Can we identify popular domains?
```{r domain}
dplyr::count(parsed_db, domain, sort = TRUE)
```
GitHub seems to be the most popular development platform, as least from this sample of CRAN packages that indicate an URL. It is also possible that some developers set up their own GitLab server with a own domain.
Many packages link to `www.r-project.org` which is not very informative, or to their own CRAN page which can be informative.
Other relatively popular domains are sites.google.com and arxiv.org. There are problably links to other venues for scientific publications than arxiv.org. What about doi.org?
```{r doi}
dplyr::filter(parsed_db, domain %in% c("doi.org", "dx.doi.org")) %>%
dplyr::select(Package, actual_url)
```
The ["earlier but no longer preferred" dx.doi.org](https://www.doi.org/factsheets/DOIProxy.html) is still in use.
[rOpenSci docs server](https://ropensci.org/technotes/2019/06/07/ropensci-docs/) also make an appearance.
> Note that you could do a similar analysis of the BugReports field. We'll leave that as an exercise to the reader. :wink:
## Conclusion
In this note, we explained why having URLs in DESCRIPTION of your package can help users and contributors find the right venues for their needs, and we had a look at URLs currently stored in the DESCRIPTIONs of CRAN packages, in particular discussing current popular domains. How do _you_ ensure the users of your package can find its best online home(s)? How do you look for online home(s) of the packages you use?
You can’t perform that action at this time.