Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Autolink urls on man generated for DESCRIPTION. #1265

Closed
dieghernan opened this issue Oct 29, 2021 · 7 comments · Fixed by #1315
Closed

Suggestion: Autolink urls on man generated for DESCRIPTION. #1265

dieghernan opened this issue Oct 29, 2021 · 7 comments · Fixed by #1315
Labels
feature a feature request or enhancement rd ✍️

Comments

@dieghernan
Copy link
Contributor

dieghernan commented Oct 29, 2021

Hi,

I am producing the Rd file of my packages with the following :

#' @keywords internal
"_PACKAGE"

And it is fine, however I find that the text on the Description field of my DESCRIPION is not autolinked, as it happens in the rest of my documents (maybe it is not parsing as .md?).

Would you be open to explore this? I prepared a reprex to check how the same text is parsed diferently depending if it is placed on a regular .R file or in the DESCRIPTION file:

desc_text <- paste(
  "Tools to extract information from the Intergovernmental Organizations",
  "('IGO') Database , version 3, provided by the Correlates of War Project",
  "<https://correlatesofwar.org/>. See also Pevehouse, J. C. et al. (2020), ",
  " <doi:10.1177/0022343319881175>. Version 3 includes information from ",
  " 1815 to 2014."
)

text_fun <- paste0(
  "#' igoR: Intergovernmental Organizations Database
        #'
        #' @description ",
  desc_text,
  "\n#' @md\nfoo <- function() {}"
)



out <- roxygen2::roc_proc_text(
  roxygen2::rd_roclet(),
  text_fun
)[[1]]

# Autolinking on urls and dois
out
#> % Generated by roxygen2: do not edit by hand
#> % Please edit documentation in ./<text>
#> \name{foo}
#> \alias{foo}
#> \title{igoR: Intergovernmental Organizations Database}
#> \usage{
#> foo()
#> }
#> \description{
#> Tools to extract information from the Intergovernmental Organizations ('IGO') Database , version 3, provided by the Correlates of War Project \url{https://correlatesofwar.org/}. See also Pevehouse, J. C. et al. (2020),   \url{doi:10.1177/0022343319881175}. Version 3 includes information from   1815 to 2014.
#> }

# url and doi as a link \url{ } ;)
# Now create a package and use "_PACKAGE" for documenting

temp_pkg <- file.path(tempdir(), "test")
usethis::create_package(temp_pkg, open = FALSE)
#> v Creating 'C:/Users/XXXX/AppData/Local/Temp/RtmpOUqitX/test/'
#> v Setting active project to 'C:/Users/XXXX/AppData/Local/Temp/RtmpOUqitX/test'
#> v Creating 'R/'
#> v Writing 'DESCRIPTION'
#> Package: test
#> Title: What the Package Does (One Line, Title Case)
#> Version: 0.0.0.9000
#> Authors@R (parsed):
#>     * First Last <first.last@example.com> [aut, cre] (YOUR-ORCID-ID)
#> Description: What the package does (one paragraph).
#> License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a
#>     license
#> Encoding: UTF-8
#> Roxygen: list(markdown = TRUE)
#> RoxygenNote: 7.1.2
#> v Writing 'NAMESPACE'
#> v Setting active project to '<no active project>'

desc_text <- paste(
  "Tools to extract information from the Intergovernmental Organizations",
  "('IGO') Database , version 3, provided by the Correlates of War Project",
  "<https://correlatesofwar.org/>. See also Pevehouse, J. C. et al. (2020), ",
  " <doi:10.1177/0022343319881175>. Version 3 includes information from ",
  " 1815 to 2014."
)

desc::desc_set(
  Description = desc_text,
  file = file.path(tempdir(), "test", "DESCRIPTION")
)
#> Package: test
#> Title: What the Package Does (One Line, Title Case)
#> Version: 0.0.0.9000
#> Authors@R (parsed):
#>     * First Last <first.last@example.com> [aut, cre] (YOUR-ORCID-ID)
#> Description: Tools to extract information from the Intergovernmental
#>     Organizations ('IGO') Database , version 3, provided by the Correlates
#>     of War Project <https://correlatesofwar.org/>. See also Pevehouse, J.
#>     C. et al. (2020), <doi:10.1177/0022343319881175>. Version 3 includes
#>     information from 1815 to 2014.
#> License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a
#>     license
#> Encoding: UTF-8
#> Roxygen: list(markdown = TRUE)
#> RoxygenNote: 7.1.2


source <- "
  #' @keywords internal
  \"_PACKAGE\""


write(source, file.path(temp_pkg, "R", "test-package.R"))


roxygen2::roxygenise(temp_pkg)
#> i Loading test
#> Writing test-package.Rd

readLines(con = file.path(temp_pkg, "man", "test-package.Rd"))[8:10]
#> [1] "\\description{"                                                                                                                                                                                                                                                                                          
#> [2] "Tools to extract information from the Intergovernmental Organizations ('IGO') Database , version 3, provided by the Correlates of War Project <https://correlatesofwar.org/>. See also Pevehouse, J. C. et al. (2020), <doi:10.1177/0022343319881175>. Version 3 includes information from 1815 to 2014."
#> [3] "}"

# No links :(

Created on 2021-10-29 by the reprex package (v2.0.1)

@gaborcsardi
Copy link
Member

Indeed it is not parsed as md, because it is not supposed to be md. IDK if there is a good solution here.

FWIW one workaround is to avoid using "_PACKAGE" and instead use @docType package and create the page of the package manually.

@Bisaloo
Copy link
Contributor

Bisaloo commented Mar 29, 2022

I understand the choice of not parsing as md but I think it would make sense to convert URLs and <doi:...> or <arxiv:...> tags since those are supported (and encouraged) by CRAN.

@hadley
Copy link
Member

hadley commented Mar 29, 2022

This would require some extra manipulation in object_defaults.package(). Do we have a list of these special urls? Neither is mentioned in Writing R extensions.

@hadley hadley added feature a feature request or enhancement rd ✍️ labels Mar 29, 2022
@dieghernan
Copy link
Contributor Author

dieghernan commented Mar 29, 2022

Hi, as per the Checklist for CRAN submissions, I think those special urls are just <doi: ...> and <arXiv:...>, aside of regular urls.

@Bisaloo
Copy link
Contributor

Bisaloo commented Mar 29, 2022

Yes, it seems to be all. Maëlle identified the source for this feature and I cannot see anything else: ropensci/roweb3#56 (comment)

@dieghernan
Copy link
Contributor Author

So out of curiosity, I made a small analysis of the Description field of the DESCRIPTION files of all the CRAN packages (based on this StackOverflow question). I don't want to overload the issue, so I leave here a quick summary:

  1. 8.148 CRAN packages (out of 19.073 at date 2022-03-30, i.e. 42.72%) have a string on the Description field that matches the pattern <text__numbers_and_symbols>. I used the regex "<(\\S*?)>", that still returns some false positives, but I just went along with it.
  2. There are a total of 13.531 strings with the corresponding pattern. Out of curiosity, the package lactcurves has 45!! strings with the pattern.
  3. I try to find the domain on the pattern <text__numbers_and_symbols>, using as delimiters . and :. Lots of false positives here, but the most common patterns are:
domain n porc cumsum cumporc
<doi: 7862 58.104 58.104 58.104
<https: 2902 21.447 79.551 79.551
<arXiv: 940 6.947 86.498 86.498
<DOI: 869 6.422 92.920 92.920
<http: 785 5.801 98.721 98.721
<ISBN: 38 0.281 99.002 99.002
<arxiv: 37 0.273 99.275 99.275
<isbn: 15 0.111 99.386 99.386
<10. 10 0.074 99.460 99.460
<doi. 10 0.074 99.534 99.534

Full reprex

library(stringr)
library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 4.1.2
library(tidyr, warn.conflicts = FALSE)
#> Warning: package 'tidyr' was built under R version 4.1.2

cran <- tools::CRAN_package_db()


cran_mod <- cran %>%
  mutate(date_pack = as.Date(str_split_fixed(Packaged, " ", 2)[, 1])) %>%
  select(Package, date_pack)


extract_urls <- str_extract_all(cran$Description,
  # Regex can be improved ...
  regex("<(\\S*?)>"),
  simplify = TRUE
) %>%
  as_tibble() %>%
  bind_cols(cran_mod, .) %>%
  filter(V1 != "")
#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
#> Using compatibility `.name_repair`.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.



paste0(
  "Number of packages with <pattern> in Description: ",
  nrow(extract_urls),
  " out of ", nrow(cran), " (",
  round(100 * nrow(extract_urls) / nrow(cran), 2),
  "%)"
)
#> [1] "Number of packages with <pattern> in Description: 8148 out of 19073 (42.72%)"



# Analyse patterns
allurls <- extract_urls %>%
  pivot_longer(
    cols = -c(Package, date_pack),
    values_to = "url"
  ) %>%
  # Remove blanks. etc
  filter(url != "" & !is.na(url))

# Total urls enclosed by <>
nrow(allurls)
#> [1] 13531

# Split by pattern, I would use : and . for splitting

allurls <- allurls %>%
  mutate(split = gsub(".", ".|",
                      gsub(":", ":|", url, fixed = TRUE),
                      fixed = TRUE),
         domain = str_split_fixed(split, "\\|", n = 2)[, 1]
         )

alldomains <- allurls %>%
  group_by(domain) %>%
  summarise(
    max_date = max(date_pack, na.rm = TRUE),
    n = n()
  ) %>%
  arrange(desc(n))

alldomains <- alldomains %>%
  mutate(
    porc = round(100 * n / sum(alldomains$n), 3),
    cumporc = cumsum(porc)
  )

head(alldomains, 10)
#> # A tibble: 10 x 5
#>    domain  max_date       n   porc cumporc
#>    <chr>   <date>     <int>  <dbl>   <dbl>
#>  1 <doi:   2022-03-30  7862 58.1      58.1
#>  2 <https: 2022-03-30  2902 21.4      79.6
#>  3 <arXiv: 2022-03-30   940  6.95     86.5
#>  4 <DOI:   2022-03-29   869  6.42     92.9
#>  5 <http:  2022-03-29   785  5.80     98.7
#>  6 <ISBN:  2022-03-15    38  0.281    99.0
#>  7 <arxiv: 2022-03-15    37  0.273    99.3
#>  8 <isbn:  2022-02-20    15  0.111    99.4
#>  9 <10.    2020-12-05    10  0.074    99.5
#> 10 <doi.   2020-07-28    10  0.074    99.5

Created on 2022-03-30 by the reprex package (v2.0.1)

@hadley
Copy link
Member

hadley commented Mar 30, 2022

Thanks for the investigation! Do you also want to do a PR? 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement rd ✍️
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants