Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check 'repos' repositories and CRAN for URL if package isn't installed #108

Merged
merged 33 commits into from
Oct 29, 2021

Conversation

ARawles
Copy link
Contributor

@ARawles ARawles commented Aug 19, 2021

This PR fixes #106.

Previously, packages needed to be installed for downlit to find the URL field. With this PR, downlit will go through 3 stages to try and find the URL, only proceeding to the next one if the previous one was unsuccessful:

  1. Check for a local install and DESCRIPTION file
  2. Check the user's 'repos' option (excluding any entries like 'cran') and check the PACKAGES file. This does mean that the PACKAGES file will need to include the URL field, which is not included by default, for this stage to be successful.
  3. Check CRAN, using the tools::CRAN_package_db() function, which includes more meta-data than the utils::available.packages() function.

A test case has been written to check that URLS for packages that are not installed are found correctly, however it currently relies on the BMRSr package not being installed. Although it's unlikely that this package would ever be installed when these tests are being run as part of a CI process, it's not perfect.

Still left to consider is the possibility of allowing the user to register a custom function that could be used to find a PACKAGE url (see Issue for discussion).

@ARawles
Copy link
Contributor Author

ARawles commented Aug 19, 2021

Because of the use of CRAN_package_db() function from the tools package, this requires the minimum R version to be changed to 3.4 (instead of 3.2).

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Comments below.

DESCRIPTION Show resolved Hide resolved
R/metadata.R Outdated Show resolved Hide resolved
R/metadata.R Outdated Show resolved Hide resolved
R/metadata.R Outdated Show resolved Hide resolved
R/metadata.R Outdated Show resolved Hide resolved
R/metadata.R Outdated Show resolved Hide resolved
R/test-helpers.R Outdated Show resolved Hide resolved
When getting a URL from CRAN or a custom repo using the `url_from_cran()` or `url_from_custom_repo()` functions respectively, you'll always get the result passed from `parse_urls()`. And now `parse_urls()` can handle empty values, the return value is the same for empty character string and NA values - both of which could be returned from the dataframe check depending on whether the package and/or URL is found.
@ARawles
Copy link
Contributor Author

ARawles commented Aug 20, 2021

One thing I wanted to get your thoughts on @hadley is the time penalty associated with checking CRAN every time downlit can't find the local package - here's a comparison of the current speed compared to the PR version:

downlit 0.2.1.9000

> microbenchmark::microbenchmark(
+     # local package
+     package_urls("dplyr"),
+     # non-installed package
+     package_urls("BMRSr"), times = 10)
Unit: microseconds
                  expr    min       lq      mean   median       uq      max neval
 package_urls("dplyr") 2792.2 2809.402 2878.3715 2834.352 2958.201 3065.801    10
 package_urls("BMRSr")  199.7  203.102  250.0211  240.251  259.201  404.601    10

downlit PR

> microbenchmark::microbenchmark(
+     # local package
+ package_urls("dplyr"),
+     # custom-repo package
+ package_urls("customrepopackage"),
+     # non-installed package
+ package_urls("BMRSr"),
+ times = 10)
Unit: milliseconds
                  expr         min        lq       mean    median          uq       max neval
 package_urls("dplyr")    2.876101    3.2662    5.65371    3.9897    4.470101   22.3326    10
 package_urls("customrepopackage") 4056.489800 4153.0438 4224.55182 4199.6713 4292.924902 4434.3823    10
 package_urls("BMRSr") 5720.558001 5786.0249 5911.00847 5876.3085 5950.955101 6240.9524    10

We're looking at about 4 seconds for packages from a custom repo and about 6 seconds for package from CRAN. I don't imagine that there will actually be that many packages that will get this far (i.e. I think most of them will be installed), but I think it's still a consideration. What do you think?

@ARawles
Copy link
Contributor Author

ARawles commented Aug 20, 2021

Following the introduction of the memoise package, the benchmarks are looking a lot more healthy:

> results <- microbenchmark::microbenchmark(
+     package_urls("mycustomrepo"), times = 5
+ )
> results
Unit: microseconds
                         expr   min    lq   mean median      uq     max neval
 package_urls("mycustomrepo") 381.1 440.6 661506  629.2 11362.3 3294717     5
> results$time
[1] 3294716900   11362300     629200     440600     381100

The first run is taking us about 3 seconds but then everything after that is basically instantaneous.

…if a URL cannot be found.

Previously, downlit would always default to "https://rdrr.io" when a custom URL for the package couldn't be found. Now you can register a new function via `register_default_package_ref()` to be used.
@ARawles
Copy link
Contributor Author

ARawles commented Aug 23, 2021

I've also had a go at adding the ability to register a custom function that will be used when no URL can be found for the package (on CRAN or otherwise).

It's a first attempt to any comments/improvements are welcome 👍

@hadley
Copy link
Member

hadley commented Oct 29, 2021

I've reverted the custom function registration out because I think that belongs in a separate PR, and I can merge your earlier work without more discussion.

@hadley
Copy link
Member

hadley commented Oct 29, 2021

@ARawles I've tweaked your code quite a bit, but I figured out how to add a test with a custom repo, so I'm fairly confident that I haven't broken anything. If you want to have a go at the custom function in another PR, you should be able to cherry-pick the commits of of this branch; LMK if you need any hints on how to do that.

@hadley hadley merged commit d5622d6 into r-lib:main Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow users to set alternative default link
2 participants