Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

available_packages_filters disregarded during install #645

Open
dgkf-roche opened this issue Jun 10, 2024 · 6 comments
Open

available_packages_filters disregarded during install #645

dgkf-roche opened this issue Jun 10, 2024 · 6 comments

Comments

@dgkf-roche
Copy link

Not sure if this is necessarily a bug - maybe it's an intentional omission.

We were hoping to leverage this as a universal mechanism for applying a selection criteria to a repository of packages based on quality measures over in pharmaR/pharmapkgs.

Using a simple example, I tried to make a function that ideally would only permit (at least without some intentional side-stepping of common install tools) installation of packages that start with "c".

options(
  available_packages_filters = list(
    add = TRUE, 
    starts_with_c = function(ap) ap[startsWith(ap[,"Package"], "c"),]
  )
)

head(available.packages(ignore_repo_cache = TRUE), 3)
#      Package Version ...
# c060 "c060"  "0.3-0" ...
# c212 "c212"  "0.98"  ...
# c2c  "c2c"   "0.1.0" ...

pak::cache_clean()
remove.packages("pkgconfig")
pak::pkg_install("pkgconfig")
# succeeds

Though I would expect this to fail, given that the filter should prevent these packages from being available.

Substituting with a function(ap) browser() function also never hits a debug session, so my impression is that available.packages is either used internally but with some default filters, or an alternative mechanism is used that doesn't implement this behavior.

I'm curious to hear your thoughts. It would be a tremendously valuable feature for us.

@dgkf-roche
Copy link
Author

Hey @gaborcsardi, is this something that you'd be interested in supporting? On our end, the filtering feature of available.packages, and its ubiquity across most mechanisms of interfacing with repositories, is a core feature of our repo tools.

@gaborcsardi
Copy link
Member

Yes, I would like to have a way to prioritize repositories, but it would be another way, as we don't use available.packages().

@gaborcsardi
Copy link
Member

What kind of filters do you use in available_packages_filters?

@dgkf-roche
Copy link
Author

dgkf-roche commented Jun 24, 2024

This is related to our work in our (currently private) fork of r-lib/rhub re-purposed for regulated industries. As packages are updated, we calculate a number of quantifiable indicators of the package's quality. We embed these indicators inside the PACKAGES file with the hope of allowing the end-user to specify some quality selection criteria. We've piloted using the available_packages_filters option as a universal mechanism of applying a policy.

There's a brief demo in the README of this package

We use some helper functions in the demo to simplify the syntax, but it amounts to doing something like:

options(available_packages_filters = list(add = TRUE, function(ap) {
  dplyr::as_tibble(ap) |>
    dplyr::select(
      QualityLineCoverage >= 0.5,
      QualityExportCoverage >= 0.9,
      QualityExportDocumentationCoverage >= 0.9
    )
}))

Here the logic is just a series of conditions, but we'd like to keep it arbitrary - it could be a decision tree or some aggregation of different qualities.

The ability to provide a function that can arbitrarily filter the available packages pulled from repos in options(repos) is pretty core to our design and our hope is that this can be applied by an administrator, ensuring that all well-intentioned user-facing mechanisms of installing packages apply the filtering criteria.

Speaking only for my company, we also use this behavior to force R to prioritize repositories by their order in options(repos). I've informally chatted with folks from other companies that mentioned they had to enforce this policy as well, so I think it's a rather frequent pitfall that needs to be addressed when locking down systems.

@gaborcsardi
Copy link
Member

So you basically want to be able to specify arbitrary conditions on arbitrary fields from your package metadata. This is certainly possible, but needs quite a lot of changes, as currently we don't even read in all metadata from PACKAGES* files.

@dgkf-roche
Copy link
Author

So you basically want to be able to specify arbitrary conditions on arbitrary fields from your package metadata

Yes, exactly. Glad to hear you're open to supporting it - please let us know if there's anything we can take on to help support it.

From what I saw in the PACKAGES parsing, it looked like it supported up to ~1000 fields which should be plenty for our needs. Are there constraints on the field names? We haven't set any standard yet, so we can definitely consider a convention that makes your life easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants