Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling dependencies of skeleton reports #69

Closed
TimTaylor opened this issue Mar 23, 2023 · 9 comments · Fixed by #72
Closed

Handling dependencies of skeleton reports #69

TimTaylor opened this issue Mar 23, 2023 · 9 comments · Fixed by #72

Comments

@TimTaylor
Copy link

TimTaylor commented Mar 23, 2023

Is the usage of renv in episoap a temporary fix whilst the package is in development?

I think the skeleton/template reports shipped with episoap should work with the latest CRAN version of used packages but unsure how you would best monitor changes. I think it is better for users to manage their own R environments.

Similarly where a dependency is not present (see line below from skeleton report) I think it would be better to use knitr::knit_exit() as opposed to installing a package from an Rmarkdown, e.g

could be replaced by

if (!requireNamespace("desired_package", quietly = TRUE)) {
  warning(call. = FALSE, "Please install {desired_package} to build these vignettes.")
  knitr::knit_exit()
}
@Bisaloo
Copy link
Member

Bisaloo commented Mar 23, 2023

Is the usage of renv in episoap a temporary fix whilst the package is in development?

No, the use of renv of probably here to stay.

  • given the large list of packages used, and the high probability of breaking changes, many templates could be regularly broken or require constant changes, which is unappealing to users and difficult to maintain. Even worse, if (or more precisely, "when") this package is released on CRAN, it would potentially require very regular updates, at the risk of going against CRAN policy.
  • given the target audience of this package, I don't think it's realistic to expect them to manage their own R environments. There are been many back and forth in how we best manage dependencies in an invisible way: pacman -> install.packages() -> pacman (again) -> standalone lockfile without renv -> renv::use(). I realize doing things "in an invisible way" is not appealing to most advanced users but I do believe it's what we need here. From my own experience working with some of the intended users, dependency management is a huge source of frustration for them and we should remove this hurdle.
  • I think users should use some kind of dependency management system to ensure reproducibility of their reports in the long-term. We might as well already pick the best (IMO) one for them and provide the infrastructure.

I would like to highlight the fact that we're not using the standard renv workflow here though. I agree this would likely be too intrusive and I have received complaints of users non familiar with renv when working with the standard workflow. renv::use() will install / copy packages on-the-fly when the report is rendered and will not interfere in any way with the user library setting or location.

I think it would be better to use knitr::knit_exit() as opposed to installing a package from an Rmarkdown

For the reason exposed above, I don't believe it's a good strategy to almost immediately confront users with errors. Additionally, I have times and times observed that many people don't read error messages and even less warnings, and would likely discard this package as "not working".
Something I have been considering and that may be a better solution would be to simply add renv to Imports since I'm here making the case that it's such an important part of this package.
Does this sound better to you?


I realize all of this is highly irregular compared to standard package development and I would never do this kind of things in any other package but I think episoap is by nature unique and shouldn't be considered as a standard package. One of the proofs of this is that the package almost doesn't include any code in R/.

However, I believe you indirectly raise a very good point about dependency version management and update. My plan at the moment is to update all dependencies to the latest version just before a new episoap release on CRAN. But I now realise this absolutely needs to be documented somewhere and I'll fix it now.

@Bisaloo
Copy link
Member

Bisaloo commented Mar 23, 2023

cc @avallecam if you want to jump in since you had opinions on dependency management and suggested reverting to pacman.

@TimTaylor
Copy link
Author

TimTaylor commented Mar 23, 2023

  • given the large list of packages used, and the high probability of breaking changes, many templates could be regularly broken or require constant changes, which is unappealing to users and difficult to maintain. Even worse, if (or more precisely, "when") this package is released on CRAN, it would potentially require very regular updates, at the risk of going against CRAN policy.

Yes this is tricky and a reasonable concern. I wonder if there could be coordination with the like of EpiNow2 and EpiEstim when they do releases. Bog standard dplyr/data.table is unlikely to change so it's likely mainly the epi focussed packages that may be an issue.

An alternative is a potential separation of reports from the package would help. Reports could be downloaded and the package website could link to the current website to display what these look like. I'm not sure if I'm keen on this but it would be a workable solution.

  • given the target audience of this package, I don't think it's realistic to expect them to manage their own R environments. There are been many back and forth in how we best manage dependencies in an invisible way: pacman -> install.packages() -> pacman (again) -> standalone lockfile without renv -> renv::use(). I realize doing things "in an invisible way" is not appealing to most advanced users but I do believe it's what we need here. From my own experience working with some of the intended users, dependency management is a huge source of frustration for them and we should remove this hurdle.
  • I think users should use some kind of dependency management system to ensure reproducibility of their reports in the long-term. We might as well already pick the best (IMO) one for them and provide the infrastructure.

It is a huge source of frustration but think this is due to it not being taught and too often ignored. Although impressive I think renv is a band aid to this problem and actually an added complication for beginners which obscures the way R packages / CRAN evolve. I really think there should be much more focus on teaching fundamentals of how R/CRAN manage packages and how this is very different from say python.

cc @avallecam if you want to jump in since you had opinions on dependency management and suggested reverting to pacman.

I'd have similar feelings to pacman I'm afraid. I think in the longer term you do a disservice to beginners by not teaching them about package management.

I know this is a different view to many but it's where my viewpoint as evolved to over the last couple of years after trying to make things "easier" as opposed to focussing more on educating :-)

@Bisaloo
Copy link
Member

Bisaloo commented Mar 23, 2023

I see two issues that we should clearly distinguish:

  1. automatically installing packages for users. Although very convenient in a pinch to make sure your code is working out of the box, I agree it runs the risk of contributing to the confusion between installing and loading, between install.packages() and library(). This is what pacman does and this is why I've always had very mixed feelings about it. I agree the best solution long-term would educate users about it. Note that here, there is a clear distinction between these two steps: packages are installed in the lockfile chunk, and loaded afterwards. I explicitly avoided setting attach = TRUE in renv::use() precisely for this reason. We are still automatically installing packages because we don't want users to jump through hoops to be able to start working with the reports

  2. managing versions and protecting users from breaking changes. This is an annoying issue even for users that clearly understand the difference between installing and loading packages, as well as a crucial point for reproducibility. As such, this should be a less controversial point. I believe episoap must follow best practices here and provide pipelines that are reproducible, with specific versions pinned.

If we really wanted 2. without 1., the solution would be to revert to providing a lockfile alongside the template, but not using it by default, just keeping it as a safety net.

I do believe the current solution is the best trade-off though as it will provide a hassle-free experience without hiding too much of the reality behind package management.

@TimTaylor
Copy link
Author

  1. managing versions and protecting users from breaking changes. This is an annoying issue even for users that clearly understand the difference between installing and loading packages, as well as a crucial point for reproducibility. As such, this should be a less controversial point. I believe episoap must follow best practices here and provide pipelines that are reproducible, with specific versions pinned.

I guess I'm just advocating for episoap to be more of a template package and for you not to have to worry about reproducibility within it (it's also a hard task from a dev responsibility side-point). I think it is better handled directly by the user (with epiverse providing education on possible solutions rather than wrapping them for them).

@Bisaloo
Copy link
Member

Bisaloo commented Mar 23, 2023

Even if we decided to delegate this responsibility to the user (which I don't think is a good strategy. One of the main values of episoap IMO is to handle this, rather than having each user trying to solve it on their own), there is still the problem of making sure we don't distribute broken templates.

Coordinating with selected maintainers would be a huge maintenance burden but probably still not enough as we were still getting deprecation warnings from the tidyverse, and as we expect the number of provided pipelines/templates to grow.

I think fetching the templates from the web is a worse solution because it would still require that this package is constantly in a very active maintenance mode, which doesn't seem realistic.

Additionally, we would likely want to have some versioning of the templates and release notes, on top of a standard way to describe their goals, their dependencies, etc. In other words, we would end up building an entire software repository & package management system outside of CRAN, which is much more than what we have capacity to handle. I also think it would be worse in terms of transparency and educating users about dependency management in the R world.

Overall, I understand your concerns and I don't want to dismiss them but using renv::use(), although it has some drawbacks, seems like the best solution by far in terms of transparency, maintainability, usability. I believe the way forward is to own this choice and clearly document it in a design document, with relevant pointers to references and resources.

@TimTaylor
Copy link
Author

It is not a problem. I'm not keen on this approach but we can agree to disagree.

@avallecam
Copy link
Member

I agree not to use pacman for this. I just got it clearer yesterday after looking for debates about it on twitter (1, 2), and understanding that the needs of script-writer and script-runner to install and load packages can differ, and unintentionally affect the runner. In that sense, I agree with the usage of renv for this.

Also, this discussion highlights an interesting training need that will be associated with episoap about "package management" and "reproducible environments". Sharing more explicitly about it, as a training module or vignette, to competent R users could help to understand the decisions and increase awareness on the topic. The outcome of it could also be to learn how to modify and update a template report to use it with your own package versions if wanted.

For context, my preference towards pacman was along the same lines as the epirhandbook editorial decisions, but I didn't have the criteria of the first paragraph. Now I think it can even increase the burden of maintenance, pushing to keep things updated but vulnerable to unexpected breaks. Also, its usage is being propagated to more readers like me, as I recall from this issue in epicontacts. So, it adds to the necessity to talk about these topics and show alternatives like in the same twitter thread.

@Bisaloo
Copy link
Member

Bisaloo commented Apr 6, 2023

Thanks for the additional references, Andree!

You are both right that training on this matter would be helpful no matter what we choose to do here.

I can confirm that pacman is the most problematic choice here because it always forces updates to the latest version and can suddenly cause issues when all was running fine until now even if the user didn't manually update their dependencies.

I have drafted a short section in the design vignette in #72 to justify the renv choice and linked to this issue for counterpoints. I'll merge the PR at the end of next week so please post any comments by then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants