New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot very slow on larger package libraries #347
Comments
|
Unfortunately, the snapshot code is especially crufty and barring a full rewrite I think it's going to be quite difficult to do this correctly and without regression. :/ This commit contains a minimal change that should greatly improve performance in this case, however: The new assumption made here is that the package library is in a consistent state, and so recursive dependency search of packages installed within a library is unnecessary. This implies that, if a user were to get themselves into a state where a required package dependency was not installed, Packrat may no longer detect that and just literally snapshot what the user has. Since an R user would have to take some explicit action to put their library into an inconsistent state like this, I think that in practice this won't be an issue. |
|
@kevinushey I've run into this same issue and noticed that the change ac1a18c mentioned above looks like it has since been reverted: Line 192 in f987891
My temporary solution to this has been: Then for snapshots: I am only using the first workflow mentioned in the README (snapshot as-you-go). By turning off the whole dependency inference functionality, it runs quite a bit faster. |
|
Yes, we unfortunately had to revert that change as it broke a couple internal RStudio applications using Packrat (some required recursive dependencies were not being discovered with that change). I think this could ultimately be fixed with some intelligent caching of discovered dependencies. |
|
Would you consider a global option that turns off all inference functionality? For people who exclusively work in the snapshot-as-you-go workflow, it might speed up functionality. |
|
I would be open to that, but the default would need to stay off. |
|
That would be great to have - like everyone else, I love packrat - i think it's a fantastic tool. However, it does slow things down dramatically - up to the point of defeating the purpose. In my case, I have a project with an .Rnw file, calling R code with subfunctions and the constant error messages about failing to parse that file for dependencies are not just annoying - installing any package can take up to 20 or even 30 min... that's way too much. So I have tried the new packrat 0.50 option, options("packrat.dependency.discovery.disabled"=TRUE), but i still feel (mistakenly?) that the same dependency checks are happening (rstudio/rstudio#2859) - it's just no longer displaying messages... (the right panel with packrat and libraries keep flickering, Rstudio keeps hanging, and top right showing Git panel shows file appearing / disappearing - everything as before, i just do no longer get the failed to parse ....Rnw for dependencies) - I am still a newbie despite using packrat since almost the beginning - so apologies if anything i typed is dead wrong, it's my user impression! This said, with packrat off, everything runs lightening fast, narrowing down the issue... many thanks. PS: I am on alienware m17x running windows 10 and everything is fully updated, e.g. packrat 0.50, R 3.5.3, latest rstudio, etc... |
|
You might be interested in trying rstudio/renv which addresses many of these challenges while maintaining packrats API |
|
I had the same experience as @tchevri, particularly when one project contained multiple packages and R Markdown documents: flickering right panel as libraries are rebuilt/loaded, but with little output as to what was going on to diagnose. I assumed it was due to knitting during the attempt to discover dependencies each time I snapshotted or installed libraries. Since I was using the snapshot-as-you-go approach, I just ignored all R/Rmd source files and manually snapshot whenever I felt the need to do so. @slopp it looks like rstudio/renv still attempts to crawl R files, but it sounds like that might only be in the |
|
We're not recommending a transition to
In addition, |
|
That sounds very promising. I will gladly chip in to test it out |
|
very interesting - just read the renv vignette |
|
It's tough to give a timeline, but I would say hopefully by summer we'll be getting ready to submit to CRAN. Before that, we would welcome any and all user testing if you have the time to spare. In particular, it'd be useful to know if there's something that Packrat does (and you rely on) that we haven't yet implemented in |
|
The changes in #615 avoid repeatedly processing the same package dependency. Most of the cost in Given a packrat library with 130+ package created by installing tidyverse, rmarkdown, shiny, and devtools: |
When the Packrat library is large, the
.snapshotImpl()function can take a very long time. This is relevant because this function is run on RStudio's startup for Packrat projects, in order to annotate records for the files pane with their Packrat status. For example, for a project with 143 packages in the library, this takes almost three quarters of a minute:All of the time appears to be spent in a recursive-descent
getPackageRecords, which performs a huge amount of disk I/O and eventually generates full a package dependency tree weighing in at 2.4mb.packrat/R/pkg.R
Lines 236 to 252 in b13dcaa
It seems like this could be made much more efficient by either:
a) re-using an existing dependency subtree if we've already generated it, or
b) changing the callers of getPackageRecords to accept a flat list of packages rather than a tree, and changing the recursive descent code to append packages to end of the list if not present.
The text was updated successfully, but these errors were encountered: