Parallel styling #682

krlmlr · 2020-10-20T07:02:19Z

via furrr.

Closes #277.

Closes #617 (=supersedes).

Things to consider:

check how suggested and imported CRAN packages use styler and what it means for parallelisation.
check impact on speed with pre-commit.
check impact with /style command in r-lib/actions.

lorenzwalthert · 2020-10-20T12:38:48Z

Thanks @krlmlr. Couple of points (some also discussed in #277):

I am not sure we should use {furrr}. Is it because of progress bars? We could just use future.apply or similar to not expand the dependency graph (I haven't checked if it actually does).
I think for this to work with future.apply() in Parallel styling of files by style_pkg & style_dir (#277) #617, we needed to pass absolute file paths to the styler because the current working directory set with withr::with_dir(). There must be workarounds as in Parallel styling of files by style_pkg & style_dir (#277) #617 or as discussed in Parallelise styling #277 (comment). Seems like {furrr} handles that out of the box?
The authors of the future package do not recommend changing the plan in package code (see help(future::plan())). I am not sure we really want to modify the plan. Why can't we leave to the user and they can set an appropriate plan, e.g. in their .Rprofile? Also, what numbers of cores should/are used with your suggestion. For styling one file or for testing, it's desirable to not turn multicore support on.

codecov-io · 2020-10-21T08:28:11Z

Codecov Report

Merging #682 into master will decrease coverage by 0.27%.
The diff coverage is 30.00%.

@@            Coverage Diff             @@
##           master     #682      +/-   ##
==========================================
- Coverage   90.43%   90.15%   -0.28%     
==========================================
  Files          47       47              
  Lines        2226     2234       +8     
==========================================
+ Hits         2013     2014       +1     
- Misses        213      220       +7

Impacted Files	Coverage Δ
R/transform-files.R	`93.91% <30.00%> (-6.09%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c50338f...e8615e8. Read the comment docs.

krlmlr · 2020-10-21T08:29:57Z

furrr::future_map_lgl() is a drop-in replacement for map_lgl(), gives a progress bar, and has seen an update very recently. It's a suggested package, the default is to revert to serial styling. (We might want to show a one-time message, though, and/or offer a parallel = NA argument.
It just works with furrr.
From ?plan:

If you think it is necessary to modify the future strategy within a function, then make sure to undo the changes when exiting the function. This can be done using: ...

and this is what we do here. I'd say this improves user experience -- users who style their code might not want to even learn about future and whatnot.

lorenzwalthert · 2020-10-21T20:01:39Z

I see.

With regard to the future strategy, I'd like to 1) check if a plan has been set explicitly, if yes, follow this (I don't know if this is possible?). If not, set temporarily to future::plan("multiprocess") and restore on exit. Because with the current implementation and furrr installed, the user has no option to decide on the strategy, which I think is against the spirit of future (apart from setting the testthat env variable, but that shouldn't be the way to avoid a certain future strategy).
Also, setting the future strategy to plan('multiprocess') on my macOS almost takes 2s, so we should really only do it if there are two or more files to style.
Regarding progress bar, I note that the .progress argument will be deprecated in the future in favour of an approach compatible with the progressr package, so not sure we should introduce this here. It is a very visible user-facing change. As I understand from Adding a progress bar by default in a package HenrikBengtsson/progressr#78, progressr is only at the beginning and at the moment, you can either force progress display as a developer (which is against the principles of progressr) or ask the user to wrap every call to styler into with_progress(), which 99% of the users won't do. Other global options are potentially considered later.
Another problem is that we don't get real time updates on the styled files as with sequential processing. Seems like a future.apply approach with progressr also does not allow this, so we might need to drop it.

slow_sqrt <- function(xs) {
  p <- progressor(along = xs)
  future.apply::future_lapply(xs, function(x) {
    message("Calculating the square root of ", x)
    Sys.sleep(2)
    p(sprintf("x=%g", x))
    sqrt(x)
  })
}

library(progressr)
handlers("progress")
with_progress(y <- slow_sqrt(1:10))

No matter how we decide in the end, we should add clear documentation and unit test various cases.

That said, I understand your point that 99% of the users probably don't want to bother with future strategies and progress bar settings but I think it's important to keep things transparent if possible.

- simplify set_arg_paths (#682).

lorenzwalthert · 2020-11-29T18:55:28Z

OK, I will try to finish this up.

krlmlr · 2020-11-29T19:09:52Z

Thanks! Happy to review.

The startup costs are a problem. On Linux we could start one child process with callr, which then could spawn subprocesses very efficiently with the multicore plan. We could do the same on Windows and multiprocess: In both cases this would also eliminate the problem of temporarily altering the local plan. What do you think?

lorenzwalthert · 2020-11-29T20:16:45Z

But going through another sub-process will probably not decrease the start-up costs, would it? Because on top of future::plan(), a new R session has to be started, so costs add up? What would be the advantages of using a sub-process? To avoid the RStudio incompatibility of forked processes on UNIX? I don't think much can go wrong with capturing the plan and setting it on exit.

lorenzwalthert · 2020-12-05T20:21:40Z

I thought about it again. There is an additional problem: When using future, setting a plan is time consuming, even when it's sequential. Hence, when there are less than 3 files to style and we don't set multisession, we should rather use map_lgl() instead of future_map_lgl(). Also, the user should have the option to fully control the process, as discussed in HenrikBengtsson/future#450. For that reason< I think we should use this heuristic with the env variable R_STYLER_USE_FUTURE: (to be added to the docs):

I think we should make a short vignette for this.

lorenzwalthert added a commit that referenced this pull request Nov 9, 2020

Merge pull request #692 from MichaelChirico/patch-1

4fdbfd1

- simplify set_arg_paths (#682).

lorenzwalthert mentioned this pull request Nov 28, 2020

Future for package developers and temporary overriding of plans HenrikBengtsson/future#450

Open

krlmlr and others added 3 commits November 29, 2020 23:57

Parallel styling

6474d06

No parallelization in tests

810f2c7

allow escaping via env variable, document and test

b79ff02

lorenzwalthert force-pushed the f-277-parallel branch from e8615e8 to b79ff02 Compare November 29, 2020 23:08

lorenzwalthert mentioned this pull request Dec 20, 2020

Parallel styling of files by style_pkg & style_dir (#277) #617

Closed

Merge branch 'master' into f-277-parallel

47b9df7

lorenzwalthert marked this pull request as draft June 3, 2021 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel styling #682

Parallel styling #682

krlmlr commented Oct 20, 2020 •

edited by lorenzwalthert

Loading

lorenzwalthert commented Oct 20, 2020 •

edited

Loading

codecov-io commented Oct 21, 2020

krlmlr commented Oct 21, 2020

lorenzwalthert commented Oct 21, 2020 •

edited

Loading

lorenzwalthert commented Nov 29, 2020

krlmlr commented Nov 29, 2020

lorenzwalthert commented Nov 29, 2020

lorenzwalthert commented Dec 5, 2020 •

edited

Loading

Parallel styling #682

Are you sure you want to change the base?

Parallel styling #682

Conversation

krlmlr commented Oct 20, 2020 • edited by lorenzwalthert Loading

lorenzwalthert commented Oct 20, 2020 • edited Loading

codecov-io commented Oct 21, 2020

Codecov Report

krlmlr commented Oct 21, 2020

lorenzwalthert commented Oct 21, 2020 • edited Loading

lorenzwalthert commented Nov 29, 2020

krlmlr commented Nov 29, 2020

lorenzwalthert commented Nov 29, 2020

lorenzwalthert commented Dec 5, 2020 • edited Loading

krlmlr commented Oct 20, 2020 •

edited by lorenzwalthert

Loading

lorenzwalthert commented Oct 20, 2020 •

edited

Loading

lorenzwalthert commented Oct 21, 2020 •

edited

Loading

lorenzwalthert commented Dec 5, 2020 •

edited

Loading