Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable optional parallel site building #383

Open
wants to merge 1 commit into
base: master
from

Conversation

@jozefhajnala
Copy link
Contributor

commented May 15, 2019

This PR proposes an option to build_site() using multiple parallel-running R processes, utilizing only the base package parallel. This can result in significant speed improvements with hardware common in 2019.

To enable the parallelization, the user must specify 2 options:

options(
  blogdown.use_parallel = TRUE,
  blogdown.use_parallel.cores = number_of_cores
)

The functionality is only triggered if:

  • the option blogdown.use_parallel is TRUE
  • the option blogdown.use_parallel.cores is > 1
  • the length of files is > 1
  • the parallel package is available

If this direction of functionality is accepted, we can make the implementation less conservative and easier to use.

@yihui
Copy link
Member

left a comment

Yes, I'd love to have this feature. Please finish up whatever you planned to do, and let me know when the PR is ready for review. Thank you very much!

@yihui yihui added this to the v0.13 milestone May 15, 2019

@jozefhajnala

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2019

I would consider this ready to be reviewed, just not sure about the "UI" in terms of using the feature. I am a fan of conservative introduction of new features, but happy to change.

@jozefhajnala

This comment has been minimized.

Copy link
Contributor Author

commented May 30, 2019

@yihui, is there anything waited on from my side?

@yihui yihui modified the milestones: v0.13, v0.14 Jun 11, 2019

@yihui
Copy link
Member

left a comment

Sorry about the delay. I just added my own proposal (which is even more conservative but also more general). Please let me know what you think. Thanks!

@@ -44,7 +47,11 @@ build_site = function(
if (local && length(files)) {
files = getOption('blogdown.files_filter', timestamp_filter)(files)
}
build_rmds(files)

This comment has been minimized.

Copy link
@yihui

yihui Jun 11, 2019

Member

I thought a while about it and I tend to let users decide how to parallelize it because there are multiple ways of parallelization. Here is how I'd implement it:

build_fun = getOption('blogdown.build_rmds', function(files, build) {
  build(files)
})
build_fun(files, build_rmds)

Then users can set their own functions (in .Rprofile), e.g.

options(blogdown.build_rmds = function(files, build) {
  parallel::mclapply(files, build)
})

This comment has been minimized.

Copy link
@jozefhajnala

jozefhajnala Jun 14, 2019

Author Contributor
  • Updated the approach to hopefully be in line with your though process, albeit still exposing the build_fun as a parameter, not just retrieving it from options within the function. We can also remove the parameter if you like the other approach better.

  • Would suggest to keep the build_rmds_parallel(), as it gives users an easy way to get parallelization without having to implement themselves, using either

    options("blogdown.build_rmds" = blogdown:::build_rmds_parallel)

    or (assuming we keep the parameter) calling directly

    blogdown::build_site(build_fun = blogdown:::build_rmds_parallel)

@jozefhajnala jozefhajnala force-pushed the jozefhajnala:parallel-build branch 2 times, most recently from 102e98f to 129cf50 Jun 14, 2019

@cderv

This comment has been minimized.

Copy link
Contributor

commented Jun 15, 2019

Just some thoughts about letting user choose how to do parallel. 💭

  • The future package is really helpful for that. It allows a clean separation between what is parallelized and how it should be run. However, it would mean adding it as dependency, at least in suggest, and could be too heavy 🤔
  • A vignette that gives examples of how to use this option could be insteresting. It would allow to show build_rmds_parallel and explain how it build on parallel package. Either the user can copy paste from the vignette, or know that a helper blogdown::build_rmds_parallel exists.

Thanks for this feature by the way !

@jozefhajnala

This comment has been minimized.

Copy link
Contributor Author

commented Jun 18, 2019

Just some thoughts about letting user choose how to do parallel. 💭

  • The future package is really helpful for that. It allows a clean separation between what is parallelized and how it should be run. However, it would mean adding it as dependency, at least in suggest, and could be too heavy 🤔
  • A vignette that gives examples of how to use this option could be insteresting. It would allow to show build_rmds_parallel and explain how it build on parallel package. Either the user can copy paste from the vignette, or know that a helper blogdown::build_rmds_parallel exists.

Thanks for this feature by the way !

Thanks for the feedback!

  • The currently suggested PR would let you choose your own way of parallelization, so you could define options("blogdown.build_rmds" = my_parallelization_fun), where my_parallelization_fun() could use the future package to parallelize, without having to introduce a dependency to blogdown
  • I will happily spend some time writing a vignette if we get the PR merged ;-)

@yihui yihui removed this from the v0.14 milestone Aug 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.