Skip to content

`future`-powered parallelism, examples for clusters, subgraph visualization, and a lot more speed

Choose a tag to compare

@wlandau-lilly wlandau-lilly released this 05 Nov 05:20
  • Extend plot_graph() to display subcomponents. Check out arguments from, mode, order, and subset. The graphing vignette has demonstrations.
  • Add "future_lapply" parallelism: parallel backends supported by the future and future.batchtools packages. See ?backend for examples and the parallelism vignette for an introductory tutorial. More advanced instruction can be found in the future and future.batchtools packages themselves.
  • Cache diagnostic information of targets that fail and retrieve diagnostic info with diagnose().
  • Add an optional hook argument to make() to wrap around build(). That way, users can more easily control the side effects of distributed jobs. For example, to redirect error messages to a file in make(..., parallelism = "Makefile", jobs = 2, hook = my_hook), my_hook should be something like function(code){withr::with_message_sink("messages.txt", code)}.
  • Remove console logging for "parLapply" parallelism. Drake was previously using the outfile argument for PSOCK clusters to generate output that could not be caught by capture.output(). It was a hack that should have been removed before.
  • Remove console logging for "parLapply" parallelism. Drake was previously using the outfile argument for PSOCK clusters to generate output that could not be caught by capture.output(). It was a hack that should have been removed before.
  • If 'verbose' is 'TRUE' and all targets are already up to date (nothing to build), then make() and outdated() print "All targets are already up to date" to the console.
  • Add new examples in 'inst/examples', most of them demonstrating how to use the "future_lapply" backends.
  • New support for timeouts and retries when it comes to building targets.
  • Failed targets are now recorded during the build process. You can see them in plot_graph() and progress(). Also see the new failed() function, which is similar to in_progress().
  • Speed up the overhead of parLapply parallelism. The downside to this fix is that drake has to be properly installed. It should not be loaded with devtools::load_all(). The speedup comes from lightening the first clusterExport() call in run_parLapply(). Previously, we exported every single individual drake function to all the workers, which created a bottleneck. Now, we just load drake itself in each of the workers, which works because build() and do_prework() are exported.
  • Change default value of overwrite to FALSE in load_basic_example().
  • Warn when overwriting an existing report.Rmd in load_basic_example().
  • Tell the user the location of the cache using a console message. Happens on every call to get_cache(..., verbose = TRUE).
  • Increase efficiency of internal preprocessing via lightly_parallelize() and lightly_parallelize_atomic(). Now, processing happens faster, and only over the unique values of a vector.
  • Add a new storr namespace called imports to be used in is_imported(). That way, the whole object need not be read to clean() is. clean() is much faster and safer.