Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is prepare_distributed in run_future_lapply necessary? #163

Closed
kendonB opened this issue Nov 21, 2017 · 1 comment
Closed

Is prepare_distributed in run_future_lapply necessary? #163

kendonB opened this issue Nov 21, 2017 · 1 comment

Comments

@kendonB
Copy link
Contributor

kendonB commented Nov 21, 2017

As far as I can tell, this is related to attempts? If so, I'd suggest conditioning this running on attempts > 1.

@wlandau-lilly
Copy link
Collaborator

Yes, it is necessary. For both forms of distributed parallelism ("Makefile" and "future_lapply"), the side effects of prepare_distributed() are much more important than the "attempts" flag (which currently exists only to tell drake when it can print "All targets are already up to date."). Initially, your environment is cached so that jobs on the cluster can load it. Next, outdated() is run, which both gets information for the attempts flag and processes all the imports. With all the imports processed, we can devote future_lapply() entirely to proper targets. The imports are usually fast, so there is no reason to waste jobs on them for any kind of distributed computing.

For "Makefile" parallelism, we need to return build_these so we know what fake timestamps to write (to trick the Makefile into running the correct jobs), and it does not slow down "future_lapply" parallelism.

I know it's bad form to have a function with both side effects and a return value, but I would rather do that than have duplicated code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants