New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelization fixes #2280
Parallelization fixes #2280
Conversation
…ake its structure more similar to parallel_map for readability)
…during parallel_map
@pmenczel Thank you for sorting that out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a big improvement to me. I asked a couple of questions about whether it can be improved more.
@pmenczel Thanks for sorting this all out. It is much better now. Happy for you to merge it when you're ready. |
Background
Since I am planning to add an mpi_parallel_map to the parallel module, I had a detailed look at the current implementations of parallel_map and loky_pmap. In the case of timeouts, errors or interrupts, I found surprising behavior. I will first summarize the current behavior in these situations, and then the behavior if my changes are included. It seems difficult to write better unit tests for such timing-sensitive behavior, but I have tested in detail on both Linux (WSL) and Windows.
Current behavior
parallel_map
loky_pmap
then returns results of all finished tasks.
fail_fast
, behaves similarly toparallel_map
.Otherwise, completes all tasks but only returns
results of those that were started before the timeout.
enters deadlock requiring second CTRL+C
executes the rest before raising
KeyboardInterrupt
.exception
fail_fast
: finishes running tasks,then raises exception.
!fail_fast
: raisesMapExceptions
at the end.fail_fast
: finishes all tasks, then raises exception.!fail_fast
: raisesMapExceptions
at the end.New behavior
parallel_map
loky_pmap
then returns results of all finished tasks.
then returns results of all finished tasks.
KeyboardInterrupt
KeyboardInterrupt
exception
fail_fast
: finishes running tasks,then raises exception.
!fail_fast
: raisesMapExceptions
at the end.fail_fast
: finishes tasks earlier in the list, thenaborts remaining ones and raises exception.
!fail_fast
: raisesMapExceptions
at the end.Job timeout
Currently, the job timeout parameter is ignored by parallel_map. In loky_pmap, it is not the maximum allowed time for one job, but the maximum time between two job finishes (possibly in different processes). If this time is exceeded, all tasks will still be executed until the end; only the results of the tasks that finished too slowly will be discarded. I do not think that this was the intention of the job timeout parameter?
Unfortunately, both
ProcessPoolExecutor
(which parallel_map is based on) and its loky version do not support timeouts for single tasks, nor do they support aborting single tasks manually. (The loky one supports killing all worker processes at once.) If we wanted to have a job timeout parameter, we would need to either use non-public API to obtain references to the worker processes and interrupt them manually, or to completely rewrite parallel_map and base it on e.g.multiprocessing.pool.Pool
.Maybe better to just remove the job_timeout parameter? If you agree with that, I will then also remove it from the available options for
MultiTrajSolver
and all its subclasses.