Handle when process terminates unexpectedly #120
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is our attempt at fixing #114. We ran into issues where behavex would never terminate if a test segfaulted. This is our attempt to address this issue with the following changes by:
We're not expert Python developers and explored several options before settling on this. Happy to have further discussions if necessary 🙂
Handling unexpected process termination
Switched to using the
concurrent.futures.ProcessPoolExecutor
which handles when process unexpectedly terminates. Whereas the previous process pool in Python does not handle this yet. When a process unexpectedly terminates, it causes all running and queued tasks to be cancelled. The pool can no longer be used to submit tasks to. Cancelled futures will all get the sameBrokenProcessPool
error, so it's not possible to know specifically which process terminated unexpectedly which would help the user debug the cause of the failure.Due to the consequences of the above, if a
BrokenProcessPool
is encountered, we don't generate the end report or the statistics as the data will be incomplete. behavex will no longer wait for infinity and it will exit with a non-zero exit code.More debug information
We wanted behavex to also output sufficient information to help the user narrow down which tests are the culprit. In order to this and with the limitations of what information we can get from using the Python process pool, we use a SyncManager list that can be shared with the process executing the test and the callback functions.
When
execute_test
task is run in a child process, it adds to the list that it is running a specific test. When the callback is called due to task completion, we remove the test from this list. When all the tests complete, this list should be empty. If it isn't, then it indicates a process terminated unexpectedly. This tells us which test failed to run.All running and queued futures are cancelled and will trigger this callback. In the callback, we don't remove from this list if it is due to a
BrokenProcessPool
. By keeping this information, we know what tests were running at the time a process died. Unfortunately this list would include tests that were running that was not involved with a process that terminated unexpectedly. However we have at least made a smaller haystack to look for the needle in.To further help with debugging, we changed the output directory for behave's stdout to the
output
folder and updated the path to match an ID that is associated with a test execution. In the test output, we can then provide the specific path to the behave logs for the failing tests. These behave logs are useful in the event of a segfault because behave logs the steps that were executed up till the point it failed.An example of the behavex output when a segfault is encountered (parallel process of 2, scheme
scenario
):An example of the behavex output when a segfault is encountered (parallel process of 2, scheme
feature
):