Handle when process terminates unexpectedly #120

jbridger · 2023-12-04T18:09:51Z

This is our attempt at fixing #114. We ran into issues where behavex would never terminate if a test segfaulted. This is our attempt to address this issue with the following changes by:

Eventually exiting with a non-zero exit status if a spawned process terminates unexpectedly
Outputting more information to help with debugging which test failed

We're not expert Python developers and explored several options before settling on this. Happy to have further discussions if necessary 🙂

Handling unexpected process termination

Switched to using the concurrent.futures.ProcessPoolExecutor which handles when process unexpectedly terminates. Whereas the previous process pool in Python does not handle this yet. When a process unexpectedly terminates, it causes all running and queued tasks to be cancelled. The pool can no longer be used to submit tasks to. Cancelled futures will all get the same BrokenProcessPool error, so it's not possible to know specifically which process terminated unexpectedly which would help the user debug the cause of the failure.

Due to the consequences of the above, if a BrokenProcessPool is encountered, we don't generate the end report or the statistics as the data will be incomplete. behavex will no longer wait for infinity and it will exit with a non-zero exit code.

More debug information

We wanted behavex to also output sufficient information to help the user narrow down which tests are the culprit. In order to this and with the limitations of what information we can get from using the Python process pool, we use a SyncManager list that can be shared with the process executing the test and the callback functions.

When execute_test task is run in a child process, it adds to the list that it is running a specific test. When the callback is called due to task completion, we remove the test from this list. When all the tests complete, this list should be empty. If it isn't, then it indicates a process terminated unexpectedly. This tells us which test failed to run.

All running and queued futures are cancelled and will trigger this callback. In the callback, we don't remove from this list if it is due to a BrokenProcessPool. By keeping this information, we know what tests were running at the time a process died. Unfortunately this list would include tests that were running that was not involved with a process that terminated unexpectedly. However we have at least made a smaller haystack to look for the needle in.

To further help with debugging, we changed the output directory for behave's stdout to the output folder and updated the path to match an ID that is associated with a test execution. In the test output, we can then provide the specific path to the behave logs for the failing tests. These behave logs are useful in the event of a segfault because behave logs the steps that were executed up till the point it failed.

An example of the behavex output when a segfault is encountered (parallel process of 2, scheme scenario):

These scenarios failed to complete for an unknown reason:
    Feature name: My feature. Feature file: path/to/myfeature.feature
        Scenario name: Passing scenario
            Behave log for scenario: /Users/myuser/project/output/behavex/logs/45194/behave.log
    Feature name: My feature 2. Feature file: path/to/myfeature2.feature
        Scenario name: Segfaulting scenario
            Behave log for scenario: /Users/myuser/project/output/behavex/logs/52551/behave.log
Exit code: 1

An example of the behavex output when a segfault is encountered (parallel process of 2, scheme feature):

These features failed to complete for an unknown reason:
    Feature name: My feature. Feature file: path/to/myfeature.feature
        Behave log for feature: /Users/myuser/project/output/behavex/logs/45194/behave.log
    Feature name: My feature 2. Feature file: path/to/myfeature2.feature
        Behave log for feature: /Users/myuser/project/output/behavex/logs/52551/behave.log
Exit code: 1

Handle when process running tests unexpectedly terminate

hrcorval · 2024-09-06T16:30:09Z

Hi @jbridger, the information you have provided in this PR has been critical to make the improvements in the core library logic. We have not been able to merge it, as the baseline was very outdated, however, I wanted to give you the credits in https://github.com/hrcorval/behavex/blob/master/CHANGES.rst
We still have pending providing the behave logs for the scenarios that were interrupted, this will be an upcoming improvement.
Thanks a lot for your collaboration on making this library much better :)

jbridger added 2 commits December 4, 2023 15:56

Handle when process running tests unexpectedly terminate

9a65597

Merge pull request #1 from validio-io/parallel-fix

81ac345

Handle when process running tests unexpectedly terminate

jbridger marked this pull request as ready for review December 4, 2023 18:29

jbridger mentioned this pull request Dec 4, 2023

Behavex runner gets stuck indefinitely when a test case terminates with sys.exit/seg fault #114

Closed

jbridger changed the title ~~Handle when process dies unexpectedly~~ Handle when process terminates unexpectedly Dec 5, 2023

jbridger mentioned this pull request Dec 19, 2023

Parallel fix #119

Closed

bombsimon added 2 commits January 31, 2024 11:38

fix: Ensure parallel_tests_in_execution is set, fix execution_id

f52776f

fix: Propagate shared_removed_scenarios to execute_tests

48fb459

hrcorval changed the base branch from master to release_4.0.1 August 19, 2024 15:28

hrcorval closed this Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle when process terminates unexpectedly #120

Handle when process terminates unexpectedly #120

jbridger commented Dec 4, 2023 •

edited

Loading

hrcorval commented Sep 6, 2024

Handle when process terminates unexpectedly #120

Handle when process terminates unexpectedly #120

Conversation

jbridger commented Dec 4, 2023 • edited Loading

Handling unexpected process termination

More debug information

hrcorval commented Sep 6, 2024

jbridger commented Dec 4, 2023 •

edited

Loading