Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle when process terminates unexpectedly #120

Closed
wants to merge 4 commits into from

Conversation

jbridger
Copy link

@jbridger jbridger commented Dec 4, 2023

This is our attempt at fixing #114. We ran into issues where behavex would never terminate if a test segfaulted. This is our attempt to address this issue with the following changes by:

  • Eventually exiting with a non-zero exit status if a spawned process terminates unexpectedly
  • Outputting more information to help with debugging which test failed

We're not expert Python developers and explored several options before settling on this. Happy to have further discussions if necessary 🙂

Handling unexpected process termination

Switched to using the concurrent.futures.ProcessPoolExecutor which handles when process unexpectedly terminates. Whereas the previous process pool in Python does not handle this yet. When a process unexpectedly terminates, it causes all running and queued tasks to be cancelled. The pool can no longer be used to submit tasks to. Cancelled futures will all get the same BrokenProcessPool error, so it's not possible to know specifically which process terminated unexpectedly which would help the user debug the cause of the failure.

Due to the consequences of the above, if a BrokenProcessPool is encountered, we don't generate the end report or the statistics as the data will be incomplete. behavex will no longer wait for infinity and it will exit with a non-zero exit code.

More debug information

We wanted behavex to also output sufficient information to help the user narrow down which tests are the culprit. In order to this and with the limitations of what information we can get from using the Python process pool, we use a SyncManager list that can be shared with the process executing the test and the callback functions.

When execute_test task is run in a child process, it adds to the list that it is running a specific test. When the callback is called due to task completion, we remove the test from this list. When all the tests complete, this list should be empty. If it isn't, then it indicates a process terminated unexpectedly. This tells us which test failed to run.

All running and queued futures are cancelled and will trigger this callback. In the callback, we don't remove from this list if it is due to a BrokenProcessPool. By keeping this information, we know what tests were running at the time a process died. Unfortunately this list would include tests that were running that was not involved with a process that terminated unexpectedly. However we have at least made a smaller haystack to look for the needle in.

To further help with debugging, we changed the output directory for behave's stdout to the output folder and updated the path to match an ID that is associated with a test execution. In the test output, we can then provide the specific path to the behave logs for the failing tests. These behave logs are useful in the event of a segfault because behave logs the steps that were executed up till the point it failed.

An example of the behavex output when a segfault is encountered (parallel process of 2, scheme scenario):

These scenarios failed to complete for an unknown reason:
    Feature name: My feature. Feature file: path/to/myfeature.feature
        Scenario name: Passing scenario
            Behave log for scenario: /Users/myuser/project/output/behavex/logs/45194/behave.log
    Feature name: My feature 2. Feature file: path/to/myfeature2.feature
        Scenario name: Segfaulting scenario
            Behave log for scenario: /Users/myuser/project/output/behavex/logs/52551/behave.log
Exit code: 1

An example of the behavex output when a segfault is encountered (parallel process of 2, scheme feature):

These features failed to complete for an unknown reason:
    Feature name: My feature. Feature file: path/to/myfeature.feature
        Behave log for feature: /Users/myuser/project/output/behavex/logs/45194/behave.log
    Feature name: My feature 2. Feature file: path/to/myfeature2.feature
        Behave log for feature: /Users/myuser/project/output/behavex/logs/52551/behave.log
Exit code: 1

@jbridger jbridger marked this pull request as ready for review December 4, 2023 18:29
@jbridger jbridger changed the title Handle when process dies unexpectedly Handle when process terminates unexpectedly Dec 5, 2023
@jbridger jbridger mentioned this pull request Dec 19, 2023
@hrcorval hrcorval changed the base branch from master to release_4.0.1 August 19, 2024 15:28
@hrcorval
Copy link
Owner

hrcorval commented Sep 6, 2024

Hi @jbridger, the information you have provided in this PR has been critical to make the improvements in the core library logic. We have not been able to merge it, as the baseline was very outdated, however, I wanted to give you the credits in https://github.com/hrcorval/behavex/blob/master/CHANGES.rst
We still have pending providing the behave logs for the scenarios that were interrupted, this will be an upcoming improvement.
Thanks a lot for your collaboration on making this library much better :)

@hrcorval hrcorval closed this Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants