Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle signals gracefully #6574

Merged

Conversation

Projects
None yet
7 participants
@cosmicexplorer
Copy link
Contributor

commented Sep 30, 2018

Problem

#6552 has some changes around signal handling. This was an attempt to move those out, and to gain greater visibility into and control over when pants errors out due to a signal (helping to address difficulties in flaky tests around exiting, such as #6708).

Resolves #7014, resolves #6708, resolves #6847, resolves #7199.

Solution

  • Create a very simple SignalHandler class which can be subclassed to set pants's behavior upon receiving different signals in different execution contexts.
  • Add a _signal_handler field to ExceptionSink.
  • Apply SignalHandler everywhere that tries to modify pants's exception handling.
  • Add the --pantsd-pailgun-quit-timeout bootstrap option and implement a timeout strategy to continue to forward output from the remote pantsd-runner process for a brief period of time upon receiving a signal, then killing the remote process if it's still alive at the end.
  • Unskip some flaky tests that we hopefully won't have to reskip.

Result

It's much more clear when and where signal handlers are set, and the flow of control is made more clear. It is easier to extend signal handlers in a hygienic way, allowing for the resolution of multiple flaky tests.

cosmicexplorer added a commit to cosmicexplorer/pants that referenced this pull request Oct 2, 2018

cosmicexplorer added a commit to cosmicexplorer/pants that referenced this pull request Oct 2, 2018

cosmicexplorer added a commit that referenced this pull request Oct 3, 2018

@jsirois

This comment has been minimized.

Copy link
Member

commented Dec 12, 2018

Can this be killed or still WIP? Ran across skipped tests referencing this PR and it's super-old at this point.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Dec 12, 2018

Well, the only blocker was not knowing how to test this effectively, so it sounds like on the other hand this PR is ready to kill pants. I will attempt to rebase or do something else equally horrible, as I believe this solves multiple issues users have been seeing in many contexts, or if not, it at least is better than before (we'll see about that).

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:handle-signals-gracefully branch from 7c7df13 to 604a8ff Dec 12, 2018

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Dec 12, 2018

Managed to minimally replicate the previous diff and also remove the ridiculous Exiter destructor nonsense by realizing the sys.exit fungibility of the Exiter also supports a prototype pattern of sorts. It's currently but won't remain late and I should get back to this tomorrow because as mentioned I believe this may resolve a variety of issues, especially relating to pantsd (as indicated in the lengthy TODO in remote_pants_runner.py).

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:handle-signals-gracefully branch from 5e1284e to 8e363db Dec 13, 2018

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:handle-signals-gracefully branch from f0fe011 to 612d8f2 Jan 1, 2019

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:handle-signals-gracefully branch 3 times, most recently from 7d9ac77 to f270cba Jan 21, 2019

@cosmicexplorer cosmicexplorer changed the title [WIP] Handle signals gracefully Handle signals gracefully Jan 24, 2019

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Jan 24, 2019

This was green modulo flaky tests/shards before the rebase just now, and should be ready for review.

@cosmicexplorer cosmicexplorer requested review from stuhood, blorente, Eric-Arellano, benjyw and jsirois and removed request for benjyw Jan 24, 2019

@Eric-Arellano
Copy link
Contributor

left a comment

This is awesome. Thanks Danny.

Two issues when trying to run this locally:

  1. ./pants clean-all, interrupt via control-c before the process completely finishes but after it's starting to print. It gets stuck on SUCCESS message for me and the process never executes, until I hit control-c again.
  2. ./pants3 test tests/python/pants_test/backend/python:integration. Exit in the first quarter second, and I'm getting:
^Ctimestamp: 2019-01-24T19:05:16.505123
Exception caught: (builtins.AttributeError)
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/bin/pants_loader.py", line 89, in <module>
    main()
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/bin/pants_loader.py", line 85, in main
    PantsLoader.run()
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/bin/pants_loader.py", line 81, in run
    cls.load_and_execute(entrypoint)
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/bin/pants_loader.py", line 71, in load_and_execute
    module = importlib.import_module(module_path)
  File "/Users/eric/DocsLocal/code/projects/pants/build-support/pants_dev_deps.py3.venv/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/bin/pants_exe.py", line 13, in <module>
    from pants.bin.pants_runner import PantsRunner
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/bin/pants_runner.py", line 13, in <module>
    from pants.bin.remote_pants_runner import RemotePantsRunner
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/bin/remote_pants_runner.py", line 20, in <module>
    from pants.pantsd.pants_daemon import PantsDaemon
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/pantsd/pants_daemon.py", line 20, in <module>
    from pants.bin.daemon_pants_runner import DaemonPantsRunner
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/bin/daemon_pants_runner.py", line 21, in <module>
    from pants.bin.local_pants_runner import LocalPantsRunner
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/bin/local_pants_runner.py", line 17, in <module>
    from pants.init.engine_initializer import EngineInitializer
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/init/engine_initializer.py", line 13, in <module>
    from pants.backend.python.rules.python_test_runner import run_python_test
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/backend/python/rules/python_test_runner.py", line 14, in <module>
    from pants.engine.legacy.graph import TransitiveHydratedTarget
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/engine/legacy/graph.py", line 410, in <module>
    @rule(BuildFileAddresses, [Select(SymbolTable), Select(AddressMapper), Select(OwnersRequest)])
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/engine/rules.py", line 243, in wrapper
    caller_frame = inspect.stack()[1][0]
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/inspect.py", line 1513, in stack
    return getouterframes(sys._getframe(1), context)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/inspect.py", line 1490, in getouterframes
    frameinfo = (frame,) + getframeinfo(frame, context)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/inspect.py", line 1464, in getframeinfo
    lines, lnum = findsource(frame)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/inspect.py", line 780, in findsource
    module = getmodule(object, file)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/inspect.py", line 725, in getmodule
    file = getabsfile(object, _filename)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/inspect.py", line 709, in getabsfile
    return os.path.normcase(os.path.abspath(_filename))
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/base/exception_sink.py", line 39, in handle_sigint
    ExceptionSink._handle_signal_gracefully(signum, frame)
  File "/Users/eric/DocsLocal/code/projects/pants/src/python/pants/base/exception_sink.py", line 379, in _handle_signal_gracefully
    tb = frame.f_exc_traceback or sys.exc_info()[2]

Exception message: 'frame' object has no attribute 'f_exc_traceback'
Show resolved Hide resolved src/python/pants/base/exception_sink.py Outdated
Class state:
- Overwrites `cls._graceful_signal_handler`.
OS state:
- Overwrites signal handlers for SIGINT, SIGQUIT, and SIGTERM.

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Jan 25, 2019

Contributor

Great docstring! I've never seen someone document what's mutated. I find this very helpful.

}
# Retry any system calls interrupted by any of the signals we just installed handlers for
# (instead of having them raise EINTR). siginterrupt(3) says this is the default behavior on
# Linux.

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Jan 25, 2019

Contributor

I'm a little confused from man signinterrupt on macOS if the behavior is the same:

The siginterrupt() function is used to change the system call restart behavior when a system
call is interrupted by the specified signal. If the flag is false (0), then system calls will
be restarted if they are interrupted by the specified signal and no data has been transferred
yet. System call restart has been the default behavior since 4.2BSD, and is the default be-
haviour for signal(3) on FreeBSD.

If the flag is true (1), then restarting of system calls is disabled. If a system call is
interrupted by the specified signal and no data has been transferred, the system call will
return -1 with the global variable errno set to EINTR
. Interrupted system calls that have
started transferring data will return the amount of data actually transferred. System call
interrupt is the signal behavior found on 4.1BSD and AT&T System V UNIX systems.

(Emphasis my own)

--

Either way, we should document if it complies with macOS defaults or changes it.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 5, 2019

Author Contributor

After reading again, it looks like False is also the default on osx, and also what we want (since we don't have any checks for EINTR anywhere in the codebase) -- I've left a comment to this effect.

Show resolved Hide resolved src/python/pants/base/exception_sink.py Outdated
Show resolved Hide resolved src/python/pants/base/exception_sink.py Outdated
Show resolved Hide resolved src/python/pants/bin/local_pants_runner.py Outdated
@@ -13,7 +13,7 @@

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Jan 25, 2019

Contributor

I don't have a solid understanding of remote_pants_runner. It may be helpful for other reviewers to look more closely at this file.

Show resolved Hide resolved src/python/pants/pantsd/pants_daemon.py Outdated
Show resolved Hide resolved tests/python/pants_test/base/test_exception_sink_integration.py Outdated
@@ -487,11 +487,9 @@ def _assert_pantsd_keyboardinterrupt_signal(self, signum):
# The pantsd-runner processes should be dead, and they should have exited with 1.
self.assertFalse(proc.is_running())

@unittest.skip('TODO: this should be unskipped as part of the work for #6574!')

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Jan 25, 2019

Contributor

Woohoo!

@stuhood
Copy link
Member

left a comment

This looks really great. Thanks Danny!

Show resolved Hide resolved src/python/pants/bin/remote_pants_runner.py Outdated
Show resolved Hide resolved src/python/pants/bin/local_pants_runner.py
@jsirois
Copy link
Member

left a comment

I did not page in the mechanics of signalling paths in my review, just style comments for bits that stood out.

Show resolved Hide resolved src/python/pants/base/exception_sink.py Outdated
Show resolved Hide resolved src/python/pants/base/exception_sink.py Outdated
Show resolved Hide resolved src/python/pants/bin/local_pants_runner.py Outdated
Show resolved Hide resolved src/python/pants/base/exception_sink.py Outdated

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:handle-signals-gracefully branch 2 times, most recently from d99e67f to f55537d Feb 9, 2019

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Feb 10, 2019

Responded to review comments, looking now into fixing and testing the issues described in #6574 (review), and determining the appropriate behavior for interrupted signals on osx as per #6574 (comment).

cosmicexplorer added some commits Mar 18, 2019

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:handle-signals-gracefully branch from 95fe86d to 5fb8b18 Mar 19, 2019

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2019

I have pushed a commit which should make all tests pass, going to do another round of cleanup. Had to break out #7406 for an issue capturing signals during ./pants run that I decided not to tackle today (and shouldn't block the rest of the work).

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:handle-signals-gracefully branch from ab93a24 to 61b09e1 Mar 19, 2019

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2019

Thought it was ridiculous to leave the timeout part untested, so made a --pantsd-pailgun-quit-timeout bootstrap option and added an integration test to make sure the warning message about the timeout is printed.

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:handle-signals-gracefully branch from 61b09e1 to 68f4bf0 Mar 19, 2019

cosmicexplorer added some commits Mar 20, 2019

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:handle-signals-gracefully branch from 3c808e0 to 1c29ccf Mar 20, 2019

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:handle-signals-gracefully branch from 1c29ccf to 9a79f00 Mar 21, 2019

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:handle-signals-gracefully branch from b6008ab to 3b97ce4 Mar 22, 2019

@cosmicexplorer cosmicexplorer merged commit 7664ca8 into pantsbuild:master Mar 22, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.