Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pantsd without forking #7596

Merged
merged 19 commits into from May 3, 2019

Conversation

Projects
None yet
4 participants
@ity
Copy link
Contributor

commented Apr 19, 2019

Problem

see #7390. New PR with a shared branch for collaboration (if only github would allow editing target branches in PRs)

Solution

TODO:

  • REPLs can't install signal handlers.
  • Environment variables are inherited for every pants run under pantsd.

Result

@ity ity referenced this pull request Apr 19, 2019

Closed

wip - pantsd without forking #7441

@ity ity added this to the 1.16.x milestone Apr 19, 2019

@blorente blorente force-pushed the twitter:pantsd-sans-forking branch 2 times, most recently from 7cfc6e8 to b8d598d Apr 24, 2019

@blorente blorente force-pushed the twitter:pantsd-sans-forking branch from d1642af to 3513d9f Apr 26, 2019

blorente added a commit that referenced this pull request Apr 30, 2019

Add the possibility to ignore sigint from other threads (#7623)
### Problem
In the context of #7596, pants tasks will be executed outside of the main thread. This means that tasks like python-repl cannot override signal handling (code), because this can only be done from the main thread.

### Solution
Create an atomic variable, SignalHandler._ignore_sigint, that gates SIGINT handling. Create a global context manager inside ExceptionSink that toggles this for a certain time if needed.

### Result
Any thread can pause the handling of SIGINT in a controlled way.
python-repl no longer installs signal handlers, but rather it toggles the handling of the signal.

### Caveat
We may want to gate all signals individually, but the semantics of SIGINT are usually different enough from SIGTERM and SIGKILL that I think it's okay to only gate the first one.
@blorente

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2019

Heads up, #7623 just merged and it unblocks a better fix for python_repl.py, namely using ExceptionSink.ignoring_sigint().

@ity ity force-pushed the twitter:pantsd-sans-forking branch from 3513d9f to 49ae0dd Apr 30, 2019

@stuhood
Copy link
Member

left a comment

Thanks, looks good, although once things are green the travis edits need to be nuked.

class NoopExiter(Exiter):
def exit(self, result, *args, **kwargs):
if result != 0:
# TODO this exception was not designed for this, but it happens to do what we want.

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 30, 2019

Member

Expanding this comment would be good: what is it that we want from the NoopExiter? As with many things, I expect that we can clean it up post merge, but would be good to include a bit more context in case ... that takes a while.

This comment has been minimized.

Copy link
@blorente

blorente May 2, 2019

Contributor

Some context on this: We want the invocation of LocalPantsRunner.run() to raise an exception if it finishes with an error code, because then DaemonPantsRunner can catch that and process that as necessary (sending the error message to the nailgun client, for example). This processing happens in this set of catch blocks.

_GracefulTerminationException was designed to be a way to signal "errors that happen when we try to close the pants run before it's done". These exceptions were raised by this call.

In this line, we are changing the semantics of _GracefulTerminationException(), from "pants hasn't been able to complete the run" to "pants hasn't been able to complete the run, or pants has completed the run and it is a failure (error code >0)".

The correct way to do this would be to either explicitly update the docstring of _GracefulTerminationException to account for the change in semantics, or to create a new exception type (maybe PantsFinishedWithFailureException), and add another elif branch to this block of code


def __init__(self, socket):
def __init__(self, maybe_shutdown_socket):
# N.B. Assuming a fork()'d child, cause os._exit to be called here to avoid the routine

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 30, 2019

Member

This part isn't accurate anymore, so would be good to clarify what this class does now.

This comment has been minimized.

Copy link
@ity

ity May 2, 2019

Author Contributor

added todo in #7606

graph_helper = None
target_roots = None
options_bootstrapper = None
# N.B. This will be overridden with the correct value if options

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 30, 2019

Member

This comment should still be accurate I think? If it's not, then creating the subprocess_dir should no longer be necessary, and we can leave a TODO about removing it.

yield fd
else:
with open('/dev/null', 'rb') as fh:
yield fh.fileno()

# The NailgunStreamWriter probably shouldn't grab the socket without a lock...

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 30, 2019

Member

Should convert this to a TODO that references a ticket, probably.

self._exiter.exit(e.exit_code)
except Exception:
except Exception as e:
# LocalPantsRunner.set_start_time resets the global exiter,

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 30, 2019

Member

Is this still true? My understanding was that context managers were added to reset the exiters.

finally:
os.close(read_fd)

# TODO: The error is that when the client finishes (and presumably closes the socket),

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 30, 2019

Member

Would be good to refer to a ticket, or expand/remove this.

@@ -402,7 +403,7 @@ def test_pantsd_pid_change(self):
def test_pantsd_memory_usage(self):
"""Validates that after N runs, memory usage has increased by no more than X percent."""
number_of_runs = 10
max_memory_increase_fraction = 0.35
max_memory_increase_fraction = 0.60 # TODO this should be closer to 0.35

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 30, 2019

Member

Can refer to Patrick's ticket for this.

@@ -477,6 +479,7 @@ def test_pantsd_parse_exception_success(self):
finally:
rm_rf(test_path)

@unittest.skip("Pantsd is not expected to handle parallel runs anymore.")

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 30, 2019

Member

If these will be skipped rather than deleted, we should probably refer to a ticket about re-enabling parallel runs for v2.

os.kill(client_pid, signum)
waiter_run = waiter_handle.join()
self.assert_failure(waiter_run)


print("Run finished, stderr data is {}".format(waiter_run.stderr_data))

This comment has been minimized.

Copy link
@stuhood

@ity ity force-pushed the twitter:pantsd-sans-forking branch from ca3b407 to 5f4c6a8 May 1, 2019

@stuhood stuhood requested review from illicitonion and blorente May 1, 2019

@blorente
Copy link
Contributor

left a comment

This is very exciting! Thanks for all the work on this.

# That said, this means that under pantsd,
# Ctrl-C will simply crash the repl (and the daemon).
# TODO(#7623) Potential more robust (but more invasive) fix.
self._run_repl(pex, **pex_run_kwargs)

This comment has been minimized.

Copy link
@blorente

blorente May 2, 2019

Contributor

This is fixed as of #7623, we can just replace the whole body of this method with:

Suggested change
self._run_repl(pex, **pex_run_kwargs)
with ExceptionSink.ignoring_signint():
self._run_repl(pex, **pex_run_kwargs)
@@ -425,6 +425,7 @@ def _exit_with_failure(cls, terminal_msg):
@classmethod
def _log_unhandled_exception_and_exit(cls, exc_class=None, exc=None, tb=None, add_newline=False):
"""A sys.excepthook implementation which logs the error and exits with failure."""

This comment has been minimized.

Copy link
@blorente

blorente May 2, 2019

Contributor

Leftover blank line

from pants.init.util import clean_global_runtime_state
from pants.java.nailgun_io import NailgunStreamStdinReader, NailgunStreamWriter
from pants.java.nailgun_protocol import ChunkType, NailgunProtocol
from pants.pantsd.process_manager import ProcessManager
from pants.java.nailgun_protocol import ChunkType, MaybeShutdownSocket, NailgunProtocol
from pants.util.contextutil import hermetic_environment_as, stdio_as
from pants.util.socket import teardown_socket


class DaemonSignalHandler(SignalHandler):

This comment has been minimized.

Copy link
@blorente

blorente May 2, 2019

Contributor

I don't think this signal handler ever takes effect, because we don't have a good way to enable it from the main thread (this whole piece of code runs in a child thread, and we can't use custom signal handlers in those for now). I think this entire class can be removed.

class NoopExiter(Exiter):
def exit(self, result, *args, **kwargs):
if result != 0:
# TODO this exception was not designed for this, but it happens to do what we want.

This comment has been minimized.

Copy link
@blorente

blorente May 2, 2019

Contributor

Some context on this: We want the invocation of LocalPantsRunner.run() to raise an exception if it finishes with an error code, because then DaemonPantsRunner can catch that and process that as necessary (sending the error message to the nailgun client, for example). This processing happens in this set of catch blocks.

_GracefulTerminationException was designed to be a way to signal "errors that happen when we try to close the pants run before it's done". These exceptions were raised by this call.

In this line, we are changing the semantics of _GracefulTerminationException(), from "pants hasn't been able to complete the run" to "pants hasn't been able to complete the run, or pants has completed the run and it is a failure (error code >0)".

The correct way to do this would be to either explicitly update the docstring of _GracefulTerminationException to account for the change in semantics, or to create a new exception type (maybe PantsFinishedWithFailureException), and add another elif branch to this block of code

"""An Exiter that emits unhandled tracebacks and exit codes via the Nailgun protocol."""
"""An Exiter that emits unhandled tracebacks and exit codes via the Nailgun protocol.
TODO: This no longer really follows the Exiter API, per-se (or at least, it doesn't call super).

This comment has been minimized.

Copy link
@blorente

blorente May 2, 2019

Contributor

If this no longer follows the Exiter API, and (for now) we don't need the DaemonExiter to act as an exiter, we could maybe inline this code, or turn it into a standalone class and make this not inherit Exiter (possibly changing the name to DaemonNailgunCloser or something to avoid confusing this with other *Exiter classes.

This comment has been minimized.

Copy link
@ity

ity May 2, 2019

Author Contributor

added todo in #7606


self._exiter = DaemonExiter(socket)
self.exit_code = exit_code
self._exiter = DaemonExiter(maybe_shutdown_socket)

def _make_identity(self):

This comment has been minimized.

Copy link
@blorente

blorente May 2, 2019

Contributor

We may not need this anymore. I'm not sure, but it seems related to ProcessManager. If we need it, I suspect it will be for Nailgun communication.

This comment has been minimized.

Copy link
@ity

ity May 2, 2019

Author Contributor

added TODO

options_bootstrapper=options_bootstrapper,
)
global_options = options.for_global_scope()
setup_logging_from_options(global_options)

This comment has been minimized.

Copy link
@blorente

blorente May 2, 2019

Contributor

Might be worth noting that this change only works because of the encapsulated_logger in DaemonPantsRunner, and therefore we don't have to gate logging setup anymore.

@@ -57,7 +65,10 @@ def run(self):
for message_regexp in global_bootstrap_options.ignore_pants_warnings:
warnings.filterwarnings(action='ignore', message=message_regexp)

if global_bootstrap_options.enable_pantsd:
# TODO For kill-pantsd and clean-all, we cannot run them in the daemon itself, because

This comment has been minimized.

Copy link
@blorente

blorente May 2, 2019

Contributor

There are #7205 (issue) and #7206 (PR), that originally proposed this solution. So this PR should either close that, or leave a TODO linking to that.

self._services = services

def handle_sigint(self, signum, _frame):
for service in self._services:

This comment has been minimized.

Copy link
@blorente

blorente May 2, 2019

Contributor

This should maybe terminate itself but not watchman (with this function).

Something like:

class PantsDaemonSignalHandler(SignalHandler):
   def __init__(self, daemon):
    super(PantsDaemonSignalHandler, self).__init__()
    self._daemon = daemon

   def handle_sigint(self, signum, _frame):
    self._daemon.terminate(include_watchman=False)
    # Send a useful message to users

def handle_sigint(self, signum, _frame):
for service in self._services:
service.record_exception(KeyboardInterrupt('remote client sent control-c!'))

This comment has been minimized.

Copy link
@blorente

blorente May 2, 2019

Contributor

This was a test for an api that I forgot to remove, it doesn't actually do anything, this line can be safely removed.

@illicitonion
Copy link
Contributor

left a comment

Woo!

@@ -458,6 +459,7 @@ def test_pantsd_invalidation_stale_sources(self):
finally:
rm_rf(test_path)

@unittest.skip("Pantsd is not expected to handle parallel runs anymore.")

This comment has been minimized.

Copy link
@illicitonion

illicitonion May 2, 2019

Contributor

Is this long-term true? I thought we did expect pantsd to still handle parallel runs of some goals, including list?

If it's long-term true, let's delete the test.
If it's not long-term true, let's add a link to a ticket about fixing it.

This comment has been minimized.

Copy link
@ity

ity May 2, 2019

Author Contributor

thanks, added TODO

@ity ity force-pushed the twitter:pantsd-sans-forking branch from abe3e0c to 2a79fc9 May 3, 2019

@blorente blorente force-pushed the twitter:pantsd-sans-forking branch from e8fa10d to 9e85d3e May 3, 2019

@ity ity force-pushed the twitter:pantsd-sans-forking branch from 9e85d3e to b6ef5b3 May 3, 2019

@ity ity force-pushed the twitter:pantsd-sans-forking branch from b6ef5b3 to 9215622 May 3, 2019

@ity ity merged commit 5ee0e81 into pantsbuild:master May 3, 2019

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.