New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first attempt at centralizing more global error logging state in ExceptionSink #6552

Merged
merged 69 commits into from Oct 9, 2018

Conversation

Projects
None yet
4 participants
@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Sep 24, 2018

Problem

See #6530. This is bullet point 2 of that issue.

Solution

  • rename ExceptionSink#set_destination() to reset_log_location().
  • (re)?acquire open file handles and persist them as ExceptionSink private fields so we can write to them in signal handlers.
    • use faulthandler.enable() to log fatal signals (SIGABRT, SIGSEGV, ...) and stacktraces to the process-specific error stream from #6539.
  • add reset_exiter() and delegate to the exiter to exit on an unhandled exception.
  • add reset_interactive_output_stream() to change which stream the Exiter is provided as the out argument, and to enable stacktraces to be sent to that stream when the process receives SIGUSR2 (on par with current functionality)
  • move everything besides exiting out of the Exiter class

Result

We can un-xfail the integration test introduced in #6539. This makes it significantly easier to diagnose pantsd communication errors and their causes, both locally and in CI.

@cosmicexplorer cosmicexplorer requested a review from stuhood Sep 24, 2018

@stuhood
Copy link
Member

stuhood left a comment

Thanks... I think the biggest blocker to getting that test passing will be the fact that the signal handler will not currently write to the exception log. Doing that will require a bit of refactoring.

class ExceptionSinkError(Exception): pass

# TODO (in response to the above): we can avoid a single entry point and make everything work with
# great ease if we can ensure all set_* methods are idempotent!

This comment has been minimized.

@stuhood

stuhood Sep 24, 2018

Member

I agree that making these methods idempotent would be good.

@classmethod
def set_destination(cls, dir_path):
cls._destination = dir_path
cls._destination = cls._check_or_create_new_destination(dir_path)
cls._all_destinations.append(cls._destination)

This comment has been minimized.

@stuhood

stuhood Sep 24, 2018

Member

I don't think a history of destinations is necessarily useful, so would drop this property.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 25, 2018

Contributor

After thinking about the use case of RemotePantsRunner a bit, I think I agree, because if we're going to want to dive into any exception logging like we do there we'll probably know where to look instead of wanting to dredge up all our possible log files (?).

However, after thinking, I'm thinking now mostly about the situation which is completely unhandled now, which is that if something happens to call ExceptionSink.set_destination() after RemotePantsRunner does, we immediately lose any logs we might have wanted for the thin client's error message.

The _all_destinations bit doesn't really address that except by way of the commented-out incomplete method body which would just dredge up all the logs (not useful for the same/opposite reason), but that situation might be something we want to special-case.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 25, 2018

Contributor

I think you addressed exactly this in your comment below re: set_pid, actually. Figuring out the log file in a signal handler part now though.

if faulthandler.is_enabled():
faulthandler.disable()
faulthandler.enable(trace_stream)
# This permits a non-fatal `kill -31 <pants pid>` for stacktrace retrieval.

This comment has been minimized.

@stuhood

stuhood Sep 24, 2018

Member

This should likely be written to both the current trace_stream and the current exception log(s).

Doing the latter thing would require keeping an open file handle for the current exception log(s), which is a bit harder... but I think that getting the test I added to pass will require this, because killing the process with a signal will only result in this firing, rather than also logging an exception.

Additionally, (re)-opening the exception log for our current pid (on a best effort basis...) would likely require an explicit step, like set_pid: we'd call it as soon as we know we've forked.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 25, 2018

Contributor

The reason I didn't consider this such a huge change when we talked was I think because the biggest change I see is just "having an actual signal handler for SIGUSR2 instead of using faulthandler", and the code I threw away was already doing that in order to handle SIGINT and SIGQUIT. I'm not sure now whether the handling of those signals should be done in this PR. We currently set handlers for those signals in RemotePantsRunner to send a signal to the pantsd-runner process on control-c.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 25, 2018

Contributor

Well, the biggest change is figuring out where to write in a signal handler as described in your comment, but after that's figured out.

This comment has been minimized.

@stuhood

stuhood Sep 25, 2018

Member

having an actual signal handler for SIGUSR2 instead of using faulthandler

If you haven't already, take a look at the restrictions on signal handlers: https://docs.python.org/3/library/faulthandler.html ...

The fault handler is called on catastrophic cases and therefore can only use signal-safe functions (e.g. it cannot allocate memory on the heap). Because of this limitation traceback dumping is minimal compared to normal Python tracebacks:

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 25, 2018

Contributor

That's extremely helpful! As an aside, I have used the glibc docs on signal handlers before and found the bit on re-entrancy and signal handlers to be super illuminating to understand what's allowed in a signal handler and why and have found it applicable to other languages. I will peruse the python docs you linked because I was looking for that exactly and literally have a tab open to that page but just was only looking for method names and didn't read it. Thanks!

safe_mkdir(destination)
except Exception as e:
# NB: When this class sets up excepthooks, raising this should be safe, because we always
# have a destination to log to (os.getcwd() if not otherwise set).

This comment has been minimized.

@stuhood

stuhood Sep 24, 2018

Member

Now that this field defaults to None this comment isn't accurate.

But regardless: I think that if this is refactored to send the signal-handler output to the exception log, it won't matter, because we'll need to eagerly open the exception/signal log in order to register it with faulthandler.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 25, 2018

Contributor

Now that this field defaults to None this comment isn't accurate.

Will update.

But regardless: I think that if this is refactored to send the signal-handler output to the exception log, it won't matter, because we'll need to eagerly open the exception/signal log in order to register it with faulthandler.

I'm understanding this as: we need to handle "opening the exception log in a signal/exception handler" anyway so in doing so we don't have to worry about the validity of the exception log in other methods (because it'd be handled specifically somewhere). Which is cool.

This comment has been minimized.

@stuhood

stuhood Sep 25, 2018

Member

"opening the exception log in a signal/exception handler" anyway

See the faulthandler docs linked above.

# TODO: this is messy. Is it less surprising to have the Exiter modified during construction, or
# in the run() method we do here?
ExceptionSink.set_destination(self._global_options.pants_workdir)
self._exiter.should_print_backtrace = self._global_options.print_exception_stacktrace

This comment has been minimized.

@stuhood

stuhood Sep 24, 2018

Member

Question I do not know the answer to: should the should_print_backtrace property be on the ExceptionSink instead? It applies to the current errstream, rather than to the Exiter, per-se.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 25, 2018

Contributor

I moved this around many times, including letting the traceback formatter be None to denote not printing tracebacks. On master, should_print_backtrace applies to the exception message printed to stderr on exit only, not to the log message, so this code conforms to the current status, which also seems to make most sense to me (I was going to delete it, but obviously this is required for the specific case of exiting with error). We take an Exiter as a param in order to avoid subsuming exiting as well as fatal logging functionality. The current errstream is an implementation detail, and btw, also not used yet, I need to make ExceptionSink call its Exiter with the out arg provided.

I don't think should_print_backtrace should be a property of the errstream or the ExceptionSink as opposed to the Exiter unless there are more situations besides "we are printing an error message on an unexpected error, possibly to a terminal" that we would want should_print_backtrace to apply to (which I'm all ears for, but don't see right now).

Note to self to make ExceptionSink use the out arg when exiting.


with maybe_profiled(os.environ.get('PANTSC_PROFILE')):
try:
PantsRunner(exiter, start_time=start_time).run()
except KeyboardInterrupt:
# TODO: ExceptionSink could probably take care of this too, as well as converting SIGINT into

This comment has been minimized.

@stuhood

stuhood Sep 24, 2018

Member

Not sure about that one. Where possible, I feel like treating the Exiter as a high level concept associated with a particular run, and ExceptionSink as the "always available manager of exception logs (and maybe an err stream if it's available)" is a cleanish separation.

With regard to handling keyboard interrupts: those are pretty clearly a per-run concept, and so if possible, their Handling should be part of the Exiter.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 25, 2018

Contributor

Hm, ok, defining it like that I think is better than what I had in my head, which was "ExceptionSink makes sure your logs are logged" (subsuming trapping as well as logging -- the first half of "ExceptionSink"). If we can move global state e.g. changing signal handlers to the Exiter, or to its file, I think that's a great move.

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:catch-everything branch from 88df4d1 to fd183ec Sep 26, 2018

if log_location.pid is None:
raise cls.ExceptionSinkError(
"No-op call to reset_log_location(): argument was {!r}."
.format(log_location))

This comment has been minimized.

@baroquebobcat

baroquebobcat Sep 26, 2018

Contributor

I found this message confusing. I don't think this is a noop. Maybe it should be "log_location must specify either a pid or a directory, but both were None"

Alternatively, you could change LogLocation#__new__ so that it throws when both are None. That would make it easier to trace where the LogLocation in question came from.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 26, 2018

Contributor

The difficulty here is that I'm overloading LogLocation (it used to be called LogLocationRequest) for both setting state and getting state, and those should have different semantics, like you describe. I think the clear answer is to split the two out (one for exceptions_log_path and one for reset_log_location).

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 26, 2018

Contributor

Done!

# Ensure the directory is suitable for writing to, or raise.
cls._check_or_create_new_destination(log_location.log_dir)
if log_location.pid is None:
log_location = log_location.copy(pid=os.getpid())

This comment has been minimized.

@baroquebobcat

baroquebobcat Sep 26, 2018

Contributor

You could also default the pid in this way in LogLocation#__new__

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 26, 2018

Contributor

Will do as part of above.

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:catch-everything branch 5 times, most recently from 20effeb to a2cf6d7 Sep 27, 2018

if exception_text:
yield exception_text

def broadcast_signal_to_client(self, signum):

This comment has been minimized.

@stuhood

stuhood Sep 28, 2018

Member

IMO, we should refactor out all changes to Ctrl+C handling, because that sends signals from the nailgun client to the nailgun server. It should be easy to add later now that we have the pid and pgrp.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 29, 2018

Contributor

Reverted the changes to this file.

@@ -266,20 +261,20 @@ def post_fork_child(self):
# the `pantsd.log`. This ensures that in the event of e.g. a hung but detached pantsd-runner
# process that the stacktrace output lands deterministically in a known place vs to a stray
# terminal window.
self._exiter.set_except_hook(sys.stderr)
ExceptionSink.reset_log_location(LogLocation(os.getpid(), os.getcwd()))

This comment has been minimized.

@stuhood

stuhood Sep 28, 2018

Member

We should keep the location we have before this call, because otherwise we will switch away from the configured location to the cwd here.

We do need to set the pid though, because we just finished forking, and definitely have a new pid.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 29, 2018

Contributor

I have removed the ability to do anything about the pid at all. This is still broken for a totally different reason but won't be in a bit.

return super(GetLogLocationRequest, cls).__new__(cls, log_dir=log_dir, pid=pid)


class LogLocation(datatype([

This comment has been minimized.

@stuhood

stuhood Sep 28, 2018

Member

I think this datatype might be overkill for this case, especially since these properties can be change independently.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 29, 2018

Contributor

This datatype is gone.

@ity
Copy link
Member

ity left a comment

lets address these and will re-review

raise KeyboardInterrupt('User interrupted execution with control-c!')
# TODO: determine the appropriate signal-safe behavior here (to avoid writing to our file
# descriptors re-entrantly, which raises an IOError). This is a best-effort solution.
if cls._is_handling_signal:

This comment has been minimized.

@ity

ity Sep 28, 2018

Member

a docstring here to explain what signals this is actually handling, would be great.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 29, 2018

Contributor

Will do!

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 30, 2018

Contributor

(done!)



logger = logging.getLogger(__name__)


class RunTrackingExiter(Exiter):

This comment has been minimized.

@ity

ity Sep 28, 2018

Member

lets extract RunTrackingExiter changes out of local_pants_runner since its unrelated to this change and to keep this PR limited to pants_daemon_runner changes.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 29, 2018

Contributor

The changes from this file have now also been reverted entirely!

@@ -34,7 +34,9 @@ def __init__(self, exiter, args=None, env=None, start_time=None):
def run(self):
options_bootstrapper = OptionsBootstrapper(env=self._env, args=self._args)
bootstrap_options = options_bootstrapper.get_bootstrap_options()
ExceptionSink.set_destination(bootstrap_options.for_global_scope().pants_workdir)
ExceptionSink.reset_log_location(LogLocation.from_options_for_current_process(

This comment has been minimized.

@ity

ity Sep 28, 2018

Member

as we discussed, separating out pid and log location between daemon_runner and pants_runner is

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 29, 2018

Contributor

This has been changed to just update the location and the exiter from the bootstrap options.



logger = logging.getLogger(__name__)


class RemoteExiter(Exiter):

This comment has been minimized.

@ity

ity Sep 28, 2018

Member

this may not be required at all, since server (pantsd) never sends anything to the client

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 29, 2018

Contributor

Indeed! This file has been reverted entirely!

logger.fatal('lost active connection to pantsd!')
raise_with_traceback(self._extract_remote_exception(pantsd_handle.pid, e))
error_log_msg = '\nlost active connection to pantsd!'
ExceptionSink.log_exception(error_log_msg)

This comment has been minimized.

@ity

ity Sep 28, 2018

Member

this wont be required at all, since you are already handling this earlier

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 29, 2018

Contributor

This file has been reverted entirely!

def _restart_pantsd(self):
return PantsDaemon.Factory.restart(bootstrap_options=self._bootstrap_options)

def _maybe_launch_pantsd(self):
return PantsDaemon.Factory.maybe_launch(bootstrap_options=self._bootstrap_options)

def run(self, args=None):
self._setup_logging()
# Redirect fatal error logging to the current workdir, set the stream to log stacktraces to on

This comment has been minimized.

@ity

ity Sep 28, 2018

Member

all of this can be reverted since we are removing RemoteExiteretc from this change.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 29, 2018

Contributor

This file has been reverted entirely!

@@ -430,7 +430,13 @@ def _render_tool_detail(self, workunit, title, class_prefix='greyed', initially_

def _emit(self, s):
"""Append content to the main report file."""
if os.path.exists(self._html_dir): # Make sure we're not immediately after a clean-all.
report_file_probably_writable = (

This comment has been minimized.

@ity

ity Sep 28, 2018

Member

we can remove this too, as we are removing any RunTracker changes from this.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 29, 2018

Contributor

Reverted!

@@ -63,8 +185,26 @@ def __init__(self, exiter, args, env, bootstrap_options, stdin=None, stdout=None
def _trapped_signals(self, client):
"""A contextmanager that overrides the SIGINT (control-c) and SIGQUIT (control-\) handlers
and handles them remotely."""
# TODO: if this control-c is done at the wrong time (before the pantsd-runner process begins and

This comment has been minimized.

@ity

ity Sep 28, 2018

Member

lets extract all the ctrl-c changes from this PR into a separate one.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 29, 2018

Contributor

This file has been reverted entirely!

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:catch-everything branch 3 times, most recently from eddcea0 to e2ac5a2 Sep 30, 2018

cosmicexplorer added some commits Sep 24, 2018

first attempt at centralizing more global error logging state in Exce…
…ptionSink

- make it handle SIGUSR2 and unhandled exceptions so we can move that logic out of Exiter
@ity
Copy link
Member

ity left a comment

nice work! this is looking great to me - will wait for your changes to +1.

@@ -157,6 +161,8 @@ def assert_runner(self, workdir, config, cmd, extra_config={}, extra_env={}, exp
elapsed = time.time() - start_time
print(bold(cyan('\ncompleted in {} seconds'.format(elapsed))))

# TODO: uncomment this and add an issue link!
run_count = self._run_count(workdir)

This comment has been minimized.

@ity

ity Oct 2, 2018

Member

this was the culprit ;)

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Oct 2, 2018

Just tried cherry-picking, and remembered we'll probably need #6533 and #6539 first, trying that now (EDIT: that worked, so this should be fine to hotfix onto 1.10.x).

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:catch-everything branch from 898e34c to f70e7c0 Oct 4, 2018

@stuhood

stuhood approved these changes Oct 4, 2018

Copy link
Member

stuhood left a comment

Thanks, this looks good. It would be great to continue to (lightly) refine the relationship between the Exiter and the ExceptionSink to make their individual responsibilities clearer... but it looks logically correct from my end.

@@ -51,5 +52,6 @@ def execute(self):
msg = '{cmdline} ... exited non-zero ({code})'.format(cmdline=cmdline, code=result)
raise TaskError(msg, exit_code=result)
except KeyboardInterrupt:
po.send_signal(signal.SIGINT)
# The process may still have exited, even if we were interrupted.
safe_kill(po.pid, signal.SIGINT)

This comment has been minimized.

@stuhood

stuhood Oct 4, 2018

Member

You can leave this in this PR, but a reminder: please do not conflate independent changes in one PR like this in the future. If this line causes a bug, or if the rest of the PR contains a bug, they won't be able to be cherry-picked independently. And it also makes reviewing more challenging.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 4, 2018

Contributor

This was necessary to pass testing. It may not be necessary now with the removal of the signal handlers.

This comment has been minimized.

@stuhood

stuhood Oct 4, 2018

Member

Again, I'm fine with leaving it... just a note for future PRs.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 4, 2018

Contributor

If it's required to pass testing because of the no longer existing changes in signal handling, it would point to signal handling (correctly) being an instance of changing multiple things at once (along with the logging changes in this PR), which I agree is made very clear here. I don't understand why this specific change is something to avoid if we needed it to pass CI?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 5, 2018

Contributor

Reverted!

# Create the directory if possible, or raise if not writable.
cls._check_or_create_new_destination(new_log_location)
else:
# Interpret a None argument to mean to just refresh the file handles for the current pid.

This comment has been minimized.

@stuhood

stuhood Oct 4, 2018

Member

This method is never called with None, afaict, so would be good to remove this case to improve clarity.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 4, 2018

Contributor

Yes, after talking with you last week about where we need to reset the logs (whenever we get bootstrap options or whenever we change pid) and we can remove that now.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 5, 2018

Contributor

Removed!

This is where the the error message on exit will be printed to as well.
"""
# NB: mutate process-global state!
# TODO: chain=True will log tracebacks to previous values of the trace stream as well -- what

This comment has been minimized.

@stuhood

stuhood Oct 4, 2018

Member

The faulthandler docs indicate that you can call unregisteror disable to remove previous copies of handlers. Is that worth doing here?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 4, 2018

Contributor

At first I had it doing that, but ended up just copying the current code (hence the TODO, because writing to maybe-closed maybe-unrelated streams after explicitly changing it sounds like the wrong thing to do). I think it would make more sense to unregister and disable here.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 5, 2018

Contributor

Done!

out=cls._interactive_output_stream)

@classmethod
def _log_unhandled_exception_and_exit(cls, exc_class=None, exc=None, tb=None, add_newline=False):

This comment has been minimized.

@stuhood

stuhood Oct 4, 2018

Member

This method is called from outside of the class, and should maybe be public.

It's also worth considering whether this should (mostly) stay in the Exiter, and just call back to the ExceptionSink for the purposes of writing to the exception logs...? I'm not very clear on that though, given the fact that they both call one another at various times.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 4, 2018

Contributor

I think I like that idea a lot. I think it specifically should be done, along with the comment on the Exiter holding onto its output stream, in a followup. I will continue addressing your comments and seeing what can be done here without lots more code changes.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 5, 2018

Contributor

Broke out #6596!

faulthandler.register(signal.SIGUSR2, interactive_output_stream, chain=True)

# NB: mutate the class variables!
# We don't *necessarily* need to keep a reference to this, but we do here for clarity.

This comment has been minimized.

@stuhood

stuhood Oct 4, 2018

Member

This field is used in _exit_with_failure, so this comment isn't accurate. As commented in _log_unhandled_exception_and_exit, maybe the Exiter should just hold onto the stream it wants to use to render a failure to the user, and the ExceptionSink should just handle the exception logs?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 4, 2018

Contributor

That sounds like a fantastic idea.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 5, 2018

Contributor

Broke out #6596!

@@ -488,6 +488,7 @@ def double_fork():
self.purge_metadata()
self.pre_fork(**pre_fork_opts or {})
is_parent, is_child = fork_func()
# No arguments reuses the same log location, but with the new pid after forking.

This comment has been minimized.

@stuhood

stuhood Oct 4, 2018

Member

I think this comment is in the wrong place?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 5, 2018

Contributor

Deleted!

out = out or sys.stderr
if not is_fileobj_definitely_closed(out):
print(msg, file=out)
# NB: Ensure we write everything out in case it's not an unbuffered stream like stderr.

This comment has been minimized.

@stuhood

stuhood Oct 4, 2018

Member

I think that f.close() implicitly flushes.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 4, 2018

Contributor

If we close stderr here we may see an error when invoking pants as a pex because the pex executor will call sys.stderr.flush() directly (haven't checked yet, but that was the error we saw in our internal CI). pex writing directly to stderr is something that I believe could be configurable and which would be extremely useful. maybe. i'll make a followup ticket.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 4, 2018

Contributor

As in, it calls that in the body of a finally: after the pex exits, and then raises an exception if stderr is closed.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 5, 2018

Contributor

The code is now all the same, but wrapped in a try/catch which calls logger.exception(e). If it is almost definitely better to close it here, I can do it -- just not sure if there's a reason to do it over the flush alone. I've left a detailed TODO above the flush() call about this.



def is_fileobj_definitely_closed(fileobj):
"""Return whether fileobj points to a file object which is definitely closed.

This comment has been minimized.

@stuhood

stuhood Oct 4, 2018

Member

This method is a bit too context-specific to put in util I think.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 4, 2018

Contributor

Agreed, it's not that pythonic either.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 5, 2018

Contributor

Deleted!

self.nailgun_client.send_control_c()
mock_kill.assert_not_called()

@mock.patch('os.kill', **PATCH_OPTS)

This comment has been minimized.

@stuhood

stuhood Oct 4, 2018

Member

This test seems useful... would be good to preserve if possible. If it was broken beforehand, leaving it in as xfail would be good, because there is a race between getting a pid from the server and Ctrl+C arriving at the client that means it's important to be able to noop.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 5, 2018

Contributor

That race is erased in #6579 with the pid/pgrp being required before the rest of the communication continues, btw. I have added these back, fixed the mock so they pass, and added an additional test for the new maybe_send_signal() method.

self.assertRegexpMatches(waiter_run.stderr_data, """\
Signal {signum} was raised\\. Exiting with failure\\.
\\(backtrace omitted\\)
""".format(pid=parent_runner_pid, signum=signal.SIGTERM))

This comment has been minimized.

@stuhood

stuhood Oct 4, 2018

Member

The pid isn't used here, but that's probably fine: telling someone the dead pid is less necessary than rendering the error.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 4, 2018

Contributor

This is an artifact of a previous implementation which logged the pid with a timestamp to stderr, which is now only supposed to occur when writing to a log file, which is why this error stayed hidden. I think it can be removed.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Oct 5, 2018

Contributor

The pid has been removed here, replaced with a note about why we only need to check the stderr.

@cosmicexplorer cosmicexplorer changed the title [WIP] first attempt at centralizing more global error logging state in ExceptionSink first attempt at centralizing more global error logging state in ExceptionSink Oct 4, 2018

cosmicexplorer added some commits Oct 5, 2018

cosmicexplorer added some commits Oct 5, 2018

@ity

ity approved these changes Oct 5, 2018

@stuhood
Copy link
Member

stuhood left a comment

The last commit re-xfails the test that this was supposed to be fixing. We need that test to be passing here. Let's discuss when you're available.

cosmicexplorer added some commits Oct 5, 2018

@stuhood

stuhood approved these changes Oct 8, 2018

Copy link
Member

stuhood left a comment

Thanks Danny.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Oct 8, 2018

This is the only remaining failure -- this is what the safe_kill() call was for earlier. I have a list of quick followups (separate from #6596, which doesn't need to happen immediately) but given the size of this PR I'm just going to fix the failure, land this, and then make the followup.

use safe_kill() in case the process has already stopped
this fixes the keyboardinterrupt integration test

@cosmicexplorer cosmicexplorer merged commit 295ea60 into pantsbuild:master Oct 9, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

cosmicexplorer added a commit that referenced this pull request Oct 10, 2018

Fatal error logging followup fixes (#6610)
### Problem

This PR contains several deemed-necessary-by-me-personally fixes to fatal error logging which weren't put into #6552 because that diff was already massive. This should be considered the same PR as that one.

### Solution

- Default the `logs/` directory to be `.pants.d/logs` so we don't litter the buildroot with a top-level `logs/` directory.
- Add a timestamp to any fatal errors printed to the terminal to more easily debug race conditions.
- Add the process title (from `setproctitle`) to the fatal error log message (this makes it more clear when the process is pantsd, compared to the `sys.argv`, which are arbitrary).
- Remove any mention of exception stacktraces from Exiter. 
- Add `ExceptionSink.reset_should_print_backtrace()` (with a new mutable field `_should_print_backtrace`) to determine whether to print stacktraces to the terminal on exit.
- Testing for all of the above.

cosmicexplorer added a commit that referenced this pull request Oct 10, 2018

first attempt at centralizing more global error logging state in Exce…
…ptionSink (#6552)

See #6530. This is bullet point 2 of that issue.

- rename `ExceptionSink#set_destination()` to `reset_log_location()`.
- (re)?acquire open file handles and persist them as `ExceptionSink` private fields so we can write to them in signal handlers.
  - use `faulthandler.enable()` to log fatal signals (`SIGABRT`, `SIGSEGV`, ...) and stacktraces to the process-specific error stream from #6539.
- add `reset_exiter()` and delegate to the exiter to exit on an unhandled exception.
- add `reset_interactive_output_stream()` to change which stream the Exiter is provided as the `out` argument, and to enable stacktraces to be sent to that stream when the process receives SIGUSR2 (on par with current functionality)
- move everything besides exiting out of the Exiter class

We can un-xfail [the integration test](https://github.com/pantsbuild/pants/blob/f70e7c0b7d55a389d31b9b3b706d0901bc9f655e/tests/python/pants_test/pantsd/test_pantsd_integration.py#L416) introduced in #6539. This makes it significantly easier to diagnose pantsd communication errors and their causes, both locally and in CI.

cosmicexplorer added a commit that referenced this pull request Oct 10, 2018

Fatal error logging followup fixes (#6610)
### Problem

This PR contains several deemed-necessary-by-me-personally fixes to fatal error logging which weren't put into #6552 because that diff was already massive. This should be considered the same PR as that one.

### Solution

- Default the `logs/` directory to be `.pants.d/logs` so we don't litter the buildroot with a top-level `logs/` directory.
- Add a timestamp to any fatal errors printed to the terminal to more easily debug race conditions.
- Add the process title (from `setproctitle`) to the fatal error log message (this makes it more clear when the process is pantsd, compared to the `sys.argv`, which are arbitrary).
- Remove any mention of exception stacktraces from Exiter. 
- Add `ExceptionSink.reset_should_print_backtrace()` (with a new mutable field `_should_print_backtrace`) to determine whether to print stacktraces to the terminal on exit.
- Testing for all of the above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment