New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash pailgun client on SIGINT if we haven't received the pid yet #7924
Conversation
@@ -33,6 +33,7 @@ def __init__(self, pailgun_client, timeout=1, *args, **kwargs): | |||
super(PailgunClientSignalHandler, self).__init__(*args, **kwargs) | |||
|
|||
def _forward_signal_with_timeout(self, signum, signame): | |||
# TODO Consider not accessing the private function here, or making it public. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear which private function is being accessed...
else: | ||
# NB: We consider not having received a PID yet as "not having started substantial work". | ||
# So in this case, we let the client die gracefully, and the server handle the closed socket. | ||
super(PailgunClientSignalHandler, self).handle_sigint(signum, _frame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this works to fix #7920, because I think thinking that if the client hasn't received the pantsd pid yet, it wouldn't know which pid to kill, so self._forward_signal_with_timeout()
wouldn't be crashing the daemon anyway -- am I mistaken? Is it possible to add a test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is the case where we haven't received the pid yet, so we just crash the client (via the usual handling sigint mechanism) in this case and let the closing socket crash the request on the server side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes!
else: | ||
# NB: We consider not having received a PID yet as "not having started substantial work". | ||
# So in this case, we let the client die gracefully, and the server handle the closed socket. | ||
super(PailgunClientSignalHandler, self).handle_sigint(signum, _frame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes!
212e869
to
ffccd8c
Compare
ffccd8c
to
10e8aa9
Compare
Have rebased on top of #7944. |
Lots of network flakiness. Will wait for everything to complete and then go over it with a fine tooth comb to try and ensure that none of the failures are real. |
One of the failures looks legit, unfortunately: https://gist.github.com/stuhood/d8cb70998b35bb9e91b266ced935dfaf I'll restart it to see if it was flaky, but the error message (expected X but got Y) looks potentially relevant. |
I have tried to run this 5 times locally and it didn't repro. |
f41d1c8
to
b32a6d0
Compare
b32a6d0
to
a79af85
Compare
3024a28
to
b5bd4df
Compare
Merging despite the integration shard failures, they seem instances of #7952 |
…ntsbuild#7924) ## Problem If the client hadn't received the remote process PID when it received a Ctrl-C signal, it was unable to forward it to the daemon. The client receives the PID once the actual pants work starts, so there was no way of Ctrl-C-ing out of pants when a request was waiting to aquire the pantsd lock. More context and examples here: pantsbuild#7920 ## Solution Crash the client when we receive ctrl-c and haven't received a remote pid yet, so that the request is no longer valid, and the server can stop trying to serve it. ## Result Ctrl-C now works when waiting for the daemon lock, and it doesn't kill the daemon.
…ntsbuild#7924 (pantsbuild#7958) I accidentally merged pantsbuild#7924 without checking, and it didn't have all the wording changes that we wanted from pantsbuild#7913. This PR adds back those changes. Merging despite red CI because the only shard that fails is known (pantsbuild#7952)
Fixes #7920