Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash pailgun client on SIGINT if we haven't received the pid yet #7924

Merged
merged 5 commits into from Jun 26, 2019

Conversation

blorente
Copy link
Contributor

Fixes #7920

@blorente blorente requested review from stuhood, cosmicexplorer, illicitonion and ity and removed request for stuhood and cosmicexplorer June 21, 2019 19:50
@blorente blorente requested a review from patliu85 June 21, 2019 19:50
@Eric-Arellano Eric-Arellano added this to the 1.17.x milestone Jun 21, 2019
@@ -33,6 +33,7 @@ def __init__(self, pailgun_client, timeout=1, *args, **kwargs):
super(PailgunClientSignalHandler, self).__init__(*args, **kwargs)

def _forward_signal_with_timeout(self, signum, signame):
# TODO Consider not accessing the private function here, or making it public.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear which private function is being accessed...

else:
# NB: We consider not having received a PID yet as "not having started substantial work".
# So in this case, we let the client die gracefully, and the server handle the closed socket.
super(PailgunClientSignalHandler, self).handle_sigint(signum, _frame)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this works to fix #7920, because I think thinking that if the client hasn't received the pantsd pid yet, it wouldn't know which pid to kill, so self._forward_signal_with_timeout() wouldn't be crashing the daemon anyway -- am I mistaken? Is it possible to add a test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is the case where we haven't received the pid yet, so we just crash the client (via the usual handling sigint mechanism) in this case and let the closing socket crash the request on the server side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes!

else:
# NB: We consider not having received a PID yet as "not having started substantial work".
# So in this case, we let the client die gracefully, and the server handle the closed socket.
super(PailgunClientSignalHandler, self).handle_sigint(signum, _frame)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes!

@stuhood
Copy link
Sponsor Member

stuhood commented Jun 24, 2019

Have rebased on top of #7944.

@stuhood
Copy link
Sponsor Member

stuhood commented Jun 24, 2019

Lots of network flakiness. Will wait for everything to complete and then go over it with a fine tooth comb to try and ensure that none of the failures are real.

@stuhood
Copy link
Sponsor Member

stuhood commented Jun 25, 2019

One of the failures looks legit, unfortunately: https://gist.github.com/stuhood/d8cb70998b35bb9e91b266ced935dfaf

I'll restart it to see if it was flaky, but the error message (expected X but got Y) looks potentially relevant.

@blorente
Copy link
Contributor Author

One of the failures looks legit, unfortunately: https://gist.github.com/stuhood/d8cb70998b35bb9e91b266ced935dfaf

I'll restart it to see if it was flaky, but the error message (expected X but got Y) looks potentially relevant.

I have tried to run this 5 times locally and it didn't repro.
The error shows pantsd crashing unexpectedly before the client, which makes really think that this was not an issue with this change. There is another test that fails, so will focus on that before diving into this.

@blorente
Copy link
Contributor Author

Needs to either ignore the failing integration test shard and land anyways (#7952) or wait for #7953 to land and then rebase on top of it.

@blorente
Copy link
Contributor Author

Merging despite the integration shard failures, they seem instances of #7952

@blorente blorente merged commit 8edbcef into pantsbuild:master Jun 26, 2019
blorente added a commit that referenced this pull request Jun 26, 2019
I accidentally merged #7924 without checking, and it didn't have all the wording changes that we wanted from #7913. This PR adds back those changes.

Merging despite red CI because the only shard that fails is known (#7952)
blorente added a commit to blorente/pants that referenced this pull request Jun 26, 2019
…ntsbuild#7924)

## Problem

If the client hadn't received the remote process PID when it received a Ctrl-C signal, it was unable to forward it to the daemon. The client receives the PID once the actual pants work starts, so there was no way of Ctrl-C-ing out of pants when a request was waiting to aquire the pantsd lock. More context and examples here: pantsbuild#7920

## Solution

Crash the client when we receive ctrl-c and haven't received a remote pid yet, so that the request is no longer valid, and the server can stop trying to serve it.

## Result

Ctrl-C now works when waiting for the daemon lock, and it doesn't kill the daemon.
blorente added a commit to blorente/pants that referenced this pull request Jun 26, 2019
…ntsbuild#7924  (pantsbuild#7958)

I accidentally merged pantsbuild#7924 without checking, and it didn't have all the wording changes that we wanted from pantsbuild#7913. This PR adds back those changes.

Merging despite red CI because the only shard that fails is known (pantsbuild#7952)
@illicitonion illicitonion removed this from the 1.17.x milestone Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[pantsd] Ctrl-C isn't forwarded properly when a pailun request is waiting for another one to finish
6 participants