Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash pailgun client on SIGINT if we haven't received the pid yet #7924

Merged
merged 5 commits into from Jun 26, 2019

Conversation

Projects
None yet
6 participants
@blorente
Copy link
Contributor

commented Jun 21, 2019

Fixes #7920

@blorente blorente requested review from stuhood, cosmicexplorer, illicitonion and ity and removed request for stuhood and cosmicexplorer Jun 21, 2019

@blorente blorente added the pantsd label Jun 21, 2019

@blorente blorente requested a review from patliu85 Jun 21, 2019

@Eric-Arellano Eric-Arellano added this to the 1.17.x milestone Jun 21, 2019

@@ -33,6 +33,7 @@ def __init__(self, pailgun_client, timeout=1, *args, **kwargs):
super(PailgunClientSignalHandler, self).__init__(*args, **kwargs)

def _forward_signal_with_timeout(self, signum, signame):
# TODO Consider not accessing the private function here, or making it public.

This comment has been minimized.

Copy link
@stuhood

stuhood Jun 22, 2019

Member

It's not clear which private function is being accessed...

else:
# NB: We consider not having received a PID yet as "not having started substantial work".
# So in this case, we let the client die gracefully, and the server handle the closed socket.
super(PailgunClientSignalHandler, self).handle_sigint(signum, _frame)

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Jun 24, 2019

Contributor

I'm not sure why this works to fix #7920, because I think thinking that if the client hasn't received the pantsd pid yet, it wouldn't know which pid to kill, so self._forward_signal_with_timeout() wouldn't be crashing the daemon anyway -- am I mistaken? Is it possible to add a test case?

This comment has been minimized.

Copy link
@blorente

blorente Jun 24, 2019

Author Contributor

This case is the case where we haven't received the pid yet, so we just crash the client (via the usual handling sigint mechanism) in this case and let the closing socket crash the request on the server side.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Jun 24, 2019

Contributor

Oh, yes!

else:
# NB: We consider not having received a PID yet as "not having started substantial work".
# So in this case, we let the client die gracefully, and the server handle the closed socket.
super(PailgunClientSignalHandler, self).handle_sigint(signum, _frame)

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Jun 24, 2019

Contributor

Oh, yes!

@ity

ity approved these changes Jun 24, 2019

@blorente blorente force-pushed the blorente:blorente/7920/ctrl-c branch from 212e869 to ffccd8c Jun 24, 2019

@stuhood stuhood force-pushed the blorente:blorente/7920/ctrl-c branch from ffccd8c to 10e8aa9 Jun 24, 2019

@stuhood

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

Have rebased on top of #7944.

@stuhood

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

Lots of network flakiness. Will wait for everything to complete and then go over it with a fine tooth comb to try and ensure that none of the failures are real.

@stuhood

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

One of the failures looks legit, unfortunately: https://gist.github.com/stuhood/d8cb70998b35bb9e91b266ced935dfaf

I'll restart it to see if it was flaky, but the error message (expected X but got Y) looks potentially relevant.

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Jun 25, 2019

One of the failures looks legit, unfortunately: https://gist.github.com/stuhood/d8cb70998b35bb9e91b266ced935dfaf

I'll restart it to see if it was flaky, but the error message (expected X but got Y) looks potentially relevant.

I have tried to run this 5 times locally and it didn't repro.
The error shows pantsd crashing unexpectedly before the client, which makes really think that this was not an issue with this change. There is another test that fails, so will focus on that before diving into this.

@blorente blorente force-pushed the blorente:blorente/7920/ctrl-c branch from f41d1c8 to b32a6d0 Jun 25, 2019

@blorente blorente force-pushed the blorente:blorente/7920/ctrl-c branch from b32a6d0 to a79af85 Jun 25, 2019

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Jun 25, 2019

Needs to either ignore the failing integration test shard and land anyways (#7952) or wait for #7953 to land and then rebase on top of it.

blorente and others added some commits Jun 21, 2019

@ity ity force-pushed the blorente:blorente/7920/ctrl-c branch from 3024a28 to b5bd4df Jun 25, 2019

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented Jun 26, 2019

Merging despite the integration shard failures, they seem instances of #7952

@blorente blorente merged commit 8edbcef into pantsbuild:master Jun 26, 2019

1 check failed

continuous-integration/travis-ci/pr The Travis CI build failed
Details

blorente added a commit that referenced this pull request Jun 26, 2019

Add some of the wording from #7913 that wasn't merged in #7924 (#7958)
I accidentally merged #7924 without checking, and it didn't have all the wording changes that we wanted from #7913. This PR adds back those changes.

Merging despite red CI because the only shard that fails is known (#7952)

blorente added a commit to blorente/pants that referenced this pull request Jun 26, 2019

Crash pailgun client on SIGINT if we haven't received the pid yet (pa…
…ntsbuild#7924)

## Problem

If the client hadn't received the remote process PID when it received a Ctrl-C signal, it was unable to forward it to the daemon. The client receives the PID once the actual pants work starts, so there was no way of Ctrl-C-ing out of pants when a request was waiting to aquire the pantsd lock. More context and examples here: pantsbuild#7920

## Solution

Crash the client when we receive ctrl-c and haven't received a remote pid yet, so that the request is no longer valid, and the server can stop trying to serve it.

## Result

Ctrl-C now works when waiting for the daemon lock, and it doesn't kill the daemon.

blorente added a commit to blorente/pants that referenced this pull request Jun 26, 2019

Add some of the wording from pantsbuild#7913 that wasn't merged in pa…
…ntsbuild#7924  (pantsbuild#7958)

I accidentally merged pantsbuild#7924 without checking, and it didn't have all the wording changes that we wanted from pantsbuild#7913. This PR adds back those changes.

Merging despite red CI because the only shard that fails is known (pantsbuild#7952)

@illicitonion illicitonion removed this from the 1.17.x milestone Jul 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.