Skip to content
This repository has been archived by the owner on Nov 10, 2023. It is now read-only.

Nailgun python client gets connection closed #534

Closed
dborowitz opened this issue Nov 30, 2015 · 15 comments
Closed

Nailgun python client gets connection closed #534

dborowitz opened this issue Nov 30, 2015 · 15 comments
Assignees
Labels

Comments

@dborowitz
Copy link

I'm having the issue described in d32c960, but that commit hasn't fixed the problem (I'm running at 665524d). I didn't see an open issue tracking this, so I'm filing a new one.

I went to look for the debug logging referred to in bhamiltoncx/nailgun@c2788d6, but can't find it in buck-out/log. I can reproduce this pretty reliably (25% or so of the time), so let me know where to find logs and I'll be happy to provide them.

I'm reproducing with:

rm -f buck-out/log/buck*.log; buck clean && rm -rf buck-out eclipse-out && buck build gerrit-common:server
@bhamiltoncx
Copy link
Contributor

Thanks for reporting, and sorry for the problem.

Can you just zip up the buck-out/log/ directory and include a link to that?

@davido
Copy link
Contributor

davido commented Dec 1, 2015

It worth noting, that you could easily reproduce it yourself. All you need is to clone gerrit: [1] and apply this pending Buck upgrade patch: [2].

@dborowitz
Copy link
Author

@bhamiltoncx
Copy link
Contributor

Thanks, @dborowitz. That error is helpful. It's not quite the same as the one I fixed; your error is a failure to shut down the Nailgun server.

Could you possibly provide a zip of the buck-out/log/ directory as I mentioned?

@davido, I have been trying and I've been unable to reproduce the problem. There's something about the build environment that triggers this issue. Here's the steps I tried; it worked fine.

https://gist.github.com/bhamiltoncx/8d0469f46a59c6e935b4

@bhamiltoncx
Copy link
Contributor

I also tried a second time and induced the client to kill the server like @dborowitz's gist, but it didn't reproduce the error:

https://gist.github.com/bhamiltoncx/20727de5cdc5c4b02ace

@dborowitz
Copy link
Author

The zip file is in the same gist as the console output.

@bhamiltoncx
Copy link
Contributor

The zip file is in the same gist as the console output.

Ah ha, thanks! I didn't realize you could attach binary files to gists, so my eyes just glanced over that. Reading through now.

@dborowitz
Copy link
Author

Aside...

I didn't realize you could attach binary files to gists

Not the simplest thing in the world, you have to actually clone the gist repo and add the file manually. And directories aren't supported. shrug

@bhamiltoncx
Copy link
Contributor

Ah, there's no buck logs in there at all, since you did rm -f buck-out/log/buck*.log at the start of the command. :( Those logs are the ones I needed, since they were for the previously-running server.

It'd be great if you could reproduce this without the rm -f buck-out/log/buck*.log..

@dborowitz
Copy link
Author

Oops. I was trying to be helpful by not polluting the directory with logs from other runs.

Done: https://gist.github.com/dborowitz/d2c83924066c067dbf5f/3dd0ea6d7694544dae10024a0518c27a565575a4

@bhamiltoncx
Copy link
Contributor

Ah hah. I bet this is a Linux vs. OS X issue. Let me test some more on Linux.

By the way, what version of Python are you running on this system?

@dborowitz
Copy link
Author

Python 2.7.6.

@bhamiltoncx
Copy link
Contributor

OK, now I know what the problem is. Can you try this patch to Buck?

ffaebc0

Here's my analysis.

In 3c872ec , I changed the behavior of the Buck wrapper for the Nailgun client.

Previously, it would exec() the nailgun C client passing the command ng-stop when running buck kill. If the C client exited with an error code of 230, 229, or 227, it would silently swallow the error.

The C client would exit with error code 227 if the server ever closed the socket without sending an exit code—and this is "expected" behavior whenever running the ng-stop command (it just kills the server and doesn't send an exit code to the client).

With the new Python client, I updated the check to look for a NailgunException with one of those error codes inside. However, the Python client currently raises socket.error with errno 104 (ECONNRESET), not NailgunException, when this happens.

@dborowitz
Copy link
Author

Unable to reproduce this with

$ for i in {1..20}; do buck clean && buck build gerrit-common:server; done

so I think we're good. Thanks!

@bhamiltoncx
Copy link
Contributor

Great! Fix should land today. Thanks again for the detailed report.

openstack-gerrit pushed a commit to openstack-infra/gerrit that referenced this issue Dec 3, 2015
The new Buck version fixed annoying stdout spamming bug on unit test
failures: [1]. Now we can revert our monkey patching hack to prevent
that.

Since [2] Buck interferes with files in buck-out directory: [3]. Switch
to using eclipse-out directory as Eclipse output directory instead. For
this change it's necessary to clean up buck-out directory, otherwise
`buck test` would fail.

This version also fixed "Python client lost connection" bug: [4].

This reverts commit 94e93aa.

[1] facebook/buck#505
[2] facebook/buck@35cb495
[3] facebook/buck#527
[4] facebook/buck#534

Change-Id: I4cd1a99ce9d0615713c235d873e6cdd61b1854bb
shs96c pushed a commit to shs96c/buck that referenced this issue Dec 26, 2015
Summary: Users on Linux reported the new Python Nailgun client
was leaking `socket.error` exceptions.

This caused the client (at least on Linux) to incorrectly exit with an
error code when shutting down the server, since the server abruptly
closes the socket without sending an exit code when we do that.

This fixes the Python client to turn `socket.error` exceptions
into `NailgunException`s with code `CONNECTION_BROKEN` to mirror
the behavior of the C client.

Now, the Buck wrapper for the Python Nailgun client will correctly
ignore connection broken errors when shutting down the server.

Closes facebook#534

Test Plan: `buck clean; buck build buck` on Mac.
Asked original bug reporters to test patch, they confirmed it
fixed the issue in the Gerrit repo.

Reviewed By: sdwilsh

fb-gh-sync-id: 1d9aa34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants