-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asyncio server hang when clients connect and immediately disconnect #71573
Comments
I recently ported ZEO to asyncio. We'd had a bug in our old asyncore-based server where the server would hang if several connections were made and then immediately disconnected on Mac OS X. This was due to an error-handling bug in our code that we fixed. We have a regression test for this case. The regression test for this case fails using asyncio.Server. I've attached a (ZEO-independent) script that demonstrates the problem. If you run the script with Python 3.4 or 3.5, I expect the script will hang. It does for me on Mac OS X 10.10.5 and Ubuntu 14.04. |
FWIW, using uvloop avoids the hang. |
Please reduce program, and make sure it still hangs. |
Yeah, I'd like to see a more minimal repro to understand what's going in. |
This is already pretty minimal. There are no external dependencies. |
Plese reduce even more. I mean remove debugging, specifi commands, and all extra code, that is not related to original problem. |
Also I recommend you to use asyncio streams, instead of reinventing wheels. So, reading your command will look like: data = await stream.read_exactly(4)
(len,) = unpack(">I", data)
command = await stream.read_exactly(len) |
OK, I *was* able to simplify it a fair bit. I'm uploading a new version. I left prints in because I think you'd find them helpful, but I'll upload another version without prints. |
One more thing. Why you set socket.SO_LINGER ? and why lingering timeout is 0 seconds ? Removing that eliminate problem completely. |
Here's a version sans prints |
I can't personally run that code and get the results you are getting; could you please walk us through what happens (as far as you can tell)? Reading the code I find myself quite confused about which parts of the code might be active or not. E.g. is self.messages used? Does its actual contents matter? Where does it end up? |
Jim, I think you wanted to post this link in this issue: https://bugs.launchpad.net/zodb/+bug/135108/comments/9 instead of in bpo-27392. I can reproduce this on my mac, but so far I've no idea what's going on. |
Guido, are you saying that the script runs without hanging for you? Are you running the version with prints? This is an adaptation of the echo server and client from the docs. The server runs in a thread. It just echos it's input. The client just waits for a message from the server, and then send messages (one in attached echo2.py) and waits for replies. When I run this on Mac and ubuntu 14.04, the server never sees the messages sent by the client. I'm uploading a newer version that simplifies the messages data structure and adds some prints to, I think, make the sequence easier to see. Fixing the bug that causes all the tracebacks to be printed would also make this easier to interpret. Commenting out the code that makes and closes the socket connections with SO_LINGER and running echo2.py should also make it easy to see the trivial expected client/server interaction. I don't think the details of the interaction between the server and the client are very important, other than the fact that the client gets the first message from the server and the server doesn't get the subsequent message from the client. |
Yuri, right you are. Thanks. Марк, see https://bugs.launchpad.net/zodb/+bug/135108/comments/9 |
Running out of time to debug this today. I think this is a bug in CPython, in either socket or select module. When I inject some debug code in selectors.py and replace KQueue with select(), I can see that the server thread's selector stops working at some point due to a EBADF error. I think something similar is happening with the KQueue selector -- at some point it just stops to return events correctly. Again, I might be wrong about this all, but this is what I think after 2.5 hours of debugging. |
No, I just don't have a computer right now, only a phone. --Guido (mobile) |
WRT CPython/sockets this problem doesn't happen if I use asyncore to accept connections and hand them off to create_connection. :) It also doesn't occur with uvloop, which I assume still uses sockets. Also, FWIW, the relevant ZEO test passes if I use SSL, which is how I'm working around this now for the tests. |
No, uvloop doesn't use python sockets or select for IO at all. All IO is done in libuv.
Interesting. |
It looks like this was fixed by bpo-27759!. Jim, could you please verify? |
Cool, I will verify soon. |
Yes, that change addresses this issue. Thanks! Will this be backported? |
Yuri, are you going to backport the fix to 3.4? |
Isn't 3.4 in security fixes only mode? |
Jim ask for a backport. In case the problem is not a security issue that needs to be backported, feel free to close the ticket. |
This is arguably a security issue because it's a DoS vector. I don't feel strongly about it though. |
Sorry Jim, was replying from my email client, didn't see all messages.
Yeah, I can see why. I can commit this to 3.4 in a week. Christian, feel free to commit this if you want this issue to be closed earlier. |
Alright, I've backported the fix to 3.4. Closing this. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: