Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak dropped connection detection #185

Merged
merged 11 commits into from Oct 17, 2020
Merged

Tweak dropped connection detection #185

merged 11 commits into from Oct 17, 2020

Conversation

florimondmanca
Copy link
Member

@florimondmanca florimondmanca commented Sep 13, 2020

Closes #182

urllib3 had to deal with an issue similar to #182 a while ago: urllib3/urllib3#589

Their approach was to backport a bunch of 3.5+ logic to maintain 2.7 compatibility, but we're lucky enough to be able to use that logic from the stdlib in the form of the selectors module, a "higher-level" alternative to select that handles cross-platform subtleties among other things.

This PR switches the sync backend to use selectors.select() instead of select.select() for detecting whether a connection was dropped (i.e. whether the underlying socket has become immediately readable).

Confirmed that this solves #182, since the CI build passes here but fails on #219.

@florimondmanca
Copy link
Member Author

florimondmanca commented Sep 13, 2020

From my own testing, this updated implementation seems to work fine:

Server:

import uvicorn
from starlette.responses import PlainTextResponse

app = PlainTextResponse("Hello, World!")

if __name__ == "__main__":
    # Make it so that idle client connections are shut down after 3s.
    uvicorn.run(app, timeout_keep_alive=3)

Client script (exits successfully):

import time
import httpcore

with httpcore.SyncConnectionPool() as transport:
    method = b"HEAD"
    url = (b"http", b"localhost", 8000, b"/")
    headers = [(b"host", b"localhost")]
    *_, stream = transport.request(method, url, headers)

    try:
        # During the first two seconds, the connection remains open…
        for __ in range(2):
            time.sleep(1)
            assert not stream.connection.is_connection_dropped()

        # After 3s, Uvicorn shuts down the connection, and we detect it.
        time.sleep(1)
        assert stream.connection.is_connection_dropped()

    finally:
        stream.close()

@florimondmanca florimondmanca requested a review from a team September 13, 2020 10:55
@@ -78,8 +78,8 @@ def close(self) -> None:
self.sock.close()

def is_connection_dropped(self) -> bool:
rready, _wready, _xready = select.select([self.sock], [], [], 0)
return bool(rready)
rready = wait_for_read([self.sock], timeout=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May you please add the same mechanism to the curio backend?

def is_connection_dropped(self) -> bool:
rready, _, _ = select.select([self.socket.fileno()], [], [], 0)
return bool(rready)

@tomchristie tomchristie mentioned this pull request Sep 14, 2020
@tomchristie
Copy link
Member

Seems like a clear win yup.

We'll probably also want some follow ups, eg... if no selector exists, then our graceful-ish fallback can just be never detecting connections as expired, and just relying on the keepalive expiry.

There might be some neater behaviour we can have for detecting expiry, at the point we're actually making the request. (Eg. attempting a read with a minimal timeout prior to writing any data, in order to see any EOF if it exists?....)

@agronholm
Copy link
Contributor

I find it worrying that there is no test that verifies that dropped connections are actually detected. The test suite passes just fine if I unconditionally return False.

@tomchristie
Copy link
Member

tomchristie commented Sep 14, 2020

It'll come. 😀
We've had to cover a lot of ground to get here.

@agronholm
Copy link
Contributor

agronholm commented Sep 14, 2020

Anyway, I did some manual testing and found that sock.recv(1, MSG_PEEK) does detect disconnection just as well as selectors.select():

  • If the connection is still alive, sock.recv(1, MSG_PEEK) raises BlockingIOError
  • If the connection has been dropped, sock.recv(1, MSG_PEEK) returns b''
  • If there is still data in the buffer, neither sock.recv() or selectors.select() can distinguish that from a dropped connection

Given that selectors.select() will use select.select() on Windows anyway, wouldn't it make sense to use sock.recv(1, MSG_PEEK) since it works on all platforms (I checked on Linux, Windows and macOS)?

@florimondmanca
Copy link
Member Author

wouldn't it make sense to use sock.recv(1, MSG_PEEK) since it works on all platforms (I checked on Linux, Windows and macOS)?

Pretty interesting, indeed…

>>> import socket
>>> alice, bob = socket.socketpair(socket.AF_UNIX, socket.SOCK_STREAM)
>>> alice.setblocking(False)
>>> bob.setblocking(False)
>>> alice.recv(1, socket.MSG_PEEK)
BlockingIOError: [Errno 35] Resource temporarily unavailable
>>> bob.send(b"hello")
>>> alice.recv(5)
b'hello'
>>> bob.close()  # Server connection lost
>>> alice.recv(1, socket.MSG_PEEK)
b''

@florimondmanca
Copy link
Member Author

florimondmanca commented Sep 14, 2020

Following on @tomchristie's comment, perhaps this MSG_PEEK trick could be used when there's no selector available…?

Also, it's not clear to me yet whether #182 would surface on Windows. Eg in urllib3/urllib3#589 @sethmlarson wrote:

On non-Windows platforms (or any platform with a better selector than select.select) this will be fixed with

I'm not sure whether this implies that the problem from #182 isn't present on Windows, or if Seth meant that it would be solved with another urllib3 PR at the time.

If Windows is not a problem, then I guess either approach is fine, though I'm more keen on relying on a well-documented and standard solution like selectors than a somewhat obscure (but most likely just fine) solution like MSG_PEEK?

@florimondmanca
Copy link
Member Author

@agronholm Since you mentioned testing on Windows, were you / would you be able to reproduce the issue in #182 on Windows? If we can't reproduce there, then perhaps the safest bet would be to just not deal with it at the moment, issue a fix for Unix, and then deal with any upcoming issues for Windows should they arrive?

@agronholm
Copy link
Contributor

It just requires a lot of established connections within a single process, right? That shouldn't be hard to do.

@tomchristie
Copy link
Member

It looks like MSG_PEEK is possibly a better approach anyways, seems to be available throughout, and less likely to have platform specific differences vs. working with selector?

@tomchristie
Copy link
Member

Incidentally we might want to rename is_connection_dropped to is_at_eof since it better indicates exactly what we mean.

If there's data still in the buffer we want the interface to return False. (We'll most likely end up with a protocol error in that odd case, at least for HTTP/1.1, but the point is that reading on the socket isn't at EOF, so we should go ahead an make the request)

@florimondmanca
Copy link
Member Author

I guess we could roll with MSG_PEEK and see if it explodes in our hands? 😅

@tomchristie
Copy link
Member

I'm all for it - it really does look like the actually most sensible thing to do.

@florimondmanca
Copy link
Member Author

Hitting some road blocks on trying to implement this .recv() idea…

  • When running anyio+asyncio+uvloop, the raw socket is a uvloop PseudoSocket that stubs out recv() calls (among other methods…)
E   TypeError: transport sockets do not support recv() method
  • In the sync case, the raw socket is blocking, so if the connection is alive, sock.recv(1, MSG_PEEK) will block waiting for the other end to send a piece of data (instead of failing immediately). Moving to a non-blocking usage here would mean implementing some infinite loops for read/write operations, which sounds pretty quirky.

Filed #193 if folks would like to see the code.

I'm going to proceed and see if the selectors tweak works with sync, anyio and curio… If it does, then I think we should probably just roll with it.

@agronholm
Copy link
Contributor

When running anyio+asyncio+uvloop, the raw socket is a uvloop PseudoSocket that stubs out recv() calls (among other methods…)

I recall running into this myself. If memory serves, I got the fileno() out of it and then used socket.fromfd() to get a real socket object.

I'm going to proceed and see if the selectors tweak works with sync, anyio and curio… If it does, then I think we should probably just roll with it.

I'll remind you that this does not fix the problem on Windows.

@florimondmanca
Copy link
Member Author

fileno()

Yep, that's what I actually ended up using in this PR too… Updated #193 as well — looking quite good now!

@florimondmanca
Copy link
Member Author

Closing in favor of #193, which I think is getting in good shape now!

@florimondmanca florimondmanca deleted the fm/selectors branch October 2, 2020 21:43
@florimondmanca
Copy link
Member Author

I am reviving this PR based on the insights from Nathaniel in #193 (comment)

TL;DR: this should only be a problem on Unix, so the selectors module is just what we need. (Windows select() is different than on Unix, and shouldn't be exposed to the issue reported in #182. Besides, #182 is indeed about Linux, not Windows.)

@florimondmanca florimondmanca restored the fm/selectors branch October 8, 2020 18:40
@florimondmanca florimondmanca reopened this Oct 8, 2020
@florimondmanca
Copy link
Member Author

Added a test case based on #219.

Should be ready for re-review! cc @JayH5 @cdeler

@florimondmanca florimondmanca requested a review from a team October 8, 2020 19:38
@florimondmanca florimondmanca added the bug Something isn't working label Oct 8, 2020
httpcore/_utils.py Outdated Show resolved Hide resolved
httpcore/_utils.py Outdated Show resolved Hide resolved
@florimondmanca florimondmanca changed the title Tweak sync dropped connection detection Tweak dropped connection detection Oct 9, 2020
@tomchristie
Copy link
Member

Okay, so that one thing I hadn't fully groked about this is that it shouldn't be called is_connection_dropped(), because that's not what it's testing for. It's testing for is_socket_readable(), which in the narrow and very particular sense of a socket that's in between request/response cycles would also just-so-happen to indicate that it must have been dropped.

Why?

Because the only valid data that could be about to come back from the socket at that point in the connection lifecycle is "".
If there's no outstanding request, then we're not expecting any response data.

But in general the state of is_socket_readable() doesn't mean that the connection is dropped.

So... I think we need to:

  1. Rename the method.
  2. Have a brief comment at the point in the connection pool where we call into is_socket_readable() indicating that if the socket is readable in this state then the only valid reason it could have for being readable is a pending disconnect.

@florimondmanca florimondmanca requested a review from a team October 17, 2020 11:46
@florimondmanca
Copy link
Member Author

Okay, I think this is starting to look quite good? Applied latest suggestions and addressed comments, happy to take new reviews!

Copy link
Member

@JayH5 JayH5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me 🍰

Copy link
Member

@cdeler cdeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats perfect for me 🔥

@florimondmanca florimondmanca merged commit 12a25fd into master Oct 17, 2020
@florimondmanca florimondmanca deleted the fm/selectors branch October 17, 2020 20:29
ddkarpovich pushed a commit to ddkarpovich/httpcore that referenced this pull request Oct 19, 2020
* Tweak sync dropped connection detection

* Nits

* Use common helper across sync, asyncio, curio, anyio

* Update

* Add test case

* Rename is_connection_dropped -> is_socket_readable, tweak comments

* Switch to stealing util from trio

Co-authored-by: Tom Christie <tom@tomchristie.com>
@florimondmanca florimondmanca mentioned this pull request Nov 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using select is causing issues when many files are open
5 participants