Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for unix domain sockets #511

Merged
merged 6 commits into from
Nov 19, 2019

Conversation

lundberg
Copy link
Contributor

@lundberg lundberg commented Nov 6, 2019

This PR adds support for connecting through unix domain sockets.

New client argument uds is added to define unix domain socket path to use.

Usage:

async with httpx.AsyncClient(uds="/var/run/docker.sock") as client:
    # This request will connect through the socket file
    r = await client.get("http://docker/version")

Solves previously discussed topic in #133

Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the lead on a proposal for this. I submitted a general question, didn't look at the code too much. Maybe we can revive discussion on #133 first.

@@ -232,7 +237,7 @@ def loop(self) -> asyncio.AbstractEventLoop:
) -> BaseTCPStream:
try:
stream_reader, stream_writer = await asyncio.wait_for( # type: ignore
asyncio.open_connection(hostname, port, ssl=ssl_context),
self._open_connection(hostname, port, ssl_context),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we could generalize the backend interface so that it exposes a way to open a unix socket, rather than create a separate backend?

This way, users won't need to use a specific backend, and we can later look at adding this piece of functionality to the trio backend, too.

I'm thinking, maybe a new .open_unix_socket() method similar to the current .open_tcp_socket(), and then make the switch between the two in HTTPConnection.connect() (if I remember correctly that that's where we create a stream instance).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feedback! I was thinking the same but wasn't sure if stream dispatching would fit that high in the stack.

If we dispatch between tcp and unix connection in HttpConnection, we need to determine what makes it select one or the other.

One example could be the origin.scheme being unix or http+unix to favour the unix socket route. But that affects other bits too much maybe, and also the path to the socket file needs to be passed some way, e.g. as origin.host which in turn hides any wanted real hostname to be passed through the unix socket.

For now, it might be better to add the unix socket path as an optional argument to the base class ConcurrencyBackend and implement the dispaching there in new .open_stream() method, which in turn calls your suggested .open_tcp_stream() / .open_unix_stream().

That way both AsyncioBackend, TrioBackend etc gets support for the socket file feature, but one needs to instantiate the backend with the socket file arg as my current implementation in the separate backend does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Well, if we put aside the whole concurrency backend aspect (which is only us trying to abstract I/O functionality so that we aren't particularly tied to any async library), maybe there's a way to, as @tomchristie suggested, to implement this as a separate, very narrow-focused dispatcher.

It's clearer to me why it should be doable if we look at requests-unixsocket, which provides unix sockets in the form of a Session subclass that uses its own adapter (a concept somewhat similar to dispatchers in HTTPX). The implementation is based on HTTP connection and connection pool subclasses. In theory we should be able to take the same path in HTTPX.

And then in terms of API, maybe aim at something like…

import httpx
from httpx_unixsocket import UnixSocketAsyncDispatcher

dispatch = UnixSocketAsyncDispatcher("/var/run/docker.sock")
async with httpx.AsyncClient(dispatch=dispatch) as client:
    r = await client.get("http+unix://docker/version")

@tomchristie
Copy link
Member

Similarly to #509 (comment) I’d probably like to see any support here via a third party package first, rather than including it in the core package.

I was also slightly surprised to see that the implementation was a ComcurrencyBackend class, rather than a Dispatcher implementation, which is what I would have expected to see, but I’ve not dug into that.

@florimondmanca
Copy link
Member

I was also slightly surprised to see that the implementation was a ComcurrencyBackend class, rather than a Dispatcher implementation

It turns out that opening a Unix socket is very much an async library-level piece of functionality (here you see that we must call asyncio.open_unix_connection() instead of asyncio.open_connection()), so it kind of makes sense that adding support for Unix sockets requires us to play with concurrency backends.

That said, I think #133 initially discussed using a special http+unix scheme, which doesn't seem to be the way we're heading at here.

@lundberg
Copy link
Contributor Author

lundberg commented Nov 6, 2019

I was also slightly surprised to see that the implementation was a ComcurrencyBackend class, rather than a Dispatcher implementation, which is what I would have expected to see, but I’ve not dug into that.

It's true that this could be implemented as a Dispatcher, but that would prevent the use of other existing dispatchers through a socket file, like if you're running some wsgi server bound to a unix socket and want to use WSGIDispatch.

@florimondmanca
Copy link
Member

florimondmanca commented Nov 6, 2019

like if you're running some wsgi server bound to a unix socket and want to use WSGIDispatch.

Except WSGIDispatch doesn't actually make network calls, does it? (I think it merely passes the correct environ/start_response components to the WSGI app.)

I think it's also fine that clients equiped with a Unix socket dispatcher can't make regular http:// requests. We aren't preventing users from using multiple clients, one for each use case (e.g. HTTP/TCP, HTTP/Unix requests, HTTP/WSGI, etc).

@lundberg
Copy link
Contributor Author

lundberg commented Nov 6, 2019

I think it's also fine that clients equiped with a Unix socket dispatcher can't make regular http:// requests.

Maybe I'm misunderstanding, but tcp or unix socket is just the connections. A client with an available unix socket, probably needs the surrounding logic being triggered as is, like connection pooling, proxies etc, only that the actual communication in the end is over the socket file rather than over tcp?

@florimondmanca
Copy link
Member

Yeah I think you're mostly right there @lundberg. :-) Unix vs TCP is merely a matter of transport, which ideally should be transparent to the end user in terms of dispatchers.

But I also kind of agree with @tomchristie that for HTTPX's own sustainability, pushing for building third-party packages is definitely worthwhile. I reckon that currently HTTPX doesn't provide enough hooks to allow adding Unix socket support as a third party package. But it never will expose such hooks that might be useful for plenty of other use cases if we don't actually challenge the status quo and do the work of uncovering where those hooks or entrypoints should be. At least that's how I'm seeing it. :-)

So, just as a thought experiment, this is the place where I think the switch between TCP or Unix would make sense:

stream = await self.backend.open_tcp_stream(host, port, ssl_context, timeout)

If this line was refactored into a .open_stream() method on HTTPConnection, we'd be able to build a subclass of it, say UnixHTTPConnection that opens a Unix socket stream instead of a TCP stream.

Then, we could also refactor this part:

connection = HTTPConnection(
origin,
verify=self.verify,
cert=self.cert,
timeout=self.timeout,
http_versions=self.http_versions,
backend=self.backend,
release_func=self.release_connection,
trust_env=self.trust_env,
)

into a .create_connection(origin) method, that could be overridden in a UnixConnectionPool subclass that creates an UnixHTTPConnection and also stores the socket path.

At that point, basic needs would be covered as we'd be able to use dispatch=UnixConnectionPool() to make connections over a Unix socket, even if we'd have a split between TCP and Unix. (Except that that functionality would be maintained by the community. HTTPX maintainers do have limited resources as well. 😉)

Then, if we decide to bring the functionality in core, we'll be able to use the newly-created hooks to simplify the implementation, and we'd have pre-existing knowledge about how things work, should be tested, etc.

So, what do you all think about the idea of introducing these hooks? (HTTPConnection.open_stream(), ConnectionPool.create_connection())

(My core intuition is that there might be a fundamental abstraction we're missing on. I think @tomchristie mentioned before that there may be key insights to learn from ASGI in that regard. I agree, but the to-be-discovered abstraction isn't here yet.)

@lundberg
Copy link
Contributor Author

lundberg commented Nov 6, 2019

So, what do you all think about the idea of introducing these hooks? (HTTPConnection.open_stream(), ConnectionPool.create_connection())

These are great ideas and would make extending connections a lot easier @florimondmanca!

One thought only;
Let's say your suggested hooks exists, and in a future core/3rd UnixHTTPConnection class we can override the suggested open_stream() method to open a unix socket connection. We still would want to use the client's given/selected backend, like AsyncioBackend or TrioBackend, leading us to use self.backend.open??? in the implementation of ???Connection.open_stream(), right?

So even with the hooks in place, the ConcurrencyBackend still, in my opinion, needs both tcp and unix interface methods and implementations in related backends, since it's where the actual transport connections currently are being made.

@florimondmanca
Copy link
Member

florimondmanca commented Nov 6, 2019

We still would want to use the client's given/selected backend, like AsyncioBackend or TrioBackend

Well, I think that's how we'd do things if we were to implement this in core, but that's not an immediate requirement for a 3rd party implementation.

The 3rd party package could just have its own open_unix_socket() helper, and use it there instead of calling into the concurrency backend.

It doesn't even have to support trio right away — asyncio-only can be defined as a constraint, which might be resolved by someone contributing trio support later. :-)

@florimondmanca
Copy link
Member

florimondmanca commented Nov 6, 2019

The only bit that might be off is that right now HTTP connections manipulate TCPStream instances. There's nothing particularly "TCP" about this interface — what seems to be important is that it's a socket of some sort, that exposes read/write operations. So maybe we should rename that to SocketStream or something? (This is the terminology that trio uses.)


It doesn't even have to support trio right away — asyncio-only can be defined as a constraint, which might be resolved by someone contributing trio support later. :-)

Just to solidify this claim, trio also has open_unix_socket() so extending an initial asyncio-only implementation should be easy enough. :)

@lundberg
Copy link
Contributor Author

lundberg commented Nov 7, 2019

The 3rd party package could just have its own open_unix_socket() helper, and use it there instead of calling into the concurrency backend.

This is true @florimondmanca , but i really think that an http client should have the option of connecting through a unix socket in its core, since it's part of pythons core to support both tcp and unix socket connections, which in turn is transparent of what protocol being used, HTTP in httpx's case.

I’d probably like to see any support here via a third party package first, rather than including it in the core package.

Let's take curl as an example, which has the client option --unix-socket to set transport over a specific unix socket file instead of TCP.

That said, I think the URL shouldn't be used to hold information about any socket file or mess with the scheme, like requests-unixsocket has to implement it in a rather "hacky" way, due to lack of hooks or core functionality in the requests lib.

Another example is uvicorn, which has the option --uds to bind to a unix domain socket, why wouldn't we want httpx to have core support to request uvicorn that way @tomchristie ?
By the way, this is how I test unix socket in this PR, i.e. by starting uvicorn bound to an uds file.

@tomchristie
Copy link
Member

I'm certainly okay with us considering this functionality for core (tho we may want to think about the top-level API if that's what we're doing.) I'd expect us to have an open_uds_connection on the concurrency backend, and then expose usd=... on the client, and have the connection pool switch between open_tcp_connection or open_uds_connection depending on if uds is set.

@lundberg lundberg force-pushed the unix-socket-backend branch 2 times, most recently from 9c4abdd to 4d8d6fa Compare November 7, 2019 23:26
@lundberg
Copy link
Contributor Author

lundberg commented Nov 7, 2019

I'd expect us to have an open_uds_connection on the concurrency backend, and then expose usd=... on the client, and have the connection pool switch between open_tcp_connection or open_uds_connection depending on if uds is set.

Spot on @tomchristie , this was my initial idea but started easy with the previous backend. I have now refactored accordingly! Any suggestions of more tests that should be done?

The only bit that might be off is that right now HTTP connections manipulate TCPStream instances. There's nothing particularly "TCP" about this interface — what seems to be important is that it's a socket of some sort, that exposes read/write operations. So maybe we should rename that to SocketStream or something? (This is the terminology that trio uses.)

Since backends now return both tcp and uds streams, your renaming suggestion TCPStream -> SocketStream makes sense @florimondmanca . Should I do this aswell?

@florimondmanca
Copy link
Member

florimondmanca commented Nov 8, 2019

Should I do this aswell?

Thanks for suggesting! If you’re up for it let’s try to get the rename in via a separate PR maybe? :)

I’ll try to have a new look at this one soon!

@lundberg
Copy link
Contributor Author

lundberg commented Nov 8, 2019

If you’re up for it let’s try to get this in via a separate PR maybe? :)

Well, a new PR might be ok I guess, but since the only code instantiating and returning TCPStream are these open_???_stream methods in the backends.
I'd say it belongs to this PR, OK @florimondmanca ?

Question, do we need to do this rename as a non-breaking change, like keeping the old class and extend the new named one with a deprecation warning? Any 3rd party package might directly import httpx.BaseTCPStream. Thoughts on that @tomchristie ?

Possible deprecation solution:

class BaseSocketStream:
    ...

class BaseTCPStream(BaseSocketStream):
    def __new__(
        cls: typing.Type["BaseTCPStream"], *args: typing.Any, **kwargs: typing.Any
    ) -> "BaseTCPStream":
        warnings.warn(
            "BaseTCPStream is deprecated in favour of BaseSocketStream",
            DeprecationWarning,
            stacklevel=2,
        )
        return super().__new__(cls)

This would yield a deprecation warning if any 3rd party package uses BaseTCPStream.

@florimondmanca
Copy link
Member

florimondmanca commented Nov 8, 2019

I'd say it belongs to this PR, OK @florimondmanca ?

It would probably be easier for all to propose this change separately.

We tend to encourage focused, incremental, isolated PRs as much as possible, even if that means having to deal with changes/conflicts with other pending PRs, as it facilitates reviewing and auditing. Two isolated PRs tend to be reviewed more efficiently, and merged more quickly, than a single PRs that tries to add/change two things at once. :-)

FYI, a rename has already happened in the past, if it can be of inspiration: #339

do we need to do this rename as a non-breaking change

Probably not — the only public aspect about concurrency backends is that class name and the import path; methods and internal classes like TCPStream aren't used directly by our users. (Anyway we're in 0.x, so I think backwards compatibility / strict semver isn't a concern yet. :-) e.g. #482.)

@lundberg
Copy link
Contributor Author

lundberg commented Nov 8, 2019

It would be easier for all to propose this change separately.

Note taken, I'll do a separate PR for the renaming then.

FYI, a rename has already happened in the past, if it can be of inspiration: #339

Stream -> TCPStream -> SocketStream ;-)

do we need to do this rename as a non-breaking change

Probably not

Ok, great. Only raised the question since it is exposed in __init__, i.e. httpx.BaseTCPStream

@lundberg lundberg force-pushed the unix-socket-backend branch 4 times, most recently from 76968d1 to c066d3e Compare November 8, 2019 22:04
@lundberg
Copy link
Contributor Author

lundberg commented Nov 8, 2019

Rebased and in line with #517

@florimondmanca
Copy link
Member

@lundberg Sorry to have let this drag a bit. I'm just a bit wary on the attempt to DRY out the "open_x_stream" methods — that's probably one level of indirection too much. Having to pass awaitables (being paranoid — who knows if they'll actually be awaited?) also feels a bit strange. Can we keep them separate, for the sake of explicitness and clarity?

@lundberg
Copy link
Contributor Author

I'm just a bit wary on the attempt to DRY out the "open_x_stream" methods

Just to clarify, I moved the timeout and ssl logic into the generic "private" _open_stream method just to not duplicate that code, since it's the same for both the uds and tcp coroutines, in both backends. But intentionally did not add it to the "interface" ConcurrencyBackend, to not force it to be implemented like that.

But, I have no problem removing the _open_stream methods. Should I @florimondmanca?

@florimondmanca
Copy link
Member

Should I?

I think so, if that sounds OK to you. :)

@lundberg
Copy link
Contributor Author

Fixed and rebased @florimondmanca

Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're getting there! The implementation looks top notch to me. I left several final comments on the tests, but nothing critical.

tests/test_concurrency.py Outdated Show resolved Hide resolved
tests/test_concurrency.py Outdated Show resolved Hide resolved
tests/client/test_client.py Outdated Show resolved Hide resolved
tests/client/test_async_client.py Outdated Show resolved Hide resolved
httpx/client.py Outdated Show resolved Hide resolved
lundberg and others added 3 commits November 19, 2019 12:07
@lundberg
Copy link
Contributor Author

I left several final comments on the tests, but nothing critical.

Legit comments, thanks. Refactored and pushed @florimondmanca .

Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stellar 🌟 Well done and thanks @lundberg!

@florimondmanca
Copy link
Member

We may want to add some prose to the "Advanced" section of the docs, to quickly demonstrate why and how to use Unix domain sockets. A short paragraph and a code snippet should do the trick. Would you like to tackle that as well, @lundberg? Up to you whether to add it to this PR, or we can merge and add the docs via a separate PR. Let me know!

@lundberg
Copy link
Contributor Author

We may want to add some prose to the "Advanced" section of the docs....Up to you whether to add it to this PR, or we can merge and add the docs via a separate PR. Let me know!

Let's merge this. I'll happily open a separate PR for the docs section, will start working on it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants