Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced connection options. #88

Closed
tomchristie opened this issue May 13, 2020 · 13 comments
Closed

Advanced connection options. #88

tomchristie opened this issue May 13, 2020 · 13 comments

Comments

@tomchristie
Copy link
Member

Prompted by a comment from @hynek

We ought to add more __init__ controls to AsyncConnectionPool(...) and SyncConnectionPool(...) for advanced connection options, including...

It'd be useful to do a comprehensive review of...

  • What connection options are offered by asyncio?
  • What connection options are offered by stdlib's standard sync networking API?
  • What connection options are offered by trio?

Compare against controls available in urllib3, aiohttp.

@hynek
Copy link

hynek commented May 13, 2020

For reference, this is what you can do with aiohttp: https://docs.aiohttp.org/en/stable/client_reference.html#aiohttp.TCPConnector

@tomchristie
Copy link
Member Author

@hynek - Gotcha. The DNS caching isn't something I think we need to introduce, since it adds less reliable side-effects. Perhaps it's a feasible feature at some point, but let's treat it as out of scope for purposes here.

Out of the other stuff:

  • What would you consider a priority here?
  • Can we enumerate those options / namings, and do they all map to asyncio.open_connection options? Are they all also available in sync and trio backends?

@hynek
Copy link

hynek commented May 13, 2020

I personally need to be able to force IPv4 by setting the socket family to AF_INET to firewall whitelisting shenanigans. No clue how common that problem is though.

That could be abstracted away of course to be more use friendly.

@bwelling
Copy link

I'm planning to look at adding source_address support to the ConnectionPool objects.

@hynek - I believe that choosing an address family would be functionally equivalent to specifying a wildcard source address in that family; that is, using AF_INET would be the same as using a source address of 0.0.0.0. This could mean that the ConnectionPool objects internally converted a family to a source address.

@bwelling
Copy link

Adding source address support is surprisingly complicated, unfortunately. Getting the address down to the various open_tcp_stream methods isn't a problem, but the backends are more complicated.

In the sync backend, the code (currently) assumes IPv4 when opening a socket. If this restriction is kept, the only change would be to optionally call sock.bind(), but not having IPv6 support seems bad. Using socket.create_connection might be an improvement.

In the trio backend, trio.open_tcp_stream has no ability to accept source address. This could be changed to explicitly create a socket, optionally bind it, connect it, and wrap it in a trio.SocketStream. This would lose the ability to try multiple connections until one works (which the sync code doesn't do), and would also require doing something to figure out which address family to use.

The asyncio backend looks like the easiest one; asyncio.open_connection takes a local_addr parameter.

@yeraydiazdiaz
Copy link
Contributor

Hey @bwelling, thanks for looking into this.

We should think about how we'd like to expose this at the top ConnectionPool level in terms of arguments:

I believe that choosing an address family would be functionally equivalent to specifying a wildcard source address in that family

That makes sense, I think we should have both a family and a source_addr arguments, not sure if that's what you had in mind as well.

In the sync backend, the code (currently) assumes IPv4 when opening a socket. If this restriction is kept

I don't think we need to keep that restriction, seems like an oversight on our part since we tend to focus on the async side of things. I see no reason we can't use the family and/or source address in the socket call.

In the trio backend, trio.open_tcp_stream has no ability to accept source address. This could be changed to explicitly create a socket, optionally bind it, connect it, and wrap it in a trio.SocketStream. This would lose the ability to try multiple connections until one works (which the sync code doesn't do), and would also require doing something to figure out which address family to use.

I'm not familiar with Trio but I think it's probably ok to lose the happy eyeballs feature as a tradeoff for ensuring the local address constraint is met. I'm curious why Trio does not support setting local address though.

The asyncio backend looks like the easiest one; asyncio.open_connection takes a local_addr parameter.

Yep, seems straightforward 🙂

BTW, there's not a lot of documentation in terms of contributing to httpcore, but generally we work on the async side and run scripts/unasync to generate the sync version, you may have already figured that out but wanted to make sure it's clear.

@bwelling
Copy link

Hey @bwelling, thanks for looking into this.

We should think about how we'd like to expose this at the top ConnectionPool level in terms of arguments:

I believe that choosing an address family would be functionally equivalent to specifying a wildcard source address in that family

That makes sense, I think we should have both a family and a source_addr arguments, not sure if that's what you had in mind as well.

I believe so. The top layer could probably convert the family into a wildcard source address, if we wanted to simplify the parameters passed to the lower layers. Similarly, the top layer could convert an address string into an address tuple.

In the sync backend, the code (currently) assumes IPv4 when opening a socket. If this restriction is kept

I don't think we need to keep that restriction, seems like an oversight on our part since we tend to focus on the async side of things. I see no reason we can't use the family and/or source address in the socket call.

Using socket.create_connection() would be another alternative, I think.

In the trio backend, trio.open_tcp_stream has no ability to accept source address. This could be changed to explicitly create a socket, optionally bind it, connect it, and wrap it in a trio.SocketStream. This would lose the ability to try multiple connections until one works (which the sync code doesn't do), and would also require doing something to figure out which address family to use.

I'm not familiar with Trio but I think it's probably ok to lose the happy eyeballs feature as a tradeoff for ensuring the local address constraint is met. I'm curious why Trio does not support setting local address though.

There's a comment in the Trio code about supporting potentially local address in the future, so I don't think it's intentional.

Getting Trio to work will not be trivial, as the current code passes the hostname to trio, which does the DNS resolution. This would need to be changed to do the DNS resolution outside of trio, so that the calling code has an address, and can open a socket in the correct address family. That is, more or less replicate a lot of the open_tcp_stream logic. I don't like this idea, but I don't have a better one.

The asyncio backend looks like the easiest one; asyncio.open_connection takes a local_addr parameter.

Yep, seems straightforward 🙂

BTW, there's not a lot of documentation in terms of contributing to httpcore, but generally we work on the async side and run scripts/unasync to generate the sync version, you may have already figured that out but wanted to make sure it's clear.

Yeah. I noticed that after making (small, fortunately) changes to the sync side and having scripts/test overwrite them.

@tomchristie
Copy link
Member Author

tomchristie commented May 27, 2020

So, we don't necessarily need to have all options supported on all backends.

I think the first point here is just determining what additional arguments we think we'd want to accept.
It looks like it'd make sense for us to add arguments for family=0 and local_addr=None right?

That'd mean:

  • Add family=0, local_addr=None to the ConnectionPool __init__ and proxy __init__ cases.
  • Add family=0, local_addr=None to the backend open_tcp_stream API.

In the sync case

Pass family through to the socket instead of socket.AF_INET. (Should we be defaulting to socket.AF_INET or 0?)

local_addr is equivalent to source_address in sync socket code our sync backend should add .bind(local_addr) if this is not None.

In the asyncio case

Pass family and local_addr through to the asyncio.open_connection call.

In the trio case

Have the backend raise an error if either argument is used, indicating that they're not currently supported. Open an issue or draft a PR with the trio project to see about getting them included in the high-level API.

Have I got that about right? Does that all seem reasonable?

@hynek
Copy link

hynek commented May 27, 2020

Wait, httpx disables IPv6 by default? Changing that is breaking but lucky you're still in 0ver yolo land. :) So yeah, 0.

@tomchristie
Copy link
Member Author

Appears to be an oversight in the sync case, yup...

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

Not for asyncio or trio.

@tomchristie
Copy link
Member Author

Resolved in #97

@njsmith
Copy link

njsmith commented May 27, 2020

Ah yeah, supporting source interfaces in trio.open_tcp_stream is just one of those things that's been bobbing gently in the todo list for the last few years: python-trio/trio#275

There are a few boring API details to figure out but it shouldn't be too hard to add.

@tomchristie
Copy link
Member Author

Closing this off, since we haz these now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants