New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HTTP proxies and config on Client #259
Implement HTTP proxies and config on Client #259
Conversation
Pushed a good chunk of code with no tests. Testing is going to be harder than the testing for other dispatchers. Uvcorn doesn't like either flavor of proxy request. |
@sethmlarson Do we need to set any special headers such as X-Forwarded-For? |
@florimondmanca No those types of headers are up to the proxy to set when using the "Forwarding" mode. We just pass along a requests that look like this:
to the forward proxy as if it were our target. |
We probably want a dispatcher class that either just echos back information that it saw in the request, or stores it as state that a test case can inspect. Then for testing we can do something like... client = httpx.Client(proxies=..., dispatch=MockEchoingDispatch())
response = client.get(...)
data = response.json()
assert data["headers"] == {...} # Whatevs It'd be good to check if we're already doing anything like that elsewhere in the tests. At least something along these lines anyways, so that we can do something more like unit testing the proxy dispatcher class, rather than actually making network requests via a proxy. (Thinking about it a second time, we might need to make the HTTPConnection class configurable, so that we can plug the proxy interface into something other than an actual connection) |
@encode/httpx-maintainers I'm having a hard time figuring out how to test these features. Ideally I'd like to have a |
…y-http-forwarding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a review as a way for me to better grasp what's going on here. :)
So, let me make sure I understand what these proxies do…
client = httpx.Client(proxies={"http://example.com": HTTPProxy("http://proxy.me")})
From the user perspective, they called @sethmlarson For testing, I don't think we need to spin up another server. We can very well proxy client = Client(proxies={"example.com": HTTPProxy("127.0.0.1:8000")}) Or am I missing something? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more things I've just noticed.
Is this going to match requests frame work as well? ie requests.get('https://example.com', proxies={'https': 'http://123.45.67.8:8080'}) session = requests.session()
session.proxies = {'https': 'http://123.45.67.8:8080'}
session.get('https://example.com') because from your examples it appears we have to define specific sites that are to be proxied? according to @florimondmanca 's comment with client = Client(proxies={"example.com": HTTPProxy("127.0.0.1:8000")}) |
@VeNoMouS It'll work identically to Requests where configuration can be pulled from:
|
Coolio. One thing we could possibly choose to do would be to have the scheme/hostname mapping split out, rather than be built directly into the client. Eg:
Would probably be abstraction overkill, but perhaps it's worth considering? |
@tomchristie By intuition, I like the separation of concerns that this decomposition you suggested would allow. I think it would help keep the logic neatly isolated at each level. |
Pushed some incremental changes, got TLS tunneling to work properly but then future requests failed on that tunnel. Need to investigate. Also pushed the implementation of the |
Now that I see the code, I don’t think RoutingDispatcher adds much value — other than the bit about sharing of resources which I’m not sure I fully understand — and find the previous implementation to be clearer/more straightforward. This might be a case of abstraction overkill after all? :) |
@florimondmanca @tomchristie Should I revert back to the old implementation then? |
❤️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Just a few comments from me.
httpx/dispatch/proxy_http.py
Outdated
"Proxy-Authorization", | ||
build_basic_auth_header(url.username, url.password), | ||
) | ||
self.proxy_url = url.copy_with(authority=url.authority.rpartition("@")[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to #328? Should we add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is how to split userinfo from authority. Rfc3986 package doesn't provide an easy way to work with subauthority components. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused.
If this could just read url.copy_with(authority=url.<some_named_property>)
then what is <some_named_property>
here? Are we missing a property that we ought to add on our URL
model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be the authority minus the auth section. I don't think we're missing any component, just the underlying rfc3986 library doesn't provide the tools to remove the userinfo section cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we'd be given url
values such as…
- https://domain.net
- https://username:password@domain.net
- https://username@gmail.com:password@domain.net (once fixed)
Isn't this computation the same as url.host
(which outputs domain.net
)?
Edit: it's not, forget this. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loads of good work in here! I've taken a thorough look over, but I think it'll make more sense for me to re-review once it's stripped down to to just including the proxy dispatcher class, and dealing with any other aspects as a follow up.
I guess specifically it'd end up as(?):
- Adding
httpx/dispatch/proxy_http.py
- Adding any neccessary corresponding imports, exceptions, and tests.
Okay, reverted a bunch of code so this PR is smaller. I kept the conftest change because it's wrong right now, not hard to review, and I'll need it in the future for proxies. |
Is these feature already usable? If yes how? Sorry for asking again but i need these so badly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One very minor comment otherwise LGTM! Nice job 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is truly great work! A few extra comments on top of @yeraydiazdiaz's.
httpx/dispatch/proxy_http.py
Outdated
"Proxy-Authorization", | ||
build_basic_auth_header(url.username, url.password), | ||
) | ||
self.proxy_url = url.copy_with(authority=url.authority.rpartition("@")[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we'd be given url
values such as…
- https://domain.net
- https://username:password@domain.net
- https://username@gmail.com:password@domain.net (once fixed)
Isn't this computation the same as url.host
(which outputs domain.net
)?
Edit: it's not, forget this. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving in advance. :)
Thanks for reviews everyone! :) @florimondmanca in reply to your url.host comment I don't think that property includes the port? If it does then it works. If yall want to you can integrate the changes and merge this. Otherwise I'll be home Sunday night. |
Ah, no,
I don't think we're able to push to your fork, are we? :) |
@florimondmanca Invited! I'll do future work on the main repo |
I just tested this out using a public proxy found here: async with httpx.HTTPProxy(proxy_url="http://94.177.241.58:8080") as proxy:
response = await proxy.request("GET", "https://example.com")
await response.read()
print(response.text) And it works great — I get the HTML from example.com 🎉 However, I ran into an issue if I request the proxy via HTTPS. Is this expected behavior? Should I not request the proxy via HTTPS in the first place? async with httpx.HTTPProxy(proxy_url="https://94.177.241.58:8080") as proxy:
response = await proxy.request("GET", "https://example.com")
await response.read()
print(response.text) ---------------------------------------------------------------------------
SSLError Traceback (most recent call last)
<ipython-input-28-384373994987> in async-def-wrapper()
3 await response.read()
4 print(response.text)
~/Developer/python-projects/httpx/httpx/dispatch/base.py in request(self, method, url, data, params, headers, verify, cert, timeout)
38 ) -> AsyncResponse:
39 request = AsyncRequest(method, url, data=data, params=params, headers=headers)
---> 40 return await self.send(request, verify=verify, cert=cert, timeout=timeout)
41
42 async def send(
~/Developer/python-projects/httpx/httpx/dispatch/proxy_http.py in send(self, request, verify, cert, timeout)
238
239 return await super().send(
--> 240 request=request, verify=verify, cert=cert, timeout=timeout
241 )
242
~/Developer/python-projects/httpx/httpx/dispatch/connection_pool.py in send(self, request, verify, cert, timeout)
116 timeout: TimeoutTypes = None,
117 ) -> AsyncResponse:
--> 118 connection = await self.acquire_connection(origin=request.url.origin)
119 try:
120 response = await connection.send(
~/Developer/python-projects/httpx/httpx/dispatch/proxy_http.py in acquire_connection(self, origin)
87 f"tunnel_connection proxy_url={self.proxy_url!r} origin={origin!r}"
88 )
---> 89 return await self.tunnel_connection(origin)
90
91 async def tunnel_connection(self, origin: Origin) -> HTTPConnection:
~/Developer/python-projects/httpx/httpx/dispatch/proxy_http.py in tunnel_connection(self, origin)
96
97 if connection is None:
---> 98 connection = await self.request_tunnel_proxy_connection(origin)
99
100 # After we receive the 2XX response from the proxy that our
~/Developer/python-projects/httpx/httpx/dispatch/proxy_http.py in request_tunnel_proxy_connection(self, origin)
135
136 # See if our tunnel has been opened successfully
--> 137 proxy_response = await connection.send(proxy_request)
138 logger.debug(
139 f"tunnel_response "
~/Developer/python-projects/httpx/httpx/dispatch/connection.py in send(self, request, verify, cert, timeout)
57 ) -> AsyncResponse:
58 if self.h11_connection is None and self.h2_connection is None:
---> 59 await self.connect(verify=verify, cert=cert, timeout=timeout)
60
61 if self.h2_connection is not None:
~/Developer/python-projects/httpx/httpx/dispatch/connection.py in connect(self, verify, cert, timeout)
86
87 logger.debug(f"start_connect host={host!r} port={port!r} timeout={timeout!r}")
---> 88 stream = await self.backend.connect(host, port, ssl_context, timeout)
89 http_version = stream.get_http_version()
90 logger.debug(f"connected http_version={http_version!r}")
~/Developer/python-projects/httpx/httpx/concurrency/asyncio.py in connect(self, hostname, port, ssl_context, timeout)
187 stream_reader, stream_writer = await asyncio.wait_for( # type: ignore
188 asyncio.open_connection(hostname, port, ssl=ssl_context),
--> 189 timeout.connect_timeout,
190 )
191 except asyncio.TimeoutError:
~/.pyenv/versions/3.7.3/lib/python3.7/asyncio/tasks.py in wait_for(fut, timeout, loop)
414
415 if fut.done():
--> 416 return fut.result()
417 else:
418 fut.remove_done_callback(cb)
~/.pyenv/versions/3.7.3/lib/python3.7/asyncio/streams.py in open_connection(host, port, loop, limit, **kwds)
75 protocol = StreamReaderProtocol(reader, loop=loop)
76 transport, _ = await loop.create_connection(
---> 77 lambda: protocol, host, port, **kwds)
78 writer = StreamWriter(transport, protocol, reader, loop)
79 return reader, writer
~/.pyenv/versions/3.7.3/lib/python3.7/asyncio/base_events.py in create_connection(self, protocol_factory, host, port, ssl, family, proto, flags, sock, local_addr, server_hostname, ssl_handshake_timeout)
984 transport, protocol = await self._create_connection_transport(
985 sock, protocol_factory, ssl, server_hostname,
--> 986 ssl_handshake_timeout=ssl_handshake_timeout)
987 if self._debug:
988 # Get the socket from the transport because SSL transport closes
~/.pyenv/versions/3.7.3/lib/python3.7/asyncio/base_events.py in _create_connection_transport(self, sock, protocol_factory, ssl, server_hostname, server_side, ssl_handshake_timeout)
1012
1013 try:
-> 1014 await waiter
1015 except:
1016 transport.close()
~/.pyenv/versions/3.7.3/lib/python3.7/asyncio/sslproto.py in data_received(self, data)
524
525 try:
--> 526 ssldata, appdata = self._sslpipe.feed_ssldata(data)
527 except Exception as e:
528 self._fatal_error(e, 'SSL error in data received')
~/.pyenv/versions/3.7.3/lib/python3.7/asyncio/sslproto.py in feed_ssldata(self, data, only_handshake)
187 if self._state == _DO_HANDSHAKE:
188 # Call do_handshake() until it doesn't raise anymore.
--> 189 self._sslobj.do_handshake()
190 self._state = _WRAPPED
191 if self._handshake_cb:
~/.pyenv/versions/3.7.3/lib/python3.7/ssl.py in do_handshake(self)
761 def do_handshake(self):
762 """Start the SSL/TLS handshake."""
--> 763 self._sslobj.do_handshake()
764
765 def unwrap(self):
SSLError: [SSL: UNKNOWN_PROTOCOL] unknown protocol (_ssl.c:1056) |
You might have to use |
Also this is a good time where context can be given to TLS issues within our library. Unknown protocol is useless but "Hey we got bytes in the handshake we don't understand" ans a lot more actionable. |
can we use this feature in same api of requests? like this, proxies = {
'https': 'http://94.177.241.58:8080',
'http': 'http://94.177.241.58:8080'
}
client = httpx.AsyncClient()
resp = await client.get('https://httpbin.org/headers', proxies=proxies)
print(resp.json()) |
@x0day Not quite, this code only adds a dispatcher and none of the choice required to configure proxies on the client. That PR will come right after this one lands as the work is already complete, just needs to be reviewed. |
Thanks to all reviewers! |
I've been able to successfully use this dispatcher against a public HTTP forwarding proxy and it works properly. :)
Need to still figure out how I'm going to test this.