Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HTTP proxies and config on Client #259

Merged
merged 20 commits into from Sep 15, 2019
Merged

Implement HTTP proxies and config on Client #259

merged 20 commits into from Sep 15, 2019

Conversation

sethmlarson
Copy link
Contributor

@sethmlarson sethmlarson commented Aug 21, 2019

I've been able to successfully use this dispatcher against a public HTTP forwarding proxy and it works properly. :)

Need to still figure out how I'm going to test this.

@sethmlarson sethmlarson changed the title Initial implementation of HTTP Forwarding Proxy Implement HTTP proxies and config on Client Aug 29, 2019
@sethmlarson
Copy link
Contributor Author

Pushed a good chunk of code with no tests. Testing is going to be harder than the testing for other dispatchers. Uvcorn doesn't like either flavor of proxy request.

@florimondmanca
Copy link
Member

@sethmlarson Do we need to set any special headers such as X-Forwarded-For?

@sethmlarson
Copy link
Contributor Author

@florimondmanca No those types of headers are up to the proxy to set when using the "Forwarding" mode. We just pass along a requests that look like this:

GET http://en.wikipedia.org/wiki/Proxy_server HTTP/1.1
Proxy-Authorization: Basic encoded-credentials
Accept: text/html

to the forward proxy as if it were our target.

@tomchristie
Copy link
Member

Testing is going to be harder than the testing for other dispatchers.

We probably want a dispatcher class that either just echos back information that it saw in the request, or stores it as state that a test case can inspect. Then for testing we can do something like...

client = httpx.Client(proxies=..., dispatch=MockEchoingDispatch())
response = client.get(...)
data = response.json()
assert data["headers"] == {...}  # Whatevs

It'd be good to check if we're already doing anything like that elsewhere in the tests.

At least something along these lines anyways, so that we can do something more like unit testing the proxy dispatcher class, rather than actually making network requests via a proxy.

(Thinking about it a second time, we might need to make the HTTPConnection class configurable, so that we can plug the proxy interface into something other than an actual connection)

@sethmlarson
Copy link
Contributor Author

@encode/httpx-maintainers I'm having a hard time figuring out how to test these features. Ideally I'd like to have a proxy fixture just like server or https_server but I don't know if that's possible with uvicorn. Could someone point me in the right direction or have any thoughts?

Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a review as a way for me to better grasp what's going on here. :)

httpx/dispatch/connection_pool.py Show resolved Hide resolved
httpx/client.py Outdated Show resolved Hide resolved
httpx/dispatch/proxy_http.py Outdated Show resolved Hide resolved
httpx/middleware.py Outdated Show resolved Hide resolved
httpx/utils.py Outdated Show resolved Hide resolved
@florimondmanca
Copy link
Member

florimondmanca commented Sep 2, 2019

So, let me make sure I understand what these proxies do…

  • User builds:
client = httpx.Client(proxies={"http://example.com": HTTPProxy("http://proxy.me")})
  • User calls client.get("http://example.com/resource").
  • Client finds that we have a proxy at http://proxy.com registered for http://example.com.
  • Proxy dispatcher updates the request to replace the URL with http://proxy.me/resource, and sends the request there.
  • Client receives the response from http://proxy.me/resource.

From the user perspective, they called example.com, and from example.com's perspective, they got a request from proxy.me. I think I've got all the pieces together. 😄

@sethmlarson For testing, I don't think we need to spin up another server. We can very well proxy example.com to 127.0.0.1:8000 (where the existing uvicorn server lives) and e.g. make a request to example.com/status/200 and make sure we get 200 back as a response, right?

client = Client(proxies={"example.com": HTTPProxy("127.0.0.1:8000")})

Or am I missing something?

Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more things I've just noticed.

httpx/client.py Outdated Show resolved Hide resolved
httpx/client.py Outdated Show resolved Hide resolved
@VeNoMouS
Copy link

VeNoMouS commented Sep 2, 2019

Is this going to match requests frame work as well?

ie

requests.get('https://example.com', proxies={'https': 'http://123.45.67.8:8080'})
session = requests.session()
session.proxies = {'https': 'http://123.45.67.8:8080'}
session.get('https://example.com')

because from your examples it appears we have to define specific sites that are to be proxied?

according to @florimondmanca 's comment with

client = Client(proxies={"example.com": HTTPProxy("127.0.0.1:8000")})

@sethmlarson
Copy link
Contributor Author

@VeNoMouS It'll work identically to Requests where configuration can be pulled from:

  • (scheme)://(hostname)
  • all://(hostname)
  • (scheme)
  • all

@tomchristie
Copy link
Member

Coolio. One thing we could possibly choose to do would be to have the scheme/hostname mapping split out, rather than be built directly into the client. Eg:

  • RoutingDispatcher() - Takes a map of {scheme/hostname strings: dispatcher instance} (plus a default) and hands off the request to the correct dispatcher.
  • ProxyDispatcher() - Makes proxy requests.
  • ConnectionPool() - Makes standard requests.

Would probably be abstraction overkill, but perhaps it's worth considering?

httpx/middleware.py Outdated Show resolved Hide resolved
@florimondmanca
Copy link
Member

florimondmanca commented Sep 3, 2019

@tomchristie By intuition, I like the separation of concerns that this decomposition you suggested would allow. I think it would help keep the logic neatly isolated at each level.

@sethmlarson
Copy link
Contributor Author

Pushed some incremental changes, got TLS tunneling to work properly but then future requests failed on that tunnel. Need to investigate.

Also pushed the implementation of the RoutingDispatcher, see what you think.

@florimondmanca
Copy link
Member

florimondmanca commented Sep 4, 2019

Now that I see the code, I don’t think RoutingDispatcher adds much value — other than the bit about sharing of resources which I’m not sure I fully understand — and find the previous implementation to be clearer/more straightforward. This might be a case of abstraction overkill after all? :)

@sethmlarson
Copy link
Contributor Author

@florimondmanca @tomchristie Should I revert back to the old implementation then?

@sethmlarson sethmlarson marked this pull request as ready for review September 6, 2019 03:26
@VeNoMouS
Copy link

VeNoMouS commented Sep 8, 2019

❤️

Copy link
Contributor

@yeraydiazdiaz yeraydiazdiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Just a few comments from me.

httpx/dispatch/proxy_http.py Show resolved Hide resolved
httpx/dispatch/proxy_http.py Outdated Show resolved Hide resolved
tests/models/test_url.py Outdated Show resolved Hide resolved
tests/dispatch/utils.py Show resolved Hide resolved
"Proxy-Authorization",
build_basic_auth_header(url.username, url.password),
)
self.proxy_url = url.copy_with(authority=url.authority.rpartition("@")[2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to #328? Should we add a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is how to split userinfo from authority. Rfc3986 package doesn't provide an easy way to work with subauthority components. :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused.

If this could just read url.copy_with(authority=url.<some_named_property>) then what is <some_named_property> here? Are we missing a property that we ought to add on our URL model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be the authority minus the auth section. I don't think we're missing any component, just the underlying rfc3986 library doesn't provide the tools to remove the userinfo section cleanly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we'd be given url values such as…

Isn't this computation the same as url.host (which outputs domain.net)?

Edit: it's not, forget this. :)

Copy link
Member

@tomchristie tomchristie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loads of good work in here! I've taken a thorough look over, but I think it'll make more sense for me to re-review once it's stripped down to to just including the proxy dispatcher class, and dealing with any other aspects as a follow up.

I guess specifically it'd end up as(?):

  • Adding httpx/dispatch/proxy_http.py
  • Adding any neccessary corresponding imports, exceptions, and tests.

@sethmlarson
Copy link
Contributor Author

Okay, reverted a bunch of code so this PR is smaller. I kept the conftest change because it's wrong right now, not hard to review, and I'll need it in the future for proxies.

@qwerty32123
Copy link

Is these feature already usable? If yes how? Sorry for asking again but i need these so badly

Copy link
Contributor

@yeraydiazdiaz yeraydiazdiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One very minor comment otherwise LGTM! Nice job 💯

tests/dispatch/test_proxy_http.py Outdated Show resolved Hide resolved
Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is truly great work! A few extra comments on top of @yeraydiazdiaz's.

httpx/dispatch/proxy_http.py Show resolved Hide resolved
"Proxy-Authorization",
build_basic_auth_header(url.username, url.password),
)
self.proxy_url = url.copy_with(authority=url.authority.rpartition("@")[2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we'd be given url values such as…

Isn't this computation the same as url.host (which outputs domain.net)?

Edit: it's not, forget this. :)

Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving in advance. :)

@sethmlarson
Copy link
Contributor Author

Thanks for reviews everyone! :)

@florimondmanca in reply to your url.host comment I don't think that property includes the port? If it does then it works.

If yall want to you can integrate the changes and merge this. Otherwise I'll be home Sunday night.

@florimondmanca
Copy link
Member

@florimondmanca in reply to your url.host comment I don't think that property includes the port? If it does then it works.

Ah, no, url.host on https://domain.net:4242 only gives domain.net. I guess we're fine then.

If yall want to you can integrate the changes and merge this.

I don't think we're able to push to your fork, are we? :)

@sethmlarson
Copy link
Contributor Author

@florimondmanca Invited! I'll do future work on the main repo

@florimondmanca
Copy link
Member

florimondmanca commented Sep 13, 2019

I just tested this out using a public proxy found here:

async with httpx.HTTPProxy(proxy_url="http://94.177.241.58:8080") as proxy: 
    response = await proxy.request("GET", "https://example.com") 
    await response.read()
    print(response.text)

And it works great — I get the HTML from example.com 🎉

However, I ran into an issue if I request the proxy via HTTPS. Is this expected behavior? Should I not request the proxy via HTTPS in the first place?

async with httpx.HTTPProxy(proxy_url="https://94.177.241.58:8080") as proxy: 
    response = await proxy.request("GET", "https://example.com") 
    await response.read()
    print(response.text)
---------------------------------------------------------------------------
SSLError                                  Traceback (most recent call last)
<ipython-input-28-384373994987> in async-def-wrapper()
      3     await response.read()
      4     print(response.text)

~/Developer/python-projects/httpx/httpx/dispatch/base.py in request(self, method, url, data, params, headers, verify, cert, timeout)
     38     ) -> AsyncResponse:
     39         request = AsyncRequest(method, url, data=data, params=params, headers=headers)
---> 40         return await self.send(request, verify=verify, cert=cert, timeout=timeout)
     41 
     42     async def send(

~/Developer/python-projects/httpx/httpx/dispatch/proxy_http.py in send(self, request, verify, cert, timeout)
    238 
    239         return await super().send(
--> 240             request=request, verify=verify, cert=cert, timeout=timeout
    241         )
    242 

~/Developer/python-projects/httpx/httpx/dispatch/connection_pool.py in send(self, request, verify, cert, timeout)
    116         timeout: TimeoutTypes = None,
    117     ) -> AsyncResponse:
--> 118         connection = await self.acquire_connection(origin=request.url.origin)
    119         try:
    120             response = await connection.send(

~/Developer/python-projects/httpx/httpx/dispatch/proxy_http.py in acquire_connection(self, origin)
     87                 f"tunnel_connection proxy_url={self.proxy_url!r} origin={origin!r}"
     88             )
---> 89             return await self.tunnel_connection(origin)
     90 
     91     async def tunnel_connection(self, origin: Origin) -> HTTPConnection:

~/Developer/python-projects/httpx/httpx/dispatch/proxy_http.py in tunnel_connection(self, origin)
     96 
     97         if connection is None:
---> 98             connection = await self.request_tunnel_proxy_connection(origin)
     99 
    100             # After we receive the 2XX response from the proxy that our

~/Developer/python-projects/httpx/httpx/dispatch/proxy_http.py in request_tunnel_proxy_connection(self, origin)
    135 
    136         # See if our tunnel has been opened successfully
--> 137         proxy_response = await connection.send(proxy_request)
    138         logger.debug(
    139             f"tunnel_response "

~/Developer/python-projects/httpx/httpx/dispatch/connection.py in send(self, request, verify, cert, timeout)
     57     ) -> AsyncResponse:
     58         if self.h11_connection is None and self.h2_connection is None:
---> 59             await self.connect(verify=verify, cert=cert, timeout=timeout)
     60 
     61         if self.h2_connection is not None:

~/Developer/python-projects/httpx/httpx/dispatch/connection.py in connect(self, verify, cert, timeout)
     86 
     87         logger.debug(f"start_connect host={host!r} port={port!r} timeout={timeout!r}")
---> 88         stream = await self.backend.connect(host, port, ssl_context, timeout)
     89         http_version = stream.get_http_version()
     90         logger.debug(f"connected http_version={http_version!r}")

~/Developer/python-projects/httpx/httpx/concurrency/asyncio.py in connect(self, hostname, port, ssl_context, timeout)
    187             stream_reader, stream_writer = await asyncio.wait_for(  # type: ignore
    188                 asyncio.open_connection(hostname, port, ssl=ssl_context),
--> 189                 timeout.connect_timeout,
    190             )
    191         except asyncio.TimeoutError:

~/.pyenv/versions/3.7.3/lib/python3.7/asyncio/tasks.py in wait_for(fut, timeout, loop)
    414 
    415         if fut.done():
--> 416             return fut.result()
    417         else:
    418             fut.remove_done_callback(cb)

~/.pyenv/versions/3.7.3/lib/python3.7/asyncio/streams.py in open_connection(host, port, loop, limit, **kwds)
     75     protocol = StreamReaderProtocol(reader, loop=loop)
     76     transport, _ = await loop.create_connection(
---> 77         lambda: protocol, host, port, **kwds)
     78     writer = StreamWriter(transport, protocol, reader, loop)
     79     return reader, writer

~/.pyenv/versions/3.7.3/lib/python3.7/asyncio/base_events.py in create_connection(self, protocol_factory, host, port, ssl, family, proto, flags, sock, local_addr, server_hostname, ssl_handshake_timeout)
    984         transport, protocol = await self._create_connection_transport(
    985             sock, protocol_factory, ssl, server_hostname,
--> 986             ssl_handshake_timeout=ssl_handshake_timeout)
    987         if self._debug:
    988             # Get the socket from the transport because SSL transport closes

~/.pyenv/versions/3.7.3/lib/python3.7/asyncio/base_events.py in _create_connection_transport(self, sock, protocol_factory, ssl, server_hostname, server_side, ssl_handshake_timeout)
   1012 
   1013         try:
-> 1014             await waiter
   1015         except:
   1016             transport.close()

~/.pyenv/versions/3.7.3/lib/python3.7/asyncio/sslproto.py in data_received(self, data)
    524 
    525         try:
--> 526             ssldata, appdata = self._sslpipe.feed_ssldata(data)
    527         except Exception as e:
    528             self._fatal_error(e, 'SSL error in data received')

~/.pyenv/versions/3.7.3/lib/python3.7/asyncio/sslproto.py in feed_ssldata(self, data, only_handshake)
    187             if self._state == _DO_HANDSHAKE:
    188                 # Call do_handshake() until it doesn't raise anymore.
--> 189                 self._sslobj.do_handshake()
    190                 self._state = _WRAPPED
    191                 if self._handshake_cb:

~/.pyenv/versions/3.7.3/lib/python3.7/ssl.py in do_handshake(self)
    761     def do_handshake(self):
    762         """Start the SSL/TLS handshake."""
--> 763         self._sslobj.do_handshake()
    764 
    765     def unwrap(self):

SSLError: [SSL: UNKNOWN_PROTOCOL] unknown protocol (_ssl.c:1056)

@sethmlarson
Copy link
Contributor Author

You might have to use http:// for your proxy URL. HTTPS still will work after the TCP tunnel is established, I'm thinking that error message for unknown protocol is coming from getting unexpected bytes during the handshake stage, might be HTTP body!

@sethmlarson
Copy link
Contributor Author

sethmlarson commented Sep 14, 2019

Also this is a good time where context can be given to TLS issues within our library. Unknown protocol is useless but "Hey we got bytes in the handshake we don't understand" ans a lot more actionable.

@x0day
Copy link

x0day commented Sep 14, 2019

@sethmlarson

can we use this feature in same api of requests?

like this,

proxies = {
    'https': 'http://94.177.241.58:8080',
    'http': 'http://94.177.241.58:8080'
}

client = httpx.AsyncClient()

resp = await client.get('https://httpbin.org/headers', proxies=proxies)
print(resp.json())

@sethmlarson
Copy link
Contributor Author

@x0day Not quite, this code only adds a dispatcher and none of the choice required to configure proxies on the client. That PR will come right after this one lands as the work is already complete, just needs to be reviewed.

@sethmlarson sethmlarson merged commit 57c27ca into encode:master Sep 15, 2019
@sethmlarson sethmlarson deleted the proxy-http-forwarding branch September 15, 2019 15:47
@sethmlarson
Copy link
Contributor Author

Thanks to all reviewers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants