Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tunnel proxy: HTTP requests only #57

Merged
merged 18 commits into from Apr 29, 2020

Conversation

yeraydiazdiaz
Copy link
Contributor

Fixes #54 split from #55 excluding support for tunneling HTTPS requests. Changes:

  • First establish an HTTP connection to our proxy since CONNECT is an HTTP method handled by h11.
  • Once successful discard the connection but prevent closing the socket.
  • Allow passing a socket to the AsyncHTTPConnection and create a new h11 connection to the target.

@yeraydiazdiaz yeraydiazdiaz requested a review from a team April 12, 2020 09:51
@yeraydiazdiaz yeraydiazdiaz changed the title Fix tunnel proxy Fix tunnel proxy: HTTP requests only Apr 12, 2020
httpcore/_async/http11.py Show resolved Hide resolved
httpcore/_async/http_proxy.py Outdated Show resolved Hide resolved
httpcore/_async/http_proxy.py Outdated Show resolved Hide resolved
Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass… :-)

httpcore/_async/http_proxy.py Outdated Show resolved Hide resolved
httpcore/_async/http_proxy.py Outdated Show resolved Hide resolved
httpcore/_async/http_proxy.py Outdated Show resolved Hide resolved
@florimondmanca
Copy link
Member

@yeraydiazdiaz I'm actually not sure I understand the original bug in #54. My hypothesis is that closing the proxy pool calls .aclose() on the connection which we didn't add using .add_to_pool(), so the semaphore was not acquired. But I'm not quite sure. What's your diagnosis?

@florimondmanca
Copy link
Member

@yeraydiazdiaz Okay, after playing with this myself, I ended up to the same conclusion: because h11.PAUSED moves the connection to a permanently "can't be used anymore" state (and this is now also enforced by our own state machine, which we didn't have before in HTTPX), we need to setup a new connection once CONNECT has succeeded, but without closing the socket. The socket parameter seems to be the way to go.

@yeraydiazdiaz
Copy link
Contributor Author

@yeraydiazdiaz I'm actually not sure I understand the original bug in #54. My hypothesis is that closing the proxy pool calls .aclose() on the connection which we didn't add using .add_to_pool(), so the semaphore was not acquired. But I'm not quite sure. What's your diagnosis?

So there were several problems before. The first one was an assertion in the AsyncHTTPConnection preventing sending requests to URLs different from the originally passed origin, the connection was created with the target origin but the CONNECT requests was attempted to the proxy origin.

Working around that I also found that consuming the response body closed the connection and the socket. Eventually arriving at we discussed above.

@florimondmanca
Copy link
Member

florimondmanca commented Apr 12, 2020

Eventually arriving at we discussed above.

Yep. Followed the same series of thoughts. :-)

@florimondmanca
Copy link
Member

florimondmanca commented Apr 12, 2020

@yeraydiazdiaz Are you able to proxy an HTTP request via Squid/mitmproxy with the current state of this PR?

Right now on Squid I get a 503 on the CONNECT request, not sure if it's because we're missing something here or if I have something else to setup on Squid…

[(b'server', b'squid/3.5.27'),
 (b'mime-version', b'1.0'),
 (b'date', b'Sun, 12 Apr 2020 17:03:08 GMT'),
 (b'content-type', b'text/html;charset=utf-8'),
 (b'content-length', b'3481'),
 (b'x-squid-error', b'ERR_CONNECT_FAIL 99'),
 (b'vary', b'Accept-Language'),
 (b'content-language', b'en')]
Traceback (most recent call last):
  File "debug/client.py", line 16, in <module>
    asyncio.run(main())
  File "/Users/florimond/.pyenv/versions/3.8.2/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/Users/florimond/.pyenv/versions/3.8.2/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "debug/client.py", line 10, in main
    response = await http.request(b"GET", url)
  File "/Users/florimond/Developer/python-projects/httpcore/httpcore/_async/http_proxy.py", line 83, in request
    return await self._tunnel_request(
  File "/Users/florimond/Developer/python-projects/httpcore/httpcore/_async/http_proxy.py", line 175, in _tunnel_request
    raise ProxyError(msg)
httpcore._exceptions.ProxyError: 503 Service Unavailable

Simplified client.py I'm using:

import asyncio
import httpcore

proxy_origin = (b"http", b"localhost", 3128)
url = (b"http", b"localhost", 8000, b"/")


async def main() -> None:
    async with httpcore.AsyncHTTPProxy(proxy_origin, proxy_mode="TUNNEL_ONLY") as http:
        response = await http.request(b"GET", url)
        print(response)
        status_code = response[1]
        assert status_code == 200, status_code


asyncio.run(main())

Uvicorn server on localhost:8000:

from starlette.responses import PlainTextResponse

app = PlainTextResponse("Hello, world!")

@yeraydiazdiaz
Copy link
Contributor Author

Your example using Starlette does not work using Squid, but it does when using mitmproxy without any options.

import asyncio
import httpcore

proxy_origin = (b"http", b"localhost", 8080)
url = (b"http", b"localhost", 8000, b"/")


async def main() -> None:
    async with httpcore.AsyncHTTPProxy(proxy_origin, proxy_mode="TUNNEL_ONLY") as http:
        response = await http.request(b"GET", url, headers=[(b"host", b"localhost")])
        print(response)
        status_code = response[1]
        assert status_code == 200, status_code


asyncio.run(main())

It also works fine when fetching example.org.

Squid doesn't really log a lot with our current configuration so it's hard to say what's wrong there.

Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second pass, bunch of nits though, I think we're on the right track :-) I was able to make it work with mitmproxy too!

About proxy Host headers: actually, examples in RFC 7321 suggest that CONNECT should also contain a Host header, so I think this is a legit constraint to put on users, and definitely not an h11 edge case we should hide (because it's not an edge case).

httpcore/_async/connection.py Outdated Show resolved Hide resolved
httpcore/_async/http_proxy.py Outdated Show resolved Hide resolved
httpcore/_async/http_proxy.py Outdated Show resolved Hide resolved
httpcore/_async/http_proxy.py Outdated Show resolved Hide resolved
httpcore/_async/http_proxy.py Outdated Show resolved Hide resolved
httpcore/_backends/asyncio.py Show resolved Hide resolved
httpcore/_async/http_proxy.py Outdated Show resolved Hide resolved
httpcore/_async/connection.py Outdated Show resolved Hide resolved
httpcore/_async/connection.py Show resolved Hide resolved
httpcore/_async/connection.py Show resolved Hide resolved
@florimondmanca
Copy link
Member

florimondmanca commented Apr 13, 2020

@yeraydiazdiaz Sorry if this is taking longer to review than you expected. 😅

This is great! stuff, and I think we're almost there.

Yes, currently proxying is broken in HTTPX, and it's a regression, but that's alright. We haven't released anything yet, and previous versions are still mostly working, so no pressure on that side.

This is why I think we should take the time to dissolve any fuzzy areas, make sure we understand everything that's going on (I'm discovering more and more of the new subtle aspects of HTTPCore as we're going through this), and that we enforce the "raw request in, raw response out"/tight-scope philosophy of HTTPCore, i.e. keeping the boundary between it and client libraries as clearly defined as possible.

Hope you're okay with this — I know long reviews can be a bit draining, so feel free to push back when I'm being too nitpicky as I know I can sometimes be. :-) Thanks again for tackling this 💖

@yeraydiazdiaz
Copy link
Contributor Author

Thank you for reviewing! 🌟

It's absolutely fine, I agree we should take time to get things right, we've only just started to make actual use of HTTPCore so I expected some "where should this live" types of discussions, we're bound to have more of these, so the sooner we can define these lines the better 🙂

ssl_context: SSLContext = None,
max_connections: int = None,
max_keepalive: int = None,
keepalive_expiry: float = None,
http2: bool = False,
):
assert proxy_mode in ("DEFAULT", "FORWARD_ONLY", "TUNNEL_ONLY")
assert proxy_mode in ("FORWARD", "TUNNEL")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the _ONLY suffix here which does not match the values in HTTPX but more semantic sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to leave these the same, at least within the scope of this pull request?
It'd be helpful if we're not introducing breaking API changes from httpx 0.12 -> 0.13 wherever possible. (excepting for example, the UDS support)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this change stems from a longer conversation with @florimondmanca. TL;DR is where do we handle the logic behind the DEFAULT mode?

I opted for removing it completely as suggested by Florimond and have httpx call into httpcore with a mode explicitly.

I think this way we reduces complexity on httpcore but I don't mind keeping the three modes and the logic behind here if you prefer it. What do you think?

@yeraydiazdiaz
Copy link
Contributor Author

@florimondmanca could you have another look at this when you have a second? Thanks!

@florimondmanca
Copy link
Member

Will try to do soon :)

@yeraydiazdiaz
Copy link
Contributor Author

I reintroduced the DEFAULT mode and *_ONLY proxy modes. The default mode is now an alias for TUNNEL_ONLY for all requests, as opposed to only on HTTPS ones. But frankly I'm not really sure how to proceed here and I'm a bit confused by the feedback in this PR.

At the moment httpx will fail to make a request via a proxy due to missing/conflicting headers.

In #59 and here we discussed httpcore should have a small as possible API and httpx should be in charge of setting the parameters are required. We discussed in this PR that it doesn't make much sense to have a DEFAULT mode here or the logic to add the proxy headers as required. Of course that means changing httpx to introduce the necessary changes there but I think it's worth it because we're moving to a smaller API surface in httpcore.

What are the next steps for this? @florimondmanca @tomchristie

@tomchristie
Copy link
Member

Okay, so I think I understand the sticking point better here now. In order to keep the conversation more tightly scoped, let me start by making one inline suggestion, and see if we can move forward from there...

httpcore/_async/http_proxy.py Outdated Show resolved Hide resolved
httpcore/_async/http_proxy.py Outdated Show resolved Hide resolved
Copy link
Member

@tomchristie tomchristie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks so much for your time on it.

🌟👍

@yeraydiazdiaz yeraydiazdiaz merged commit b65eaef into encode:master Apr 29, 2020
@yeraydiazdiaz
Copy link
Contributor Author

🎉 Thanks @tomchristie and @florimondmanca! 🚀

@yeraydiazdiaz yeraydiazdiaz deleted the tunnel-proxy-fix branch April 29, 2020 14:47
This was referenced Apr 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TUNNEL_ONLY proxy mode fails
4 participants