Skip to content
This repository has been archived by the owner on Jan 13, 2021. It is now read-only.

Implement HTTPS proxy support #322

Merged
merged 2 commits into from Jun 21, 2017

Conversation

KostyaEsmukov
Copy link
Contributor

@KostyaEsmukov KostyaEsmukov commented May 6, 2017

Related issue: #304

This is a very work in progress. It seems to work, but I would like to get your feedback whether I'm on a right track or not.

The main concern I have is about httplib: is it OK to use it for just sending CONNECT method to a proxy? Though HTTP/2 standard defines CONNECT method as well, I guess it is an overkill to use it for just getting a raw socket from a proxy.

TODO:

  • Requests adapter: respect custom request ports when choosing a proxy
  • Validate proxy support on h2 w/o tls (h2c) (should we even think about it? I'm not sure)
  • raise appropriate IO exceptions to distinguish between connection failure to proxy and a general IO exception (to be done in a follow-up PR)
  • Implement proxy authorization support
  • Strip Proxy-* headers from response on HTTP proxying (seems unneeded. requests doesn't do it)

test/server.py Outdated
@@ -78,7 +80,7 @@ def _start_server(self):
sock.close()

def _wrap_socket(self, sock):
raise NotImplementedError()
return self.cxt.wrap_socket(sock, server_side=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK? What was the purpose of this method? I couldn't find any references to it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I have no idea.

sock.send(b'HTTP/1.0 200 Connection established\r\n\r\n')

# todo make this less ugly?
sock = self.server_thread._wrap_socket(sock)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel really bad about this. There must be a better way to wrap server socket. Any ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, TBH. Maybe promote _wrap_socket to wrap_socket, or even pull that wrap_socket method off the server thread and make it available as a test helper function instead.

Copy link
Member

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the first note is: no, don't use httplib. =) The codebase has a HTTP/1.1 stack, you should use that. 😄

@@ -71,3 +71,10 @@ class MissingCertFile(Exception):
The certificate file could not be found.
"""
pass


class ProxyError(Exception):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that it should extend a more generic ConnectionError exception, but there's no such common exception. But there's one in hyper.http20.exceptions.ConnectionError. Maybe we could introduce it here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to introduce a common ConnectionError that the HTTP/2 one subclasses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a hyper.common.exceptions.SocketError, which is not referenced anywhere in the project. It's very similar to the ConnectionError I've just introduced, I guess.

I suggest to leave them both here for now, but I would like to work on exceptions structure in another PR after this one

@@ -18,6 +18,7 @@
to_host_port_tuple, to_native_string, to_bytestring, HTTPVersion
)
from ..compat import unicode, bytes
from ..http11.connection import HTTP11Connection
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to explicitly depend on HTTP11Connection here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, totally fine.

Copy link
Member

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, so this looks like a fairly good start, well done! I have left a bunch of code review feedback, but with a largish patch like this that's always going to happen, especially when you are doing the right thing and putting it forward for feedback early. Well done so far, I'm looking forward to seeing how this moves forward!

hyper/contrib.py Outdated
"""
Gets an appropriate HTTP/2 connection object based on
host/port/scheme/cert tuples.
"""
# TODO take custom request ports into account
proxy = select_proxy("%s://%s" % (scheme, host), proxies)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably do this at a higher scope so that we can pass request.url instead of rebuilding it ourselves. That's what the Requests version of this code does.

@@ -137,6 +144,33 @@ def connect(self):

return

@classmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is weird, I know, but I kind of hate classmethods for this use case. IMO, classmethods are for custom constructors, and nothing else. This should just be a function that uses HTTP11Connection directly, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I've made it a classmethod is to ensure that the HTTP11Connection instance, created to connect to a proxy, is not usable/alterable anymore, because its socket is moved to an original connection instance, which picks it up.

Maybe a staticmethod will be better? I'm afraid that an instance method doesn't incapsulate this fragile socket transition properly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been clearer. I wasn't suggesting an instance method: I was suggesting a function. Don't hang it off a class at all, it doesn't need one.

# http CONNECT proxy
sock = HTTP11Connection._create_tunnel(
self.proxy_host, self.proxy_port, self.host, self.port)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is probably simpler as two elif branches that just consist of a one-line call to socket.create_connection.

"""
conn = cls(proxy_host, proxy_port)
conn.connect()
conn._send_headers(to_bytestring('CONNECT'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a constant, this can just be b'CONNECT'. We only need to_bytestring to allow string formatting operations.

while response is None:
# 'encourage' the socket to receive data.
conn._sock.fill()
response = conn.parser.parse_response(conn._sock.buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we can't just actually use the regular request/response logic, and just not preload any data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the reason is that HTTP11Response asserts on presence of some headers like Connection/Content-length which might be absent in the proxy response to a CONNECT.

Maybe I should add a comment here explaining this. Or should we adapt the HTTP11Response to accept responses without these headers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably adapt them to accept responses without those headers.

def socket_handler(listener):
sock = listener.accept()[0]

# Read the CONNECT reader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/reader/header/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aww, what a shameful copy-pasted typo 😃

BTW, CONNECT is a method, according to RFC 7231 HTTP/1.1
I will look through my comments again, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a method, yes, but in this case we're reading the whole request header. 😉

sock.send(b'HTTP/1.0 200 Connection established\r\n\r\n')

# todo make this less ugly?
sock = self.server_thread._wrap_socket(sock)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, TBH. Maybe promote _wrap_socket to wrap_socket, or even pull that wrap_socket method off the server thread and make it available as a test helper function instead.

def socket_handler(listener):
sock = listener.accept()[0]

# Read the CONNECT reader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/reader/header/

def socket_handler(listener):
sock = listener.accept()[0]

# Read the CONNECT reader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/reader/helper/

def socket_handler(listener):
sock = listener.accept()[0]

# Read the CONNECT reader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/reader/header/

KostyaEsmukov added a commit to KostyaEsmukov/hyper that referenced this pull request May 8, 2017
KostyaEsmukov added a commit to KostyaEsmukov/hyper that referenced this pull request May 8, 2017
python-hyper#322 (comment)

also promote `test.server.SocketServerThread._wrap_socket` to
`test.server.SocketServerThread.wrap_socket`
python-hyper#322 (comment)
KostyaEsmukov added a commit to KostyaEsmukov/hyper that referenced this pull request May 8, 2017
@KostyaEsmukov
Copy link
Contributor Author

Thank you for your feedback, I appreciate it!
I guess I have addressed all of the notes.
I'm going to implement the last two points of my TODO today and then we may start polishing it for a final state 😊

@Lukasa
Copy link
Member

Lukasa commented May 8, 2017

Cool, in that case I'll hold off reviewing until you deal with your last few TODOs. Try to reduce the amount of back-and-forth we do. 😄

@KostyaEsmukov KostyaEsmukov changed the title [WIP] Implement HTTPS proxy support Implement HTTPS proxy support May 9, 2017
@KostyaEsmukov
Copy link
Contributor Author

Eww, this diff is already big! I guess I've done with this feature. There's more I want to do in hyper, but lets hold it for another PR(s).
As for now proxy support is complete IMO. Waiting for a review 😊

hyper/contrib.py Outdated
@@ -43,28 +48,40 @@ def get_connection(self, host, port, scheme, cert=None):
if cert is not None:
ssl_context = init_context(cert=cert)

connection_key = (host, port, scheme, cert, proxy)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling get_connection with the same arguments but different proxy_headers would reuse already created connection, so the new proxy_headers will be ignored.

Our proxy_headers are generated from proxy url, so we should be fine, but this is not strictly guaranteed in this code.

Maybe we should move in the select_proxy logic here, thus generating proxy_headers inside this function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to just make proxy_headers part of the tuple?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, dict is unhashable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but frozenset of tuple isn't. You could write a transform that does proxy_headers_key = frozenset((k, v) for k, v in proxy_headers.items()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know what do you think about bbf59ff

I don't like storing the whole dict as a key. It seems to bloat the key without much benefit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it resolves the core issue. If you are setting headers for the proxy (e.g. proxy-authorization) then we absolutely must respect that. We need to keep track of them. In 99%+ of cases we will never store anything other than one set of proxy headers and a None value, which is pretty minimal IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, OK, will do, but still, I think storing proxy headers in this key is redundant, because they are generated from a proxy which is already a part of this key

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They needn't be. Requests has a proxy_headers method expressly to allow it to be overridden, in case users want to provide other forms of proxy header handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that makes sense!
Should I revert that commit, or just add the dict to the key?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point we have enough commits that you don't need to worry about reverting: we'll need to squash before we merge to clean them up anyway. 😄 So just make the change directly to get the code looking the way you want it. 😁

Copy link
Member

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, got some more notes. I didn't get around to reviewing the tests yet, but there's some feedback in the code itself so let's worry about that first. 😄

hyper/contrib.py Outdated
@@ -43,28 +48,40 @@ def get_connection(self, host, port, scheme, cert=None):
if cert is not None:
ssl_context = init_context(cert=cert)

connection_key = (host, port, scheme, cert, proxy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to just make proxy_headers part of the tuple?

hyper/contrib.py Outdated
if username:
headers['Proxy-Authorization'] = _basic_auth_str(username,
password)
return headers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this code? We inherit from the HTTPAdapter, which already provides this method. Why can't we just call it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I must have missed that

@@ -396,3 +427,23 @@ def __enter__(self):
def __exit__(self, type, value, tb):
self.close()
return False # Never swallow exceptions.


def _create_tunnel(proxy_host, proxy_port, target_host, target_port,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a minor coding style note, it's better to put functions like these towards the top of the file rather than towards the bottom.

# Send http CONNECT method to a proxy and acquire the socket
sock = _create_tunnel(self.proxy_host, self.proxy_port,
self.host, self.port,
proxy_headers=self.proxy_headers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard style in this codebase for breaking the line would be like this:

sock = _create_tunnel(
    self.proxy_host,
    self.proxy_port,
    self.host,
    self.port,
    proxy_headers=self.proxy_headers
)

elif self.proxy_host:
# Simple http proxy
sock = socket.create_connection((self.proxy_host,
self.proxy_port), 5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same multiline note.

@@ -80,20 +84,24 @@ def __init__(self, host, port=None, secure=None, ssl_context=None,
self._send_http_upgrade = not self.secure
self._enable_push = kwargs.get('enable_push')

# CONNECT request (to create a tunnel via proxy) has a few special
# cases which needs to be processed differently.
self._is_connect_request = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I don't think this feels right. A connection is not a request: a request is just one of many things that can happen on a connection. In this case, all of the places it's used except for the response have access to the request verb, so they can just check what the request verb is. The better solution is to tell the response what the request verb was, which is good practice anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I've got this point, will try this out tomorrow

@@ -218,7 +247,8 @@ def get_response(self):
# https://github.com/Lukasa/hyper/issues/312.
# Connection options are case-insensitive, while upgrade tokens are
# case-sensitive: https://github.com/httpwg/http11bis/issues/8.
if (response.status == 101 and
if (not self._is_connect_request and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary: as we don't send Upgrade headers on a CONNECT request, we're not going to get a 101 Switching Protocols response back. The standard logic is fine.

proxy_headers = self.proxy_headers or {}
for name, value in proxy_headers.items():
is_default = to_native_string(name) in default_headers
self.putheader(name, value, stream_id, replace=is_default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to try to be a bit clever here. We can avoid duplicating the loop above by doing this:

all_headers = headers.items()
if self.proxy_host and not self.secure:
    proxy_headers = self.proxy_headers or {}
    all_headers = itertools.chain(all_headers, proxy_headers.items())

for name, value in all_headers:

That'll keep a single loop, and generally seem a lot cleaner.

test/server.py Outdated
@@ -27,6 +28,12 @@
from hyper.tls import NPN_PROTOCOL


class SocketSecure(Enum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this name. Maybe SocketSecuritySetting?

test/server.py Outdated
class SocketSecure(Enum):
SECURE = True
INSECURE = False
SECURE_NO_AUTO_WRAP = 'NO_AUTO_WRAP'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't appear to need the values to be set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate on this?
I think we actually need the values to be able to convert boolean secure to an instance of this enum

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm. I suppose? It feels pretty weird though, let's put a comment in here to say that's why we're doing it.

KostyaEsmukov added a commit to KostyaEsmukov/hyper that referenced this pull request May 10, 2017
resp = conn.get_response()
if resp.status != 200:
raise ProxyError("Tunnel connection failed: %d %s" % (
resp.status, to_native_string(resp.reason)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nicer to actually return the response object in this case as well, so that users can grab it to read any response body or headers that might be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to be a private API, so we are the users of it :)
We don't seem to need this response as for now. Maybe we could leave it as is? 🙂

The problem is to differ socket from response on consumer side, which seems like an unnecessary complication as for now, I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a private API in the sense that no-one else calls it, but presumably we allow this exception to bubble up, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Do you mean that we should add response as exception attribute?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.

@@ -83,17 +106,21 @@ def __init__(self, host, port=None, secure=None, ssl_context=None,
self.ssl_context = ssl_context
self._sock = None

# Keep request methods in a queue in order to be able to know
# in get_response() what was the request verb.
self._request_methods = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, this doesn't make me much happier, as it turns out. 😞

I think the thing to do is to keep track of only the currently outstanding request method. We don't allow pipelining, really, so we shouldn't have a list: just a single value. We can then use that in get_response. Does that sound more sensible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about pipelining while considering a list here.
OK, I will turn this to a single value

KostyaEsmukov added a commit to KostyaEsmukov/hyper that referenced this pull request May 10, 2017
KostyaEsmukov added a commit to KostyaEsmukov/hyper that referenced this pull request May 10, 2017
This fixes a possible situation when calling get_connection with the
same arguments but different proxy_headers would reuse an already
created connection, so the new proxy_headers will be ignored.

python-hyper#322 (comment)
KostyaEsmukov added a commit to KostyaEsmukov/hyper that referenced this pull request May 10, 2017
Copy link
Member

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, this is coming along really nicely! The number of notes I have each time we go around is shrinking, with is definitely a good sign. I think we're getting very close to this being ready to go.

"""
An error occurred during connection to a proxy.
"""
def __init__(self, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, this is good, but let's save ourselves a kwargs.pop and write the declaration as def __init__(self, *args, response=None, **kwargs):. That way we save the dict pop, make it clear that we are really expecting a response argument, and just generally make everything a bit nicer. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would have been nicer, but Python 2.7 doesn't support this syntax :(
It's in https://www.python.org/dev/peps/pep-3102/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, gross. Well, in that case I think we should just formally rewrite this as taking two arguments: message, and response.

hyper/contrib.py Outdated
# rely on proxy headers being the same for the same proxies.
proxy_headers_key = (frozenset(proxy_headers.items())
if proxy_headers else None)
connection_key = (host, port, scheme, cert, proxy, proxy_headers_key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me ask: is there a reason to key on proxy, instead of proxy_netloc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! No, there's not, since we include proxy headers in the key

self.proxy_host, self.proxy_port = to_host_port_tuple(
proxy_host, default_port=8080
)
elif proxy_host and proxy_port is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can condense this: this can just be elif proxy_host, as definitionally if we didn't enter the block above but proxy_host is true, proxy_port must not be None.

def get_response(self):
"""
Returns a response object.

This is an early beta, so the response object is pretty stupid. That's
ok, we'll fix it later.
"""
method = self._current_request_method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to None out self._current_request_method here as well.

@@ -48,7 +49,8 @@ def __init__(self, code, reason, headers, sock, connection=None):
# bother checking for content-length, we just keep reading until
# we no longer can.
self._expect_close = False
if b'close' in self.headers.get(b'connection', []):
if (b'close' in self.headers.get(b'connection', []) or
b'close' in self.headers.get(b'proxy-connection', [])):
self._expect_close = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want self._expect_close to be true for CONNECT requests? Probably not, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a proxy responds with 200 to a CONNECT request, it doesn't include a proxy-connection header at all (at least I haven't seen that).
But when it responds with >=400, it does include a proxy-connection: close header, which should be respected here.

So I think there's no need to make any exception for a CONNECT request here.

UPDATE:
There's a test checking that socket is actially closed in this case. It's test_integration_http11.TestHyperH11Integration.test_proxy_connection_close_is_respected

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we cannot rely on a proxy-connection: close header being present. There is no specification requiring it (in fact, proxy-connection is not a specified header at all, it's just something that has evolved over time and has been argued as being a "misunderstanding" of how the connection header was supposed to work). I don't think there's anything wrong with saying proxy-connection: close on a 200 OK to a CONNECT: in fact, if the response is HTTP/1.1 it might even be good practice to provide Connection: close, as the termination of the response body is nominally a connection termination.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you suggest to ignore the proxy-connection header at all?

How should a response like this one be handled? It causes an assertion error if we ignore the proxy-connection header when using an insecure proxy (w/o CONNECT).

HTTP/1.0 407 Proxy Authentication Required\r\n
Proxy-Authenticate: Basic realm="proxy"\r\n
Proxy-Connection: close\r\n
\r\n

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What AssertionError does it raise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the response above a proxy server exclicitly set that this is a HTTP/1.0 response (not HTTP/1.1), didn't set a Content-Length header and closed connection after sending body. This is a valid HTTP/1.0 response.

RFC 1945, HTTP/1.0, section 7.2.2 Length:

When an Entity-Body is included with a message, the length of that
body may be determined in one of two ways. If a Content-Length header
field is present, its value in bytes represents the length of the
Entity-Body. Otherwise, the body length is determined by the closing
of the connection by the server.

RFC 1945, HTTP/1.0, section 1.3 Overall Operation:

Except for experimental applications, current practice requires that
the connection be established by the client prior to each request and
closed by the server after sending the response.

But it is an invalid response from HTTP/1.1 perspective as you have already fairly mentioned.

RFC 7230, HTTP/1.1 Message Syntax and Routing, section 3.4 Handling Incomplete Messages:

[...] A response that has neither chunked
transfer coding nor Content-Length is terminated by closure of the
connection and, thus, is considered complete regardless of the number
of message body octets received, provided that the header section was
received intact.

RFC 2068, HTTP/1.1, section 14.10 Connection:

HTTP/1.1 applications that do not support persistent connections MUST
include the "close" connection option in every message.

In HTTP/1.0, each connection is established by the client prior to
the request and closed by the server after sending the response.
However, some implementations implement the explicitly negotiated
("Keep-Alive") version of persistent connections described in Section
19.7.1 of [RFC2068].

Also it is expected, but not strictly required, for HTTP/1.1 clients to understand valid HTTP/1.0 responses as well (as per RFC 2068, HTTP/1.1, section 19.7 Compatibility with Previous Versions).

So the key difference between HTTP/1.1 and HTTP/1.0 responses for that matter, as I see it, is that HTTP/1.1 response must include a connection: close header if a server is about to close connection in the absence of content-length/chunked encoding headers, but for HTTP/1.0 response connection should always be closed anyway (except when connection: keep-alive is present, see below).

I suggest to set self._expect_close to True for HTTP/1.0 responses which doesn't contain connection header.

RFC 2068, HTTP/1.1, section 19.7.1 Compatibility with HTTP/1.0 Persistent Connections:

Some clients and servers may wish to be compatible with some previous
implementations of persistent connections in HTTP/1.0 clients and
servers. Persistent connections in HTTP/1.0 must be explicitly
negotiated as they are not the default behavior. [...]

The following describes the original HTTP/1.0 form of persistent
connections.

When it connects to an origin server, an HTTP client MAY send the
Keep-Alive connection-token in addition to the Persist connection-
token:

  Connection: Keep-Alive

An HTTP/1.0 server would then respond with the Keep-Alive connection
token and the client may proceed with an HTTP/1.0 (or Keep-Alive)
persistent connection.

Thus the assertion will not be triggered even if we ignore the proxy-conenction header.

An instant question if you consider this proposal acceptable: should this be done in this PR, or in a different one? This change is not just proxy-related and it definitely requires adding some new tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the key difference between HTTP/1.1 and HTTP/1.0 responses for that matter, as I see it, is that HTTP/1.1 response must include a connection: close header if a server is about to close connection in the absence of content-length/chunked encoding headers

This is true in the sense that the spec requires it, but the spec also requires that in the absence of all three headers (no Connection: close, no Transfer-Encoding: chunked, and no Content-Length), the response must be treated as if Connection: close were present. So my suggestion is that we take that route, rather than special-case HTTP/1.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the spec also requires that in the absence of all three headers (no Connection: close, no Transfer-Encoding: chunked, and no Content-Length), the response must be treated as if Connection: close were present.

I guess this quote confirms what you say:

RFC 7230, HTTP/1.1 Message Syntax and Routing, section 3.3.3. Message Body Length

The length of a message body is determined by one of the following
(in order of precedence):

[... cases when either content-length or transfer-encoding: chunked are set ...]

  1. Otherwise, this is a response message without a declared message
    body length, so the message body length is determined by the
    number of octets received prior to the server closing the
    connection.

Since there is no way to distinguish a successfully completed,
close-delimited message from a partially received message interrupted
by network failure, a server SHOULD generate encoding or
length-delimited messages whenever possible. The close-delimiting
feature exists primarily for backwards compatibility with HTTP/1.0.

Here are some extracts that doesn't say the opposite, but it's implicitly meant that in the absence of Connection: close header for HTTP/1.1 response a client should persist connection.

RFC 7230, HTTP/1.1 Message Syntax and Routing, section 6.3. Persistence:

o If the "close" connection option is present, the connection will
not persist after the current response; else,

o If the received protocol is HTTP/1.1 (or later), the connection
will persist after the current response; else,

o If the received protocol is HTTP/1.0, the "keep-alive" connection
option is present, the recipient is not a proxy, and the recipient
wishes to honor the HTTP/1.0 "keep-alive" mechanism, the
connection will persist after the current response; otherwise,

o The connection will close after the current response.

RFC 2068, HTTP/1.1, section 8.1.2.1 Negotiation:

An HTTP/1.1 client MAY expect a connection to remain open, but would
decide to keep it open based on whether the response from a server
contains a Connection header with the connection-token close. [...]

If either the client or the server sends the close token in the
Connection header, that request becomes the last one for the
connection.

Clients and servers SHOULD NOT assume that a persistent connection is
maintained for HTTP versions less than 1.1 unless it is explicitly
signaled. See section 19.7.1 for more information on backwards
compatibility with HTTP/1.0 clients.

In sum, we have that in the absence of content-length and transfer-encoding: chunked headers connection should be closed, but this intent must be explicitly defined with the connection: close header. Thus I assume that it's not a valid HTTP/1.1 response and I guess that the assertion we discuss should be triggered for HTTP/1.1 responses, but should not for HTTP/1.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, we should tolerate it for HTTP/1.1 as well. 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay 😃
But there's a problem: HTTP/1.0 200 Connection established response to CONNECT will be considered as close-delimited, so the Response class will eat up the tunnel thinking that this is a body of this response.

I will think about this problem in the coming days. 🙂

self.proxy_host, self.proxy_port = to_host_port_tuple(
proxy_host, default_port=8080
)
elif proxy_host and proxy_port is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note here about the proxy_port check.

Copy link
Member

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, this looks really good! Mind squashing the commits down? When you do that, I'll go ahead and merge.

@KostyaEsmukov
Copy link
Contributor Author

No problem! Should I force-push to this branch, or open a new PR?

I'm asking because I don't know how would GitHub handle the comments in this PR once these commits will be gone (or maybe it is OK to lose them? 😄)

@Lukasa
Copy link
Member

Lukasa commented May 30, 2017

You can force push to this branch, it'll work just fine.

Squashed python-hyper#322

# Conflicts:
#	hyper/contrib.py
kit-cat pushed a commit to Polyconseil/hyper that referenced this pull request Jun 2, 2017
Squashed python-hyper#322

# Conflicts:
#	hyper/contrib.py
@KostyaEsmukov
Copy link
Contributor Author

Hi @Lukasa , is anything missing here?

class ConnectionError(Exception):
"""
An error occurred during connection to a host.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm. Here's a question: on Pythons that have ConnectionError, probably we shouldn't create hyper.common.exceptions.ConnectionError as an alias of that exception but as a subclass instead. Mind having this inherit instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Here's the ConnectionResetError done the same way in master: https://github.com/Lukasa/hyper/blob/master/hyper/common/exceptions.py#L41

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm. Given we have prior art let's keep doing this then. I have mixed feelings about it, but if we're going to fix it we should fix all of it. 😁

@Lukasa
Copy link
Member

Lukasa commented Jun 21, 2017

Ok, if we bring this up-to-date with current master we should be ready to go.

@Lukasa
Copy link
Member

Lukasa commented Jun 21, 2017

Very nicely done! Thanks for spending the time getting this into perfect shape: I know it wasn't easy, and I want you to know I appreciate it greatly.

@Lukasa Lukasa merged commit 185254b into python-hyper:development Jun 21, 2017
@KostyaEsmukov KostyaEsmukov deleted the secure_proxy branch July 2, 2017 05:45
@LLCampos LLCampos mentioned this pull request Oct 4, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants