Conversation
test/server.py
Outdated
@@ -78,7 +80,7 @@ def _start_server(self): | |||
sock.close() | |||
|
|||
def _wrap_socket(self, sock): | |||
raise NotImplementedError() | |||
return self.cxt.wrap_socket(sock, server_side=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK? What was the purpose of this method? I couldn't find any references to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I have no idea.
test/test_integration.py
Outdated
sock.send(b'HTTP/1.0 200 Connection established\r\n\r\n') | ||
|
||
# todo make this less ugly? | ||
sock = self.server_thread._wrap_socket(sock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel really bad about this. There must be a better way to wrap server socket. Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, TBH. Maybe promote _wrap_socket
to wrap_socket
, or even pull that wrap_socket
method off the server thread and make it available as a test helper function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the first note is: no, don't use httplib. =) The codebase has a HTTP/1.1 stack, you should use that. 😄
hyper/common/exceptions.py
Outdated
@@ -71,3 +71,10 @@ class MissingCertFile(Exception): | |||
The certificate file could not be found. | |||
""" | |||
pass | |||
|
|||
|
|||
class ProxyError(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that it should extend a more generic ConnectionError
exception, but there's no such common exception. But there's one in hyper.http20.exceptions.ConnectionError
. Maybe we could introduce it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to introduce a common ConnectionError
that the HTTP/2 one subclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a hyper.common.exceptions.SocketError
, which is not referenced anywhere in the project. It's very similar to the ConnectionError
I've just introduced, I guess.
I suggest to leave them both here for now, but I would like to work on exceptions structure in another PR after this one
hyper/http20/connection.py
Outdated
@@ -18,6 +18,7 @@ | |||
to_host_port_tuple, to_native_string, to_bytestring, HTTPVersion | |||
) | |||
from ..compat import unicode, bytes | |||
from ..http11.connection import HTTP11Connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK to explicitly depend on HTTP11Connection
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, totally fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, so this looks like a fairly good start, well done! I have left a bunch of code review feedback, but with a largish patch like this that's always going to happen, especially when you are doing the right thing and putting it forward for feedback early. Well done so far, I'm looking forward to seeing how this moves forward!
hyper/contrib.py
Outdated
""" | ||
Gets an appropriate HTTP/2 connection object based on | ||
host/port/scheme/cert tuples. | ||
""" | ||
# TODO take custom request ports into account | ||
proxy = select_proxy("%s://%s" % (scheme, host), proxies) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably do this at a higher scope so that we can pass request.url
instead of rebuilding it ourselves. That's what the Requests version of this code does.
hyper/http11/connection.py
Outdated
@@ -137,6 +144,33 @@ def connect(self): | |||
|
|||
return | |||
|
|||
@classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is weird, I know, but I kind of hate classmethods for this use case. IMO, classmethods are for custom constructors, and nothing else. This should just be a function that uses HTTP11Connection
directly, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I've made it a classmethod is to ensure that the HTTP11Connection
instance, created to connect to a proxy, is not usable/alterable anymore, because its socket is moved to an original connection instance, which picks it up.
Maybe a staticmethod will be better? I'm afraid that an instance method doesn't incapsulate this fragile socket transition properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have been clearer. I wasn't suggesting an instance method: I was suggesting a function. Don't hang it off a class at all, it doesn't need one.
hyper/http11/connection.py
Outdated
# http CONNECT proxy | ||
sock = HTTP11Connection._create_tunnel( | ||
self.proxy_host, self.proxy_port, self.host, self.port) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch is probably simpler as two elif
branches that just consist of a one-line call to socket.create_connection
.
hyper/http11/connection.py
Outdated
""" | ||
conn = cls(proxy_host, proxy_port) | ||
conn.connect() | ||
conn._send_headers(to_bytestring('CONNECT'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is a constant, this can just be b'CONNECT'
. We only need to_bytestring
to allow string formatting operations.
hyper/http11/connection.py
Outdated
while response is None: | ||
# 'encourage' the socket to receive data. | ||
conn._sock.fill() | ||
response = conn.parser.parse_response(conn._sock.buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason we can't just actually use the regular request/response logic, and just not preload any data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the reason is that HTTP11Response
assert
s on presence of some headers like Connection
/Content-length
which might be absent in the proxy response to a CONNECT.
Maybe I should add a comment here explaining this. Or should we adapt the HTTP11Response
to accept responses without these headers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably adapt them to accept responses without those headers.
test/test_integration.py
Outdated
def socket_handler(listener): | ||
sock = listener.accept()[0] | ||
|
||
# Read the CONNECT reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/reader/header/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aww, what a shameful copy-pasted typo 😃
BTW, CONNECT is a method, according to RFC 7231 HTTP/1.1
I will look through my comments again, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a method, yes, but in this case we're reading the whole request header. 😉
test/test_integration.py
Outdated
sock.send(b'HTTP/1.0 200 Connection established\r\n\r\n') | ||
|
||
# todo make this less ugly? | ||
sock = self.server_thread._wrap_socket(sock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, TBH. Maybe promote _wrap_socket
to wrap_socket
, or even pull that wrap_socket
method off the server thread and make it available as a test helper function instead.
test/test_integration.py
Outdated
def socket_handler(listener): | ||
sock = listener.accept()[0] | ||
|
||
# Read the CONNECT reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/reader/header/
test/test_integration.py
Outdated
def socket_handler(listener): | ||
sock = listener.accept()[0] | ||
|
||
# Read the CONNECT reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/reader/helper/
test/test_integration_http11.py
Outdated
def socket_handler(listener): | ||
sock = listener.accept()[0] | ||
|
||
# Read the CONNECT reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/reader/header/
python-hyper#322 (comment) also promote `test.server.SocketServerThread._wrap_socket` to `test.server.SocketServerThread.wrap_socket` python-hyper#322 (comment)
Thank you for your feedback, I appreciate it! |
Cool, in that case I'll hold off reviewing until you deal with your last few TODOs. Try to reduce the amount of back-and-forth we do. 😄 |
Eww, this diff is already big! I guess I've done with this feature. There's more I want to do in hyper, but lets hold it for another PR(s). |
hyper/contrib.py
Outdated
@@ -43,28 +48,40 @@ def get_connection(self, host, port, scheme, cert=None): | |||
if cert is not None: | |||
ssl_context = init_context(cert=cert) | |||
|
|||
connection_key = (host, port, scheme, cert, proxy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling get_connection
with the same arguments but different proxy_headers
would reuse already created connection, so the new proxy_headers
will be ignored.
Our proxy_headers
are generated from proxy url, so we should be fine, but this is not strictly guaranteed in this code.
Maybe we should move in the select_proxy
logic here, thus generating proxy_headers
inside this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to just make proxy_headers
part of the tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, dict is unhashable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but frozenset of tuple isn't. You could write a transform that does proxy_headers_key = frozenset((k, v) for k, v in proxy_headers.items())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me know what do you think about bbf59ff
I don't like storing the whole dict as a key. It seems to bloat the key without much benefit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it resolves the core issue. If you are setting headers for the proxy (e.g. proxy-authorization) then we absolutely must respect that. We need to keep track of them. In 99%+ of cases we will never store anything other than one set of proxy headers and a None
value, which is pretty minimal IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, OK, will do, but still, I think storing proxy headers in this key is redundant, because they are generated from a proxy which is already a part of this key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They needn't be. Requests has a proxy_headers
method expressly to allow it to be overridden, in case users want to provide other forms of proxy header handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, that makes sense!
Should I revert that commit, or just add the dict to the key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point we have enough commits that you don't need to worry about reverting: we'll need to squash before we merge to clean them up anyway. 😄 So just make the change directly to get the code looking the way you want it. 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, got some more notes. I didn't get around to reviewing the tests yet, but there's some feedback in the code itself so let's worry about that first. 😄
hyper/contrib.py
Outdated
@@ -43,28 +48,40 @@ def get_connection(self, host, port, scheme, cert=None): | |||
if cert is not None: | |||
ssl_context = init_context(cert=cert) | |||
|
|||
connection_key = (host, port, scheme, cert, proxy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to just make proxy_headers
part of the tuple?
hyper/contrib.py
Outdated
if username: | ||
headers['Proxy-Authorization'] = _basic_auth_str(username, | ||
password) | ||
return headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this code? We inherit from the HTTPAdapter
, which already provides this method. Why can't we just call it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I must have missed that
hyper/http11/connection.py
Outdated
@@ -396,3 +427,23 @@ def __enter__(self): | |||
def __exit__(self, type, value, tb): | |||
self.close() | |||
return False # Never swallow exceptions. | |||
|
|||
|
|||
def _create_tunnel(proxy_host, proxy_port, target_host, target_port, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a minor coding style note, it's better to put functions like these towards the top of the file rather than towards the bottom.
hyper/http11/connection.py
Outdated
# Send http CONNECT method to a proxy and acquire the socket | ||
sock = _create_tunnel(self.proxy_host, self.proxy_port, | ||
self.host, self.port, | ||
proxy_headers=self.proxy_headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standard style in this codebase for breaking the line would be like this:
sock = _create_tunnel(
self.proxy_host,
self.proxy_port,
self.host,
self.port,
proxy_headers=self.proxy_headers
)
hyper/http11/connection.py
Outdated
elif self.proxy_host: | ||
# Simple http proxy | ||
sock = socket.create_connection((self.proxy_host, | ||
self.proxy_port), 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same multiline note.
hyper/http11/connection.py
Outdated
@@ -80,20 +84,24 @@ def __init__(self, host, port=None, secure=None, ssl_context=None, | |||
self._send_http_upgrade = not self.secure | |||
self._enable_push = kwargs.get('enable_push') | |||
|
|||
# CONNECT request (to create a tunnel via proxy) has a few special | |||
# cases which needs to be processed differently. | |||
self._is_connect_request = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I don't think this feels right. A connection is not a request: a request is just one of many things that can happen on a connection. In this case, all of the places it's used except for the response have access to the request verb, so they can just check what the request verb is. The better solution is to tell the response what the request verb was, which is good practice anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I've got this point, will try this out tomorrow
hyper/http11/connection.py
Outdated
@@ -218,7 +247,8 @@ def get_response(self): | |||
# https://github.com/Lukasa/hyper/issues/312. | |||
# Connection options are case-insensitive, while upgrade tokens are | |||
# case-sensitive: https://github.com/httpwg/http11bis/issues/8. | |||
if (response.status == 101 and | |||
if (not self._is_connect_request and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary: as we don't send Upgrade
headers on a CONNECT
request, we're not going to get a 101 Switching Protocols
response back. The standard logic is fine.
hyper/http20/connection.py
Outdated
proxy_headers = self.proxy_headers or {} | ||
for name, value in proxy_headers.items(): | ||
is_default = to_native_string(name) in default_headers | ||
self.putheader(name, value, stream_id, replace=is_default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to try to be a bit clever here. We can avoid duplicating the loop above by doing this:
all_headers = headers.items()
if self.proxy_host and not self.secure:
proxy_headers = self.proxy_headers or {}
all_headers = itertools.chain(all_headers, proxy_headers.items())
for name, value in all_headers:
That'll keep a single loop, and generally seem a lot cleaner.
test/server.py
Outdated
@@ -27,6 +28,12 @@ | |||
from hyper.tls import NPN_PROTOCOL | |||
|
|||
|
|||
class SocketSecure(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this name. Maybe SocketSecuritySetting
?
test/server.py
Outdated
class SocketSecure(Enum): | ||
SECURE = True | ||
INSECURE = False | ||
SECURE_NO_AUTO_WRAP = 'NO_AUTO_WRAP' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't appear to need the values to be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate on this?
I think we actually need the values to be able to convert boolean secure
to an instance of this enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm. I suppose? It feels pretty weird though, let's put a comment in here to say that's why we're doing it.
hyper/http11/connection.py
Outdated
resp = conn.get_response() | ||
if resp.status != 200: | ||
raise ProxyError("Tunnel connection failed: %d %s" % ( | ||
resp.status, to_native_string(resp.reason))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nicer to actually return the response object in this case as well, so that users can grab it to read any response body or headers that might be there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended to be a private API, so we are the users of it :)
We don't seem to need this response as for now. Maybe we could leave it as is? 🙂
The problem is to differ socket from response on consumer side, which seems like an unnecessary complication as for now, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a private API in the sense that no-one else calls it, but presumably we allow this exception to bubble up, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Do you mean that we should add response as exception attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup.
hyper/http11/connection.py
Outdated
@@ -83,17 +106,21 @@ def __init__(self, host, port=None, secure=None, ssl_context=None, | |||
self.ssl_context = ssl_context | |||
self._sock = None | |||
|
|||
# Keep request methods in a queue in order to be able to know | |||
# in get_response() what was the request verb. | |||
self._request_methods = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, this doesn't make me much happier, as it turns out. 😞
I think the thing to do is to keep track of only the currently outstanding request method. We don't allow pipelining, really, so we shouldn't have a list: just a single value. We can then use that in get_response
. Does that sound more sensible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about pipelining while considering a list here.
OK, I will turn this to a single value
This fixes a possible situation when calling get_connection with the same arguments but different proxy_headers would reuse an already created connection, so the new proxy_headers will be ignored. python-hyper#322 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, this is coming along really nicely! The number of notes I have each time we go around is shrinking, with is definitely a good sign. I think we're getting very close to this being ready to go.
hyper/common/exceptions.py
Outdated
""" | ||
An error occurred during connection to a proxy. | ||
""" | ||
def __init__(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, this is good, but let's save ourselves a kwargs.pop
and write the declaration as def __init__(self, *args, response=None, **kwargs):
. That way we save the dict pop, make it clear that we are really expecting a response argument, and just generally make everything a bit nicer. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would have been nicer, but Python 2.7 doesn't support this syntax :(
It's in https://www.python.org/dev/peps/pep-3102/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, gross. Well, in that case I think we should just formally rewrite this as taking two arguments: message
, and response
.
hyper/contrib.py
Outdated
# rely on proxy headers being the same for the same proxies. | ||
proxy_headers_key = (frozenset(proxy_headers.items()) | ||
if proxy_headers else None) | ||
connection_key = (host, port, scheme, cert, proxy, proxy_headers_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me ask: is there a reason to key on proxy
, instead of proxy_netloc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! No, there's not, since we include proxy headers in the key
hyper/http11/connection.py
Outdated
self.proxy_host, self.proxy_port = to_host_port_tuple( | ||
proxy_host, default_port=8080 | ||
) | ||
elif proxy_host and proxy_port is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can condense this: this can just be elif proxy_host
, as definitionally if we didn't enter the block above but proxy_host
is true, proxy_port
must not be None
.
def get_response(self): | ||
""" | ||
Returns a response object. | ||
|
||
This is an early beta, so the response object is pretty stupid. That's | ||
ok, we'll fix it later. | ||
""" | ||
method = self._current_request_method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to None
out self._current_request_method
here as well.
hyper/http11/response.py
Outdated
@@ -48,7 +49,8 @@ def __init__(self, code, reason, headers, sock, connection=None): | |||
# bother checking for content-length, we just keep reading until | |||
# we no longer can. | |||
self._expect_close = False | |||
if b'close' in self.headers.get(b'connection', []): | |||
if (b'close' in self.headers.get(b'connection', []) or | |||
b'close' in self.headers.get(b'proxy-connection', [])): | |||
self._expect_close = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want self._expect_close
to be true for CONNECT
requests? Probably not, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a proxy responds with 200 to a CONNECT request, it doesn't include a proxy-connection
header at all (at least I haven't seen that).
But when it responds with >=400, it does include a proxy-connection: close
header, which should be respected here.
So I think there's no need to make any exception for a CONNECT request here.
UPDATE:
There's a test checking that socket is actially closed in this case. It's test_integration_http11.TestHyperH11Integration.test_proxy_connection_close_is_respected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we cannot rely on a proxy-connection: close
header being present. There is no specification requiring it (in fact, proxy-connection
is not a specified header at all, it's just something that has evolved over time and has been argued as being a "misunderstanding" of how the connection
header was supposed to work). I don't think there's anything wrong with saying proxy-connection: close
on a 200 OK
to a CONNECT
: in fact, if the response is HTTP/1.1 it might even be good practice to provide Connection: close
, as the termination of the response body is nominally a connection termination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you suggest to ignore the proxy-connection
header at all?
How should a response like this one be handled? It causes an assertion error if we ignore the proxy-connection
header when using an insecure proxy (w/o CONNECT).
HTTP/1.0 407 Proxy Authentication Required\r\n
Proxy-Authenticate: Basic realm="proxy"\r\n
Proxy-Connection: close\r\n
\r\n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What AssertionError
does it raise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the response above a proxy server exclicitly set that this is a HTTP/1.0 response (not HTTP/1.1), didn't set a Content-Length
header and closed connection after sending body. This is a valid HTTP/1.0 response.
RFC 1945, HTTP/1.0, section 7.2.2 Length:
When an Entity-Body is included with a message, the length of that
body may be determined in one of two ways. If a Content-Length header
field is present, its value in bytes represents the length of the
Entity-Body. Otherwise, the body length is determined by the closing
of the connection by the server.
RFC 1945, HTTP/1.0, section 1.3 Overall Operation:
Except for experimental applications, current practice requires that
the connection be established by the client prior to each request and
closed by the server after sending the response.
But it is an invalid response from HTTP/1.1 perspective as you have already fairly mentioned.
RFC 7230, HTTP/1.1 Message Syntax and Routing, section 3.4 Handling Incomplete Messages:
[...] A response that has neither chunked
transfer coding nor Content-Length is terminated by closure of the
connection and, thus, is considered complete regardless of the number
of message body octets received, provided that the header section was
received intact.
RFC 2068, HTTP/1.1, section 14.10 Connection:
HTTP/1.1 applications that do not support persistent connections MUST
include the "close" connection option in every message.In HTTP/1.0, each connection is established by the client prior to
the request and closed by the server after sending the response.
However, some implementations implement the explicitly negotiated
("Keep-Alive") version of persistent connections described in Section
19.7.1 of [RFC2068].
Also it is expected, but not strictly required, for HTTP/1.1 clients to understand valid HTTP/1.0 responses as well (as per RFC 2068, HTTP/1.1, section 19.7 Compatibility with Previous Versions).
So the key difference between HTTP/1.1 and HTTP/1.0 responses for that matter, as I see it, is that HTTP/1.1 response must include a connection: close
header if a server is about to close connection in the absence of content-length
/chunked encoding headers, but for HTTP/1.0 response connection should always be closed anyway (except when connection: keep-alive
is present, see below).
I suggest to set self._expect_close
to True for HTTP/1.0 responses which doesn't contain connection
header.
RFC 2068, HTTP/1.1, section 19.7.1 Compatibility with HTTP/1.0 Persistent Connections:
Some clients and servers may wish to be compatible with some previous
implementations of persistent connections in HTTP/1.0 clients and
servers. Persistent connections in HTTP/1.0 must be explicitly
negotiated as they are not the default behavior. [...]The following describes the original HTTP/1.0 form of persistent
connections.When it connects to an origin server, an HTTP client MAY send the
Keep-Alive connection-token in addition to the Persist connection-
token:Connection: Keep-Alive
An HTTP/1.0 server would then respond with the Keep-Alive connection
token and the client may proceed with an HTTP/1.0 (or Keep-Alive)
persistent connection.
Thus the assertion will not be triggered even if we ignore the proxy-conenction
header.
An instant question if you consider this proposal acceptable: should this be done in this PR, or in a different one? This change is not just proxy-related and it definitely requires adding some new tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the key difference between HTTP/1.1 and HTTP/1.0 responses for that matter, as I see it, is that HTTP/1.1 response must include a connection: close header if a server is about to close connection in the absence of content-length/chunked encoding headers
This is true in the sense that the spec requires it, but the spec also requires that in the absence of all three headers (no Connection: close
, no Transfer-Encoding: chunked
, and no Content-Length
), the response must be treated as if Connection: close
were present. So my suggestion is that we take that route, rather than special-case HTTP/1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the spec also requires that in the absence of all three headers (no Connection: close, no Transfer-Encoding: chunked, and no Content-Length), the response must be treated as if Connection: close were present.
I guess this quote confirms what you say:
RFC 7230, HTTP/1.1 Message Syntax and Routing, section 3.3.3. Message Body Length
The length of a message body is determined by one of the following
(in order of precedence):[... cases when either
content-length
ortransfer-encoding: chunked
are set ...]
- Otherwise, this is a response message without a declared message
body length, so the message body length is determined by the
number of octets received prior to the server closing the
connection.Since there is no way to distinguish a successfully completed,
close-delimited message from a partially received message interrupted
by network failure, a server SHOULD generate encoding or
length-delimited messages whenever possible. The close-delimiting
feature exists primarily for backwards compatibility with HTTP/1.0.
Here are some extracts that doesn't say the opposite, but it's implicitly meant that in the absence of Connection: close
header for HTTP/1.1 response a client should persist connection.
RFC 7230, HTTP/1.1 Message Syntax and Routing, section 6.3. Persistence:
o If the "close" connection option is present, the connection will
not persist after the current response; else,o If the received protocol is HTTP/1.1 (or later), the connection
will persist after the current response; else,o If the received protocol is HTTP/1.0, the "keep-alive" connection
option is present, the recipient is not a proxy, and the recipient
wishes to honor the HTTP/1.0 "keep-alive" mechanism, the
connection will persist after the current response; otherwise,o The connection will close after the current response.
RFC 2068, HTTP/1.1, section 8.1.2.1 Negotiation:
An HTTP/1.1 client MAY expect a connection to remain open, but would
decide to keep it open based on whether the response from a server
contains a Connection header with the connection-token close. [...]If either the client or the server sends the close token in the
Connection header, that request becomes the last one for the
connection.Clients and servers SHOULD NOT assume that a persistent connection is
maintained for HTTP versions less than 1.1 unless it is explicitly
signaled. See section 19.7.1 for more information on backwards
compatibility with HTTP/1.0 clients.
In sum, we have that in the absence of content-length
and transfer-encoding: chunked
headers connection should be closed, but this intent must be explicitly defined with the connection: close
header. Thus I assume that it's not a valid HTTP/1.1 response and I guess that the assertion we discuss should be triggered for HTTP/1.1 responses, but should not for HTTP/1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, we should tolerate it for HTTP/1.1 as well. 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay 😃
But there's a problem: HTTP/1.0 200 Connection established
response to CONNECT will be considered as close-delimited, so the Response class will eat up the tunnel thinking that this is a body of this response.
I will think about this problem in the coming days. 🙂
hyper/http20/connection.py
Outdated
self.proxy_host, self.proxy_port = to_host_port_tuple( | ||
proxy_host, default_port=8080 | ||
) | ||
elif proxy_host and proxy_port is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note here about the proxy_port
check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, this looks really good! Mind squashing the commits down? When you do that, I'll go ahead and merge.
No problem! Should I force-push to this branch, or open a new PR? I'm asking because I don't know how would GitHub handle the comments in this PR once these commits will be gone (or maybe it is OK to lose them? 😄) |
You can force push to this branch, it'll work just fine. |
Squashed python-hyper#322 # Conflicts: # hyper/contrib.py
813bcdf
to
fa238d8
Compare
Squashed python-hyper#322 # Conflicts: # hyper/contrib.py
Hi @Lukasa , is anything missing here? |
class ConnectionError(Exception): | ||
""" | ||
An error occurred during connection to a host. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm. Here's a question: on Pythons that have ConnectionError
, probably we shouldn't create hyper.common.exceptions.ConnectionError
as an alias of that exception but as a subclass instead. Mind having this inherit instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
Here's the ConnectionResetError
done the same way in master: https://github.com/Lukasa/hyper/blob/master/hyper/common/exceptions.py#L41
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm. Given we have prior art let's keep doing this then. I have mixed feelings about it, but if we're going to fix it we should fix all of it. 😁
Ok, if we bring this up-to-date with current master we should be ready to go. |
# Conflicts: # hyper/http20/connection.py
Very nicely done! Thanks for spending the time getting this into perfect shape: I know it wasn't easy, and I want you to know I appreciate it greatly. |
Related issue: #304
This is a very work in progress. It seems to work, but I would like to get your feedback whether I'm on a right track or not.The main concern I have is about httplib: is it OK to use it for just sendingThough HTTP/2 standard defines CONNECT method as well, I guess it is an overkill to use it for just getting a raw socket from a proxy.CONNECT
method to a proxy?TODO:
raise appropriate IO exceptions to distinguish between connection failure to proxy and a general IO exception(to be done in a follow-up PR)Strip(seems unneeded. requests doesn't do it)Proxy-*
headers from response on HTTP proxying