Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-69152: Add _proxy_response_headers attribute to HTTPConnection #26152

Merged
merged 7 commits into from
May 5, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 7 additions & 12 deletions Lib/http/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -858,6 +858,7 @@ def __init__(self, host, port=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
self._tunnel_host = None
self._tunnel_port = None
self._tunnel_headers = {}
self._proxy_response_headers = None

(self.host, self.port) = self._get_hostport(host, port)

Expand Down Expand Up @@ -943,21 +944,15 @@ def _tunnel(self):
response = self.response_class(self.sock, method=self._method)
(version, code, message) = response._read_status()

self._proxy_response_headers = parse_headers(response.fp)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to how it is done in lines 337-341


if self.debuglevel > 0:
for hdr, val in self._proxy_response_headers.items():
print("header:", hdr + ":", val)

if code != http.HTTPStatus.OK:
self.close()
raise OSError(f"Tunnel connection failed: {code} {message.strip()}")
while True:
line = response.fp.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise LineTooLong("header line")
if not line:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition if not line: isn't captured in parse_headers call.
Additional check of _MAXHEADERS is verified.
So, this isn't a strict 1:1 replacement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add the if not line condition in _reader_headers method for equivalence.

However, given that _tunnel method already prints headers in line 960 in print('header:', line.decode()), I am trying to understand, what additional benefit this patch brings

The discussion in #69152 seems adding additional states to the _tunnel method, and adding debug response to those states.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orsenthil, thanks for the review!
Regarding condition if not line. It seems to me that this is now a useless part of the code. Because the only variant of the line value in which this condition will be true, if I'm not mistaken, is line = b" (because the method readline can't return None or just an empty string). And for the equality of line to the value of b'', we check a little below.

I looked at the history, these lines were added when there was no equality check for the empty binary object below. That is, at some point they made sense, but after this commit, they ceased to be necessary.

But we can add this check to the _read_headers method as a precaution. Do you think it should be done?

Copy link
Contributor Author

@nametkin nametkin Apr 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. As for the benefits of the changes being made. Firstly, headers are needed in the situations described in the comment Add tunnel CONNECT response headers to httplib / http.client #69152 (comment). And secondly, now headers gets into the debug-logs only if we were able to go beyond this line: https://github.com/python/cpython/blob/main/Lib/http/client.py#L948.
    But if we did not pass the authorization headers we need, then we will get an OSError without information about what happened, we will not be able to find out what authentication data is required. With my PR, I would like to take the first step towards solving the problems described here Add support for digest authentication with an HTTP proxy psf/requests#2526, Allow custom authentication (in particular NTLM) to proxies  psf/requests#1582, http proxy negotiate/gssapi authentication? requests/requests-kerberos#83

Maybe then it would be possible to start returning not OSError, but some custom error that can be catched in the library code (urllib3, requests) and automatically prepare data for authentication based on the information contained in _proxy_response_headers. Similar to how it is done in https://github.com/requests/requests-kerberos/pull/149/files.

Maybe you know ways to solve these problems in a different way, I will be glad of any ideas and suggestions.
How @vadmium's proposal will solve the problem of the lack of any information about what went wrong when creating the tunnel, I did not understand.

# for sites which EOF without sending a trailer
break
if line in (b'\r\n', b'\n', b''):
break

if self.debuglevel > 0:
print('header:', line.decode())

def connect(self):
"""Connect to the host and port specified in __init__."""
Expand Down
14 changes: 14 additions & 0 deletions Lib/test/test_httplib.py
Original file line number Diff line number Diff line change
Expand Up @@ -2390,6 +2390,20 @@ def test_tunnel_debuglog(self):
lines = output.getvalue().splitlines()
self.assertIn('header: {}'.format(expected_header), lines)

def test_proxy_response_headers(self):
expected_header = ('X-Dummy', '1')
response_text = (
'HTTP/1.0 200 OK\r\n'
'{0}\r\n\r\n'.format(':'.join(expected_header))
)

self.conn._create_connection = self._create_connection(response_text)
self.conn.set_tunnel('destination.com')

self.conn.request('PUT', '/', '')
headers = self.conn._proxy_response_headers
self.assertIn(expected_header, headers.items())


if __name__ == '__main__':
unittest.main(verbosity=2)
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Added attribute '_proxy_response_headers' to HTTPConnection class. This
attribute contains the headers of the proxy server response to the CONNECT
request.