Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Add "Host" header in CONNECT requests to HTTPS proxies #2069

Merged
merged 3 commits into from Jul 6, 2016

Conversation

@redapple
Copy link
Contributor

@redapple redapple commented Jun 21, 2016

"Host" request header is a MUST according to RFC 2616

A client MUST include a Host header field in all HTTP/1.1 request
messages . If the requested URI does not include an Internet host
name for the service being requested, then the Host header field MUST
be given with an empty value. An HTTP/1.1 proxy MUST ensure that any
request message it forwards does contain an appropriate Host header
field that identifies the service being requested by the proxy. All
Internet-based HTTP/1.1 servers MUST respond with a 400 (Bad Request)
status code to any HTTP/1.1 request message which lacks a Host header
field.

so it must also be sent for CONNECT requests.

Fixes issue raised in #1434 (comment)

Note:
HTTPS proxy tests need some love.
We could perhaps reuse https://github.com/fmoo/twisted-connect-proxy

It seems that currently the only test using CONNECT is tests/test_downloader_handlers.py::Http11ProxyTestCase::test_download_with_proxy_https_timeout,
which sends this over the wire (with this "Host" header fix):

CONNECT no-such-domain.nosuch:443 HTTP/1.1
Host: no-such-domain.nosuch:443

And test server returns

HTTP/1.1 200 OK
Server: TwistedWeb/16.2.0
Date: Tue, 21 Jun 2016 11:33:19 GMT
Content-Type: text/html
Content-Length: 25

no-such-domain.nosuch:443

I don't know where that no-such-domain.nosuch:443 line at the end comes from.

@kmike
Copy link
Member

@kmike kmike commented Jun 21, 2016

@redapple do you know why are tests failng in Python 3.5?

@redapple
Copy link
Contributor Author

@redapple redapple commented Jun 21, 2016

@kmike , hm, I don't know, the tests pass for me locally.
Maybe something to do with that "no-such-domain.nosuch:443" sent as response body

@redapple
Copy link
Contributor Author

@redapple redapple commented Jun 21, 2016

Ok, so that "no-such-domain.nosuch:443" line comes from HttpProxyTestCase where server is set up with UriResource() which returns the URI by default.

@codecov-io
Copy link

@codecov-io codecov-io commented Jun 21, 2016

Current coverage is 83.32%

Merging #2069 into master will decrease coverage by 0.01%

Powered by Codecov. Last updated by 80c296e...15d0c89

@redapple
Copy link
Contributor Author

@redapple redapple commented Jun 21, 2016

@kmike , test is passing but we should really strengthen these proxy tests

@redapple redapple changed the title Add "Host" header in CONNECT requests to HTTPS proxies [MRG] Add "Host" header in CONNECT requests to HTTPS proxies Jun 22, 2016
@nyov
Copy link
Contributor

@nyov nyov commented Jul 2, 2016

YES!
Thanks ❤️

@@ -151,23 +157,22 @@ def connect(self, protocolFactory):
return self._tunnelReadyDeferred


def tunnel_request_data(host, port, proxy_auth_header=None):
def tunnel_request_data(host, port, proxy_auth_header=None, host_header=True):
Copy link
Member

@kmike kmike Jul 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is host_header argument needed? It is also a bit confusing it has a boolean value, unlike proxy_auth_header.

Copy link
Contributor Author

@redapple redapple Jul 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really needed. Definitely an artifact from my tests, switching this on and off.
It's a boolean because it doesn't convey a string value, but tells to add host string or not, compared to proxy_auth_header which is a string.
I can remove it.

Copy link
Member

@kmike kmike Jul 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed that while reading coverage report - host_header was always True in tests. Let's drop it :) +1 to merge the PR after that.

@kmike kmike merged commit 759a555 into scrapy:master Jul 6, 2016
1 of 2 checks passed
redapple added a commit that referenced this issue Jul 8, 2016
[backport][1.1] Add "Host" header in CONNECT requests to HTTPS proxies (PR #2069)
@redapple redapple deleted the https-connect-host branch Jul 8, 2016
@redapple redapple added this to the v1.1.1 milestone Jul 13, 2016
@redapple redapple mentioned this pull request Jul 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants