Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Buffer CONNECT response bytes from proxy until all HTTP headers are received #2495

Merged
merged 2 commits into from
Feb 20, 2017

Conversation

redapple
Copy link
Contributor

Fixes #2491

See #2491 (comment) for rationale.
A simple fix, but ideally, we'd want to use a proper HTTP parser.
Something for another PR I believe.

@codecov-io
Copy link

codecov-io commented Jan 13, 2017

Codecov Report

Merging #2495 into master will decrease coverage by -0.02%.
The diff coverage is 60%.

@@            Coverage Diff             @@
##           master    #2495      +/-   ##
==========================================
- Coverage   83.46%   83.44%   -0.02%     
==========================================
  Files         161      161              
  Lines        8780     8784       +4     
  Branches     1288     1289       +1     
==========================================
+ Hits         7328     7330       +2     
- Misses       1204     1205       +1     
- Partials      248      249       +1
Impacted Files Coverage Δ
scrapy/core/downloader/handlers/http11.py 91.13% <60%> (-0.71%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ca191e...adb180f. Read the comment docs.

@redapple redapple changed the title Buffer CONNECT response bytes from proxy until all HTTP headers are received [MRG] Buffer CONNECT response bytes from proxy until all HTTP headers are received Jan 31, 2017
@@ -121,8 +122,16 @@ def processProxyResponse(self, rcvd_bytes):
created, notifies the client that we are ready to send requests. If not
raises a TunnelError.
"""
self._connectBuffer += rcvd_bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have a different algorithmic complexity in pypy; it seems using bytearray could help. I'm not sure if this is a real problem though.

@kmike kmike added this to the v1.4 milestone Feb 17, 2017
@kmike
Copy link
Member

kmike commented Feb 20, 2017

LGTM. Relevant document: http://www.freeproxy.ru/ru/free_proxy/faq/draft-luotonen-web-proxy-tunneling-01.txt:

An example of a tunneling request/response in an interleaved
   multicolumn format:

     CLIENT -> SERVER                        SERVER -> CLIENT
     --------------------------------------  -----------------------------------
     CONNECT home.netscape.com:443 HTTP/1.0
     User-agent: Mozilla/4.0
     <<< empty line >>>
                                             HTTP/1.0 200 Connection established
                                             Proxy-agent: Netscape-Proxy/1.1
                                             <<< empty line >>>
                  <<< data tunneling to both directions begins >>>

so this PR does as specified - waits for an empty line.

@kmike kmike merged commit 565d1ca into scrapy:master Feb 20, 2017
# from the proxy so that we don't send those bytes to the TLS layer
#
# see https://github.com/scrapy/scrapy/issues/2491
if b'\r\n\r\n' not in self._connectBuffer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also can be O(N^2) if connectBuffer is extended byte-by-byte, but I'm not sure how to fix it, and likely it shouldn't be a problem as the response should be small.

@redapple redapple deleted the proxy-connect-headers-buffer branch February 23, 2017 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TLS connection fails through HTTPS proxy after CONNECT tunnel is established
3 participants