Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Buffer CONNECT response bytes from proxy until all HTTP headers are received #2495

merged 2 commits into from Feb 20, 2017


Copy link

@redapple redapple commented Jan 13, 2017

Fixes #2491

See #2491 (comment) for rationale.
A simple fix, but ideally, we'd want to use a proper HTTP parser.
Something for another PR I believe.

Copy link

@codecov-io codecov-io commented Jan 13, 2017

Codecov Report

Merging #2495 into master will decrease coverage by -0.02%.
The diff coverage is 60%.

@@            Coverage Diff             @@
##           master    #2495      +/-   ##
- Coverage   83.46%   83.44%   -0.02%     
  Files         161      161              
  Lines        8780     8784       +4     
  Branches     1288     1289       +1     
+ Hits         7328     7330       +2     
- Misses       1204     1205       +1     
- Partials      248      249       +1
Impacted Files Coverage Δ
scrapy/core/downloader/handlers/ 91.13% <60%> (-0.71%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ca191e...adb180f. Read the comment docs.

@redapple redapple force-pushed the redapple:proxy-connect-headers-buffer branch from d44ef57 to a586243 Jan 31, 2017
@redapple redapple changed the title Buffer CONNECT response bytes from proxy until all HTTP headers are received [MRG] Buffer CONNECT response bytes from proxy until all HTTP headers are received Jan 31, 2017
@@ -121,8 +122,16 @@ def processProxyResponse(self, rcvd_bytes):
created, notifies the client that we are ready to send requests. If not
raises a TunnelError.
self._connectBuffer += rcvd_bytes

This comment has been minimized.


kmike Feb 2, 2017

This could have a different algorithmic complexity in pypy; it seems using bytearray could help. I'm not sure if this is a real problem though.

@kmike kmike added this to the v1.4 milestone Feb 17, 2017
Copy link

@kmike kmike commented Feb 20, 2017

LGTM. Relevant document:

An example of a tunneling request/response in an interleaved
   multicolumn format:

     CLIENT -> SERVER                        SERVER -> CLIENT
     --------------------------------------  -----------------------------------
     User-agent: Mozilla/4.0
     <<< empty line >>>
                                             HTTP/1.0 200 Connection established
                                             Proxy-agent: Netscape-Proxy/1.1
                                             <<< empty line >>>
                  <<< data tunneling to both directions begins >>>

so this PR does as specified - waits for an empty line.

@kmike kmike merged commit 565d1ca into scrapy:master Feb 20, 2017
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
# from the proxy so that we don't send those bytes to the TLS layer
# see
if b'\r\n\r\n' not in self._connectBuffer:

This comment has been minimized.


kmike Feb 20, 2017

This also can be O(N^2) if connectBuffer is extended byte-by-byte, but I'm not sure how to fix it, and likely it shouldn't be a problem as the response should be small.

@redapple redapple deleted the redapple:proxy-connect-headers-buffer branch Feb 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants
You can’t perform that action at this time.