Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Buffer CONNECT response bytes from proxy until all HTTP headers are received #2495

Merged
merged 2 commits into from Feb 20, 2017
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 10 additions & 1 deletion scrapy/core/downloader/handlers/http11.py
Expand Up @@ -105,6 +105,7 @@ def __init__(self, reactor, host, port, proxyConf, contextFactory,
self._tunneledHost = host
self._tunneledPort = port
self._contextFactory = contextFactory
self._connectBuffer = b''

def requestTunnel(self, protocol):
"""Asks the proxy to open a tunnel."""
Expand All @@ -121,8 +122,16 @@ def processProxyResponse(self, rcvd_bytes):
created, notifies the client that we are ready to send requests. If not
raises a TunnelError.
"""
self._connectBuffer += rcvd_bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have a different algorithmic complexity in pypy; it seems using bytearray could help. I'm not sure if this is a real problem though.

# make sure that enough (all) bytes are consumed
# and that we've got all HTTP headers (ending with a blank line)
# from the proxy so that we don't send those bytes to the TLS layer
#
# see https://github.com/scrapy/scrapy/issues/2491
if b'\r\n\r\n' not in self._connectBuffer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also can be O(N^2) if connectBuffer is extended byte-by-byte, but I'm not sure how to fix it, and likely it shouldn't be a problem as the response should be small.

return
self._protocol.dataReceived = self._protocolDataReceived
respm = TunnelingTCP4ClientEndpoint._responseMatcher.match(rcvd_bytes)
respm = TunnelingTCP4ClientEndpoint._responseMatcher.match(self._connectBuffer)
if respm and int(respm.group('status')) == 200:
try:
# this sets proper Server Name Indication extension
Expand Down