HTTPS over a proxy, now with Proxy-Connection and CONNECT...HTTP/1.1 #45

Closed
wants to merge 4 commits into
from

Conversation

Projects
None yet
5 participants
@sabren

sabren commented Oct 6, 2011

My previous pull request worked with several free proxies (Apache+mod_proxy, fiddler, Proxoid),
but failed on the commercial proxy we tested. [400 Bad Request]

This version uses HTTP/1.1 for the CONNECT request and adds a "Proxy-Connection: keep-alive" header, which resolved the issue.

@sabren

This comment has been minimized.

Show comment
Hide comment
@sabren

sabren Oct 6, 2011

Owner

The code above actually does work correctly with HTTPS redirects through a proxy.

I said redirects weren't working because of the special case of https://paypal.com described here:

As far as I can tell, https://paypal.com/ is actually sending a bad location header when scrapy visits.

Location: https://www.paypal.comhttps://paypal.com/

I think it may be a bug in their handling of HTTP/1.0 because I can't duplicate the problem with any other client.
(Unfortunately, simply patching scrapy to send HTTP/1.1 results in 400 Bad Request errors.)

Owner

sabren commented on 98bf034 Oct 6, 2011

The code above actually does work correctly with HTTPS redirects through a proxy.

I said redirects weren't working because of the special case of https://paypal.com described here:

As far as I can tell, https://paypal.com/ is actually sending a bad location header when scrapy visits.

Location: https://www.paypal.comhttps://paypal.com/

I think it may be a bug in their handling of HTTP/1.0 because I can't duplicate the problem with any other client.
(Unfortunately, simply patching scrapy to send HTTP/1.1 results in 400 Bad Request errors.)

proxy = request.meta.get('proxy')
if proxy:
+ old_scheme, old_host, old_port = self.scheme, self.host, self.port
self.scheme, _, self.host, self.port, _ = _parse(proxy)
self.path = self.url

This comment has been minimized.

@scottyallen

scottyallen Oct 10, 2011

Thanks for the patch - I was in the process of trying to fix this, and it saved me a ton of time:) However, I don't think line 191 is quite right for the tunnel case. It results in sending a GET request with the full url to the destination webserver, which is technically wrong and some sites refuse to handle. Instead, self.path should remain unchanged for the tunnel case. I can send a patch to your patch, if you like...

@scottyallen

scottyallen Oct 10, 2011

Thanks for the patch - I was in the process of trying to fix this, and it saved me a ton of time:) However, I don't think line 191 is quite right for the tunnel case. It results in sending a GET request with the full url to the destination webserver, which is technically wrong and some sites refuse to handle. Instead, self.path should remain unchanged for the tunnel case. I can send a patch to your patch, if you like...

This comment has been minimized.

@sabren

sabren Oct 11, 2011

Thanks, Scotty!

I'm sure my client would appreciate it.

Can you make a combined pull request, or should I pull from you and open another pull request here?

@sabren

sabren Oct 11, 2011

Thanks, Scotty!

I'm sure my client would appreciate it.

Can you make a combined pull request, or should I pull from you and open another pull request here?

@nramirezuy

This comment has been minimized.

Show comment
Hide comment
@nramirezuy

nramirezuy Oct 14, 2013

Member

@dangra Can we close this ticket? on #392 you said this is deprecated.

Member

nramirezuy commented Oct 14, 2013

@dangra Can we close this ticket? on #392 you said this is deprecated.

@dangra

This comment has been minimized.

Show comment
Hide comment
@dangra

dangra Oct 14, 2013

Member

@nramirezuy: so true.

Member

dangra commented Oct 14, 2013

@nramirezuy: so true.

@dangra dangra closed this Oct 14, 2013

@dangra dangra referenced this pull request Nov 5, 2013

Closed

https request with proxies. #453

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment