Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy CONNECT aborted on HTTPS redirection #3

Closed
Florian95 opened this issue Dec 8, 2015 · 1 comment
Closed

Proxy CONNECT aborted on HTTPS redirection #3

Florian95 opened this issue Dec 8, 2015 · 1 comment
Labels

Comments

@Florian95
Copy link

Hello,

Excellent work, but that does not seem to work with HTTP to HTTPS redirection, for example:
http://github.com/fabienvauchelles/scrapoxy

HTTP :

$ curl -i --proxy http://127.0.0.1:8888 http://github.com/fabienvauchelles/scrapoxy
HTTP/1.1 301 Moved Permanently
content-length: 0
location: https://github.com/fabienvauchelles/scrapoxy
connection: close
x-cache-proxyname: i-ae00b817
Date: Tue, 08 Dec 2015 23:33:56 GMT

HTTPS :

$ curl --proxy http://127.0.0.1:8888 https://github.com/fabienvauchelles/scrapoxy
curl: (56) Proxy CONNECT aborted

Regards,

@fabienvauchelles
Copy link
Owner

Hello,

Thanks for your feedback :)

That is a normal behavior.

For HTTP/HTTPS, there are 2 methods:

Method REQUEST for HTTP

  1. A client makes an HTTP request to a proxy, in REQUEST mode;
  2. The proxy replays the same HTTP request to the target.

Method CONNECT for HTTPS

  1. A client makes an HTTP request to a proxy, in CONNECT mode;
  2. The proxy creates a TCP tunnel between the client and the target.

=> The proxy only views a TCP redirect. Content cannot be understand by the proxy.

This a huge problem because Scrapoxy changes HTTP headers on-the-fly (useragent).

To bypass the problem
Scrapoxy accepts only HTTP proxy. For HTTPS request, you must URL with HTTPS with REQUEST mode.
See Tutorial

Why cURL doesn't work ?
cURL cannot works with HTTPS on REQUEST mode. If cURL detects an HTTPS URL, it will ask a CONNECT mode. More information here and here.

The Scrapy framework accepts HTTPS on REQUEST mode.

You must specify 'noconnect' in you proxy URL:

PROXY = 'http://127.0.0.1:8888/?noconnect'

Hope it will help,
Fabien.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants