New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSL errors crawling https sites using proxies #1855
Comments
@Cesped ,
I was able to use 2 https proxies from HMA with https://www.python.org and https://www.base.net I pushed the Wireshark-capture pcap file and console logs to https://github.com/redapple/scrapy-issues/tree/master/1855/redapple for you to compare if you can. Could it be related to the HTTPS proxies you use? |
Thanks for answering @redapple. The solution was changing |
I am trying to crawl walmart using proxymesh proxy provider. Same error is coming. Can i solve this using http proxies? |
I am the same with @yasirnazir, I get this error when using ProxyMesh. @Cesped I don't understand what do you mean with |
Doesn't matter. Found the solution just as @Cesped said. Here is the middleware:
|
Same problem, also found the solution thanks to user cesped |
You can alternatively use w3lib.http.basic_auth_header |
I'm unable to scrape https sites through https supported proxies. I've tried with proxymesh as well as other proxy services. I can scrape most of this sites without proxies or using Tor.
Curl seems to work fine too:
curl -x https://xx.xx.xx.xx:xx --proxy-user user:pass -L https://www.base.net:443
Retrieves the site's html.
Setup:
OS X El Capitan v10.11.3
Scrapy:
Solutions tried:
1 - Installing Scrapy-1.1.0rc3
2016-03-09 12:44:59 [scrapy] ERROR: Error downloading <GET https://www.base.net/>: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'SSL23_GET_SERVER_HELLO', 'unknown protocol')]>]
Other website:
2016-03-09 12:56:45 [scrapy] DEBUG: Retrying <GET https://es.alojadogatopreto.com/es-es/> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]>]
2 - #1764 (comment)
Using SSLv23_METHOD
2016-03-09 12:22:40 [scrapy] ERROR: Error downloading <GET https://www.base.net/>: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'SSL23_GET_SERVER_HELLO', 'unknown protocol')]>]
Using other SSL methods
2016-03-09 12:24:11 [scrapy] ERROR: Error downloading <GET https://www.base.net/>: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'SSL3_GET_RECORD', 'wrong version number')]>]
3 - #1227 (comment) | Get same errors as in 1 & 2.
4 - #1429 (comment) | Get same errors as in 1 & 2.
The text was updated successfully, but these errors were encountered: