Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL website. twisted.internet.error.ConnectionLost #2916

Closed
russian-developer opened this issue Sep 7, 2017 · 18 comments
Closed

SSL website. twisted.internet.error.ConnectionLost #2916

russian-developer opened this issue Sep 7, 2017 · 18 comments

Comments

@russian-developer
Copy link

Hi everybody!
I catch this error on both OS. This HTTPS site can't be downloaded via scrapy (twisted). I looked on this issue board and I don't found solution.

Both: Debian 9 / Mac OS

$ scrapy shell "https://wwwnet1.state.nj.us/"
2017-09-07 16:23:02 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot)
2017-09-07 16:23:02 [scrapy.utils.log] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2017-09-07 16:23:02 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2017-09-07 16:23:02 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-09-07 16:23:02 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-09-07 16:23:03 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-09-07 16:23:03 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-09-07 16:23:03 [scrapy.core.engine] INFO: Spider opened
2017-09-07 16:23:03 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wwwnet1.state.nj.us/> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-09-07 16:23:03 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wwwnet1.state.nj.us/> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-09-07 16:23:04 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://wwwnet1.state.nj.us/> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
Traceback (most recent call last):
  File "scrapy", line 11, in <module>
    sys.exit(execute())
  File "/lib/python3.5/site-packages/scrapy/cmdline.py", line 149, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/lib/python3.5/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/lib/python3.5/site-packages/scrapy/cmdline.py", line 156, in _run_command
    cmd.run(args, opts)
  File "/lib/python3.5/site-packages/scrapy/commands/shell.py", line 73, in run
    shell.start(url=url, redirect=not opts.no_redirect)
  File "/lib/python3.5/site-packages/scrapy/shell.py", line 48, in start
    self.fetch(url, spider, redirect=redirect)
  File "/lib/python3.5/site-packages/scrapy/shell.py", line 115, in fetch
    reactor, self._schedule, request, spider)
  File "/lib/python3.5/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "/lib/python3.5/site-packages/twisted/python/failure.py", line 385, in raiseException
    raise self.value.with_traceback(self.tb)
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]

Mac OSx:

$ scrapy version -v
Scrapy    : 1.4.0
lxml      : 3.8.0.0
libxml2   : 2.9.4
cssselect : 1.0.1
parsel    : 1.2.0
w3lib     : 1.18.0
Twisted   : 17.9.0rc1
Python    : 3.5.1 (default, Jan 22 2016, 08:54:32) - [GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)]
pyOpenSSL : 17.2.0 (OpenSSL 1.1.0f  25 May 2017)
Platform  : Darwin-16.7.0-x86_64-i386-64bit

Debian 9:

$ scrapy version -v
Scrapy    : 1.4.0
lxml      : 3.8.0.0
libxml2   : 2.9.3
cssselect : 1.0.1
parsel    : 1.2.0
w3lib     : 1.18.0
Twisted   : 17.9.0rc1
Python    : 3.4.2 (default, Oct  8 2014, 10:45:20) - [GCC 4.9.1]
pyOpenSSL : 17.2.0 (OpenSSL 1.1.0f  25 May 2017)
Platform  : Linux-3.16.0-4-amd64-x86_64-with-debian-8.7

Mac OSx:

$ openssl s_client -connect wwwnet1.state.nj.us:443 -servername wwwnet1.state.nj.us
CONNECTED(00000003)
140736760988680:error:140790E5:SSL routines:ssl23_write:ssl handshake failure:s23_lib.c:177:
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 0 bytes and written 336 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : 0000
    Session-ID: 
    Session-ID-ctx: 
    Master-Key: 
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1504790705
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
---

Debian 9:

CONNECTED(00000003)
---
Certificate chain
 0 s:/C=US/ST=New Jersey/L=Trenton/O=New Jersey State Government/OU=E-Gov Services - wwwnet1.state.nj.us/CN=wwwnet1.state.nj.us
   i:/C=US/O=Symantec Corporation/OU=Symantec Trust Network/CN=Symantec Class 3 Secure Server SHA256 SSL CA
---
Server certificate
-----BEGIN CERTIFICATE-----
<cut out>
-----END CERTIFICATE-----
<cut out>
---
No client certificate CA names sent
---
SSL handshake has read 1724 bytes and written 635 bytes
---
New, TLSv1/SSLv3, Cipher is DES-CBC3-SHA
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1
    Cipher    : DES-CBC3-SHA
    Session-ID: 930F00007F5944DC3C6010F96E95E7FA63656EF5EA35508B055078CEC249DC38
    Session-ID-ctx:
    Master-Key: 27B02D427F006A57B121CCEFEAA7F33B870DE262848BB6F851242F48F051ABB77BA4ED06706766EE8EE55F6643C9FF55
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1504790821
    Timeout   : 300 (sec)
    Verify return code: 21 (unable to verify the first certificate)
---

Thanks you for your time.

@redapple
Copy link
Contributor

redapple commented Sep 7, 2017

This worked for me:

  • force TLS 1.0
  • use cryptography<2 (e.g. 1.9 in my case, before OpenSSL 1.1)
$ scrapy version -v
Scrapy    : 1.4.0
lxml      : 3.8.0.0
libxml2   : 2.9.3
cssselect : 1.0.1
parsel    : 1.2.0
w3lib     : 1.18.0
Twisted   : 17.5.0
Python    : 3.6.2 (default, Aug 24 2017, 10:48:24) - [GCC 6.3.0 20170406]
pyOpenSSL : 17.2.0 (OpenSSL 1.0.2g  1 Mar 2016)


$ pip freeze
asn1crypto==0.22.0
attrs==17.2.0
Automat==0.6.0
cffi==1.10.0
constantly==15.1.0
cryptography==1.9
cssselect==1.0.1
hyperlink==17.3.1
idna==2.6
incremental==17.5.0
lxml==3.8.0
parsel==1.2.0
pyasn1==0.3.3
pyasn1-modules==0.1.1
pycparser==2.18
PyDispatcher==2.0.5
pyOpenSSL==17.2.0
queuelib==1.4.2
Scrapy==1.4.0
service-identity==17.0.0
six==1.10.0
Twisted==17.5.0
w3lib==1.18.0
zope.interface==4.4.2

$ scrapy shell "https://wwwnet1.state.nj.us/" -s DOWNLOADER_CLIENT_TLS_METHOD=TLSv1.0
2017-09-07 17:45:49 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot)
2017-09-07 17:45:49 [scrapy.utils.log] INFO: Overridden settings: {'DOWNLOADER_CLIENT_TLS_METHOD': 'TLSv1.0', 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'LOGSTATS_INTERVAL': 0}
2017-09-07 17:45:49 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage']
2017-09-07 17:45:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-09-07 17:45:49 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-09-07 17:45:49 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-09-07 17:45:49 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-09-07 17:45:49 [scrapy.core.engine] INFO: Spider opened
2017-09-07 17:45:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://wwwnet1.state.nj.us/> (referer: None)
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f24fb802ac8>
[s]   item       {}
[s]   request    <GET https://wwwnet1.state.nj.us/>
[s]   response   <200 https://wwwnet1.state.nj.us/>
[s]   settings   <scrapy.settings.Settings object at 0x7f24f314d9e8>
[s]   spider     <DefaultSpider 'default' at 0x7f24f24ba7b8>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects 
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
>>> 

Using OpenSSL 1.1.0f (with cryptography==2.0.3), did not work for me, even when forcing TLS1.0

@russian-developer
Copy link
Author

@redapple thanks you for your replay.
Yes, this is working for me too…
btw, How do you know about forcing TLS version?

@derrickmar
Copy link

derrickmar commented Dec 3, 2017

Hmm I also tried pip install --upgrade 'cryptography<2' but I'm getting an error when running
scrapy shell "https://wwwnet1.state.nj.us/" -s DOWNLOADER_CLIENT_TLS_METHOD=TLSv1.0

scrapy version -v
Scrapy    : 1.4.0
lxml      : 4.1.1.0
libxml2   : 2.9.7
cssselect : 1.0.1
parsel    : 1.2.0
w3lib     : 1.18.0
Twisted   : 17.9.0
Python    : 2.7.10 (default, Sep 23 2015, 04:34:14) - [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.72)]
pyOpenSSL : 17.5.0 (OpenSSL 1.1.0f  25 May 2017)
Platform  : Darwin-16.7.0-x86_64-i386-64bit
pip freeze
asn1crypto==0.23.0
attrs==17.3.0
Automat==0.6.0
cffi==1.11.2
constantly==15.1.0
cryptography==1.9
cssselect==1.0.1
enum34==1.1.6
hyperlink==17.3.1
idna==2.6
incremental==17.5.0
ipaddress==1.0.18
lxml==4.1.1
parsel==1.2.0
pyasn1==0.4.2
pyasn1-modules==0.2.1
pycparser==2.18
PyDispatcher==2.0.5
pyOpenSSL==17.5.0
queuelib==1.4.2
Scrapy==1.4.0
service-identity==17.0.0
six==1.11.0
Twisted==17.9.0
w3lib==1.18.0
zope.interface==4.4.3

Error

2017-12-02 19:41:37 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot)
2017-12-02 19:41:37 [scrapy.utils.log] INFO: Overridden settings: {'DOWNLOADER_CLIENT_TLS_METHOD': 'TLSv1.0', 'LOGSTATS_INTERVAL': 0, 'RETRY_TIMES': '0', 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2017-12-02 19:41:37 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2017-12-02 19:41:37 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-12-02 19:41:37 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-12-02 19:41:37 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-12-02 19:41:37 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-12-02 19:41:37 [scrapy.core.engine] INFO: Spider opened
2017-12-02 19:41:37 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www5.apply2jobs.com/jupitermed/ProfExt/index.cfm?fuseaction=mExternal.searchJobs> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
Traceback (most recent call last):
  File "/Users/dmar/.local/share/virtualenvs/pathwise-scrape-7G7iLF5G/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/Users/dmar/.local/share/virtualenvs/pathwise-scrape-7G7iLF5G/lib/python2.7/site-packages/scrapy/cmdline.py", line 149, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/Users/dmar/.local/share/virtualenvs/pathwise-scrape-7G7iLF5G/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/Users/dmar/.local/share/virtualenvs/pathwise-scrape-7G7iLF5G/lib/python2.7/site-packages/scrapy/cmdline.py", line 156, in _run_command
    cmd.run(args, opts)
  File "/Users/dmar/.local/share/virtualenvs/pathwise-scrape-7G7iLF5G/lib/python2.7/site-packages/scrapy/commands/shell.py", line 73, in run
    shell.start(url=url, redirect=not opts.no_redirect)
  File "/Users/dmar/.local/share/virtualenvs/pathwise-scrape-7G7iLF5G/lib/python2.7/site-packages/scrapy/shell.py", line 48, in start
    self.fetch(url, spider, redirect=redirect)
  File "/Users/dmar/.local/share/virtualenvs/pathwise-scrape-7G7iLF5G/lib/python2.7/site-packages/scrapy/shell.py", line 115, in fetch
    reactor, self._schedule, request, spider)
  File "/Users/dmar/.local/share/virtualenvs/pathwise-scrape-7G7iLF5G/lib/python2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "<string>", line 2, in raiseException
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]

@aashmishra
Copy link

I am also facing the same error

PS D:\fresh\zomatodata> scrapy version -v
Scrapy : 1.4.0
lxml : 4.1.1.0
libxml2 : 2.9.5
cssselect : 1.0.1
parsel : 1.2.0
w3lib : 1.18.0
Twisted : 17.9.0
Python : 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:14:34) [MSC v.1900 32 bit (Intel)]
pyOpenSSL : 17.5.0 (OpenSSL 1.1.0f 25 May 2017)
Platform : Windows-10-10.0.15063-SP0

PS D:\fresh\zomatodata> python -m pip install --upgrade 'cryptography<2

'
Collecting cryptography<2
Downloading cryptography-1.9-cp36-cp36m-win32.whl (1.1MB)
100% |████████████████████████████████| 1.1MB 750kB/s
Requirement already up-to-date: six>=1.4.1 in d:\python_installed\lib\site-packages (from cryptography<2)
Requirement already up-to-date: asn1crypto>=0.21.0 in d:\python_installed\lib\site-packages (from cryptography<2)
Requirement already up-to-date: cffi>=1.7 in d:\python_installed\lib\site-packages (from cryptography<2)
Requirement already up-to-date: idna>=2.1 in d:\python_installed\lib\site-packages (from cryptography<2)
Requirement already up-to-date: pycparser in d:\python_installed\lib\site-packages (from cffi>=1.7->cryptography<2)
Installing collected packages: cryptography
Found existing installation: cryptography 2.1.4
Uninstalling cryptography-2.1.4:
Successfully uninstalled cryptography-2.1.4
Successfully installed cryptography-1.9
PS D:\fresh\zomatodata> scrapy shell "https://wwwnet1.state.nj.us/" -s DOWNLOADER_CLIENT_TLS_METHOD=TLSv1.0
2017-12-10 14:35:56 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: zomatodata)
2017-12-10 14:35:56 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'zomatodata', 'DOWNLOADER_CLIENT_TLS_METHOD': 'TLSv1.0', 'DUPEFILTER_CLASS': 'scrapy
.dupefilters.BaseDupeFilter', 'LOGSTATS_INTERVAL': 0, 'NEWSPIDER_MODULE': 'zomatodata.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['zomatodata.spiders']}
2017-12-10 14:35:56 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole']
2017-12-10 14:35:56 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-12-10 14:35:56 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-12-10 14:35:56 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-12-10 14:35:56 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-12-10 14:35:56 [scrapy.core.engine] INFO: Spider opened
2017-12-10 14:35:57 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wwwnet1.state.nj.us/robots.txt> (failed 1 times): [<twisted.python.failure.Fa
ilure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-12-10 14:35:57 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wwwnet1.state.nj.us/robots.txt> (failed 2 times): [<twisted.python.failure.Fa
ilure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-12-10 14:35:58 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://wwwnet1.state.nj.us/robots.txt> (failed 3 times): [<twisted.python.fa
ilure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-12-10 14:35:58 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET https://wwwnet1.state.nj.us/robots.txt>: [<twisted.python.failure.Failur
e twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a no
n-clean fashion: Connection lost.>]
2017-12-10 14:35:58 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wwwnet1.state.nj.us/> (failed 1 times): [<twisted.python.failure.Failure twis
ted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-12-10 14:35:59 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wwwnet1.state.nj.us/> (failed 2 times): [<twisted.python.failure.Failure twis
ted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-12-10 14:35:59 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://wwwnet1.state.nj.us/> (failed 3 times): [<twisted.python.failure.Fail
ure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
Traceback (most recent call last):
File "d:\python_installed\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "d:\python_installed\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "D:\python_installed\Scripts\scrapy.exe_main
.py", line 9, in
File "d:\python_installed\lib\site-packages\scrapy\cmdline.py", line 149, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "d:\python_installed\lib\site-packages\scrapy\cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "d:\python_installed\lib\site-packages\scrapy\cmdline.py", line 156, in _run_command
cmd.run(args, opts)
File "d:\python_installed\lib\site-packages\scrapy\commands\shell.py", line 73, in run
shell.start(url=url, redirect=not opts.no_redirect)
File "d:\python_installed\lib\site-packages\scrapy\shell.py", line 48, in start
self.fetch(url, spider, redirect=redirect)
File "d:\python_installed\lib\site-packages\scrapy\shell.py", line 115, in fetch
reactor, self._schedule, request, spider)
File "d:\python_installed\lib\site-packages\twisted\internet\threads.py", line 122, in blockingCallFromThread
result.raiseException()
File "d:\python_installed\lib\site-packages\twisted\python\failure.py", line 385, in raiseException
raise self.value.with_traceback(self.tb)
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-
clean fashion: Connection lost.>]

@raphapassini
Copy link
Contributor

@derrickmar seems that you have different version of OpenSSL, this is the line that works for
@redapple: pyOpenSSL : 17.2.0 (OpenSSL 1.0.2g 1 Mar 2016) and this is the line you posted: pyOpenSSL : 17.5.0 (OpenSSL 1.1.0f 25 May 2017) try to change the OpenSSL version on your system to 1.0.x

@ejulio
Copy link
Contributor

ejulio commented Dec 17, 2018

I reached the same issue a couple of weeks ago and the solution was to change the TLS method.
I changed the config https://doc.scrapy.org/en/latest/topics/settings.html#downloader-client-tls-method to SSLv23_METHOD/TLS

@niquepa
Copy link

niquepa commented Feb 28, 2019

I had the same Issue, in my case the solution was to set the USER_AGENTin the seetings-pyfile:

USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'

@ejulio
Copy link
Contributor

ejulio commented Feb 28, 2019

This issue seems to be related to lots of stuff.
Not sure if we should document it somewhere to keep it easier for other people to find "solutions".
Maybe StackOverflow or Scrapy docs...

@Gallaecio , @raphapassini , @victor-torres ideas here?

@victor-torres
Copy link
Contributor

@ejulio, I like to think about this issue as an edge case. Every time things like this happen to me, the first thing I do is to copy and paste the exception core message and that usually leads me to Stack Overflow or a Mailing List or a GitHub issue. In this case, I think users are pretty much covered with such good content here in this thread.

@SachitNayak
Copy link

SachitNayak commented Feb 20, 2020

I had the same Issue, in my case the solution was to set the USER_AGENTin the seetings-pyfile:

USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'

Aforementioned solution worked:

if you are using scrapy shell:

scrapy shell -s USER_AGENT=USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' 'http://www.expedia.com'

@anapaulagomes
Copy link

I tried all the suggestions above but still didn't manage to fix this problem.
URL: https://www.diariooficial.feiradesantana.ba.gov.br/

scrapy==2.0.0
Twisted==20.3.0
pyOpenSSL==19.1.0

Any words of wisdom are much appreciated. 🙏

@russian-developer
Copy link
Author

russian-developer commented Jun 13, 2020

I tried all the suggestions above but still didn't manage to fix this problem.
URL: https://www.diariooficial.feiradesantana.ba.gov.br/

scrapy==2.0.0
Twisted==20.3.0
pyOpenSSL==19.1.0

Any words of wisdom are much appreciated. 🙏

@anapaulagomes you have to use TLSv1.0 and RC4-MD5 cihper.
The next command should work in the scraper environment
curl -v --tlsv1.0 --ciphers RC4-MD5 https://www.diariooficial.feiradesantana.ba.gov.br/
You can reach it by compiling the OpenSSL with support SSLv3.

@gbonesso
Copy link

gbonesso commented Oct 6, 2020

I'm having the same problem in the url
https://fnet.bmfbovespa.com.br/fnet/publico/exibirDocumento?id=88001
In my case I just remove the "s" as a workaround and I'm able to scrape the site without using SSL. Still trying the suggestions above to support SSL...
http://fnet.bmfbovespa.com.br/fnet/publico/exibirDocumento?id=88001
@anapaulagomes, maybe this work for Feira de Santana site...
http://www.diariooficial.feiradesantana.ba.gov.br/

@anapaulagomes
Copy link

In my case, the website had changed their protocol but after talking to them (meaning: complaining in public) they changed it again. Thanks, @gbonesso.
Also, before their latest change, we managed to run a Docker image thanks to @Laerte using @unk2k tips. Sharing in case someone is trapped in a problem like this. 👍🏽

@russian-developer
Copy link
Author

I had the same Issue, in my case the solution was to set the USER_AGENTin the seetings-pyfile:

USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'

This issue was about TLS problems. Your problem was about post-TLS connection issue.

@russian-developer
Copy link
Author

Scrapy is very sensitive to OpenSSL version. You also should keep in mind, that python, pyopenssl, cryptography should be compiled with your custom OpenSSL version, even it's not system based version of course.

@russian-developer
Copy link
Author

russian-developer commented Oct 7, 2020

You may use my Dockerfile to avoid problems regarding broken TLS connection.

Dockerfile.base.zip

@wRAR
Copy link
Member

wRAR commented Jan 29, 2023

Closing as there is no single specific problem discussed here, the original issue is no longer reproducible and we have many workarounds some of which were even mentioned.

@wRAR wRAR closed this as not planned Won't fix, can't repro, duplicate, stale Jan 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests