Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrapy 1.1 - exceptions.ValueError: Invalid DNS-ID. #2092

Closed
nealhnguyen opened this issue Jul 5, 2016 · 3 comments
Closed

Scrapy 1.1 - exceptions.ValueError: Invalid DNS-ID. #2092

nealhnguyen opened this issue Jul 5, 2016 · 3 comments
Labels
Milestone

Comments

@nealhnguyen
Copy link

@nealhnguyen nealhnguyen commented Jul 5, 2016

Hello I'm crawling websites with insecure connections. When I try to crawl, I get the stack trace,

2016-07-05 15:50:17 [twisted] CRITICAL: Error during info_callback
Traceback (most recent call last):
  File "c:\python27\lib\site-packages\twisted\protocols\tls.py", line 421, in dataReceived
    self._write(bytes)
  File "c:\python27\lib\site-packages\twisted\protocols\tls.py", line 569, in _write
    sent = self._tlsConnection.send(toSend)
  File "c:\python27\lib\site-packages\OpenSSL\SSL.py", line 1270, in send
    result = _lib.SSL_write(self._ssl, buf, len(buf))
  File "c:\python27\lib\site-packages\OpenSSL\SSL.py", line 933, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
  File "c:\python27\lib\site-packages\twisted\internet\_sslverify.py", line 1154, in infoCallback
    return wrapped(connection, where, ret)
  File "c:\python27\lib\site-packages\scrapy\core\downloader\tls.py", line 45, in _identityVerifyingInfoCallback
    verifyHostname(connection, self._hostnameASCII)
  File "c:\python27\lib\site-packages\service_identity\pyopenssl.py", line 45, in verify_hostname
    obligatory_ids=[DNS_ID(hostname)],
  File "c:\python27\lib\site-packages\service_identity\_common.py", line 245, in __init__
    raise ValueError("Invalid DNS-ID.")
exceptions.ValueError: Invalid DNS-ID.

Is there any way to ignore the invalid certificate?

@redapple redapple added https bug labels Jul 5, 2016
@redapple
Copy link
Contributor

@redapple redapple commented Jul 5, 2016

@nealhnguyen , thanks for reporting.
This exception is not currently caught when verifying certificates, only VerificationError is.
But we can add something to handle this case. And it would help a lot if you could tell us which website this happened with so we can test a patch (you can send me this info by email if you don't want to disclose it)

Until then, you'll have to use a custom DOWNLOADER_CLIENTCONTEXTFACTORY which catches ValueError and ignores it.

@redapple
Copy link
Contributor

@redapple redapple commented Jul 6, 2016

Ok, I'm able to reproduce when using IP addresses directly, for example with https://www.python.org

$ nslookup www.python.org
Non-authoritative answer:
www.python.org  canonical name = python.map.fastly.net.
python.map.fastly.net   canonical name = prod.python.map.fastlylb.net.
Name:   prod.python.map.fastlylb.net
Address: 151.101.12.223

You get this with scrapy 1.1.0:

$ scrapy shell https://151.101.12.223
2016-07-06 11:13:57 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot)
(...)
2016-07-06 11:13:58 [scrapy] INFO: Spider opened
(...)

2016-07-06 11:14:00 [twisted] CRITICAL: Error during info_callback
Traceback (most recent call last):
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/twisted/protocols/tls.py", line 421, in dataReceived
    self._write(bytes)
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/twisted/protocols/tls.py", line 569, in _write
    sent = self._tlsConnection.send(toSend)
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/OpenSSL/SSL.py", line 1270, in send
    result = _lib.SSL_write(self._ssl, buf, len(buf))
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/OpenSSL/SSL.py", line 933, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/twisted/internet/_sslverify.py", line 1154, in infoCallback
    return wrapped(connection, where, ret)
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/scrapy/core/downloader/tls.py", line 45, in _identityVerifyingInfoCallback
    verifyHostname(connection, self._hostnameASCII)
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/service_identity/pyopenssl.py", line 45, in verify_hostname
    obligatory_ids=[DNS_ID(hostname)],
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/service_identity/_common.py", line 245, in __init__
    raise ValueError("Invalid DNS-ID.")
builtins.ValueError: Invalid DNS-ID.

2016-07-06 11:14:00 [scrapy] DEBUG: Gave up retrying <GET https://151.101.12.223> (failed 3 times): [<twisted.python.failure.Failure builtins.ValueError: Invalid DNS-ID.>]
Traceback (most recent call last):
  File "/home/paul/.virtualenvs/scrapy11/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/scrapy/cmdline.py", line 142, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/scrapy/cmdline.py", line 88, in _run_print_help
    func(*a, **kw)
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/scrapy/cmdline.py", line 149, in _run_command
    cmd.run(args, opts)
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/scrapy/commands/shell.py", line 71, in run
    shell.start(url=url)
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/scrapy/shell.py", line 47, in start
    self.fetch(url, spider)
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/scrapy/shell.py", line 112, in fetch
    reactor, self._schedule, request, spider)
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "/home/paul/.virtualenvs/scrapy11/lib/python3.5/site-packages/twisted/python/failure.py", line 368, in raiseException
    raise self.value.with_traceback(self.tb)
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure builtins.ValueError: Invalid DNS-ID.>]

Compared to this with scrapy 1.0.6

$ scrapy shell https://151.101.12.223
2016-07-06 11:14:49 [scrapy] INFO: Scrapy 1.0.6 started (bot: scrapybot)
(...)
2016-07-06 11:14:56 [scrapy] INFO: Spider opened
Error during info_callback
Traceback (most recent call last):
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/twisted/protocols/tls.py", line 421, in dataReceived
    self._write(bytes)
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/twisted/protocols/tls.py", line 569, in _write
    sent = self._tlsConnection.send(toSend)
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1270, in send
    result = _lib.SSL_write(self._ssl, buf, len(buf))
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/OpenSSL/SSL.py", line 933, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1154, in infoCallback
    return wrapped(connection, where, ret)
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1253, in _identityVerifyingInfoCallback
    verifyHostname(connection, self._hostnameASCII)
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 45, in verify_hostname
    obligatory_ids=[DNS_ID(hostname)],
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/service_identity/_common.py", line 245, in __init__
    raise ValueError("Invalid DNS-ID.")
exceptions.ValueError: Invalid DNS-ID.

2016-07-06 11:14:56 [twisted] CRITICAL: Error during info_callback
Traceback (most recent call last):
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/twisted/protocols/tls.py", line 421, in dataReceived
    self._write(bytes)
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/twisted/protocols/tls.py", line 569, in _write
    sent = self._tlsConnection.send(toSend)
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1270, in send
    result = _lib.SSL_write(self._ssl, buf, len(buf))
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/OpenSSL/SSL.py", line 933, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1154, in infoCallback
    return wrapped(connection, where, ret)
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1253, in _identityVerifyingInfoCallback
    verifyHostname(connection, self._hostnameASCII)
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 45, in verify_hostname
    obligatory_ids=[DNS_ID(hostname)],
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/service_identity/_common.py", line 245, in __init__
    raise ValueError("Invalid DNS-ID.")
exceptions.ValueError: Invalid DNS-ID.

From cffi callback <function infoCallback at 0x7f2bacb7d6e0>:
Traceback (most recent call last):
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/OpenSSL/SSL.py", line 933, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1158, in infoCallback
    connection.get_app_data().failVerification(f)
AttributeError: 'NoneType' object has no attribute 'failVerification'
2016-07-06 11:14:56 [scrapy] DEBUG: Retrying <GET https://151.101.12.223> (failed 1 times): 500 Internal Server Error
2016-07-06 11:14:56 [scrapy] DEBUG: Retrying <GET https://151.101.12.223> (failed 2 times): 500 Internal Server Error
2016-07-06 11:14:56 [scrapy] DEBUG: Gave up retrying <GET https://151.101.12.223> (failed 3 times): 500 Internal Server Error
2016-07-06 11:14:56 [scrapy] DEBUG: Crawled (500) <GET https://151.101.12.223> (referer: None)
[s] Available Scrapy objects:
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f2bb5f57090>
[s]   item       {}
[s]   request    <GET https://151.101.12.223>
[s]   response   <500 https://151.101.12.223>
[s]   settings   <scrapy.settings.Settings object at 0x7f2bae416a10>
[s]   spider     <DefaultSpider 'default' at 0x7f2bacb6b190>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser
2016-07-06 11:15:03 [root] DEBUG: Using default logger
2016-07-06 11:15:03 [root] DEBUG: Using default logger
@nealhnguyen
Copy link
Author

@nealhnguyen nealhnguyen commented Jul 6, 2016

Wow thanks so much for answering so quickly. Unfortunately the website requires you to be logged on to a specific server of which I utilize the IP address to aces. But if creating a custom DOWNLOADER_CLIENTCONTEXTFACTORY solves this issue, then I'll try that.

@redapple redapple added this to the v1.1.1 milestone Jul 13, 2016
@kmike kmike closed this in #2094 Jul 13, 2016
redapple added a commit to redapple/scrapy that referenced this issue Jul 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.