Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch and ignore certification verification exception for IP-address hosts #2094

Merged
merged 3 commits into from Jul 13, 2016

Conversation

@redapple
Copy link
Contributor

@redapple redapple commented Jul 6, 2016

Fixes GH-2092

FTR, this is what Twisted 16.3.0 gives with default Agent (for www.google.com at 216.58.208.132)

$ python
Python 2.7.9 (default, Apr  2 2015, 15:33:21) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from twisted.internet import reactor
>>> from twisted.web.client import Agent
>>> from twisted.web.http_headers import Headers
>>> 
>>> agent = Agent(reactor)
>>> 
>>> d = agent.request(
...     'GET',
...     'https://216.58.208.132:443/',
...     Headers({'User-Agent': ['Twisted Web Client Example']}),
...     None)
>>> 
>>> 
>>> def cbResponse(ignored):
...     print('Response received')
... 
>>> 
>>> d.addCallback(cbResponse)
<Deferred at 0x7fc71fd89e18>
>>> 
>>> def cbShutdown(ignored):
...     reactor.stop()
... 
>>> 
>>> d.addBoth(cbShutdown)
<Deferred at 0x7fc71fd89e18>
>>> 
>>> reactor.run()
Error during info_callback
Traceback (most recent call last):
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/twisted/protocols/tls.py", line 423, in dataReceived
    self._write(bytes)
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/twisted/protocols/tls.py", line 571, in _write
    sent = self._tlsConnection.send(toSend)
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1270, in send
    result = _lib.SSL_write(self._ssl, buf, len(buf))
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/OpenSSL/SSL.py", line 933, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1154, in infoCallback
    return wrapped(connection, where, ret)
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1253, in _identityVerifyingInfoCallback
    verifyHostname(connection, self._hostnameASCII)
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 45, in verify_hostname
    obligatory_ids=[DNS_ID(hostname)],
  File "/home/paul/.virtualenvs/scrapy10/local/lib/python2.7/site-packages/service_identity/_common.py", line 245, in __init__
    raise ValueError("Invalid DNS-ID.")
exceptions.ValueError: Invalid DNS-ID.

>>> 

And curl's default (secure, i.e. without -k):

$ curl -v https://216.58.208.132:443
* Rebuilt URL to: https://216.58.208.132:443/
*   Trying 216.58.208.132...
* Connected to 216.58.208.132 (216.58.208.132) port 443 (#0)
* found 174 certificates in /etc/ssl/certs/ca-certificates.crt
* found 708 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*    server certificate verification OK
*    server certificate status verification SKIPPED
* SSL: certificate subject name (www.google.com) does not match target host name '216.58.208.132'
* Closing connection 0
curl: (51) SSL: certificate subject name (www.google.com) does not match target host name '216.58.208.132'
@codecov-io
Copy link

@codecov-io codecov-io commented Jul 6, 2016

Current coverage is 83.37%

Merging #2094 into master will increase coverage by 0.03%

Powered by Codecov. Last updated by 441df4c...005cf94

@@ -48,6 +48,11 @@ def _identityVerifyingInfoCallback(self, connection, where, ret):
'Remote certificate is not valid for hostname "{}"; {}'.format(
self._hostnameASCII, e))

except ValueError as e:

This comment has been minimized.

@kmike

kmike Jul 6, 2016
Member

The comment after ScrapyClientTLSOptions definition should be updated.

except ValueError as e:
logger.warning(
'SSL/TLS verification failed for hostname "{}"; {}'.format(
self._hostnameASCII, e))

This comment has been minimized.

@kmike

kmike Jul 6, 2016
Member

It looks like the problem is that ip address is not a hostname, and Twisted passes it to verify_hostname.

So verification is failed, but it is not failed because SSL/TLS algorithm tells it should fail, but arguably because of a twisted bug (or service_identity inflexibility).

This comment has been minimized.

@redapple

redapple Jul 6, 2016
Author Contributor

It's correct that the trouble comes from passing the IP address to verify_hostname.
SubjectAltName apparently can include IP addresses but we need to check what happens when remote certificate does include it, matching what the client connected to.
The error with scrapy 1.0.6 looks identical but the agent implementation made it "work".
This PR is definitely not pretty.
Would you suggest coming up with a proper verification implementation?

This comment has been minimized.

@kmike

kmike Jul 7, 2016
Member

I'm not sure it is Scrapy's job to implement verification - this code is better fit in Twisted or service_identity. I think we should figure out where this bug belongs, and document it as a bug, not as SSL/TLS verification failure.

This comment has been minimized.

@redapple

redapple Jul 7, 2016
Author Contributor

Makes sense.

This comment has been minimized.

@kmike

kmike Jul 7, 2016
Member

I didn't mean we shouldn't catch this error; maybe just a log message should be different :)

This comment has been minimized.

@redapple

redapple Jul 8, 2016
Author Contributor

alright, what message do you suggest?

This comment has been minimized.

@redapple

redapple Jul 11, 2016
Author Contributor

@kmike , how about 'Ignoring remote certificate verification failure for hostname "{}"; {}'.format(self._hostnameASCII, e)), without the SSL mention?

This comment has been minimized.

@kmike

kmike Jul 12, 2016
Member

@redapple I have 2 reservations about the error message:

  1. 'certificate verification failure' can be read as 'wrong certificate' or a 'misconfigured remote website' or 'wrong configuration of local verification software'. But here it is not a bad certificate or old openssl or a problem with a remote website, it is just a bug in software; because of a bug the verification is not even attempted. I don't know how to express that well in error message though.
  2. it says 'hostname {}'.format(self._hostnameASCII), but _hostnameASCII is not a hostname here - it is an IP, and this is precisely what causes the error.

I think we shouldn't describe the actual bug in the error message because ValueError is too broad - it could happen other issue will cause ValueError in future, or maybe it can be triggered now by something else.

What about "Ignoring certificate verification error for {}; {}".format(self._hostnameASCII, e)? It doesn't fully address (1).

This comment has been minimized.

@redapple

redapple Jul 12, 2016
Author Contributor

OK.
What about "host" or "netloc"?
And something like "Ignoring error while verifying certificate from host {}
(exception: {})" maybe,
with target host and exception
Le 13 juil. 2016 12:28 AM, "Mikhail Korobov" notifications@github.com a
écrit :

In scrapy/core/downloader/tls.py
#2094 (comment):

@@ -48,6 +48,11 @@ def _identityVerifyingInfoCallback(self, connection, where, ret):
'Remote certificate is not valid for hostname "{}"; {}'.format(
self._hostnameASCII, e))

  •            except ValueError as e:
    
  •                logger.warning(
    
  •                    'SSL/TLS verification failed for hostname "{}"; {}'.format(
    
  •                        self._hostnameASCII, e))
    

@redapple https://github.com/redapple I have 2 reservations about the
error message:

  1. 'certificate verification failure' can be read as 'wrong
    certificate' or a 'misconfigured remote website' or 'wrong configuration of
    local verification software'. But here it is not a bad certificate or old
    openssl or a problem with a remote website, it is just a bug in software;
    because of a bug the verification is not even attempted. I don't know how
    to express that well in error message though.
  2. it says 'hostname {}'.format(self._hostnameASCII), but
    _hostnameASCII is not a hostname here - it is an IP, and this is precisely
    what causes the error.

I think we shouldn't describe the actual bug in the error message because
ValueError is too broad - it could happen other issue will cause ValueError
in future, or maybe it can be triggered now by something else.

What about "Ignoring certificate verification error for {};
{}".format(self._hostnameASCII, e)? It doesn't fully address (1).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scrapy/scrapy/pull/2094/files/37efdde3e3dd4efecf7dccd361fceab6c851d0ff#r70534245,
or mute the thread
https://github.com/notifications/unsubscribe/AA2GGMc4yG_0sC7jmUU2LrN5jkjsGJLGks5qVBUMgaJpZM4JGBi-
.

This comment has been minimized.

@kmike

kmike Jul 12, 2016
Member

Sounds good 👍
Sorry for holding it back!

@redapple redapple changed the title Catch and ignore TLS verification exception for IP-address hosts Catch and ignore certification verification exception for IP-address hosts Jul 13, 2016
@redapple redapple added this to the v1.1.1 milestone Jul 13, 2016
@kmike kmike merged commit 2dd1a9e into scrapy:master Jul 13, 2016
2 checks passed
2 checks passed
codecov/patch 100% of diff hit (target 100%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@redapple redapple removed this from the v1.1.1 milestone Jul 13, 2016
redapple added a commit that referenced this pull request Jul 13, 2016
[backport][1.1] Catch and ignore certification verification exception for IP-address hosts (PR #2094)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants