Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New SSL module doesn't seem to verify hostname against commonName in certificate #45930

Closed
ahasenack mannequin opened this issue Dec 11, 2007 · 47 comments
Closed

New SSL module doesn't seem to verify hostname against commonName in certificate #45930

ahasenack mannequin opened this issue Dec 11, 2007 · 47 comments
Labels
stdlib Python modules in the Lib dir type-security A security issue

Comments

@ahasenack
Copy link
Mannequin

ahasenack mannequin commented Dec 11, 2007

BPO 1589
Nosy @jcea, @orsenthil, @pitrou, @giampaolo, @merwok
Files
  • verisign-inc-class-3-public-primary.pem
  • sslcheck.patch
  • sslcheck2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-10-08.10:38:19.891>
    created_at = <Date 2007-12-11.15:41:03.176>
    labels = ['type-security', 'library']
    title = "New SSL module doesn't seem to verify hostname against commonName in certificate"
    updated_at = <Date 2012-02-25.10:20:25.171>
    user = 'https://bugs.python.org/ahasenack'

    bugs.python.org fields:

    activity = <Date 2012-02-25.10:20:25.171>
    actor = 'Thomas.Leonard'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-10-08.10:38:19.891>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2007-12-11.15:41:03.176>
    creator = 'ahasenack'
    dependencies = []
    files = ['8924', '19128', '19141']
    hgrepos = []
    issue_num = 1589
    keywords = ['patch']
    message_count = 47.0
    messages = ['58434', '58444', '58472', '58491', '58508', '58535', '58547', '71405', '71443', '71586', '71682', '72574', '72935', '73006', '73036', '107895', '117606', '117613', '117635', '117636', '117639', '117640', '117641', '117643', '117647', '117936', '117942', '117963', '117968', '117970', '117972', '118068', '118073', '118076', '118080', '118084', '118086', '118176', '118841', '118844', '120107', '120122', '120920', '120945', '120946', '120947', '154230']
    nosy_count = 18.0
    nosy_names = ['jcea', 'zooko', 'janssen', 'orsenthil', 'pitrou', 'techtonik', 'giampaolo.rodola', 'vila', 'heikki', 'ahasenack', 'kiilerix', 'eric.araujo', 'debatem1', 'jsamuel', 'devin', 'asdfasdfasdfasdfasdfasdfasdf', 'Ryan.Tucker', 'Thomas.Leonard']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue1589'
    versions = ['Python 3.2']

    @ahasenack
    Copy link
    Mannequin Author

    ahasenack mannequin commented Dec 11, 2007

    (I hope I used the correct component for this report)

    http://pypi.python.org/pypi/ssl/

    I used the client example shown at
    http://docs.python.org/dev/library/ssl.html#client-side-operation to
    connect to a bank site called www.realsecureweb.com.br at
    200.208.16.101. Its certificate signed by verisign. My OpenSSL has this
    CA at /etc/pki/tls/rootcerts/verisign-inc-class-3-public-primary.pem.
    The verification works.

    If I make up a hostname called something else, like "wwws", and place it
    in /etc/hosts pointing to that IP address, the SSL connection should not
    be established because that name doesn't match the common name field in
    the server certificate. But the SSL module happily connects to it
    (excerpt below):

    cert = verisign-inc-class-3-public-primary.pem
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    ssl_sock = ssl.wrap_socket(s,
               ca_certs="/etc/pki/tls/rootcerts/%s" % cert,
               cert_reqs=ssl.CERT_REQUIRED)
    ssl_sock.connect(('wwws', 443))
    print repr(ssl_sock.getpeername())

    output:
    ('200.208.16.101', 443)
    ('RC4-MD5', 'TLSv1/SSLv3', 128)
    {'notAfter': 'Sep 10 23:59:59 2008 GMT',
    'subject': ((('countryName', u'BR'),),
    (('stateOrProvinceName', u'Sao Paulo'),),
    (('localityName', u'Sao Paulo'),),
    (('organizationName', u'Banco ABN AMRO Real SA'),),
    (('organizationalUnitName', u'TI Internet PF e PJ'),),
    (('commonName', u'www.realsecureweb.com.br'),))}

    If I now open, say, a firefox window and point it to "https://wwws", it
    gives me the expected warning that the hostname doesn't match the
    certificate.

    I'll attach the verisign CA certificate to make it easier to reproduce
    the error.

    @ahasenack ahasenack mannequin added stdlib Python modules in the Lib dir type-security A security issue labels Dec 11, 2007
    @gvanrossum
    Copy link
    Member

    gvanrossum commented Dec 11, 2007

    Bill, can you respond?

    @janssen
    Copy link
    Mannequin

    janssen mannequin commented Dec 11, 2007

    Unfortunately, hostname matching is one of those ideas that seemed
    better when it was thought up than it actually proved to be in practice.
    I've had extensive experience with this, and have found it to almost
    always an application-specific decision. I thought about this when
    designing the client-side verification, and couldn't see any automatic
    solution that doesn't get in the way. So the right way to do this with
    Python is to write some application code that looks at the verified
    identity and makes decisions based on whatever authentication algorithm
    you need.

    @ahasenack
    Copy link
    Mannequin Author

    ahasenack mannequin commented Dec 12, 2007

    At the least it should be made clear in the documentation that the
    hostname is not checked against the commonName nor the subjectAltName
    fields of the server certificate. And add some sample code to the
    documentation for doing a simple check. Something like this, to illustrate:

    def get_subjectAltName(cert):
            if not cert.has_key('subjectAltName'):
                    return []
            ret = []
            for rdn in cert['subjectAltName']:
                    if rdn[0].lower() == 'dns' or rdn[0][:2].lower() == 'ip':
                            ret.append(rdn[1])
            return ret
    
    def get_commonName(cert):
            if not cert.has_key('subject'):
                    return []
            ret = []
            for rdn in cert['subject']:
                    if rdn[0][0].lower() == 'commonname':
                            ret.append(rdn[0][1])
            return ret
    
    
    def verify_hostname(cert, host):
            cn = get_commonName(cert)
            san = get_subjectAltName(cert)
            return (host in cn) or (host in san)

    @janssen
    Copy link
    Mannequin

    janssen mannequin commented Dec 12, 2007

    Yes, I think that's reasonable. And for pseudo-standards like https, which
    calls for this, the implementation in the standard library should attempt to
    do it automatically. Unfortunately, that means that client-side certificate
    verification has to be done (it's pointless to look at the data in
    unverified certificates), and that means that the client software has to
    have an appropriate collection of root certificates to verify against. I
    think there's an argument for adding a registry of root certificates to the
    SSL module, just a module-level variable that the application can bind to a
    filename of a file containing their collection of certificates. If it's
    non-None, the https code would use it to verify the certificate, then use
    the commonName in the subject field to check against the hostname in the
    URL. If it's None, the check would be skipped.

    Bill

    On Dec 12, 2007 4:48 AM, Andreas Hasenack <report@bugs.python.org> wrote:

    Andreas Hasenack added the comment:

    At the least it should be made clear in the documentation that the
    hostname is not checked against the commonName nor the subjectAltName
    fields of the server certificate. And add some sample code to the
    documentation for doing a simple check. Something like this, to
    illustrate:

    def get_subjectAltName(cert):
    if not cert.has_key('subjectAltName'):
    return []
    ret = []
    for rdn in cert['subjectAltName']:
    if rdn[0].lower() == 'dns' or rdn[0][:2].lower() == 'ip':
    ret.append(rdn[1])
    return ret

    def get_commonName(cert):
    if not cert.has_key('subject'):
    return []
    ret = []
    for rdn in cert['subject']:
    if rdn[0][0].lower() == 'commonname':
    ret.append(rdn[0][1])
    return ret

    def verify_hostname(cert, host):
    cn = get_commonName(cert)
    san = get_subjectAltName(cert)
    return (host in cn) or (host in san)


    Tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1589\>


    @ahasenack
    Copy link
    Mannequin Author

    ahasenack mannequin commented Dec 13, 2007

    do it automatically. Unfortunately, that means that client-side
    certificate
    verification has to be done (it's pointless to look at the data in
    unverified certificates), and that means that the client software has to
    have an appropriate collection of root certificates to verify against. I

    But the current API already has this feature:
    ssl_sock = ssl.wrap_socket(s, ca_certs="/etc/pki/tls/rootcerts/%s" % cert,
    cert_reqs=ssl.CERT_REQUIRED)

    So this is already taken care of with ca_certs and cert_reqs, right?

    @janssen
    Copy link
    Mannequin

    janssen mannequin commented Dec 13, 2007

    The mechanism is there for direct use of the SSL module, yes. But the
    question is, what should indirect usage, like the httplib or urllib modules,
    do? If they are going to check hostnames on use of an https: URL, they need
    some way to pass a ca_certs file through to the SSL code they use.

    Bill

    On Dec 13, 2007 7:14 AM, Andreas Hasenack <report@bugs.python.org> wrote:

    Andreas Hasenack added the comment:

    > do it automatically. Unfortunately, that means that client-side
    certificate
    > verification has to be done (it's pointless to look at the data in
    > unverified certificates), and that means that the client software has to
    > have an appropriate collection of root certificates to verify against.
    I

    But the current API already has this feature:
    ssl_sock = ssl.wrap_socket(s, ca_certs="/etc/pki/tls/rootcerts/%s" % cert,
    cert_reqs=ssl.CERT_REQUIRED)

    So this is already taken care of with ca_certs and cert_reqs, right?


    Tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1589\>


    @heikki
    Copy link
    Mannequin

    heikki mannequin commented Aug 19, 2008

    I would definitely recommend providing as strict as possible hostname
    verification in the stdlib, but provide application developers a way to
    override that.

    M2Crypto (and TLS Lite, from which I copied the approach to M2Crypto),
    provide a default post connection checker. See
    http://svn.osafoundation.org/m2crypto/trunk/M2Crypto/SSL/Connection.py
    and the set_post_connection_check_callback() as well as
    http://svn.osafoundation.org/m2crypto/trunk/M2Crypto/SSL/Checker.py.

    @janssen
    Copy link
    Mannequin

    janssen mannequin commented Aug 19, 2008

    Nope. Hostname verification was never a good idea -- the "hostname" is
    just a vague notion, at best -- lots of hostnames can map to one or more
    IP addresses of the server. It's exposed to the application code, so if
    a client application wants to do it, it can. But I recommend against
    it. It's a complication that doesn't belong in the basic support, that
    is, the SSL module. I'll add a note to this effect in the documentation.

    @janssen janssen mannequin closed this as completed Aug 19, 2008
    @heikki
    Copy link
    Mannequin

    heikki mannequin commented Aug 20, 2008

    I would think most people/applications want to know to which host they
    are talking to. The reason I am advocating adding a default check to the
    stdlib is because this is IMO important for security, and it is easy to
    get it wrong (I don't think I have it 100% correct in M2Crypto either,
    although I believe it errs on the side of caution). I believe it would
    be a disservice to ship something that effectively teaches developers to
    ignore security (like the old socket.ssl does).

    A TLS extension also allows SSL vhosts, so static IPs are no longer
    strictly necessary (this is not universally supported yet, though).

    @janssen
    Copy link
    Mannequin

    janssen mannequin commented Aug 21, 2008

    checking hostnames is false security, not real security.

    On 8/20/08, Heikki Toivonen <report@bugs.python.org> wrote:

    Heikki Toivonen <hjtoi-bugzilla@comcast.net> added the comment:

    I would think most people/applications want to know to which host they
    are talking to. The reason I am advocating adding a default check to the
    stdlib is because this is IMO important for security, and it is easy to
    get it wrong (I don't think I have it 100% correct in M2Crypto either,
    although I believe it errs on the side of caution). I believe it would
    be a disservice to ship something that effectively teaches developers to
    ignore security (like the old socket.ssl does).

    A TLS extension also allows SSL vhosts, so static IPs are no longer
    strictly necessary (this is not universally supported yet, though).


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1589\>


    @heikki
    Copy link
    Mannequin

    heikki mannequin commented Sep 5, 2008

    Could you clarify your comment regarding hostname check being false
    security?

    Just about all SSL texts I have read say you must do that, and that is
    what your web browser and email client does to ensure it is talking to
    the right host, for example. Without that check you are subject to a man
    in the middle attack. Or is there some other check you perform that is
    better?

    @janssen
    Copy link
    Mannequin

    janssen mannequin commented Sep 10, 2008

    Sorry to be so brief there -- I was off on vacation.

    Verifying hostnames is a prescription that someone (well, OK, Eric
    Rescorla, who knows what he's talking about) put in the https IETF RFC
    (which, by the way, is only an informational RFC, not standards-track).
    It's a good idea if you're a customer trying to talk to Wells-Fargo,
    say, over an https connection, but isn't suitable for all https traffic.
    I support putting it in the httplib Https class by default, but there
    should be a way to override it, as there is with the Java APIs for https
    connections. (Take a look at javax.net.ssl.HostnameVerifier; one of the
    more popular Java classes at PARC is a version of this that verifies any
    hostname).

    So what's wrong with it? There are two problems. The first is that
    certificates for services are all about the hostname, and that's just
    wrong. You should verify the specific service, not just the hostname.
    So a client that really cares about what they are talking to should
    have the certificate for that service, and verify that it is the service
    it's talking to, and ignore the hostname in the URL.

    But the larger problem is that hostnames are a DNS construct for humans,
    and not really well supported on computers, or by the services that run
    on those computers. Most computers have only the haziest notion of what
    their hostname is, and many have lots of different hostnames (my laptop
    has at least five hostnames that I know of, all meaning the same
    computer, but with five different PARC IP addresses). So the services
    running on that computer aren't real clear about their hostnames,
    either. If I run a service on that computer that I secure with SSL, so
    that packets going over my WiFi are encrypted, which hostname should
    that service declare itself to be in the certificate? And the services
    on that computer keep running, even when it switches its IP address (and
    thus its set of hostnames). So doing hostname matching provokes lots of
    false negatives, especially when it's not needed. I think it by and
    large isn't a good idea, though I support having it (in an overrideable
    form) for the client-side https class in httplib.

    This is all exacerbated by the fact that HTTP isn't what it was when
    Eric wrote that RFC eight years ago. The growth of Web 2.0 and
    "RESTful" services means that lots of new things are using https in a
    much less formal way, more to get encrypted packets than to verify
    endpoints. So false negatives caused by mindless hostname verification
    cause real damage.

    @heikki
    Copy link
    Mannequin

    heikki mannequin commented Sep 11, 2008

    Ok, thank you for clarifications. Now I understand why the hostname
    checking isn't the solution that fits every problem. I am still not
    completely clear how you'd do the checking otherwise, for example to
    verify the service you are talking to is what you think it is.

    But still, I think dealing with email servers is another common use case
    where hostname check is adequate most of the time. I am sure there are
    other cases like this. Therefore I am still of the opinion that the
    default should be to do the hostname check. Yes, make it overridable,
    but doing the check is safer than not doing any checking IMO because
    even if the check is incorrect for a certain purpose the developer is
    likely to notice an error quickly and inclined to do some other security
    check instead of not doing anything and thinking they have a secure system.

    If you want to continue the discussion, we should maybe take this to
    some other forum, like comp.lang.python.

    @janssen
    Copy link
    Mannequin

    janssen mannequin commented Sep 11, 2008

    I think that, where it's appropriate, you can do that. Just don't put it in
    the SSL module.

    Bill

    On Wed, Sep 10, 2008 at 11:24 PM, Heikki Toivonen <report@bugs.python.org>wrote:

    Heikki Toivonen <hjtoi-bugzilla@comcast.net> added the comment:

    Ok, thank you for clarifications. Now I understand why the hostname
    checking isn't the solution that fits every problem. I am still not
    completely clear how you'd do the checking otherwise, for example to
    verify the service you are talking to is what you think it is.

    But still, I think dealing with email servers is another common use case
    where hostname check is adequate most of the time. I am sure there are
    other cases like this. Therefore I am still of the opinion that the
    default should be to do the hostname check. Yes, make it overridable,
    but doing the check is safer than not doing any checking IMO because
    even if the check is incorrect for a certain purpose the developer is
    likely to notice an error quickly and inclined to do some other security
    check instead of not doing anything and thinking they have a secure system.

    If you want to continue the discussion, we should maybe take this to
    some other forum, like comp.lang.python.


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1589\>


    @pitrou
    Copy link
    Member

    pitrou commented Jun 15, 2010

    Reopening. I think it would be nice to provide the appropriate convenience function(s) as part of the ssl module, even if the user has to call them explicitly.

    @pitrou pitrou reopened this Jun 15, 2010
    @orsenthil
    Copy link
    Member

    orsenthil commented Sep 29, 2010

    Removed this message by mistake.

    Author ahasenack
    Date 2007-12-11.21:11:53

    Ups, typo in the script:
    cert = "verisign-inc-class-3-public-primary.pem"

    @asdfasdfasdfasdfasdfasdfasdf
    Copy link
    Mannequin

    asdfasdfasdfasdfasdfasdfasdf mannequin commented Sep 29, 2010

    Welcome to 2010.
    SSL shouldn't be difficult to use anymore or support in python applications. But yet, until the changes in http://bugs.python.org/issue9983 was fixed python devs were using modules without any warning of the security implications. pycurl works ... but a *LOT* of coders are not using pycurl.

    Today they are still getting it wrong and are still vulnerable to mitm attacks against https on the client side.

    I have an example in fairly large open source project:
    bzr --> (by default due to a dependency failure ... on not depending on pycurl).
    https://bugs.edge.launchpad.net/ubuntu/+source/checkbox/+bug/625076

    Less large:
    libcloud http://github.com/apache/libcloud/issues/issue/2
    linode-python http://github.com/tjfontaine/linode-python/issues/issue/1

    I would *very* much like to see these methods fixed by default.
    You can talk about how the ssl protocol is not secure because of ca's handling certificates poorly, but until you *actually* perform proper validation you cannot say these things imho.

    I can keep on looking at python projects and reporting these issues but it is really easy, just look at anything that says and is important that mitm isn't possible against it -> then check the deps. in ubuntu /debian and pick the ones that don't use pycurl, check they don't validate the common name etc. and then you have a bunch of mitm'able apps probably ;)

    @zooko
    Copy link
    Mannequin

    zooko mannequin commented Sep 29, 2010

    This appears to be a concern for some people. Maybe the builtin ssl module should be deprecated if there isn't a lot of manpower to maintain it and instead the well-maintained pyOpenSSL package should become the recommended tool?

    Here is a letter that I just received, in my role as a developer of Tahoe-LAFS, from a concerned coder who doesn't know much about Python:

    An FYI on Python.

    I'm not sure how businesses handle this (I've always worked in Windows
    shops), but I imagine some might consider pulling Python until it is
    properly secured. Pulling Python might affect Tahoe, which I would
    like to see do well.

    Here is my reply to him:

    Thanks for the note warning me about this issue! I appreciate it.

    The Tahoe-LAFS project doesn't use the builtin "ssl" module that comes
    with the Python Standard Library and instead uses the separate
    pyOpenSSL package (and uses the separate Twisted package for HTTP and
    other networking protocols). Therefore this isn't an issue for
    Tahoe-LAFS. I agree that it is potentially a "marketing" issue in that
    people might mistakenly think that Tahoe-LAFS is vulnerable or might,
    as you suggest, blacklist Python as such and thus hit Tahoe-LAFS as
    collateral damage. There's not much I can do about that from the
    perspective of a Tahoe-LAFS developer. From the perspective of
    contributor to Python, I'm also not sure what to do, except perhaps to
    complain. :-) I guess I'll try to stir the waters a bit by suggesting
    that Python should deprecate the builtin "ssl" module and recommend
    the pyOpenSSL package instead.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 29, 2010

    Here is a letter that I just received, in my role as a developer of
    Tahoe-LAFS, from a concerned coder who doesn't know much about Python:

    > An FYI on Python.
    >
    > I'm not sure how businesses handle this (I've always worked in
    Windows
    > shops), but I imagine some might consider pulling Python until it is
    > properly secured. Pulling Python might affect Tahoe, which I would
    > like to see do well.

    That sounds like an inventively outrageous kind of FUD. It's the first
    time I hear of someone writing to third-party library authors in order
    to pressure them to pressure the maintainers of a programming language
    implementation to make some "decisions".

    By the way, if "businesses" are really concerned about the security
    problems induced by this issue, they can sponsor the effort to get the
    bug fixed. It shouldn't be a lot of work.

    This appears to be a concern for some people. Maybe the builtin ssl
    module should be deprecated if there isn't a lot of manpower to
    maintain it and instead the well-maintained pyOpenSSL package should
    become the recommended tool?

    Correct me if I'm wrong, but the "well-maintained pyOpenSSL package"
    doesn't have the missing functionality (hostname checking in server
    certificates), either. M2Crypto has it, though.

    @devin
    Copy link
    Mannequin

    devin mannequin commented Sep 29, 2010

    Correct me if I'm wrong, but the "well-maintained pyOpenSSL
    package" doesn't have the missing functionality (hostname
    checking in server certificates), either.

    I'm pretty sure it's just a wrapper around the openssl library, which does not include it. That was Bill Janssen's argument for why the ssl module shouldn't do that verification. Well, that and the fact that there's no finalized standard for it yet. I believe this is the latest draft:
    http://tools.ietf.org/html/draft-saintandre-tls-server-id-check-09

    @debatem1
    Copy link
    Mannequin

    debatem1 mannequin commented Sep 29, 2010

    On Wed, Sep 29, 2010 at 11:34 AM, Antoine Pitrou <report@bugs.python.org> wrote:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    > Here is a letter that I just received, in my role as a developer of
    > Tahoe-LAFS, from a concerned coder who doesn't know much about Python:
    >
    > > An FYI on Python.
    > >
    > > I'm not sure how businesses handle this (I've always worked in
    > Windows
    > > shops), but I imagine some might consider pulling Python until it is
    > > properly secured. Pulling Python might affect Tahoe, which I would
    > > like to see do well.

    That sounds like an inventively outrageous kind of FUD. It's the first
    time I hear of someone writing to third-party library authors in order
    to pressure them to pressure the maintainers of a programming language
    implementation to make some "decisions".

    Not to add fuel to the fire, but I've had a user report this behavior
    as a bug as well, so this isn't entirely outside the scope of
    plausibility to me.

    By the way, if "businesses" are really concerned about the security
    problems induced by this issue, they can sponsor the effort to get the
    bug fixed. It shouldn't be a lot of work.

    What would the approximate cost on that be, do you think? My
    understanding was that the code was pretty much written given John
    Nagle's patch and M2Crypto.

    Geremy Condra

    @pitrou
    Copy link
    Member

    pitrou commented Sep 29, 2010

    > Correct me if I'm wrong, but the "well-maintained pyOpenSSL
    > package" doesn't have the missing functionality (hostname
    > checking in server certificates), either.

    I'm pretty sure it's just a wrapper around the openssl library, which
    does not include it. That was Bill Janssen's argument for why the ssl
    module shouldn't do that verification. Well, that and the fact that
    there's no finalized standard for it yet. I believe this is the latest
    draft:
    http://tools.ietf.org/html/draft-saintandre-tls-server-id-check-09

    Well, to be clear, it shouldn't be done *automatically*. But providing a
    helper function that implements the feature and lets higher layers like
    http.client and urllib.request call it if desired would be more than
    reasonable.

    (openssl may not provide such a function, but gnutls does, by the way)

    @pitrou
    Copy link
    Member

    pitrou commented Sep 29, 2010

    What would the approximate cost on that be, do you think? My
    understanding was that the code was pretty much written given John
    Nagle's patch and M2Crypto.

    To err on the safe side and account for integration work (unit tests,
    coding style, and use in http.client / urllib), I would say a couple of
    days. Also because it's rather boring code :-)

    (but, don't assume that urllib will then be secure by default; Python
    doesn't ship with CA certificates, so existing code will still need a
    bit of work to activate cert validation and pass the location of the
    system's CA certs)

    @asdfasdfasdfasdfasdfasdfasdf
    Copy link
    Mannequin

    asdfasdfasdfasdfasdfasdfasdf mannequin commented Sep 29, 2010

    imho it would be nice to be 'secure by default' in say the next python stable releases... (or perhaps only 3.X ? ).

    @kiilerix
    Copy link
    Mannequin

    kiilerix mannequin commented Oct 4, 2010

    I added some extra verification to Mercurial (http://www.selenic.com/hg/rev/f2937d6492c5). Feel free to use the following under the Python license in Python or elsewhere. It could be a separate method/function or it could integrated in wrap_socket and controlled by a keyword. I would appreciate if you find the implementation insufficient or incorrect.

    The purpose with this function is to verify if the received and validated certificate matches the host we intended to connect to.

    I try to keep it simple and to fail to the safe side.

    "Correct" subjectAltName handling seems not to be feasible.

    Are CRLs checked by the SSL module? Otherwise it deserves a big fat warning.

    (I now assume that notBefore is handled by the SSL module and shouldn't be checked here.)

    def _verifycert(cert, hostname):
        '''Verify that cert (in socket.getpeercert() format) matches
        hostname and is valid at this time. CRLs and subjectAltName are
        not handled.
        
        Returns error message if any problems are found and None on success.
        '''
        if not cert:
            return _('no certificate received')
        notafter = cert.get('notAfter')
        if notafter and time.time() > ssl.cert_time_to_seconds(notafter):
            return _('certificate expired %s') % notafter
        dnsname = hostname.lower()
        for s in cert.get('subject', []):
            key, value = s[0]
            if key == 'commonName':
                certname = value.lower()
                if (certname == dnsname or
                    '.' in dnsname and
                    certname == '*.' + dnsname.split('.', 1)[1]):
                    return None
                return _('certificate is for %s') % certname
        return _('no commonName found in certificate')
    
    
    def check(a, b):
        if a != b:
            print (a, b)

    # Test non-wildcard certificates
    check(_verifycert({'subject': ((('commonName', 'example.com'),),)}, 'example.com'),
    None)
    check(_verifycert({'subject': ((('commonName', 'example.com'),),)}, 'www.example.com'),
    'certificate is for example.com')
    check(_verifycert({'subject': ((('commonName', 'www.example.com'),),)}, 'example.com'),
    'certificate is for www.example.com')

    # Test wildcard certificates
    check(_verifycert({'subject': ((('commonName', '*.example.com'),),)}, 'www.example.com'),
    None)
    check(_verifycert({'subject': ((('commonName', '*.example.com'),),)}, 'example.com'),
    'certificate is for *.example.com')
    check(_verifycert({'subject': ((('commonName', '*.example.com'),),)}, 'w.w.example.com'),
    'certificate is for *.example.com')

    # Avoid some pitfalls
    check(_verifycert({'subject': ((('commonName', '*.foo'),),)}, 'foo'),
    'certificate is for *.foo')
    check(_verifycert({'subject': ((('commonName', '*o'),),)}, 'foo'),
    'certificate is for *o')

    import time
    lastyear = time.gmtime().tm_year - 1
    nextyear = time.gmtime().tm_year + 1
    check(_verifycert({'notAfter': 'May  9 00:00:00 %s GMT' % lastyear}, 'example.com'),
        'certificate expired May  9 00:00:00 %s GMT' % lastyear)
    check(_verifycert({'notAfter': 'Sep 29 15:29:48 %s GMT' % nextyear, 'subject': ()}, 'example.com'),
        'no commonName found in certificate')
    check(_verifycert(None, 'example.com'),
        'no certificate received')

    @pitrou
    Copy link
    Member

    pitrou commented Oct 4, 2010

    Hello,

    I added some extra verification to Mercurial
    (http://www.selenic.com/hg/rev/f2937d6492c5). Feel free to use the
    following under the Python license in Python or elsewhere. It could be
    a separate method/function or it could integrated in wrap_socket and
    controlled by a keyword. I would appreciate if you find the
    implementation insufficient or incorrect.

    Thank you, I'll take a look!

    Are CRLs checked by the SSL module? Otherwise it deserves a big fat
    warning.

    They are not, but AFAIK most browsers don't check CRLs either...
    (or, rather they don't download updated CRLs)

    (I now assume that notBefore is handled by the SSL module and
    shouldn't be checked here.)

    I can't say for sure, but OpenSSL seems to handle both notBefore and
    notAfter as part of its cert verification routine (see interval_verify()
    and cert_check_time() in crypto/x509/x509_vfy.c).

    @pitrou
    Copy link
    Member

    pitrou commented Oct 4, 2010

    Here is a patch against py3k. It adds a single ssl.match_hostname method, with rules from RFC 2818 (that is, tailored for HTTPS). Review welcome.

    @devin
    Copy link
    Mannequin

    devin mannequin commented Oct 4, 2010

    I think it looks good except for the wildcard checking. According to the latest draft of that TLS id-checking RFC, you aren't supposed to allow the wildcard as part of a fragment. Of course this contradicts RFC 2818.

    http://tools.ietf.org/html/draft-saintandre-tls-server-id-check-09#section-4.4.3

    If this gets accepted, I'll submit a patch to http.client and urllib that makes use of it.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 4, 2010

    I think it looks good except for the wildcard checking. According to
    the latest draft of that TLS id-checking RFC, you aren't supposed to
    allow the wildcard as part of a fragment. Of course this contradicts
    RFC 2818.

    Well, since it is then an "error" (according to the id-checking draft)
    in the certificate itself rather than the hostname we are trying to
    match, it seems there would be no real issue in accepting the match
    anyway. It's up to CAs to make sure that certificates conform to
    whatever standard is currently in effect.

    I'm also assuming RFC 2818 is in wider use than the id-checking draft;
    am I wrong?

    @devin
    Copy link
    Mannequin

    devin mannequin commented Oct 4, 2010

    I'm also assuming RFC 2818 is in wider use than the id-checking draft;
    am I wrong?

    Yeah, since RFC 2818 has been accepted since 2000 and the id-checking draft was started in 2009, I'd say it's a safe bet. I'm in no way authoritative though.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 6, 2010

    If nobody objects, I will commit this (with docs) soon. Then I will open a separate issue for the http.client / urllib.request integration, since the discussion is already quite long here.

    @kiilerix
    Copy link
    Mannequin

    kiilerix mannequin commented Oct 6, 2010

    I'm sorry to make the discussion longer ...

    From a Python user/programmers point of view it would be nice if http://docs.python.org/library/ssl.html also clarified what "validation" means (apparently that the cert chain all the way from one of ca_certs is valid and with valid dates, except that CRLs not are checked?). It could perhaps be mentioned next to the ca_certs description. It would also be nice to see an example with subjectAltName, both with DNS and IP entries.

    Has it been tested that the way Python uses OpenSSL really checks both notBefore and notAfter?

    Some comments to the patch. Some of them are just thoughts that can be ignored, but I think some of them are relevant.

    _dnsname_to_pat:

    AFAICS * shouldn't match the empty string. I would expect "fail(cert, '.a.com')".

    I would prefer to fail to the safe side and only allow a left-most wildcard and thus not allow multiple or f* wildcards, just like draft-saintandre-tls-server-id-check-09 suggests.

    I would prefer to not use re for such an important task where clarity and correctness is so important. If we only allow left-most wildcards it shouldn't be necessary.

    match_hostname:

    I don't understand "IP addresses are not accepted for hostname". I assume that if commonName specifies an IP address then a hostname with this address is valid. So isn't it more that "subjectAltName iPAddress isn't supported"? But wouldn't it be better and simpler to simply support iPAddress - either as the only check iff hostname "looks" like an IP address, alternatively in all cases check against both DNS and IP entries?

    "dnsnames" doesn't say much about what it is. Perhaps "unmatched"?

    "if san: ... else: ..." would perhaps be a bit clearer.

    "doesn't match with either of (%s)" ... isn't the paranthesis around the list elements too pythonic for a message intended for end users?

    Separate error messages for subjectAltName and commonName could be helpful.

    I assume it should be "no appropriate _commonName_" to match "subjectAltName".

    test:

    cert for example.com is defined twice.

    Finally:

    How about unicode and/or IDN hostnames?

    @pitrou
    Copy link
    Member

    pitrou commented Oct 6, 2010

    From a Python user/programmers point of view it would be nice if
    http://docs.python.org/library/ssl.html also clarified what
    "validation" means (apparently that the cert chain all the way from
    one of ca_certs is valid and with valid dates, except that CRLs not
    are checked?). It could perhaps be mentioned next to the ca_certs
    description. It would also be nice to see an example with
    subjectAltName, both with DNS and IP entries.

    As mentioned in the patch, IP entries are not supported.
    I'm planning to do a couple of doc updates as part of the commit, but
    any doc suggestions should go in a separate issue IMO. This will make
    things more manageable.

    Has it been tested that the way Python uses OpenSSL really checks both
    notBefore and notAfter?

    I just checked and, yes, it does (but only if you specify CERT_OPTIONAL
    or CERT_REQUIRED, of course).

    AFAICS * shouldn't match the empty string. I would expect "fail(cert,
    '.a.com')".

    Good point.

    I would prefer to fail to the safe side and only allow a left-most
    wildcard and thus not allow multiple or f* wildcards, just like
    draft-saintandre-tls-server-id-check-09 suggests.

    Well, RFC 2818 allows them, and I see no point in being stricter.

    I would prefer to not use re for such an important task where clarity
    and correctness is so important. If we only allow left-most wildcards
    it shouldn't be necessary.

    I'm not convinced that manual parsing is really more readable than
    regular expressions, and wildcards are a pretty obvious use case for
    regular expressions. Perhaps it's a matter of habit.

    match_hostname:

    I don't understand "IP addresses are not accepted for hostname". I
    assume that if commonName specifies an IP address then a hostname with
    this address is valid. So isn't it more that "subjectAltName iPAddress
    isn't supported"?

    Indeed. But, strictly speaking, there are no tests for IPs, so it
    shouldn't be taken for granted that it works, even for commonName.
    The rationale is that there isn't really any point in using an IP rather
    a host name.

    But wouldn't it be better and simpler to simply support iPAddress -
    either as the only check iff hostname "looks" like an IP address,
    alternatively in all cases check against both DNS and IP entries?

    Well, that's additional logic to code. I'm not sure it's worth it,
    especially given that the function is called match_hostname in the first
    place.

    "doesn't match with either of (%s)" ... isn't the paranthesis around
    the list elements too pythonic for a message intended for end users?

    Hmm, perhaps.

    Separate error messages for subjectAltName and commonName could be
    helpful.

    That depends if you're an end user or an SSL expert, I guess. end users
    don't know what subjectAltNames and commonNames are.

    I assume it should be "no appropriate _commonName_" to match
    "subjectAltName".

    Ah, yes.

    cert for example.com is defined twice.

    Right.

    How about unicode and/or IDN hostnames?

    I haven't looked how these work in the context of certificate checking.
    I would prefer to tackle that separately if needed, but someone can
    provide a patch for this if they want.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 6, 2010

    Here is a new patch with doc updates and the corrections mentioned above.

    @kiilerix
    Copy link
    Mannequin

    kiilerix mannequin commented Oct 6, 2010

    Indeed. But, strictly speaking, there are no tests for IPs, so it
    shouldn't be taken for granted that it works, even for commonName.
    The rationale is that there isn't really any point in using an IP rather
    a host name.

    I don't know if there is a point or not, but some hosts are for some
    reason intended to be connected to using IP address and their
    certificates thus contains IP addresses. I think we should support that
    too, and I find it a bit confusing to only have partial support for
    subjectAltName.

    Well, that's additional logic to code. I'm not sure it's worth it,
    especially given that the function is called match_hostname in the first
    place.

    "hostname" in Python usually refers to both IP addresses and DNS
    hostnames (just like in URLs), so I think it is a fair assumption that
    IP addresses also works in this hostname function.

    Perhaps it should be noted that CertificateError only is raised by
    match_hostname so a paranoid programmer don't start catching it
    everywhere - and also that match_hostname won't raise SSLError.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 6, 2010

    I don't know if there is a point or not, but some hosts are for some
    reason intended to be connected to using IP address and their
    certificates thus contains IP addresses. I think we should support that
    too, and I find it a bit confusing to only have partial support for
    subjectAltName.

    Do you have examples? Otherwise it is difficult to implement.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 8, 2010

    I wanted to go forward with this and so I've committed the patch in r85321. If you're concerned about the lack of support for IP addresses, you can open a new issue (and even provide a patch!). Thank you.

    @pitrou pitrou closed this as completed Oct 8, 2010
    @kiilerix
    Copy link
    Mannequin

    kiilerix mannequin commented Oct 15, 2010

    Can you confirm that the exception raised both on "too early" and "too late" is something like "...SSL3_GET_SERVER_CERTIFICATE:certificate verify failed"?

    (If so: It would be nice if a slightly more helpful message could be given. I don't know if that is possible.)

    @pitrou
    Copy link
    Member

    pitrou commented Oct 15, 2010

    Le vendredi 15 octobre 2010 à 22:51 +0000, Mads Kiilerich a écrit :

    Mads Kiilerich <mads@kiilerich.com> added the comment:

    Can you confirm that the exception raised both on "too early" and "too
    late" is something like "...SSL3_GET_SERVER_CERTIFICATE:certificate
    verify failed"?

    Yes.

    (If so: It would be nice if a slightly more helpful message could be
    given. I don't know if that is possible.)

    Agreed. I don't know how to do that, though.

    @asdfasdfasdfasdfasdfasdfasdf
    Copy link
    Mannequin

    asdfasdfasdfasdfasdfasdfasdf mannequin commented Nov 1, 2010

    So I know the current patch doesn't support IP addresses but I thought I would link to what mozilla considered a security problem(just for future reference):

    CVE-2010-3170: http://www.mozilla.org/security/announce/2010/mfsa2010-70.html

    "Security researcher Richard Moore reported that when an SSL certificate was created with a common name containing a wildcard followed by a partial IP address a valid SSL connection could be established with a server whose IP address matched the wildcard range by browsing directly to the IP address. It is extremely unlikely that such a certificate would be issued by a Certificate Authority."

    @kiilerix
    Copy link
    Mannequin

    kiilerix mannequin commented Nov 1, 2010

    So I know the current patch doesn't support IP addresses

    Not exactly. The committed patch do not consider IP addresses -
    especially not iPAddress entries in subjectAltName. But Python only
    distinguishes resolvable names from IP addresses at a very low level. At
    the ssl module level the name and IP is considered the same, so we
    actually do support IP addresses if specified in commonName or
    subjectAltName DNS. We are thus "vulnerable" to this issue. (AFAIK AFAICS)

    (It seems like IP in commonName isn't permitted by the RFCs, but I think
    it is quite common, especially for self-signed certificates.)

    CVE-2010-3170: http://www.mozilla.org/security/announce/2010/mfsa2010-70.html

    For reference, the actual report can be found on
    http://www.securityfocus.com/archive/1/513396

    FWIW, I don't think it is critical at all. Granted, it is a deviation
    from the specification, and that is not good in a security critical
    part. But we do not claim to implement the full specification, so I
    don't think this deviation makes any difference.

    Further, this issue will only have relevance if one the trusted CAs
    create invalid certificates. But if the trusted CAs create invalid
    certificates the user has lost anyway and things can't get much worse.

    @techtonik
    Copy link
    Mannequin

    techtonik mannequin commented Nov 10, 2010

    Should we escalate this issue to CVA for Python 2.x?

    @pitrou
    Copy link
    Member

    pitrou commented Nov 11, 2010

    Should we escalate this issue to CVA for Python 2.x?

    It's more of a missing feature than a security issue in itself, although
    the missing feature has to do with security.

    @asdfasdfasdfasdfasdfasdfasdf
    Copy link
    Mannequin

    asdfasdfasdfasdfasdfasdfasdf mannequin commented Nov 11, 2010

    On 11 November 2010 23:31, Antoine Pitrou <report@bugs.python.org> wrote:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    > Should we escalate this issue to CVA for Python 2.x?

    It's more of a missing feature than a security issue in itself, although
    the missing feature has to do with security.

    Still it would be nice to see in python 2.x at some point don't you think?

    @pitrou
    Copy link
    Member

    pitrou commented Nov 11, 2010

    >
    > It's more of a missing feature than a security issue in itself, although
    > the missing feature has to do with security.

    Still it would be nice to see in python 2.x at some point don't you think?

    Well, lots of things would be nice to see in python 2.x, but that's not
    how things work :)
    That said, someone else is maintaining a backport (thank him):
    http://pypi.python.org/pypi/backports.ssl_match_hostname/

    @ThomasLeonard
    Copy link
    Mannequin

    ThomasLeonard mannequin commented Feb 25, 2012

    Just to add a couple of data points to argue in favour of a secure-by-default behaviour:

    0install.net:

    http://secunia.com/advisories/47935 (spoofing attack due to certificate names not being validated)

    Mozilla is recommending people avoid using Python's built-in SSL:

    https://github.com/mozilla/browserid/wiki/Security-Considerations-when-Implementing-BrowserID

    I find it hard to believe that anyone would be able to write an SSL client in Python currently without introducing some vulnerability. There are too many traps to fall into. Here are the three I know about:

    1. Not specifying any trusted CAs means trust everyone (where for most software it would mean either trust no-one or trust only well-known CAs).

    2. Specifiying a single trusted CA means also trust all known CAs (on MacOS X at least).

    3. Unless you manually enable hostname checking, the attacker only needs a valid SSL certificate for their own site, not for the site they're spoofing.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants