-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSL match_hostname fails for internationalized domain names #72600
Comments
In accordance with http://tools.ietf.org/html/rfc6125#section-6.4.2: I found that error appears after using ssl.SSLContext.wrap_bio, which in turn uses internal newPySSLSocket, which in turn always decode server_hostname through:
PySSLSocket *self;
...
PyObject *hostname = PyUnicode_Decode(server_hostname, strlen(server_hostname), "idna", "strict");
...
self->server_hostname = hostname;
In this way, SSLSocket always contains U-label in its server_hostname field, and ssl._dnsname_match falis with "ssl.CertificateError: hostname ... doesn't match either of ..." And i don't understand where is a bug, or is it a bug. |
Thanks for bringing this to my attention. I can confirm that the code is broken. Further more there are no tests for IDN for server_hostname.
The bug is clearly in match_hostname(). The function fails to convert the hostname U-label to A-label before it compares the certificate. I have a rough draft of a patch here https://github.com/tiran/cpython/tree/issue28414_idna_verify By the way IDNA support in Python is broken in general, bpo-17305. We still don't support the latest IDNA standard from 2008 (!). IDNA 2003 is not compatible with German, Greek, Farsi and Sinhalese domains, http://unicode.org/faq/idn.html. |
Yes, I misspelled, match_hostname() fails with ssl.CertificateError. |
Christian, thanks a lot for your comment and for patch you provide. It becomes much clearer. |
Christian, what's the status on this one? |
It's a big, complicated mess. I can't implement IDN support correctly because Python lacks UTS#46 and IDNA2008 support. I just found out that IDNA 2008 is not enough because it does not provide a case mapping. Lack of case mapping broke my fix for curl CVE-2016-8625. At the moment IDN support is broken in a sane way: it just doesn't work and fails. A partial fix will introduce security issues. http://unicode.org/reports/tr46/#Processing lists "www.sparkasse-gießen.de" as a critical example. It's the domain of a German savings and loan bank. |
Hello Christian. Is there any update about this issue ? Thank you. |
If the SSL module followed the pattern of encoding all str to bytes at the edges while leaving bytes alone, and used exclusively bytes internally (and in this case by "bytes" I mean "bytes objects containing A-labels"), then it would at least fix this bug and also make it possible for library authors to implement their own IDNA handling. Right now if you pass in a pre-encoded byte-string, exactly what ssl.py needs to compare to the certificate, then ssl.py will convert it *back* to text :-(. |
I have an idea for a different approach that can be applied to both ssl and socket module. Stay tuned to this station for a PEP broadcast! |
I endorse njs' recommended fix here. Don't try to get clever, this is a security component, it should be the dumbest it can be possibly be while being correct, because if it's smarter it will probably be wrong. |
Did I miss Christian's "PEP Broadcast"? |
This came up on m.d.s.p. today: https://groups.google.com/d/msg/mozilla.dev.security.policy/K3sk5ZMv2DE/fx6c3WWFBgAJ I haven't dug in deeply, but it sounds like we handle IDNs in CNs and SANs differently? I think we should look for a way to solve that specific problem, without biting off the whole thing -- one solution would be to simply drop support for CNs in match_hostname, as both Chrome and Firefox have already done :-) |
No -- Python's ssl module uses exactly the same hostname checking logic in both cases, and it's equally broken regardless. But, since CAs do all kinds of weird stuff with CNs, there's some chance that our brokenness and their brokenness will align and make things work by chance. Specifically, this will happen if the CA puts the U-encoded name in the CN field. Nick Lamb's concern is that CAs may be using this as justification for continuing to issue certs that are broken in this way. I don't know if that's true, but it's possible.
That would indeed fix Nick Lamb's concern, but I'm dubious about this word "simply" :-). Obviously we should do this eventually, but it's going to break a bunch of people, you'll have to have a big fight about Python 2 and Redhat will probably refuse to take the patch and etc etc. OTOH fixing match_hostname to use A-labels would provide immediate benefits to Python's users (right now Python just... can't do SSL connections to IDNs) with minimal breakage, so you can call it a bug fix, and then worry about deprecating the CN field on its own schedule. |
For the record, I'm now considering match_hostname() on U-Labels crazy level 'A sure sign of someone who wears his underpants on his head.'. Once upon a time I had some hope to make it work and keep server_hostname to be an IDN U-Label. I no longer think it feasible and safe at the same time. Pros:
Cons:
Self-quote from pyca/cryptography#3357 (comment) --- User supplied input (hostname for TCP connection, server hostname) can be specified as either IDN U-label (str), IDN A-label (aka ACE, str) or ACE bytes. Internally the socket module and SSL module use ACE bytes only. Text (str) are converted to ACE bytes using IDNA. Since ACE str are just ASCII, IDNA encoding of ACE str is equivalent to encoding with ASCII encoding. All output (SAN dNSName, SAN URI, CN, SNI callback, server_hostname attribute) are decoded as ACE strings. Since IDN is not a bijective mapping and also depends on the IDNA standard (2003, 2008, UTS46), this avoids some potential security issues. X.509 hostname verification and matching is defined on ACE, not IDN U-labels. I would rather keep them as bytes, but it wouldn't be backwards compatible. Also the aligns the SSL module with the socket module. socket.getnameinfo() decodes with ASCII, not with IDNA. The new approach will make the SSL module compatible with the external idna package and IDNA 2008. Users just have to pass in ACE bytes as server_hostname. |
As much for myself when I next run into this on my checklist as for any other readers: Despite the appearance of nothing happening PR 3010 (linked) actually has a little bit of momentum and seems likely to eventually land in Python. |
At Christian's request and considering the importance of the ssl module, I'm going to allow an extension for landing of this feature until 3.7.0b2, currently scheduled for 2018-02-26. If anyone else can help Christian get this in before b2, that would be great. I'm removing older versions for now. We can discuss potential backports after the feature lands. |
In PR #5395 I added a test to verify that most IDNA domains are now working. IDNA 2008 deviations and the fundamental issue of IDNA server callback and IDNA encoded server_hostname attribute are still open. I'll address them in another PR. |
Christian: we're less than a week out from b2. Do you need any help here? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: