Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed SANs, or SANs of type other than DNS or IP Address, may raise "UnicodeError: label too long in IDNA decode" in certs.dummy_cert() #6536

Closed
justinsteven opened this issue Dec 7, 2023 · 3 comments · Fixed by #6537

Comments

@justinsteven
Copy link

#6382 changed certs.dummy_cert()'s logic when generating a cert for a given list of string-type sans

Old behaviour:

  1. Try to ipaddress.ip_address() the san
  2. If that raises ValueError, leave the san alone

New behaviour:

  1. Try to ipaddress.ip_address() the san
  2. If that raises ValueError, do san.encode("idna").decode()

The problem is that a subjectAltName can be something other than an IP address or DNS hostname. For example, it can be a URL. This raises the possibility of a UnicodeError if the SAN is illegal per the maximum length of a DNS label (63)

This is fine:

>>> ("A"*63).encode("idna")
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'

This becomes a problem:

>>> ("A"*64).encode("idna")
Traceback (most recent call last):
  File "/usr/lib/python3.9/encodings/idna.py", line 167, in encode
    raise UnicodeError("label too long")
UnicodeError: label too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label too long)

This is fine (introducing dots separates the long name into labels):

>>> (".".join("A"*63 for _ in range(3))).encode("idna")
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'

Surprisingly, while longer than the maximum length of a DNS name (256), this is also fine:

>>> (".".join("A"*63 for _ in range(128))).encode("idna")
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.[... SNIP ...]

This is fine:

>>> ("https://host.example/" + "a" * (63 - len("example/"))).encode("idna")
b'https://host.example/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'

This is a problem:

>>> ("https://host.example/" + "a" * (64 - len("example/"))).encode("idna")
Traceback (most recent call last):
  File "/usr/lib/python3.9/encodings/idna.py", line 167, in encode
    raise UnicodeError("label too long")
UnicodeError: label too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label too long)

In general, it may be problematic to just YOLO IDNA-encode the entire thing if we're not sure that it's a DNS hostname. Take for example https://host.example/bücher

>>> "https://host.example/bücher".encode("idna").decode()
'https://host.xn--example/bcher-4ob'

I don't know if Unicode characters in the path are legal, but this looks like a pretty messed up SAN to me.

Regardless, as shown above, certificate generation can at least be made to fail.

We worked around this in our lab by patching the behaviour to be:

  1. Try to ipaddress.ip_address() the san
  2. If that raises ValueError, do san.encode("idna").decode()
  3. If that raises UnicodeError, leave the san alone

This fixed the cert generation for a particular edge-case SAN we were seeing. However, this might not be the most resilient fix. The correct fix might be:

  1. Pass the types of the SANs through to dummy_cert()
  2. If a SAN is of type IP address, ipaddress.ip_address() it (and if that fails leave it alone?)
  3. Else if it's of type DNS, do san.encode("idna").decode() (and if that fails leave it alone?)
  4. Else if it's of type URL, leave it alone (?)
  5. Else handle other SAN types as needed?
@mhils
Copy link
Member

mhils commented Dec 7, 2023

Thanks for the detailed write-up! I agree with everything here, we should just handle SANs properly and represent them as cryptography.x509.GeneralNames and not a list of strings. I've just posted #6537, which should fix this. This is hard-to-get-right, so a review would be super welcome! 😃

@mhils mhils added kind/bug area/core and removed kind/triage Unclassified issues labels Dec 7, 2023
@justinsteven
Copy link
Author

justinsteven commented Dec 7, 2023

Aha, that changelog entry makes complete sense 😁

I'm not as familiar with the wider codebase as I'd like to be, so I'm not sure I'd make a great reviewer. If I can I'll take a look.

Thanks for jumping on this so quickly!

@mhils
Copy link
Member

mhils commented Dec 12, 2023

I'm not as familiar with the wider codebase as I'd like to be, so I'm not sure I'd make a great reviewer. If I can I'll take a look.

Thanks! If you can verify that this fixes the issues you were seeing that'd be splendid, but absolutely no worries if you are busy otherwise. :)

mhils added a commit that referenced this issue Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants