OpenSSL::X509::Name.to_s generates ASCII-8BIT strings and double quotes UTF-8 #26

docwhat · 2015-09-02T03:23:16Z

Given a certificate with non-ASCII characters in subject I would expect the subject when converted to a string via .to_s would return a UTF-8 (or other appropriately encoded) string.

What I get instead is an ASCII-8BIT encoding string with the UTF-8 characters double encoded.

Digging through the ruby openssl code, I see that if .to_s is called with an integer flag, it actually calls X509_NAME_print_ex().

The docs say I should use XN_FLAG_ONELINE & ~ASN1_STRFLGS_ESC_MSB to get UTF-8 (as an example). But I couldn't find ASN1_STRFLGS_ESC_MSB in the ruby OpenSSL library.

So I'd say there was two bugs:

.to_s with no arguments should return a properly encoded string without escaped UTF-8 characters.
The ASN1_STRFLGS constants are missing.

Here is some ruby code to help show the problem:

# coding: utf-8

require 'openssl'

### Note 1 ###
# I had to look this up in the OpenSSL include files.
# I couldn't find it in ruby anyplace.
ASN1_STRFLGS_ESC_MSB = 4

pem = <<-CERT
-----BEGIN CERTIFICATE-----
MIIEPTCCAyWgAwIBAgIBATANBgkqhkiG9w0BAQUFADCBvzE/MD0GA1UEAww2VMOc
UktUUlVTVCBFbGVrdHJvbmlrIFNlcnRpZmlrYSBIaXptZXQgU2HEn2xhecSxY8Sx
c8SxMQswCQYDVQQGEwJUUjEPMA0GA1UEBwwGQW5rYXJhMV4wXAYDVQQKDFVUw5xS
S1RSVVNUIEJpbGdpIMSwbGV0acWfaW0gdmUgQmlsacWfaW0gR8O8dmVubGnEn2kg
SGl6bWV0bGVyaSBBLsWeLiAoYykgQXJhbMSxayAyMDA3MB4XDTA3MTIyNTE4Mzcx
OVoXDTE3MTIyMjE4MzcxOVowgb8xPzA9BgNVBAMMNlTDnFJLVFJVU1QgRWxla3Ry
b25payBTZXJ0aWZpa2EgSGl6bWV0IFNhxJ9sYXnEsWPEsXPEsTELMAkGA1UEBhMC
VFIxDzANBgNVBAcMBkFua2FyYTFeMFwGA1UECgxVVMOcUktUUlVTVCBCaWxnaSDE
sGxldGnFn2ltIHZlIEJpbGnFn2ltIEfDvHZlbmxpxJ9pIEhpem1ldGxlcmkgQS7F
ni4gKGMpIEFyYWzEsWsgMjAwNzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoC
ggEBAKu3PgqMyKVYFeaK7yc9SrToJdPNM8Ig3BnuiD9NYvDdE3ePYakqtdTyuTFY
KTsvP2qcb3N2Je40IIDu6rfwxArNK4aUyeNgsURSsloptJGXg9i3phQvKUmi8wUG
+7RP2qFsmmaf8EMJyupyj+sA1zU511YXRxcw9L6/P8JorzZAwan0qafoEGsIiveG
HtyaKhUG9qPw9ODHFNRRf8+0222vR5YXm3dx2KdxnSQM9pQ/hTEST7ruToK4uT6P
IzdezKKqdfcYbwnTrqdUKDT74eA7YH2gvnmJhsifLfkKS8RQouf9eRbHegsYz85M
733WB2+Y8a+xwXrXgTW4qhe04MsCAwEAAaNCMEAwHQYDVR0OBBYEFCnFkKslrxHk
Yb+j/4hhkeYO/pyBMA4GA1UdDwEB/wQEAwIBBjAPBgNVHRMBAf8EBTADAQH/MA0G
CSqGSIb3DQEBBQUAA4IBAQAQDdr4Ouwo0RSVgrESLFF6QSU2TJ/sPx+EnWVUXKgW
AkD6bho3hO9ynYYKVZ1WKKxmLNA6VpM0ByWtCLCPyA8JWcqdmBzlVPi5RX9ql2+I
aE1KBiY3iAIOtsbWcpnOa3faYjGkVh+uX4132l32iPwa2Z61gfAyuOOI0JzzaqC5
mxRZNTZPz/OOXl0XrRWV2N2y1RVuAE6zS89mlOTgzbUF2mNXi+WzqtvALhyQRNsa
XRik7r4EW5nVcV9VZWRi1aKbBFmGyGJ353yCRWo9F7/snXUMrqNvWtMvmDb08PUZ
qxFdyKbjKlhqQgnDvZImZjINXQhVdP+MmNAKpoRq0Tl9
-----END CERTIFICATE-----
CERT


cert = OpenSSL::X509::Certificate.new pem

cert.subject # => #<OpenSSL::X509::Name:0x007f85ad0019c0>

puts
puts 'The "I expected this to work" approach'
i_expected_this_to_work = cert.subject.to_s
puts i_expected_this_to_work
puts i_expected_this_to_work.encoding

# Following instructions from https://wiki.openssl.org/index.php/Manual:X509_NAME_print_ex(3)
# to get UTF-8.
puts
puts 'The "Using magic flags" approach'
oh_so_close = cert.subject.to_s(OpenSSL::X509::Name::ONELINE & ~ASN1_STRFLGS_ESC_MSB)
puts oh_so_close
puts oh_so_close.encoding

puts
puts 'The "Fix the magic flags" approach'
corrected_approach = oh_so_close.force_encoding(Encoding::UTF_8)
puts corrected_approach
puts corrected_approach.encoding

# >> 
# >> The "I expected this to work" approach
# >> /CN=T\xC3\x9CRKTRUST Elektronik Sertifika Hizmet Sa\xC4\x9Flay\xC4\xB1c\xC4\xB1s\xC4\xB1/C=TR/L=Ankara/O=T\xC3\x9CRKTRUST Bilgi \xC4\xB0leti\xC5\x9Fim ve Bili\xC5\x9Fim G\xC3\xBCvenli\xC4\x9Fi Hizmetleri A.\xC5\x9E. (c) Aral\xC4\xB1k 2007
# >> ASCII-8BIT
# >> 
# >> The "Using magic flags" approach
# >> CN = TÜRKTRUST Elektronik Sertifika Hizmet Sağlayıcısı, C = TR, L = Ankara, O = TÜRKTRUST Bilgi İletişim ve Bilişim Güvenliği Hizmetleri A.Ş. (c) Aralık 2007
# >> ASCII-8BIT
# >> 
# >> The "Fix the magic flags" approach
# >> CN = TÜRKTRUST Elektronik Sertifika Hizmet Sağlayıcısı, C = TR, L = Ankara, O = TÜRKTRUST Bilgi İletişim ve Bilişim Güvenliği Hizmetleri A.Ş. (c) Aralık 2007
# >> UTF-8

The text was updated successfully, but these errors were encountered:

tarcieri · 2015-09-02T23:37:23Z

I would recommend against all of the suggestions in this proposal. Or rather, I think it's okay to extend the current API with new methods that enable this functionality, but the proposed approach will break many things.

The subject (i.e. subject CN) and associated SANs are represented in certificates as IDNA names encoded in punycode. For purposes of certificate verification, verification APIs and things that interact with them need to also work in ASCII-8BIT/BINARY-clean punycode. I say this as the person who fixed RFC-6125 IDNA compliance for this library 1.

An API which decodes punycode and provides a UTF-8 representation would be nice, but I would argue it MUST be implemented as an extension to the current API or it will break everything which is already capable of working with IDNA names which expects they're encoded in punycode. I think I can safely say this extends to anything which interoperates with DNS, a.k.a. practically everything that supports IDNA in Ruby today.

docwhat · 2015-09-03T00:12:18Z

It's a shame .to_s is already being used because it is the expected by idiots (read: me) to "just work" with non-ascii characters.

I assume that swapping the current behavior to a new method e.g. .print_name would break things even if the major number is bumped?

tarcieri · 2015-09-03T00:13:40Z

I'd suggest something like #to_utf8

docwhat · 2015-09-03T00:17:42Z

Sounds acceptable. Following with the idiot use case, I'd have figured that out as soon as I saw the escaping and looked around.

Would it make sense to have a dedicated .to_something for the use case you had above. In case anyone else like me comes along again and you aren't around to stop them?

adamel · 2017-02-09T14:49:21Z

@tarcieri, there is no IDNA or punycode involved here, and noone is asking that anything related to that should be done. What we have above is a subject field containing non ASCII characters in UTF8STRING objects. See ASN.1 dump extract:

   34:d=3  hl=2 l=  63 cons:    SET               
   36:d=4  hl=2 l=  61 cons:     SEQUENCE          
   38:d=5  hl=2 l=   3 prim:      OBJECT            :commonName
   43:d=5  hl=2 l=  54 prim:      UTF8STRING        :TÜRKTRUST Elektronik Sertifika Hizmet Sağlayıcısı
   99:d=3  hl=2 l=  11 cons:    SET               
  101:d=4  hl=2 l=   9 cons:     SEQUENCE          
  103:d=5  hl=2 l=   3 prim:      OBJECT            :countryName
  108:d=5  hl=2 l=   2 prim:      PRINTABLESTRING   :TR
  112:d=3  hl=2 l=  15 cons:    SET               
  114:d=4  hl=2 l=  13 cons:     SEQUENCE          
  116:d=5  hl=2 l=   3 prim:      OBJECT            :localityName
  121:d=5  hl=2 l=   6 prim:      UTF8STRING        :Ankara
  129:d=3  hl=2 l=  94 cons:    SET               
  131:d=4  hl=2 l=  92 cons:     SEQUENCE          
  133:d=5  hl=2 l=   3 prim:      OBJECT            :organizationName
  138:d=5  hl=2 l=  85 prim:      UTF8STRING        :TÜRKTRUST Bilgi İletişim ve Bilişim Güvenliği Hizmetleri A.Ş. (c) Aralık 2007```

When a field contains non-ASCII ruby *should* return it as correct UTF-8.

As for IDNA/punycode encoded strings those would be plain ASCII in the certificate, and I fully agree that the openssl lib should absolutely not try to decode those in any way.

tarcieri · 2017-02-09T17:57:24Z

@adamel oh sorry, I misunderstood... I was talking about SAN values.

Interpreting the subject name and additional fields as UTF-8 is correct per RFC 4630:

https://tools.ietf.org/html/rfc4630#section-3

The existing #to_s does not interact well with distinguished names containing multi-byte UTF-8 characters since the OpenSSL function X509_NAME_print_ex() escapes bytes with MSB set by default. Unfortunately we can't fix it without breaking backwards compatibility. It takes options as a bit field that is directly passed to X509_NAME_print_ex(). Let's add a new method instead. Fixes: ruby#26

rhenium · 2017-09-03T08:57:38Z

#143 adds OpenSSL::X509::Name#to_utf8.

The existing #to_s does not interact well with distinguished names containing multi-byte UTF-8 characters since the OpenSSL function X509_NAME_print_ex() escapes bytes with MSB set by default. Unfortunately we can't fix it without breaking backwards compatibility. It takes options as a bit field that is directly passed to X509_NAME_print_ex(). Let's add a new method instead. Fixes: ruby#26

adamel · 2017-09-04T18:13:53Z

Nice. For those stuck with older versions the following can be be used as a workaround:

ASN1_STRFLGS_ESC_MSB = 4

def name2utf8 name
  begin
    s = name.to_s(OpenSSL::X509::Name::ONELINE & ~(ASN1_STRFLGS_ESC_MSB))
  rescue
    s = name.to_s
  end
  return s
end

rhenium mentioned this issue Sep 3, 2017

Add X509::Name#to_utf8 and #inspect #143

Merged

rhenium closed this as completed in #143 Sep 3, 2017

DrusTheAxe mentioned this issue Jan 1, 2021

Installation fails if certificate contains extended ascii or non alpha-numeric character microsoft/msix-packaging#401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenSSL::X509::Name.to_s generates ASCII-8BIT strings and double quotes UTF-8 #26

OpenSSL::X509::Name.to_s generates ASCII-8BIT strings and double quotes UTF-8 #26

docwhat commented Sep 2, 2015

tarcieri commented Sep 2, 2015

docwhat commented Sep 3, 2015

tarcieri commented Sep 3, 2015

docwhat commented Sep 3, 2015

adamel commented Feb 9, 2017

tarcieri commented Feb 9, 2017

rhenium commented Sep 3, 2017

adamel commented Sep 4, 2017

OpenSSL::X509::Name.to_s generates ASCII-8BIT strings and double quotes UTF-8 #26

OpenSSL::X509::Name.to_s generates ASCII-8BIT strings and double quotes UTF-8 #26

Comments

docwhat commented Sep 2, 2015

tarcieri commented Sep 2, 2015

docwhat commented Sep 3, 2015

tarcieri commented Sep 3, 2015

docwhat commented Sep 3, 2015

adamel commented Feb 9, 2017

tarcieri commented Feb 9, 2017

rhenium commented Sep 3, 2017

adamel commented Sep 4, 2017