Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSSL::X509::Name.to_s generates ASCII-8BIT strings and double quotes UTF-8 #26

Closed
docwhat opened this issue Sep 2, 2015 · 8 comments

Comments

@docwhat
Copy link

docwhat commented Sep 2, 2015

Given a certificate with non-ASCII characters in subject I would expect the subject when converted to a string via .to_s would return a UTF-8 (or other appropriately encoded) string.

What I get instead is an ASCII-8BIT encoding string with the UTF-8 characters double encoded.

Digging through the ruby openssl code, I see that if .to_s is called with an integer flag, it actually calls X509_NAME_print_ex().

The docs say I should use XN_FLAG_ONELINE & ~ASN1_STRFLGS_ESC_MSB to get UTF-8 (as an example). But I couldn't find ASN1_STRFLGS_ESC_MSB in the ruby OpenSSL library.

So I'd say there was two bugs:

  1. .to_s with no arguments should return a properly encoded string without escaped UTF-8 characters.
  2. The ASN1_STRFLGS constants are missing.

Here is some ruby code to help show the problem:

# coding: utf-8

require 'openssl'

### Note 1 ###
# I had to look this up in the OpenSSL include files.
# I couldn't find it in ruby anyplace.
ASN1_STRFLGS_ESC_MSB = 4

pem = <<-CERT
-----BEGIN CERTIFICATE-----
MIIEPTCCAyWgAwIBAgIBATANBgkqhkiG9w0BAQUFADCBvzE/MD0GA1UEAww2VMOc
UktUUlVTVCBFbGVrdHJvbmlrIFNlcnRpZmlrYSBIaXptZXQgU2HEn2xhecSxY8Sx
c8SxMQswCQYDVQQGEwJUUjEPMA0GA1UEBwwGQW5rYXJhMV4wXAYDVQQKDFVUw5xS
S1RSVVNUIEJpbGdpIMSwbGV0acWfaW0gdmUgQmlsacWfaW0gR8O8dmVubGnEn2kg
SGl6bWV0bGVyaSBBLsWeLiAoYykgQXJhbMSxayAyMDA3MB4XDTA3MTIyNTE4Mzcx
OVoXDTE3MTIyMjE4MzcxOVowgb8xPzA9BgNVBAMMNlTDnFJLVFJVU1QgRWxla3Ry
b25payBTZXJ0aWZpa2EgSGl6bWV0IFNhxJ9sYXnEsWPEsXPEsTELMAkGA1UEBhMC
VFIxDzANBgNVBAcMBkFua2FyYTFeMFwGA1UECgxVVMOcUktUUlVTVCBCaWxnaSDE
sGxldGnFn2ltIHZlIEJpbGnFn2ltIEfDvHZlbmxpxJ9pIEhpem1ldGxlcmkgQS7F
ni4gKGMpIEFyYWzEsWsgMjAwNzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoC
ggEBAKu3PgqMyKVYFeaK7yc9SrToJdPNM8Ig3BnuiD9NYvDdE3ePYakqtdTyuTFY
KTsvP2qcb3N2Je40IIDu6rfwxArNK4aUyeNgsURSsloptJGXg9i3phQvKUmi8wUG
+7RP2qFsmmaf8EMJyupyj+sA1zU511YXRxcw9L6/P8JorzZAwan0qafoEGsIiveG
HtyaKhUG9qPw9ODHFNRRf8+0222vR5YXm3dx2KdxnSQM9pQ/hTEST7ruToK4uT6P
IzdezKKqdfcYbwnTrqdUKDT74eA7YH2gvnmJhsifLfkKS8RQouf9eRbHegsYz85M
733WB2+Y8a+xwXrXgTW4qhe04MsCAwEAAaNCMEAwHQYDVR0OBBYEFCnFkKslrxHk
Yb+j/4hhkeYO/pyBMA4GA1UdDwEB/wQEAwIBBjAPBgNVHRMBAf8EBTADAQH/MA0G
CSqGSIb3DQEBBQUAA4IBAQAQDdr4Ouwo0RSVgrESLFF6QSU2TJ/sPx+EnWVUXKgW
AkD6bho3hO9ynYYKVZ1WKKxmLNA6VpM0ByWtCLCPyA8JWcqdmBzlVPi5RX9ql2+I
aE1KBiY3iAIOtsbWcpnOa3faYjGkVh+uX4132l32iPwa2Z61gfAyuOOI0JzzaqC5
mxRZNTZPz/OOXl0XrRWV2N2y1RVuAE6zS89mlOTgzbUF2mNXi+WzqtvALhyQRNsa
XRik7r4EW5nVcV9VZWRi1aKbBFmGyGJ353yCRWo9F7/snXUMrqNvWtMvmDb08PUZ
qxFdyKbjKlhqQgnDvZImZjINXQhVdP+MmNAKpoRq0Tl9
-----END CERTIFICATE-----
CERT


cert = OpenSSL::X509::Certificate.new pem

cert.subject # => #<OpenSSL::X509::Name:0x007f85ad0019c0>

puts
puts 'The "I expected this to work" approach'
i_expected_this_to_work = cert.subject.to_s
puts i_expected_this_to_work
puts i_expected_this_to_work.encoding

# Following instructions from https://wiki.openssl.org/index.php/Manual:X509_NAME_print_ex(3)
# to get UTF-8.
puts
puts 'The "Using magic flags" approach'
oh_so_close = cert.subject.to_s(OpenSSL::X509::Name::ONELINE & ~ASN1_STRFLGS_ESC_MSB)
puts oh_so_close
puts oh_so_close.encoding

puts
puts 'The "Fix the magic flags" approach'
corrected_approach = oh_so_close.force_encoding(Encoding::UTF_8)
puts corrected_approach
puts corrected_approach.encoding

# >> 
# >> The "I expected this to work" approach
# >> /CN=T\xC3\x9CRKTRUST Elektronik Sertifika Hizmet Sa\xC4\x9Flay\xC4\xB1c\xC4\xB1s\xC4\xB1/C=TR/L=Ankara/O=T\xC3\x9CRKTRUST Bilgi \xC4\xB0leti\xC5\x9Fim ve Bili\xC5\x9Fim G\xC3\xBCvenli\xC4\x9Fi Hizmetleri A.\xC5\x9E. (c) Aral\xC4\xB1k 2007
# >> ASCII-8BIT
# >> 
# >> The "Using magic flags" approach
# >> CN = TÜRKTRUST Elektronik Sertifika Hizmet Sağlayıcısı, C = TR, L = Ankara, O = TÜRKTRUST Bilgi İletişim ve Bilişim Güvenliği Hizmetleri A.Ş. (c) Aralık 2007
# >> ASCII-8BIT
# >> 
# >> The "Fix the magic flags" approach
# >> CN = TÜRKTRUST Elektronik Sertifika Hizmet Sağlayıcısı, C = TR, L = Ankara, O = TÜRKTRUST Bilgi İletişim ve Bilişim Güvenliği Hizmetleri A.Ş. (c) Aralık 2007
# >> UTF-8
@tarcieri
Copy link
Collaborator

tarcieri commented Sep 2, 2015

I would recommend against all of the suggestions in this proposal. Or rather, I think it's okay to extend the current API with new methods that enable this functionality, but the proposed approach will break many things.

The subject (i.e. subject CN) and associated SANs are represented in certificates as IDNA names encoded in punycode. For purposes of certificate verification, verification APIs and things that interact with them need to also work in ASCII-8BIT/BINARY-clean punycode. I say this as the person who fixed RFC-6125 IDNA compliance for this library 1.

An API which decodes punycode and provides a UTF-8 representation would be nice, but I would argue it MUST be implemented as an extension to the current API or it will break everything which is already capable of working with IDNA names which expects they're encoded in punycode. I think I can safely say this extends to anything which interoperates with DNS, a.k.a. practically everything that supports IDNA in Ruby today.

@docwhat
Copy link
Author

docwhat commented Sep 3, 2015

It's a shame .to_s is already being used because it is the expected by idiots (read: me) to "just work" with non-ascii characters.

I assume that swapping the current behavior to a new method e.g. .print_name would break things even if the major number is bumped?

@tarcieri
Copy link
Collaborator

tarcieri commented Sep 3, 2015

I'd suggest something like #to_utf8

@docwhat
Copy link
Author

docwhat commented Sep 3, 2015

Sounds acceptable. Following with the idiot use case, I'd have figured that out as soon as I saw the escaping and looked around.

Would it make sense to have a dedicated .to_something for the use case you had above. In case anyone else like me comes along again and you aren't around to stop them?

@adamel
Copy link

adamel commented Feb 9, 2017

@tarcieri, there is no IDNA or punycode involved here, and noone is asking that anything related to that should be done. What we have above is a subject field containing non ASCII characters in UTF8STRING objects. See ASN.1 dump extract:

   34:d=3  hl=2 l=  63 cons:    SET               
   36:d=4  hl=2 l=  61 cons:     SEQUENCE          
   38:d=5  hl=2 l=   3 prim:      OBJECT            :commonName
   43:d=5  hl=2 l=  54 prim:      UTF8STRING        :TÜRKTRUST Elektronik Sertifika Hizmet Sağlayıcısı
   99:d=3  hl=2 l=  11 cons:    SET               
  101:d=4  hl=2 l=   9 cons:     SEQUENCE          
  103:d=5  hl=2 l=   3 prim:      OBJECT            :countryName
  108:d=5  hl=2 l=   2 prim:      PRINTABLESTRING   :TR
  112:d=3  hl=2 l=  15 cons:    SET               
  114:d=4  hl=2 l=  13 cons:     SEQUENCE          
  116:d=5  hl=2 l=   3 prim:      OBJECT            :localityName
  121:d=5  hl=2 l=   6 prim:      UTF8STRING        :Ankara
  129:d=3  hl=2 l=  94 cons:    SET               
  131:d=4  hl=2 l=  92 cons:     SEQUENCE          
  133:d=5  hl=2 l=   3 prim:      OBJECT            :organizationName
  138:d=5  hl=2 l=  85 prim:      UTF8STRING        :TÜRKTRUST Bilgi İletişim ve Bilişim Güvenliği Hizmetleri A.Ş. (c) Aralık 2007```

When a field contains non-ASCII ruby *should* return it as correct UTF-8.

As for IDNA/punycode encoded strings those would be plain ASCII in the certificate, and I fully agree that the openssl lib should absolutely not try to decode those in any way.

@tarcieri
Copy link
Collaborator

tarcieri commented Feb 9, 2017

@adamel oh sorry, I misunderstood... I was talking about SAN values.

Interpreting the subject name and additional fields as UTF-8 is correct per RFC 4630:

https://tools.ietf.org/html/rfc4630#section-3

rhenium added a commit to rhenium/ruby-openssl that referenced this issue Sep 3, 2017
The existing #to_s does not interact well with distinguished names
containing multi-byte UTF-8 characters since the OpenSSL function
X509_NAME_print_ex() escapes bytes with MSB set by default.

Unfortunately we can't fix it without breaking backwards compatibility.
It takes options as a bit field that is directly passed to
X509_NAME_print_ex(). Let's add a new method instead.

Fixes: ruby#26
@rhenium
Copy link
Member

rhenium commented Sep 3, 2017

#143 adds OpenSSL::X509::Name#to_utf8.

rhenium added a commit to rhenium/ruby-openssl that referenced this issue Sep 3, 2017
The existing #to_s does not interact well with distinguished names
containing multi-byte UTF-8 characters since the OpenSSL function
X509_NAME_print_ex() escapes bytes with MSB set by default.

Unfortunately we can't fix it without breaking backwards compatibility.
It takes options as a bit field that is directly passed to
X509_NAME_print_ex(). Let's add a new method instead.

Fixes: ruby#26
@adamel
Copy link

adamel commented Sep 4, 2017

Nice. For those stuck with older versions the following can be be used as a workaround:

ASN1_STRFLGS_ESC_MSB = 4

def name2utf8 name
  begin
    s = name.to_s(OpenSSL::X509::Name::ONELINE & ~(ASN1_STRFLGS_ESC_MSB))
  rescue
    s = name.to_s
  end
  return s
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants