Document that PKCS#12 functions assume UTF-8 for passwords #3535

dwmw2 · 2017-05-24T11:22:24Z

Part of issue #3531

Checklist

documentation is added or updated
tests are added or updated

levitte · 2017-05-24T12:12:48Z

doc/man3/PKCS12_newpass.pod

+
+In particular, this means that passwords obtained through the default
+UI_OpenSSL() method, which are in the locale character set, should not be
+used without potentially being converted to UTF-8 first.


... unless the charset is known to be UTF8 or ISO-8859-1.

That was implicit in the "potentially". And no, not if it's known to be ISO8859-1. That's why we have the hack in apps/pkcs12.c, because if it's ISO8859-1 and just happens to be a byte sequence that would have been valid UTF-8, it'll get interpreted as UTF-8.

So having discounted ISO8859-1 which still needs conversion in the general case, you end up with a suggested wording of "...potentially being converted to UTF-8 first unless it was already known to be UTF-8.". Which is kind of redundant in its obviousness.

A ISO 8859-1 string would have to be very carefully crafted to be taken for a UTF-8 string...

Does that observation make it an acceptable bug?

levitte · 2017-05-24T12:13:22Z

doc/man3/PKCS12_parse.pod

+
+In particular, this means that passwords obtained through the default
+UI_OpenSSL() method, which are in the locale character set, should not be
+used without potentially being converted to UTF-8 first.


... unless...

richsalz · 2017-05-24T12:25:04Z

doc/man3/PKCS12_newpass.pod

+=head1 NOTES
+
+Both B<oldpass> and B<newpass> are NUL-terminated strings in the UTF-8 encoding.
+If either is not a valid UTF-8 byte sequence, then that one is interpreted


Maybe something like this: Each string first interpreted as UTF-8 encoded strings; and if not valid, it is taken to be a ISO8859-1 instead.

I was trying to be careful to make it clear that if one of them isn't a valid UTF-8 sequence, then only that one is interpreted as ISO8859-1; not both. Other than slightly reducing the clarity of that point (in my highly subjective POV) I'm not sure of intent of the changes you suggest. I don't really object to phrasing it your way, but I'm not sure of the reasoning.

I was trying to convey the same point, but thought my wording was more clear. Shrug.

How about "Each of B<oldpass> and B<newpass> is independently interpreted as a string in the UTF-8 encoding. If it is not valid UTF-8, it is assumed to be ISO8859-1 instead".

Mostly I've just added 'independently' to your suggestion...

levitte · 2017-05-24T13:21:55Z

doc/man3/PKCS12_newpass.pod

+In particular, this means that passwords obtained through the default
+UI_OpenSSL() method, which are in the locale character set, should not be
+used without potentially being converted to UTF-8 first.
+


These two paragraphs are contradicting each other. The first sort of says that ISO8859-1 is accepted, while the second seems to say that if the results of the use of UI_OpenSSL() is encoded in ISO8859-1, that would not be acceptable.

The first says that it'll be interpreted as ISO8859-1 if it can't be interpreted as UTF-8. Which means that if it's some arbitrary string in ISO8859-1... then you need to do what apps/pkcs11.c does and convert it to UTF-8. You can't just pass it through blindly and pray that it isn't one of the corner cases that gets misinterpreted.

To be clear: the "potentially" is a shorthand for something along the lines of...

A password string returned by the UI should be converted from the locale character set to UTF-8. Such a conversion may be skipped if:
• the locale character set is UTF-8, or
• the password entered happens to contain only characters for which the locale character encoding is compatible with UTF-8 (i.e. contains only 7-bit ASCII characters, for most non-EBCDIC locales). or
• the password entered happens to contain only characters for which the locale character encoding is compatible with ISO8859-1, and the resulting byte sequence cannot be interpreted as valid UTF-8.

On the whole, I was happier not writing that, and sticking with "potentially" :)

levitte · 2017-05-24T13:22:19Z

doc/man3/PKCS12_newpass.pod

+=head1 NOTES
+
+Each of B<oldpass> and B<newpass> is independently interpreted as a string in
+the UTF-8 encoding. If it is not valid UTF-8, it is assumed to be ISO9859-1


8859, not 9859

Fixed; thanks.

petrovr · 2017-05-24T13:43:36Z

dwmw2 wrote:

* Document that PKCS#12 functions assume UTF-8 for passwords

It is not acceptable documentation to state ''...in the locale character set ...". Locale make no sense here.

dwmw2 · 2017-05-24T14:05:17Z

It is not acceptable documentation to state ''...in the locale character set ...". Locale make no sense here.

Absolutely not true. See discussion in #3532

petrovr · 2017-05-24T14:09:04Z

dwmw2 wrote:

> It is not acceptable documentation to state ''...in the locale character set ...". Locale make no sense here. Absolutely not true. See discussion in #3532

Whet is scope of request - to design future or to describe current functionality ? "locale character set" is not correct for current . It is just byte sequence .

dwmw2 · 2017-05-24T14:24:01Z

Nothing is ever "just a byte sequence". That's not just wrong; it's a completely unacceptable way of thinking. If you have no clue whether the byte sequence is ASCII or EBCDIC or UCS-2, why pass it at all? We have RAND_bytes() for byte sequences with no meaning...

A string requires a byte sequence, and an encoding. Without the latter, it cannot be a string.

Often the encoding is implicit, and is indicated by the locale. And historically, in some circumstances one would ignore the encoding and just pretend it's a byte sequence. But that was always a hack. For interchange with something like PKCS#12 where the key derivation specifies a given character set, the encoding must absolutely be taken into account.

levitte · 2017-05-24T14:36:08Z

Maybe one should write "which are assumed to be in the current locale". That would be a very precise way of putting it...

On the other hand, I wonder why you want to document UI_OpenSSL here at all, since it's already documented in the UI docs.

dwmw2 · 2017-05-24T14:39:40Z

"which are assumed to be" is a good way of putting it. I'll do that; thanks.

And I want to call it out explicitly because it's unexpected. Application developers are in a situation where OpenSSL gives them pieces A and B of the jigsaw... but if they put them together then surprise! it's broken. One day we'll fix that. Right now let's make it less of a surprise.

levitte · 2017-05-24T14:43:51Z

The problem with documenting stuff in multiple places is synchronization. It's bound to be differ between places, so I would strongly suggest to avoid that and to simply refer to UI_OpenSSL(3).

petrovr · 2017-05-24T14:49:00Z

dwmw2 wrote:

Nothing is ever "just a byte sequence". That's not just wrong; it's a completely unacceptable way of thinking. If you have no clue whether the byte sequence is ASCII or EBCDIC, why pass it at all? We have RAND_bytes() for byte sequences with no meaning...

:) You did receive on my question: What is scope of request - to design future or to describe current functionality ? For current functionality documentation cannot state "...locale encoding.."

dwmw2 · 2017-05-24T14:59:21Z

The problem with documenting stuff in multiple places is synchronization. It's bound to be differ between places, so I would strongly suggest to avoid that and to simply refer to UI_OpenSSL(3).

OK, how about this then:

In particular, this means that passwords in the locale character set
(or code page on Windows) must potentially be converted to UTF-8 before
use. This may include passwords from local text files, or input from
the terminal or command line. Refer to L<UI_new(3)> for the behaviour of
the default UI_OpenSSL() UI method, for example.

@petrovr right now we are describing current functionality. And I thought I'd already dealt with your objection by pointing out that you're entirely wrong to object to "locale encoding". But as it happens I've doubly resolved it now with the above wording, since I'm no longer stating that any particular input is in the locale encoding; only that they may be.

EDIT: I've deleted my following comment; if we really must follow up that line of discussion let's do it only in #3532 (or maybe it would be better in #3531)

levitte · 2017-05-24T16:41:16Z

@dwmw2 wrote:

The problem with documenting stuff in multiple places is synchronization. It's bound to be differ between places, so I would strongly suggest to avoid that and to simply refer to UI_OpenSSL(3).

OK, how about this then:
In particular, this means that passwords in the locale character set
(or code page on Windows) must potentially be converted to UTF-8 before
use. This may include passwords from local text files, or input from
the terminal or command line. Refer to L<UI_new(3)> for the behaviour of
the default UI_OpenSSL() UI method, for example.

Much better. You can make the reference L<UI_OpenSSL(3)>, though.

Part of issue openssl#3531

dwmw2 · 2017-05-24T18:38:46Z

Much better. You can make the reference L<UI_OpenSSL(3)>, though.

Done.

dwmw2 · 2017-05-24T19:13:39Z

Strictly speaking this does now depend on #3532 now it refers to the documentation you add there

levitte · 2017-05-24T19:41:41Z

True. Let's hope another @openssl/team member wakes up and does a review...

richsalz

for master and any backports you want to do.

Part of issue #3531 Reviewed-by: Rich Salz <rsalz@openssl.org> Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from #3535)

levitte · 2017-05-24T20:17:56Z

Merged.

master: cff85f3
1.1.0: df73fcd

Part of issue #3531 Reviewed-by: Rich Salz <rsalz@openssl.org> Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from #3535) (cherry picked from commit cff85f3)

Part of issue openssl#3531 Reviewed-by: Rich Salz <rsalz@openssl.org> Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from openssl#3535) (cherry picked from commit cff85f3)

dwmw2 mentioned this pull request May 24, 2017

Internal treatment of character encoding in passwords #3531

Open

levitte requested changes May 24, 2017

View reviewed changes

richsalz reviewed May 24, 2017

View reviewed changes

dwmw2 force-pushed the pkcs12-utf8-doc branch from 186c23e to 62984f3 Compare May 24, 2017 13:05

levitte reviewed May 24, 2017

View reviewed changes

levitte requested changes May 24, 2017

View reviewed changes

dwmw2 force-pushed the pkcs12-utf8-doc branch from 62984f3 to 09fecee Compare May 24, 2017 13:34

dwmw2 force-pushed the pkcs12-utf8-doc branch from 09fecee to 57c85bd Compare May 24, 2017 14:42

dwmw2 force-pushed the pkcs12-utf8-doc branch 2 times, most recently from d2aa687 to eb5f0b4 Compare May 24, 2017 14:55

dwmw2 mentioned this pull request May 24, 2017

Clarify what character encoding is used in the returned UI strings #3532

Closed

Document that PKCS#12 functions assume UTF-8 for passwords

48d1a5e

Part of issue openssl#3531

dwmw2 force-pushed the pkcs12-utf8-doc branch from eb5f0b4 to 48d1a5e Compare May 24, 2017 18:37

levitte approved these changes May 24, 2017

View reviewed changes

richsalz approved these changes May 24, 2017

View reviewed changes

levitte pushed a commit that referenced this pull request May 24, 2017

Document that PKCS#12 functions assume UTF-8 for passwords

cff85f3

Part of issue #3531 Reviewed-by: Rich Salz <rsalz@openssl.org> Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from #3535)

levitte closed this May 24, 2017

dwmw2 deleted the pkcs12-utf8-doc branch May 24, 2017 21:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document that PKCS#12 functions assume UTF-8 for passwords #3535

Document that PKCS#12 functions assume UTF-8 for passwords #3535

dwmw2 commented May 24, 2017

levitte May 24, 2017

dwmw2 May 24, 2017

dwmw2 May 24, 2017

levitte May 24, 2017

dwmw2 May 24, 2017

levitte May 24, 2017

richsalz May 24, 2017

dwmw2 May 24, 2017

richsalz May 24, 2017

dwmw2 May 24, 2017 •

edited

richsalz May 24, 2017

levitte May 24, 2017

dwmw2 May 24, 2017

dwmw2 May 24, 2017

levitte May 24, 2017

dwmw2 May 24, 2017

petrovr commented May 24, 2017 via email

dwmw2 commented May 24, 2017

petrovr commented May 24, 2017 via email

dwmw2 commented May 24, 2017 •

edited

levitte commented May 24, 2017

dwmw2 commented May 24, 2017

levitte commented May 24, 2017

petrovr commented May 24, 2017 via email

dwmw2 commented May 24, 2017 •

edited

levitte commented May 24, 2017

dwmw2 commented May 24, 2017

dwmw2 commented May 24, 2017

levitte commented May 24, 2017

richsalz left a comment

levitte commented May 24, 2017

Document that PKCS#12 functions assume UTF-8 for passwords #3535

Document that PKCS#12 functions assume UTF-8 for passwords #3535

Conversation

dwmw2 commented May 24, 2017

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwmw2 May 24, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petrovr commented May 24, 2017 via email

dwmw2 commented May 24, 2017

petrovr commented May 24, 2017 via email

dwmw2 commented May 24, 2017 • edited

levitte commented May 24, 2017

dwmw2 commented May 24, 2017

levitte commented May 24, 2017

petrovr commented May 24, 2017 via email

dwmw2 commented May 24, 2017 • edited

levitte commented May 24, 2017

dwmw2 commented May 24, 2017

dwmw2 commented May 24, 2017

levitte commented May 24, 2017

richsalz left a comment

Choose a reason for hiding this comment

levitte commented May 24, 2017

dwmw2 May 24, 2017 •

edited

dwmw2 commented May 24, 2017 •

edited

dwmw2 commented May 24, 2017 •

edited