New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document that PKCS#12 functions assume UTF-8 for passwords #3535
Conversation
doc/man3/PKCS12_newpass.pod
Outdated
|
||
In particular, this means that passwords obtained through the default | ||
UI_OpenSSL() method, which are in the locale character set, should not be | ||
used without potentially being converted to UTF-8 first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... unless the charset is known to be UTF8 or ISO-8859-1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was implicit in the "potentially". And no, not if it's known to be ISO8859-1. That's why we have the hack in apps/pkcs12.c
, because if it's ISO8859-1 and just happens to be a byte sequence that would have been valid UTF-8, it'll get interpreted as UTF-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So having discounted ISO8859-1 which still needs conversion in the general case, you end up with a suggested wording of "...potentially being converted to UTF-8 first unless it was already known to be UTF-8.". Which is kind of redundant in its obviousness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ISO 8859-1 string would have to be very carefully crafted to be taken for a UTF-8 string...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that observation make it an acceptable bug?
doc/man3/PKCS12_parse.pod
Outdated
|
||
In particular, this means that passwords obtained through the default | ||
UI_OpenSSL() method, which are in the locale character set, should not be | ||
used without potentially being converted to UTF-8 first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... unless...
doc/man3/PKCS12_newpass.pod
Outdated
=head1 NOTES | ||
|
||
Both B<oldpass> and B<newpass> are NUL-terminated strings in the UTF-8 encoding. | ||
If either is not a valid UTF-8 byte sequence, then that one is interpreted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like this: Each string first interpreted as UTF-8 encoded strings; and if not valid, it is taken to be a ISO8859-1 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to be careful to make it clear that if one of them isn't a valid UTF-8 sequence, then only that one is interpreted as ISO8859-1; not both. Other than slightly reducing the clarity of that point (in my highly subjective POV) I'm not sure of intent of the changes you suggest. I don't really object to phrasing it your way, but I'm not sure of the reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to convey the same point, but thought my wording was more clear. Shrug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "Each of B<oldpass> and B<newpass> is independently interpreted as a string in the UTF-8 encoding. If it is not valid UTF-8, it is assumed to be ISO8859-1 instead".
Mostly I've just added 'independently' to your suggestion...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
doc/man3/PKCS12_newpass.pod
Outdated
In particular, this means that passwords obtained through the default | ||
UI_OpenSSL() method, which are in the locale character set, should not be | ||
used without potentially being converted to UTF-8 first. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two paragraphs are contradicting each other. The first sort of says that ISO8859-1 is accepted, while the second seems to say that if the results of the use of UI_OpenSSL()
is encoded in ISO8859-1, that would not be acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first says that it'll be interpreted as ISO8859-1 if it can't be interpreted as UTF-8. Which means that if it's some arbitrary string in ISO8859-1... then you need to do what apps/pkcs11.c
does and convert it to UTF-8. You can't just pass it through blindly and pray that it isn't one of the corner cases that gets misinterpreted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: the "potentially" is a shorthand for something along the lines of...
A password string returned by the UI should be converted from the locale character set to UTF-8. Such a conversion may be skipped if:
• the locale character set is UTF-8, or
• the password entered happens to contain only characters for which the locale character encoding is compatible with UTF-8 (i.e. contains only 7-bit ASCII characters, for most non-EBCDIC locales). or
• the password entered happens to contain only characters for which the locale character encoding is compatible with ISO8859-1, and the resulting byte sequence cannot be interpreted as valid UTF-8.
On the whole, I was happier not writing that, and sticking with "potentially" :)
doc/man3/PKCS12_newpass.pod
Outdated
=head1 NOTES | ||
|
||
Each of B<oldpass> and B<newpass> is independently interpreted as a string in | ||
the UTF-8 encoding. If it is not valid UTF-8, it is assumed to be ISO9859-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8859, not 9859
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed; thanks.
dwmw2 wrote:
* Document that PKCS#12 functions assume UTF-8 for passwords
It is not acceptable documentation to state ''...in the locale character
set ...".
Locale make no sense here.
|
Absolutely not true. See discussion in #3532 |
dwmw2 wrote:
> It is not acceptable documentation to state ''...in the locale character set ...". Locale make no sense here.
Absolutely not true. See discussion in #3532
Whet is scope of request - to design future or to describe current
functionality ? "locale character set" is not correct for current . It
is just byte sequence .
|
Nothing is ever "just a byte sequence". That's not just wrong; it's a completely unacceptable way of thinking. If you have no clue whether the byte sequence is ASCII or EBCDIC or UCS-2, why pass it at all? We have RAND_bytes() for byte sequences with no meaning... A string requires a byte sequence, and an encoding. Without the latter, it cannot be a string. Often the encoding is implicit, and is indicated by the locale. And historically, in some circumstances one would ignore the encoding and just pretend it's a byte sequence. But that was always a hack. For interchange with something like PKCS#12 where the key derivation specifies a given character set, the encoding must absolutely be taken into account. |
Maybe one should write "which are assumed to be in the current locale". That would be a very precise way of putting it... On the other hand, I wonder why you want to document UI_OpenSSL here at all, since it's already documented in the UI docs. |
"which are assumed to be" is a good way of putting it. I'll do that; thanks. And I want to call it out explicitly because it's unexpected. Application developers are in a situation where OpenSSL gives them pieces A and B of the jigsaw... but if they put them together then surprise! it's broken. One day we'll fix that. Right now let's make it less of a surprise. |
The problem with documenting stuff in multiple places is synchronization. It's bound to be differ between places, so I would strongly suggest to avoid that and to simply refer to UI_OpenSSL(3). |
dwmw2 wrote:
Nothing is ever "just a byte sequence". That's not just wrong; it's a completely unacceptable way of thinking. If you have no clue whether the byte sequence is ASCII or EBCDIC, why pass it at all? We have RAND_bytes() for byte sequences with no meaning...
:)
You did receive on my question: What is scope of request - to design
future or to describe current functionality ?
For current functionality documentation cannot state "...locale encoding.."
|
d2aa687
to
eb5f0b4
Compare
OK, how about this then:
@petrovr right now we are describing current functionality. And I thought I'd already dealt with your objection by pointing out that you're entirely wrong to object to "locale encoding". But as it happens I've doubly resolved it now with the above wording, since I'm no longer stating that any particular input is in the locale encoding; only that they may be. EDIT: I've deleted my following comment; if we really must follow up that line of discussion let's do it only in #3532 (or maybe it would be better in #3531) |
@dwmw2 wrote:
Much better. You can make the reference |
Done. |
Strictly speaking this does now depend on #3532 now it refers to the documentation you add there |
True. Let's hope another @openssl/team member wakes up and does a review... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for master and any backports you want to do.
Part of issue openssl#3531 Reviewed-by: Rich Salz <rsalz@openssl.org> Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from openssl#3535) (cherry picked from commit cff85f3)
Part of issue #3531
Checklist