Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that PKCS#12 functions assume UTF-8 for passwords #3535

Closed
wants to merge 1 commit into from

Conversation

dwmw2
Copy link
Contributor

@dwmw2 dwmw2 commented May 24, 2017

Part of issue #3531

Checklist
  • documentation is added or updated
  • tests are added or updated


In particular, this means that passwords obtained through the default
UI_OpenSSL() method, which are in the locale character set, should not be
used without potentially being converted to UTF-8 first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... unless the charset is known to be UTF8 or ISO-8859-1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was implicit in the "potentially". And no, not if it's known to be ISO8859-1. That's why we have the hack in apps/pkcs12.c, because if it's ISO8859-1 and just happens to be a byte sequence that would have been valid UTF-8, it'll get interpreted as UTF-8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So having discounted ISO8859-1 which still needs conversion in the general case, you end up with a suggested wording of "...potentially being converted to UTF-8 first unless it was already known to be UTF-8.". Which is kind of redundant in its obviousness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ISO 8859-1 string would have to be very carefully crafted to be taken for a UTF-8 string...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that observation make it an acceptable bug?


In particular, this means that passwords obtained through the default
UI_OpenSSL() method, which are in the locale character set, should not be
used without potentially being converted to UTF-8 first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... unless...

=head1 NOTES

Both B<oldpass> and B<newpass> are NUL-terminated strings in the UTF-8 encoding.
If either is not a valid UTF-8 byte sequence, then that one is interpreted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this: Each string first interpreted as UTF-8 encoded strings; and if not valid, it is taken to be a ISO8859-1 instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to be careful to make it clear that if one of them isn't a valid UTF-8 sequence, then only that one is interpreted as ISO8859-1; not both. Other than slightly reducing the clarity of that point (in my highly subjective POV) I'm not sure of intent of the changes you suggest. I don't really object to phrasing it your way, but I'm not sure of the reasoning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to convey the same point, but thought my wording was more clear. Shrug.

Copy link
Contributor Author

@dwmw2 dwmw2 May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "Each of B<oldpass> and B<newpass> is independently interpreted as a string in the UTF-8 encoding. If it is not valid UTF-8, it is assumed to be ISO8859-1 instead".

Mostly I've just added 'independently' to your suggestion...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

In particular, this means that passwords obtained through the default
UI_OpenSSL() method, which are in the locale character set, should not be
used without potentially being converted to UTF-8 first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two paragraphs are contradicting each other. The first sort of says that ISO8859-1 is accepted, while the second seems to say that if the results of the use of UI_OpenSSL() is encoded in ISO8859-1, that would not be acceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first says that it'll be interpreted as ISO8859-1 if it can't be interpreted as UTF-8. Which means that if it's some arbitrary string in ISO8859-1... then you need to do what apps/pkcs11.c does and convert it to UTF-8. You can't just pass it through blindly and pray that it isn't one of the corner cases that gets misinterpreted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: the "potentially" is a shorthand for something along the lines of...

A password string returned by the UI should be converted from the locale character set to UTF-8. Such a conversion may be skipped if:
• the locale character set is UTF-8, or
• the password entered happens to contain only characters for which the locale character encoding is compatible with UTF-8 (i.e. contains only 7-bit ASCII characters, for most non-EBCDIC locales). or
• the password entered happens to contain only characters for which the locale character encoding is compatible with ISO8859-1, and the resulting byte sequence cannot be interpreted as valid UTF-8.

On the whole, I was happier not writing that, and sticking with "potentially" :)

=head1 NOTES

Each of B<oldpass> and B<newpass> is independently interpreted as a string in
the UTF-8 encoding. If it is not valid UTF-8, it is assumed to be ISO9859-1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8859, not 9859

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed; thanks.

@petrovr
Copy link

petrovr commented May 24, 2017 via email

@dwmw2
Copy link
Contributor Author

dwmw2 commented May 24, 2017

It is not acceptable documentation to state ''...in the locale character set ...". Locale make no sense here.

Absolutely not true. See discussion in #3532

@petrovr
Copy link

petrovr commented May 24, 2017 via email

@dwmw2
Copy link
Contributor Author

dwmw2 commented May 24, 2017

Nothing is ever "just a byte sequence". That's not just wrong; it's a completely unacceptable way of thinking. If you have no clue whether the byte sequence is ASCII or EBCDIC or UCS-2, why pass it at all? We have RAND_bytes() for byte sequences with no meaning...

A string requires a byte sequence, and an encoding. Without the latter, it cannot be a string.

Often the encoding is implicit, and is indicated by the locale. And historically, in some circumstances one would ignore the encoding and just pretend it's a byte sequence. But that was always a hack. For interchange with something like PKCS#12 where the key derivation specifies a given character set, the encoding must absolutely be taken into account.

@levitte
Copy link
Member

levitte commented May 24, 2017

Maybe one should write "which are assumed to be in the current locale". That would be a very precise way of putting it...

On the other hand, I wonder why you want to document UI_OpenSSL here at all, since it's already documented in the UI docs.

@dwmw2
Copy link
Contributor Author

dwmw2 commented May 24, 2017

"which are assumed to be" is a good way of putting it. I'll do that; thanks.

And I want to call it out explicitly because it's unexpected. Application developers are in a situation where OpenSSL gives them pieces A and B of the jigsaw... but if they put them together then surprise! it's broken. One day we'll fix that. Right now let's make it less of a surprise.

@levitte
Copy link
Member

levitte commented May 24, 2017

The problem with documenting stuff in multiple places is synchronization. It's bound to be differ between places, so I would strongly suggest to avoid that and to simply refer to UI_OpenSSL(3).

@petrovr
Copy link

petrovr commented May 24, 2017 via email

@dwmw2 dwmw2 force-pushed the pkcs12-utf8-doc branch 2 times, most recently from d2aa687 to eb5f0b4 Compare May 24, 2017 14:55
@dwmw2
Copy link
Contributor Author

dwmw2 commented May 24, 2017

The problem with documenting stuff in multiple places is synchronization. It's bound to be differ between places, so I would strongly suggest to avoid that and to simply refer to UI_OpenSSL(3).

OK, how about this then:

In particular, this means that passwords in the locale character set
(or code page on Windows) must potentially be converted to UTF-8 before
use. This may include passwords from local text files, or input from
the terminal or command line. Refer to L<UI_new(3)> for the behaviour of
the default UI_OpenSSL() UI method, for example.

@petrovr right now we are describing current functionality. And I thought I'd already dealt with your objection by pointing out that you're entirely wrong to object to "locale encoding". But as it happens I've doubly resolved it now with the above wording, since I'm no longer stating that any particular input is in the locale encoding; only that they may be.

EDIT: I've deleted my following comment; if we really must follow up that line of discussion let's do it only in #3532 (or maybe it would be better in #3531)

@levitte
Copy link
Member

levitte commented May 24, 2017

@dwmw2 wrote:

The problem with documenting stuff in multiple places is synchronization. It's bound to be differ between places, so I would strongly suggest to avoid that and to simply refer to UI_OpenSSL(3).

OK, how about this then:

In particular, this means that passwords in the locale character set
(or code page on Windows) must potentially be converted to UTF-8 before
use. This may include passwords from local text files, or input from
the terminal or command line. Refer to L<UI_new(3)> for the behaviour of
the default UI_OpenSSL() UI method, for example.

Much better. You can make the reference L<UI_OpenSSL(3)>, though.

@dwmw2
Copy link
Contributor Author

dwmw2 commented May 24, 2017

Much better. You can make the reference L<UI_OpenSSL(3)>, though.

Done.

@dwmw2
Copy link
Contributor Author

dwmw2 commented May 24, 2017

Strictly speaking this does now depend on #3532 now it refers to the documentation you add there

@levitte
Copy link
Member

levitte commented May 24, 2017

True. Let's hope another @openssl/team member wakes up and does a review...

Copy link
Contributor

@richsalz richsalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for master and any backports you want to do.

levitte pushed a commit that referenced this pull request May 24, 2017
Part of issue #3531

Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from #3535)
@levitte
Copy link
Member

levitte commented May 24, 2017

Merged.

master: cff85f3
1.1.0: df73fcd

levitte pushed a commit that referenced this pull request May 24, 2017
Part of issue #3531

Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from #3535)
(cherry picked from commit cff85f3)
@levitte levitte closed this May 24, 2017
@dwmw2 dwmw2 deleted the pkcs12-utf8-doc branch May 24, 2017 21:28
pracj3am pushed a commit to cdn77/openssl that referenced this pull request Aug 22, 2017
Part of issue openssl#3531

Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from openssl#3535)
(cherry picked from commit cff85f3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants