Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSSL_STORE 'file' loader: Try several pass phrase variants for PKCS#12 #6341

Closed

Conversation

levitte
Copy link
Member

@levitte levitte commented May 23, 2018

Pass phrases for PKCS#12 objects are problematic, as our conversion to
BMPString was very naïve in OpenSSL <1.1.0, and while being less so in
OpenSSL >=1.1.0, it still doesn't account for the application locale.

OSSL_STORE, on the other hand, is supposed to be fairly agnostic to
the types of objects it loads, and thereby all internal intricaties
concerning pass phrases. Therefore, we need to create a few variants
of the given pass phrase when trying to open a PKCS#12 object, with
the hope that this will work on PKCS#12 objects generated by OpenSSL
<1.1.0 as well as OpenSSL >=1.1.0 as well as other software.

While doing this, we need to make sure the locale is set correctly.
Fortunately, that's a simple setlocale(LC_ALL, "").

Ideas largely inspired by @dwmw2

Pass phrases for PKCS#12 objects are problematic, as our conversion to
BMPString was very naïve in OpenSSL <1.1.0, and while being less so in
OpenSSL >=1.1.0, it still doesn't account for the application locale.

OSSL_STORE, on the other hand, is supposed to be fairly agnostic to
the types of objects it loads, and thereby all internal intricaties
concerning pass phrases.  Therefore, we need to create a few variants
of the given pass phrase when trying to open a PKCS#12 object, with
the hope that this will work on PKCS#12 objects generated by OpenSSL
<1.1.0 as well as OpenSSL >=1.1.0 as well as other software.

While doing this, we need to make sure the locale is set correctly.
Fortunately, that's a simple setlocale(LC_ALL, "").

Ideas largely inspired by David Woodhouse
@levitte levitte added branch: master Merge to master branch branch: 1.1.1 Merge to OpenSSL_1_1_1-stable branch labels May 23, 2018
@levitte levitte changed the title OSSL_STORE 'file' loader: Try several pass phrase variants for PKCS#12 WIP: OSSL_STORE 'file' loader: Try several pass phrase variants for PKCS#12 May 23, 2018
@levitte
Copy link
Member Author

levitte commented May 23, 2018

@dwmw2, if you wouldn't mind having a look at this.

@levitte
Copy link
Member Author

levitte commented May 23, 2018

Note to everyone: this is an attempt to "do the right thing", and I'm sorry this came in so late.

* is therefore harmless.
*/
utf8_count = 0;
for (i = 0; i < wpass_len; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if (despite it not being strictly permitted) someone used UTF-16 and characters outside the BMP?

On Windows I think wchar_t is 16bit and mbstowcs() is actually going to give you UTF-16, and it'll all work out OK.

On other platforms you're actually going to get values >= 0x10000 in wpass[i] and that's not going to work out so well.

I'm not sure I care.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weeellllll, have you seen what our PKCS#12 routines do with >0x10000 pass phrase characters? With regards to that, I think this does the right thing as the platform allows.

unsigned char *pass_utf8 = NULL;
size_t utf8_count, i, j;

if ((wpass_len = mbstowcs(wpass, (char *)tpass,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows, does mbvstowcs convert from the code page used for file I/O, or the code page used for console I/O?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and which of those are you actually getting from the default PEM passphrase callback that OpenSSL provides? Both with and without the WIN32_UTF8 definition, or whatever that is...

Let's make sure that it's going to do the right thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have another think on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mbstowcs converts according to the locale that's set by the application. https://msdn.microsoft.com/en-us/library/x99tb11d.aspx has this to say about the empty locale string:

setlocale( LC_ALL, "" );
Sets the locale to the default, which is the user-default ANSI code page obtained from the operating system.

The default PEM callback gives exactly what it got from the built-in UI method, i.e. the exact characters it got fed, except on Windows with OPENSSL_WIN32_UTF8 defined, where whatever was given by the user is converted to UTF-8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows

Note that MSCV doesn't define __STDC_VERSION__, so this code isn't even compiled. But apart from that. Let's say you recommend application to call setlocale(LC_ALL,""). What does Windows make out of ""? Would it at any point try to interpret string as UTF-8? I suppose you can guess that the answer is no. "" means "use system locale/code page", and it's never UTF-8, at least not on my computer. And what happens if you pass the notorious 0xC3 0xAF? Same thing as in OpenSSL pre-1.1.0... Again, on my [Windows] computer. I suppose on computer with more exotic default code page it would do it differently... But let's say I call setlocale(LC_ALL,"foo_bar"). Literally. Windows seem to accept pretty much anything with underscore between. And it apparently folds the sequence to 0xef as if it was UTF-8 locale. But if you hit a meaningful string, e.g. "Greek_Greece", then you get different reading. And if you aim for example from passphrase-encoding you should pass "Hungarian_Hungary" to setlocale...

does mbvstowcs convert from the code page used for file I/O, or the code page used for console I/O?

It's a strange question. I mean how code page is used for file I/O? Application just reads data from file as sequence of bytes. That's it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possibly a case where we should check for OPENSSL_WIN32_UTF8 here as well, and simply assume that we can skip over the mbstowcs call and following wide to UTF-8 conversion in that case... does that sound right to you? (also assuming we find the right checks to get this code compiled on Windows to begin with)

I'm sorry, this suggestion is silly, it's simply doing nothing about the pass phrase, which has already been covered by variant 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, whether or not you hit correct password without OPENSSL_WIN32_UTF8 depends on whether or not application called setlocale(LC_ALL,"") (provided that code is compiled on Windows). Yes, you suggest in this request that openssl binary does it. But us providing library implies that it be linked with an application with have no control over...

Yes, it's true that this requires the cooperation of the application in that manner. Do you have a suggestion as to what to do about it? Should we setlocale(LC_ALL, "") internally?

Copy link
Member Author

@levitte levitte May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm off to bed, btw... be back tomorrow)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not calling it is equivalent to setlocale(LC_ALL, "C"), and with that, mbstowcs fails on anything non-ASCII, at least on Linux (I assume the same goes for Windows).

No. See above. Windows is happy to mbstowcs with "C" locale.

mbstowcs will translate the string to wide chars if it can.

Again, see above, Windows mbstowcs will always translate to something. Because setlocale(LC_ALL,"") always set "ANSI" locale.

without having the code page 65001

65001 is not a code page you can set as system one. There is no code page you can set that would make setlocale(LC_ALL,"") make mbstowcs interpret input as UTF-8. Though I might be wrong about this. It might happen that Asian code pages could... You can call setlocale(LC_ALL,"some-non-empty-string") that can make mbstowcs interpret input as UTF-8, but as long as it's setlocale(LC_ALL,""), it's never UTF-8 [Again, I'm not sure how things are on Asian Windows versions.]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we got a bit lost in this discussion, so let's get things a bit more straight.

The main issue is whether the application called setlocale or not, and if the given locale matches the encoding of the pass phrase, right? This isn't unique to Windows, BTW, but I think we got that already. In the Windows case, would it be better to call MultiByteToWideChar with CP_ACP? I have no issue doing that if that seems more feasible.

* read containers protected with correctly encoded pass
* phrases (i.e. PKCS#12 files from other softwares as well)
*/
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could perhaps be conditional on the locale charset not being UTF-8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I count on mbstowcs to do normalization, and worst case scenario, things get converted back to the same UTF-8, i.e. no harm done. I'd rather avoid adding on to the pile of checks and possible corner cases.

* the corner case when the pass phrase byte sequence might
* otherwise be mistaken for a legitimate UTF-8 string.
*/
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one could be conditional on... er ... the locale charset not being UTF-8 and the string actually having chars outside the ASCII set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or not being ISO8859-1. More caffeine required. Either way it's probably harmless as you only get here in the failure case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's ASCII, the resulting string will be exactly the same as the original. Like you said, this is the last fallback, the pure ASCII gets caught by the first case anyway, so I see no real reason to make extra checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back a bit...

The default PEM callback gives exactly what it got from the built-in UI method, i.e. the exact characters it got fed, except on Windows with OPENSSL_WIN32_UTF8 defined, where whatever was given by the user is converted to UTF-8

No. The PEM callback doesn't give you characters. It gives you bytes. Octets. chars, if you must. But they are not characters, in this sense. To interpret those bytes as characters you must know which encoding they use. It might be UTF-8. It might be something else. On Windows I believe it'll be UTF-8 if OPENSSL_WIN32_UTF8 is defined, and will be the ANSI code page (er, or is it the console code page, which is different?) if OPENSSL_WIN32_UTF8 is not defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, bytes. Got mixed up in the terminology

@dwmw2
Copy link
Contributor

dwmw2 commented May 23, 2018

Looks sane to me; thanks. Wants torture test cases, of course. I have some for OpenConnect (passwords of U+0102 U+017B which I try to open ∀ LC_CTYPE, etc.) which you're welcome to lift.

@dwmw2
Copy link
Contributor

dwmw2 commented May 23, 2018

We probably want to do something similar for PKCS#8 and encrypted PKCS#1 files, right? At least try the locale charset and the UTF-8 conversion, although some of the PKCS#12 complexity is not there.

@levitte
Copy link
Member Author

levitte commented May 23, 2018

I'm not going to do anything about PKCS#5 (which is what you mean when you mention PKCS#1 and PKCS#8), because that standard sees the pass phrase as a byte sequence with no inherent meaning, i.e. it can literally be any array of bytes. Limiting it to any given charset is wrong, and would potentially cause more problems than it solves.

@levitte
Copy link
Member Author

levitte commented May 23, 2018

Actually, let me take some time to think again about PKCS#5... My brain is still ruminating on that one....

@dwmw2
Copy link
Contributor

dwmw2 commented May 23, 2018

IIRC PKCS#5 suggests that ASCII or UTF-8 are good character sets to use, although it lamentably falls short of even saying SHOULD. You've seen that we're working on a new I-D which addresses that.

If the native charset doesn't work then converting to UTF-8 would certainly be a useful thing to try.

@@ -76,6 +77,8 @@ static void calculate_columns(DISPLAY_COLUMNS *dc)

static int apps_startup(void)
{
setlocale(LC_ALL, "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem with this of course is that you're making suggestion that all library applications should do that. What if application doesn't, would libcrypto still work? And how?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an application doesn't do this, you get the "C" locale, according to docs (linux man page as well as https://msdn.microsoft.com/en-us/library/x99tb11d.aspx). Linux man page says this:

On startup of the main program, the portable "C" locale is selected as default.

The only thing in libcrypto that's directly affected by this, as far as I can tell, is the mbstowcs call that's added by this PR. Can you think of anything else? I can't recall that we use anything else that depends on application locale...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the key question was actually "how" would it work. And not in sense "good" vs. "bad", but whether or not it would be producing varying results depending on whether or not application called setlocale(LC_ALL,""). Should it be concern, or would we be bettor of it behaviour was invariant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input string to mbstowcs doesn't agree with the locale, mbstowcs will return (size_t)-1, so using that variant will simply be invalidated... however, it currently bails out when this happens, that's probably not the right move...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just note that calling setlocale() in libcrypto would definitely NOT be the right thing to do as it is not thread safe and some applications might want to set different locale. It is probably obvious but I just wanted to explicitly state that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining question if it should make assumption that application did. [I for one would answer "no, it should not."]

Ok, so

  1. How do you detect that? Simply assume that if the current locale is "C", the application didn't set locale?
  2. What should be done if we determine the application didn't set it?

Copy link
Member Author

@levitte levitte May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BTW, I don't quite agree with you, I think it is reasonable to assume that locale isexpect the locale to be correctly set, and to fail variant 2 if it isn't, i.e. if mbstowcs fails)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable to assume that locale isexpect the locale to be correctly set

To "" or "some-other-string"? Here is fun Linux-specific question. What if non-UTF-8 locales for CLI application and xterm are different? Or one is UTF-8 and another is not...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For xterm? That might give an ugly windows title... I assume that you meant the shell running in the xterm. Either way, whichever that uses a locale that's different from what input, or different from what your font will display if that applies, then you're gonna see unexpected things happen (what you wrote being misinterpreted or getting weird characters displayed).

I cannot judge what an application uses for a locale, as long as text strings are encoded to match. If someone screws up their locale, all bets are off in my book.
(and of course, that makes UTF-8 darn practical)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot judge what an application uses for a locale, as long as text strings are encoded to match.

Exactly my point. And in my book it translates to "I don't want to be dependent on application to call setlocale, be it with "" or "some-non-empty-string"." All you can do is to assume that it's all sorted out by application and/or user [choosing adequate input method], and act in manner that is not dependent on what application code did [or didn't]. To be more specific don't call mbstowcs :-)

@levitte
Copy link
Member Author

levitte commented May 23, 2018

@dwmw2, I'll quote PKCS#5 again:

Throughout this document, a password is considered to be an octet
string of arbitrary length whose interpretation as a text string is
unspecified. In the interest of interoperability, however, it is
recommended that applications follow some common text encoding rules.
ASCII and UTF-8 are two possibilities. (ASCII is a subset of
UTF-8.)

The question that goes through my head is if OSSL_STORE should be regarded as an application or not. Regarding it as an application seems weird to me, considering it's very much a library API, but I'm not gonna pretend that my opinion is the absolute truth.

@dwmw2
Copy link
Contributor

dwmw2 commented May 23, 2018

Let's assume for the sake of argument that OSSL_STORE is not an application — or more to the point, should not behave by default in the ways which make sense for an application.

Now, let's realise that you've just doomed all the actual applications to tweaking and working around its "non-application" behaviour in order to Do The Right Thing. And further realise that most of them won't, and will continue to do the wrong thing.

Then let's revisit the assumption at the top, and suggest that actually, OSSL_STORE should make it really easy for applications to Do The Right Thing.

Back to the question at hand.... we have a password which, pedantry aside, is fairly much known to be a string and it's probably in the local charset. We have established that it doesn't unlock the file. Why would we not want to attempt conversion to UTF-8 and try that to see if it works? Or put another way, why would we want to force applications to have to do that additional step?

@levitte
Copy link
Member Author

levitte commented May 23, 2018

Back to the question at hand.... we have a password which, pedantry aside, is fairly much known to be a string and it's probably in the local charset.

Considering all the -passin options, your assumption can't be verified... I would even dare say that you may draw overly hopeful conclusions. However, that aside:

We have established that it doesn't unlock the file. Why would we not want to attempt conversion to UTF-8 and try that to see if it works? Or put another way, why would we want to force applications to have to do that additional step?

This does sound compelling... "what's the harm in doing this in addition to using the bytes straight as they are?" I'm unsure, and that tells me I need to sleep in it.

@dot-asm
Copy link
Contributor

dot-asm commented May 24, 2018

I'd like [to] argue that [as far as compatibility with 1.0.x goes] second variant is effectively redundant. Allow me to illustrate with examples.

We are on Linux. Imagine UTF-8 input method that was used with OpenSSL 1.0.x to enter Hungarian passphrase. Now you want to access the object with OpenSSL 1.1.x. If you use UTF-8 input method and call setlocale, second variant is essentially a no-op, as it would yield same output as input, which has already been exercised by 1st method(*). If you use UTF-8 input method and don't call setlocale, mbstowcs fails, variant is not exercised. If you use non-UTF-8 Hungarian input method and call setlocale, 2nd method will yield same password you would have entered with UTF-8 input method, and it's not right one, not for object created with 1.0.x. If you use non-UTF-8 Hungarian input method and don't call setlocale, mbstowcs fails, variant is not exercised.

We are still on Linux. Imagine non-UTF-8 Hungarian input method that was used with OpenSSL 1.0.x to enter Hungarian passphrase, one that can be misinterpreted if attempted as UTF-8. Now you want to access the object with OpenSSL 1.1.x. If you use UTF-8 input method, none of the three variants would actually do the job. You have no option, but to use non-UTF-8 input method. If you use non-UTF-8 Hungarian input method and call setlocale, 2nd method will yield same password you would have entered with UTF-8 input method, and it's not right one, not for object created with 1.0.x. If you use non-UTF-8 Hungarian input method and don't call setlocale, mbstowcs fails, variant is not exercised.

In other words in no combination second method does what hasn't been tried already or is right answer.

(*) To be completely accurate 1st call to PKCS12_verify_mac would actually succeed, because it does fall back to equivalent of 3rd method internally.

@dot-asm
Copy link
Contributor

dot-asm commented May 24, 2018

Now let's take Windows. In 1.0.x for Windows UTF-8 input method was not an option, so that as long as we're talking about backward compatibility for Hungarian passphrases, we should only consider non-UTF-8 Hungarian input method used to type password to 1.0.x. Now we want to access it with 1.1.x. If user sets OPENSSL_WIN32_UTF8, no password is right, irregardless whether or not application called setlocale. If user does not set OPENSSL_WIN32_UTF8 and application calls setlocale, 2nd method will yield same password you would have entered with UTF-8 input method, and it's not right one. If application doesn't call setlocale, 2nd method will yield same password as 3rd method. Even here 2nd method doesn't offer any advantage.

@dot-asm
Copy link
Contributor

dot-asm commented May 24, 2018

Do note that message prior last one had "as far as compatibility with 1.0.x goes" in opening sentence. This is because there is case when 2nd method is kind of meaningful. But[!] it has nothing to do with backward compatibility. Imagine standard-compliant PKCS#12 object protected with Hungarian passphrase. What does it take to read it with 1.1.x? UTF-8 input method would always do. The only question is what happens if user uses non-UTF-8 [Hungarian] input method. That's where 2nd method can become handy. It will translate the passphrase to UTF-8, and it will be the right passphrase. Now question is if this convenience feature is justified. Thing is that it will yield correct UTF-8 passphrase only if application called setlocale. So it all boils down to following. Do we simply recommend users to use UTF-8 input method with standard-compliant PKCS#12 objects? Or do we tell application developers call setlocale? I for one would say that former is sufficient. Overwhelming majority of Linux/Unix users are using UTF-8 input method, so they wouldn't even notice the problem. Now one can still argue that Windows user would tend to forget to set OPENSSL_WIN32_UTF8 variable. All right, it might be appropriate to make life a bit easier for them. But would I use mbstowcs? No, I'd use MultiByteToWideChar with CP_ACP. Why? Because it would be independent on whether or not application called setlocale.

@levitte
Copy link
Member Author

levitte commented May 24, 2018

Yes, you did come to the conclusion I hoped to see (what the main use of variant 2 is). I'll quote the comment that comes with it:

                /*
                 * Check variant 2: locale to UTF-8 translation
                 * Convert the pass phrase from the current application locale
                 * via wide chars to UTF-8.  This should ensure that we can
                 * read containers protected with correctly encoded pass
                 * phrases (i.e. PKCS#12 files from other softwares as well)
                 */

Maybe that comment needs clarification?

Now, for your suggestion that this exercise (variant 2) is necessary or not, I'll let @dwmw2 and you fight it out. Me, I'm thinking that if it serves some and is essentially a no-op to the rest, what's the harm?

Re your suggestion to use MultiByteToWideChar with CP_ACP on Windows, it's already on its way, coming up as soon as it's building on my machinery.

@dwmw2
Copy link
Contributor

dwmw2 commented May 24, 2018

We are still on Linux. Imagine non-UTF-8 Hungarian input method that was used with OpenSSL 1.0.x to enter Hungarian passphrase, one that can be misinterpreted if attempted as UTF-8. Now you want to access the object with OpenSSL 1.1.x. If you use UTF-8 input method, none of the three variants would actually do the job.

That is an uninteresting case. For backward compatibility we care about someone operating in a legacy 8-bit locale who is still operating in that same legacy 8-bit locale with an updated version of OpenSSL. But when we weren't making any attempt to deal with correct encodings, moving from one input encoding to another (e.g. taking my file created under ISO8859-1 and using it on a ISO8859-2 system, where my passphrase happens to have characters which are in both but not at the same codepoint) was never expected to work anyway. It's a red herring.

@dot-asm
Copy link
Contributor

dot-asm commented May 24, 2018

To summarize three messages [of mine] above. I'd omit 2nd method on Unix and replace mcstowcs with MultiByteToWideChar with CP_ACP on Windows. This way library remains neutral to whether or not application called setlocale. You may say "but doesn't it limit user's ability?" No. User is in control of input method, so that amount of tried combinations would be sufficient. It's just that some combinations are performed by library and one by user.

@dwmw2
Copy link
Contributor

dwmw2 commented May 24, 2018

Now, for your suggestion that this exercise (variant 2) is necessary or not, I'll let @dwmw2 and you fight it out.

It is clearly necessary. Sure, we can all come up with examples (like plain ASCII password) where the first attempt will work. But for a PKCS#12 file with that pathological password example in an ISO8859-2 locale, there are incontrovertibly three different ways it could have been created: OpenSSL 1.0, OpenSSL 1.1, and correct. And we need to attempt all three.

@dwmw2
Copy link
Contributor

dwmw2 commented May 24, 2018

Now let's take Windows. In 1.0.x for Windows UTF-8 input method was not an option, so that as long as we're talking about backward compatibility for Hungarian passphrases, we should only consider non-UTF-8 Hungarian input method used to type password to 1.0.x. Now we want to access it with 1.1.x. If user sets OPENSSL_WIN32_UTF8, no password is right, [regardless] whether or not application called setlocale.

Yeah. Perhaps with OPENSSL_WIN32_UTF8 we should actually be calling wcstombs() instead, because we expect the input bytes to be in UTF8? Except we don't... that's only true if the default PEM callback was used. If it came from somewhere else it probably is in the ANSI code page?

If user does not set OPENSSL_WIN32_UTF8 and application calls setlocale, 2nd method will yield sad same password you would have entered with UTF-8 input method, and it's not right one. If application doesn't call setlocale, 2nd method will yield same password as 3rd method. Even here 2nd method doesn't offer any advantage.

If the user doesn't set OPENSSL_WIN32_UTF8 then, as ever, they need to be operating in the same locale as before and then the first attempt with the actual byte sequence will work. In this case, the second and third attempts aren't needed. That doesn't mean they're never needed.

@dwmw2
Copy link
Contributor

dwmw2 commented May 24, 2018

I think using MultiByteToWideChar() with CP_ACP on Windows probably does make sense.

@levitte
Copy link
Member Author

levitte commented May 24, 2018

Perhaps with OPENSSL_WIN32_UTF8 we should actually be calling wcstombs() instead

That's effectively what happens in crypto/ui/ui_openssl.c (which is also called down the line by the default PEM callback, incidently)

# else
wpass_len = mbstowcs(wpass, (char *)tpass,
OSSL_NELEM(wpass));
# endif
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a ping to show that MultiByteToWideChar(CP_ACP, ... is now in place.

@levitte levitte changed the title WIP: OSSL_STORE 'file' loader: Try several pass phrase variants for PKCS#12 OSSL_STORE 'file' loader: Try several pass phrase variants for PKCS#12 May 31, 2018
@levitte
Copy link
Member Author

levitte commented May 31, 2018

Taking this out of WIP.

@dwmw2
Copy link
Contributor

dwmw2 commented May 31, 2018

Looking good. I think we just need to indicate that the passphrase input should be in the locale charset (to offset confusion about the ways in which we interpret it).

@dot-asm
Copy link
Contributor

dot-asm commented May 31, 2018

To clarify. I don't view it unreasonable to say "pass UTF-8 to libcrypto". It's not unreasonable to put burden of conversion to UTF-8 on application (or burden of choice of UTF-8 input method on user). It's really not bigger deal on contemporary Unices, in overwhelming majority of cases it's actually "do nothing special." As for "ĂŻ" with ISO8859-2 input method. It's correct that as it stands now, 1.1.0 will fail to produce compliant PKCS#12 with non-UTF-8 input method. It's just that so far(*) assumption was that user will use UTF-8 input method. And majority de-facto does and has been doing it for a while by now. So that chances are that UTF-8 method was used to enter "ĂŻ" already to 1.0.x(**). But if we're to aim to support even non-UTF-8 input method, that is not impossible to resolve in openssl command line tool. Note that I'm arguing against dependency on setlocale in librypto, but it doesn't mean that it would be inappropriate to perform conversion in apps/pkcs12.c. One can probably even argue that it would be reasonable to better support non-UTF-8 input methods, at least to facilitate "normalization" of legacy PKCS#12 object. I mean for far in order to "normalize" object created with 1.1.0 with non-UTF-8 input method user would have to a) choose original method and change password to ASCII-only pass phrase, then b) switch to UTF-8 method and change back to original pass phase. One can argue that it should be possible to do this in one step.

(*) Or rather upon 1.1.0 release.
(**) Yes, it means that it was reckoned that this would be more common backward compatibility problem than password in question entered with non-UTF-8 input method. I can't recall any problem reports about accessing legacy PKCS#12 files, so... Well, this can mean two things: it was actually adequate call, or people suffer in silence. But it's probably unreasonable to assume that literally everybody would suffer in silence...

@levitte
Copy link
Member Author

levitte commented Jun 1, 2018

So let's see if I can summarize, if an application wants to support standard compliant PKCS#12 objects and non-UTF-8 input method when loading through OSSL_STORE, there's the choice between:

  1. this PR and have the application call setlocale(LC_CTYPE, "")
  2. creating a UI_METHOD in the application that converts all its input to UTF-8 encoding (which, incidently, requires setlocale(LC_CTYPE, "")), and have the user convert any object that has been created using a non-compliant pass phrase...

In terms of burden on application writers and on users that have to handle non-compliant (in terms of pass phrase conversion into BMPString) PKCS#12 objects, I don't agree with you about what's reasonable.

Regarding (**), I would believe that the vast majority type ASCII only in their pass phrases... which might be their way, incidently, of "suffering in silence".

@levitte
Copy link
Member Author

levitte commented Jun 1, 2018

Side note: you're right to point out apps/pkcs12.c, 'cause it should ensure that the pass phrase that's fed to PKCS12_create and PKCS12_set_mac is UTF-8 (or well, I personally think that PKCS12_set_mac should do that work, but since @dot-asm obviously won't approve anything in libcrypto that relies on locale being set, there ya go...)

@dot-asm
Copy link
Contributor

dot-asm commented Jun 1, 2018

Regarding (**), I would believe that the vast majority type ASCII only in their pass phrases...

I would too. My remark was about those who actually typed non-ASCII characters. I mean "everybody" was not "every OpenSSL user", but "everybody who used non-ASCII characters in passphrase." Whole request is about them, and naturally the rest of this remark.

there's the choice between:

  1. this PR and have the application call setlocale(LC_CTYPE, "")
  2. creating a UI_METHOD in the application that converts all its input to UTF-8 encoding (which, incidently, requires setlocale(LC_CTYPE, "")), and have the user convert any object that has been created using a non-compliant pass phrase...

As for 2, part about having the user convert. It is desired direction, users should be encouraged to do that. As for rest of 2. It would be required only if application insists on using non-UTF-8 input method with standard-compliant PKCS#12 object. Question is do we have to meet that demand? Application insists -> application fixes. Note that UTF-8 input methods are common place and customarily is matter of end user's choice. And in unlike event UTF-8 input method is not an option, well, such system probably doesn't have mbstowcs either, and application would have to come up with some custom solution anyway. Or simply adhere to non-standard PKCS#12... Yes, you can sense contradiction. On one hand I'm saying that users should "normalize" and on the other hand suggest that there might be situations when "non-normalized" could be an option. Well, life is not fair... But note that not making any assumptions about locale in libcrypto aligns better with situation when you're stuck with non-UTF-8 input method. In sense that it would be more appropriate to interfere as little as possible with what application would have to do to pull it off.

@dot-asm
Copy link
Contributor

dot-asm commented Jun 1, 2018

@dot-asm obviously won't approve anything in libcrypto that relies on locale being set, there ya go...)

But it doesn't mean that somebody else couldn't. I merely express my opinion. It's just that me not approving it is an [unfortunate] expression of the opinion, but what can I do :-( It's not like I don't appreciate the effort (as I do!) or sentiment...

@levitte
Copy link
Member Author

levitte commented Jun 1, 2018

And in unlike event UTF-8 input method is not an option, well, such system probably doesn't have mbstowcs either, and application would have to come up with some custom solution anyway.

In this, you are wrong. VMS terminal drivers do not generate UTF-8, but there is locale support in DCL (you can set a locale, and you can compile a locale, but DCL itself isn't wide char aware, so editing command lines that happens to have UTF-8 characters is quirky), and the C RTL is fully C95 compliant in this regard, and is perfectly possible to use mbstowcs.

Basically, what locale and wide char support there is on VMS, it currently only applies to programs written in C. You can imagine that the lack of support in the native tools that run in a terminal session (DCL itself, various native editors such as EVE, .....) makes a UTF-8 input method not entirely desirable (as soon as there is one multibyte character on a line, the position of the cursor isn't trustable).

@levitte
Copy link
Member Author

levitte commented Jun 1, 2018

I've taken this to openssl-project@openssl.org (@dwmw2, are you listening to it?)

@dot-asm
Copy link
Contributor

dot-asm commented Jun 1, 2018

And in unlike event UTF-8 input method is not an option, well, such system probably doesn't have mbstowcs either,

In this, you are wrong.

All right. I'm utterly surprised that they actually don't support UTF-8 as input option while providing UTF-8 locale[s]. By this they kind of say "application is on its own when it comes to UTF-8, we just let it make something out if it, it's not our busyness." And so question would be why should our message be different? One can also put it as following. If target platform doesn't support UTF-8 as input option, would it be unreasonable for us to disclaim support for UTF-8 in general and standard-compliant PKCS#12 in particular? Once again, all this is about users who actually type non-ASCII characters in pass phrases.

@levitte
Copy link
Member Author

levitte commented Jun 1, 2018

At one time, there were plenty of programs on Unix that didn't provide a UTF-8 input method, while UTF-8 locales did exist. xterm was one such program (this has now changed, but it took a while before that happened). VMS' native programs haven't received the same love and care for quite a few years, and VSI are currently working their collective asses off to catch up on quite a number of topics (I do hope this is one of them).

@dot-asm
Copy link
Contributor

dot-asm commented Jun 1, 2018

VSI are currently working their collective asses off to catch up on quite a number of topics (I do hope this [UTF-8 input method] is one of them).

So it just will have to start working when they catch up... Once again, is it actually unreasonable for us to disclaim support if vendor doesn't provide UTF-8 input method? Note that it doesn't prevent application from solving the problem...

@dwmw2
Copy link
Contributor

dwmw2 commented Jun 1, 2018

@dot-asm I'm getting a bit lost. Can you be specific about what you're suggesting?

The STORE API should be defined to take filenames etc. in the locale-encoding like all normal APIs do except that the passphrase (and the passphrase alone?) shall be in UTF-8?

What does the default PEM callback do, in order to meet that requirement? Or do we stop providing a default? Or do we provide a default which silently does the wrong thing?

@levitte
Copy link
Member Author

levitte commented Jun 1, 2018

Errr, the STORE API takes URIs, and the exact syntax of a URI is well defined (percent encoding for anything non-ASCII), so I don't quite see how your argument holds there, @dwmw2 (but I should look over loader_file.c with regards to percent encoding).

Otherwise, @dot-asm's message seems to be clear, the application should be made to provide UTF-8 encoded pass phrases, period, and libcrypto shouldn't care how that happens, and the user should convert any object that has used a differently encoded pass phrase.

(note that in a way, this complies with your desire for UTF-8 uniformity, just not quite the way Postel stipulated)

@dwmw2
Copy link
Contributor

dwmw2 commented Jun 1, 2018

Errr, the STORE API takes URIs, and the exact syntax of a URI is well defined (percent encoding for anything non-ASCII), so I don't quite see how your argument holds there, @dwmw2 (but I should look over loader_file.c with regards to percent encoding).

Yeah, let's make test cases for that. Create a PKCS#12 object in a file called ♥.p12 and access it via a URI on Windows and other platforms.

@dwmw2
Copy link
Contributor

dwmw2 commented Jun 1, 2018

I've taken this to openssl-project@openssl.org (@dwmw2, are you listening to it?)

Listening only, it seems

Only OMC members and committers may post directly to this list.

@richsalz
Copy link
Contributor

richsalz commented Jun 1, 2018 via email

@dot-asm
Copy link
Contributor

dot-asm commented Jun 1, 2018

@dot-asm I'm getting a bit lost. Can you be specific about what you're suggesting?

the application should be made to provide UTF-8 encoded pass phrases, period

Right. Modulo fact that libcrypto would still have to tolerate and application occasionally pass non-UTF-8 input for backward compatibility with 1.0.x PKCS#12. Unfortunately :-( But IMHO libcrypto shouldn't "help" application by performing operations that are dependent on call to setlocale, i.e. calling mbstowcs, for accessing standard-compliant objects. One can put it like this. Demand UTF-8 input, but tolerate non-UTF-8 for backward compatibility, not forward.

@levitte
Copy link
Member Author

levitte commented Jun 1, 2018

The trouble is that the application cannot be expected to know what kind of object the UI_METHOD is asked to provide a pass phrase for.
So either we tell application writers that they should loop around the whole loading cycle (open, load, close) with one pass phrase variant for each iteration, or we must accept one well specified variant.

I have also fantasized around making a UI retriable, and thereby have the UI_METHOD serve more than one pass phrase variant. Haven't quite figured out the mechanics for that yet. That would be a fairly practical way that still puts the problem of variants (a problem that we have created, I might add) into the application's lap. I don't think that's very fair (because we created the problem in the first place), but if our app can constitute a good source to copy from, it's still a somewhat viable option.

Yet another way to deal with this is to add a wchar_t only UI API and only use that from OSSL_STORE. Of course, we can then forget about non-compliant objects from openssl 1.0.x then.

@dwmw2
Copy link
Contributor

dwmw2 commented Jun 2, 2018

We cannot lock ourselves in an ivory tower and pretend not to be affected by the locale character set. Every filename we pass to open(), every string we receive on the command line, every I/O from the console and most text files, will be in that character set. And most applications will use that in every char * API unless explicitly directed otherwise (and in fact even then, unless we enforce otherwise).

We have three choices:
• Continue to bury our heads in the sand and get things wrong (perhaps with the slight improvement of documenting that we get it wrong in non-UTF8 locales including Windows).
• Document that applications ought to call setlocale() if they want things to work right, and do appropriate conversions when we need a specific encoding (as we do for PKCS#12).
• Require UTF-8.

The problem with the last option is that we still have to do conversions back to the locale encoding, if we want to attempt to load PKCS#12 files created with OpenSSL <=1.0 (or even 1.1 in some cases). And choosing that because we "don't want to require setlocale()" is kind of nonsensical because how in $DEITY's name do we think they're going to manage to do that conversion if they haven't set the locale appropriately? And we'd need the locale set anyway if we want to load PKCS#12 files created with OpenSSL.

I understand the desire to keep life simple for ourselves and not properly address this rather complicated issue. I think it's entirely the wrong choice though, and is very detrimental to our users.

@levitte
Copy link
Member Author

levitte commented Jun 2, 2018

The project consensus seems to go toward requiring UTF-8 pass phrase encoding for the STORE API, and that any file that can't be loaded with that encoding has to be converted first. For PKCS#12, this has the benefit to push for standard compliant objects. It also means that the job of making sure there is a correctly encoded pass phrase depends entirely on the application, and only takes one attempt.

What's needed in this case is a simple conversion tool, designed to handle three object types:

  • legacy encrypted PEM
  • PKCS#8
  • PKCS#12

Did I forget anything?

@franc6
Copy link

franc6 commented Jun 2, 2018

There seems to be an assumption here that wchar_t represents UTF-32 on most UNIX systems, like it (generally) does on Linux. It does not. On AIX, HP-UX, and Solaris, wchar_t is a locale-specific encoding. I just don't want someone to come along and see this thread, and think that using mbstowcs will convert to UTF-32 on other UNIX systems, because it won't. I really don't think the code in this PR does what the author intended on most UNIX systems. And really, the majority of the incorrect encodings for passphrases are coming from Windows, which would be expecting UTF-16 encoded passphrases, so even on Linux, the problem is not addressed appropriately.

This first link is a little out of date, as several of the systems have more locales than listed, and have added more API support. The second link includes clear information regarding wchar_t on Solaris.

https://www.tldp.org/HOWTO/Unicode-HOWTO-6.html
https://docs.oracle.com/cd/E36784_01/html/E39536/gmwkm.html

@petrovr
Copy link

petrovr commented Jun 2, 2018 via email

@petrovr
Copy link

petrovr commented Jun 2, 2018 via email

@levitte
Copy link
Member Author

levitte commented Jun 5, 2018

So after some discussion on openssl-project, it's been concluded that the path forward is to reinforce that the PKCS12 functions really should be given UTF-8 encoded strings, period, and that the application is responsible for ensuring this. Anything else cannot be expected to be standard compliant. This could be phrased "undefined".

Furthermore, the OSSL_STORE API is new, and we are therefore free to define exactly what can be used through that API. In the same discussion, the conclusion was that we can simply require that the pass phrase input be UTF-8 encoded, and that any object that's made accessible through a URI to OSSL_STORE must be converted in such a way that the UTF-8 encoded passphrase gets you the expected result. That's simply a matter of unpacking the object with the old encoding and repack it with the UTF-8 encoded pass phrase.

With all that, this PR has become moot, as far as I can see. However, since we make demands on the application, I thought it wise to try and avoid creating non-compliant PKCS#12 objects (thanks @dot-asm for the inspiration), see #6392.

@levitte levitte closed this Jun 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch: master Merge to master branch branch: 1.1.1 Merge to OpenSSL_1_1_1-stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants