New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSSL_STORE 'file' loader: Try several pass phrase variants for PKCS#12 #6341
Conversation
Pass phrases for PKCS#12 objects are problematic, as our conversion to BMPString was very naïve in OpenSSL <1.1.0, and while being less so in OpenSSL >=1.1.0, it still doesn't account for the application locale. OSSL_STORE, on the other hand, is supposed to be fairly agnostic to the types of objects it loads, and thereby all internal intricaties concerning pass phrases. Therefore, we need to create a few variants of the given pass phrase when trying to open a PKCS#12 object, with the hope that this will work on PKCS#12 objects generated by OpenSSL <1.1.0 as well as OpenSSL >=1.1.0 as well as other software. While doing this, we need to make sure the locale is set correctly. Fortunately, that's a simple setlocale(LC_ALL, ""). Ideas largely inspired by David Woodhouse
@dwmw2, if you wouldn't mind having a look at this. |
Note to everyone: this is an attempt to "do the right thing", and I'm sorry this came in so late. |
* is therefore harmless. | ||
*/ | ||
utf8_count = 0; | ||
for (i = 0; i < wpass_len; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if (despite it not being strictly permitted) someone used UTF-16 and characters outside the BMP?
On Windows I think wchar_t
is 16bit and mbstowcs()
is actually going to give you UTF-16, and it'll all work out OK.
On other platforms you're actually going to get values >= 0x10000 in wpass[i]
and that's not going to work out so well.
I'm not sure I care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weeellllll, have you seen what our PKCS#12 routines do with >0x10000 pass phrase characters? With regards to that, I think this does the right thing as the platform allows.
crypto/store/loader_file.c
Outdated
unsigned char *pass_utf8 = NULL; | ||
size_t utf8_count, i, j; | ||
|
||
if ((wpass_len = mbstowcs(wpass, (char *)tpass, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Windows, does mbvstowcs convert from the code page used for file I/O, or the code page used for console I/O?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and which of those are you actually getting from the default PEM passphrase callback that OpenSSL provides? Both with and without the WIN32_UTF8 definition, or whatever that is...
Let's make sure that it's going to do the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have another think on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mbstowcs converts according to the locale that's set by the application. https://msdn.microsoft.com/en-us/library/x99tb11d.aspx has this to say about the empty locale string:
setlocale( LC_ALL, "" );
Sets the locale to the default, which is the user-default ANSI code page obtained from the operating system.
The default PEM callback gives exactly what it got from the built-in UI method, i.e. the exact characters it got fed, except on Windows with OPENSSL_WIN32_UTF8 defined, where whatever was given by the user is converted to UTF-8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Windows
Note that MSCV doesn't define __STDC_VERSION__
, so this code isn't even compiled. But apart from that. Let's say you recommend application to call setlocale(LC_ALL,""). What does Windows make out of ""? Would it at any point try to interpret string as UTF-8? I suppose you can guess that the answer is no. "" means "use system locale/code page", and it's never UTF-8, at least not on my computer. And what happens if you pass the notorious 0xC3 0xAF? Same thing as in OpenSSL pre-1.1.0... Again, on my [Windows] computer. I suppose on computer with more exotic default code page it would do it differently... But let's say I call setlocale(LC_ALL,"foo_bar"). Literally. Windows seem to accept pretty much anything with underscore between. And it apparently folds the sequence to 0xef as if it was UTF-8 locale. But if you hit a meaningful string, e.g. "Greek_Greece", then you get different reading. And if you aim for example from passphrase-encoding you should pass "Hungarian_Hungary" to setlocale...
does mbvstowcs convert from the code page used for file I/O, or the code page used for console I/O?
It's a strange question. I mean how code page is used for file I/O? Application just reads data from file as sequence of bytes. That's it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possibly a case where we should check for
OPENSSL_WIN32_UTF8
here as well, and simply assume that we can skip over thembstowcs
call and following wide to UTF-8 conversion in that case... does that sound right to you? (also assuming we find the right checks to get this code compiled on Windows to begin with)
I'm sorry, this suggestion is silly, it's simply doing nothing about the pass phrase, which has already been covered by variant 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, whether or not you hit correct password without OPENSSL_WIN32_UTF8 depends on whether or not application called setlocale(LC_ALL,"") (provided that code is compiled on Windows). Yes, you suggest in this request that openssl binary does it. But us providing library implies that it be linked with an application with have no control over...
Yes, it's true that this requires the cooperation of the application in that manner. Do you have a suggestion as to what to do about it? Should we setlocale(LC_ALL, "")
internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm off to bed, btw... be back tomorrow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not calling it is equivalent to setlocale(LC_ALL, "C"), and with that, mbstowcs fails on anything non-ASCII, at least on Linux (I assume the same goes for Windows).
No. See above. Windows is happy to mbstowcs with "C" locale.
mbstowcs will translate the string to wide chars if it can.
Again, see above, Windows mbstowcs will always translate to something. Because setlocale(LC_ALL,"") always set "ANSI" locale.
without having the code page 65001
65001 is not a code page you can set as system one. There is no code page you can set that would make setlocale(LC_ALL,"") make mbstowcs interpret input as UTF-8. Though I might be wrong about this. It might happen that Asian code pages could... You can call setlocale(LC_ALL,"some-non-empty-string") that can make mbstowcs interpret input as UTF-8, but as long as it's setlocale(LC_ALL,""), it's never UTF-8 [Again, I'm not sure how things are on Asian Windows versions.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we got a bit lost in this discussion, so let's get things a bit more straight.
The main issue is whether the application called setlocale
or not, and if the given locale matches the encoding of the pass phrase, right? This isn't unique to Windows, BTW, but I think we got that already. In the Windows case, would it be better to call MultiByteToWideChar
with CP_ACP
? I have no issue doing that if that seems more feasible.
* read containers protected with correctly encoded pass | ||
* phrases (i.e. PKCS#12 files from other softwares as well) | ||
*/ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could perhaps be conditional on the locale charset not being UTF-8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I count on mbstowcs to do normalization, and worst case scenario, things get converted back to the same UTF-8, i.e. no harm done. I'd rather avoid adding on to the pile of checks and possible corner cases.
* the corner case when the pass phrase byte sequence might | ||
* otherwise be mistaken for a legitimate UTF-8 string. | ||
*/ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one could be conditional on... er ... the locale charset not being UTF-8 and the string actually having chars outside the ASCII set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or not being ISO8859-1. More caffeine required. Either way it's probably harmless as you only get here in the failure case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's ASCII, the resulting string will be exactly the same as the original. Like you said, this is the last fallback, the pure ASCII gets caught by the first case anyway, so I see no real reason to make extra checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going back a bit...
The default PEM callback gives exactly what it got from the built-in UI method, i.e. the exact characters it got fed, except on Windows with OPENSSL_WIN32_UTF8 defined, where whatever was given by the user is converted to UTF-8
No. The PEM callback doesn't give you characters. It gives you bytes. Octets. char
s, if you must. But they are not characters, in this sense. To interpret those bytes as characters you must know which encoding they use. It might be UTF-8. It might be something else. On Windows I believe it'll be UTF-8 if OPENSSL_WIN32_UTF8
is defined, and will be the ANSI code page (er, or is it the console code page, which is different?) if OPENSSL_WIN32_UTF8
is not defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, bytes. Got mixed up in the terminology
Looks sane to me; thanks. Wants torture test cases, of course. I have some for OpenConnect (passwords of U+0102 U+017B which I try to open ∀ LC_CTYPE, etc.) which you're welcome to lift. |
We probably want to do something similar for PKCS#8 and encrypted PKCS#1 files, right? At least try the locale charset and the UTF-8 conversion, although some of the PKCS#12 complexity is not there. |
I'm not going to do anything about PKCS#5 (which is what you mean when you mention PKCS#1 and PKCS#8), because that standard sees the pass phrase as a byte sequence with no inherent meaning, i.e. it can literally be any array of bytes. Limiting it to any given charset is wrong, and would potentially cause more problems than it solves. |
Actually, let me take some time to think again about PKCS#5... My brain is still ruminating on that one.... |
IIRC PKCS#5 suggests that ASCII or UTF-8 are good character sets to use, although it lamentably falls short of even saying SHOULD. You've seen that we're working on a new I-D which addresses that. If the native charset doesn't work then converting to UTF-8 would certainly be a useful thing to try. |
@@ -76,6 +77,8 @@ static void calculate_columns(DISPLAY_COLUMNS *dc) | |||
|
|||
static int apps_startup(void) | |||
{ | |||
setlocale(LC_ALL, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem with this of course is that you're making suggestion that all library applications should do that. What if application doesn't, would libcrypto still work? And how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an application doesn't do this, you get the "C" locale, according to docs (linux man page as well as https://msdn.microsoft.com/en-us/library/x99tb11d.aspx). Linux man page says this:
On startup of the main program, the portable "C" locale is selected as default.
The only thing in libcrypto that's directly affected by this, as far as I can tell, is the mbstowcs call that's added by this PR. Can you think of anything else? I can't recall that we use anything else that depends on application locale...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the key question was actually "how" would it work. And not in sense "good" vs. "bad", but whether or not it would be producing varying results depending on whether or not application called setlocale(LC_ALL,""). Should it be concern, or would we be bettor of it behaviour was invariant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the input string to mbstowcs
doesn't agree with the locale, mbstowcs
will return (size_t)-1
, so using that variant will simply be invalidated... however, it currently bails out when this happens, that's probably not the right move...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just note that calling setlocale() in libcrypto would definitely NOT be the right thing to do as it is not thread safe and some applications might want to set different locale. It is probably obvious but I just wanted to explicitly state that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remaining question if it should make assumption that application did. [I for one would answer "no, it should not."]
Ok, so
- How do you detect that? Simply assume that if the current locale is "C", the application didn't set locale?
- What should be done if we determine the application didn't set it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(BTW, I don't quite agree with you, I think it is reasonable to assume that locale isexpect the locale to be correctly set, and to fail variant 2 if it isn't, i.e. if mbstowcs fails)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is reasonable to assume that locale isexpect the locale to be correctly set
To "" or "some-other-string"? Here is fun Linux-specific question. What if non-UTF-8 locales for CLI application and xterm are different? Or one is UTF-8 and another is not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For xterm? That might give an ugly windows title... I assume that you meant the shell running in the xterm. Either way, whichever that uses a locale that's different from what input, or different from what your font will display if that applies, then you're gonna see unexpected things happen (what you wrote being misinterpreted or getting weird characters displayed).
I cannot judge what an application uses for a locale, as long as text strings are encoded to match. If someone screws up their locale, all bets are off in my book.
(and of course, that makes UTF-8 darn practical)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot judge what an application uses for a locale, as long as text strings are encoded to match.
Exactly my point. And in my book it translates to "I don't want to be dependent on application to call setlocale, be it with "" or "some-non-empty-string"." All you can do is to assume that it's all sorted out by application and/or user [choosing adequate input method], and act in manner that is not dependent on what application code did [or didn't]. To be more specific don't call mbstowcs :-)
@dwmw2, I'll quote PKCS#5 again:
The question that goes through my head is if OSSL_STORE should be regarded as an application or not. Regarding it as an application seems weird to me, considering it's very much a library API, but I'm not gonna pretend that my opinion is the absolute truth. |
Let's assume for the sake of argument that OSSL_STORE is not an application — or more to the point, should not behave by default in the ways which make sense for an application. Now, let's realise that you've just doomed all the actual applications to tweaking and working around its "non-application" behaviour in order to Do The Right Thing. And further realise that most of them won't, and will continue to do the wrong thing. Then let's revisit the assumption at the top, and suggest that actually, OSSL_STORE should make it really easy for applications to Do The Right Thing. Back to the question at hand.... we have a password which, pedantry aside, is fairly much known to be a string and it's probably in the local charset. We have established that it doesn't unlock the file. Why would we not want to attempt conversion to UTF-8 and try that to see if it works? Or put another way, why would we want to force applications to have to do that additional step? |
Considering all the
This does sound compelling... "what's the harm in doing this in addition to using the bytes straight as they are?" I'm unsure, and that tells me I need to sleep in it. |
I'd like [to] argue that [as far as compatibility with 1.0.x goes] second variant is effectively redundant. Allow me to illustrate with examples. We are on Linux. Imagine UTF-8 input method that was used with OpenSSL 1.0.x to enter Hungarian passphrase. Now you want to access the object with OpenSSL 1.1.x. If you use UTF-8 input method and call setlocale, second variant is essentially a no-op, as it would yield same output as input, which has already been exercised by 1st method(*). If you use UTF-8 input method and don't call setlocale, mbstowcs fails, variant is not exercised. If you use non-UTF-8 Hungarian input method and call setlocale, 2nd method will yield same password you would have entered with UTF-8 input method, and it's not right one, not for object created with 1.0.x. If you use non-UTF-8 Hungarian input method and don't call setlocale, mbstowcs fails, variant is not exercised. We are still on Linux. Imagine non-UTF-8 Hungarian input method that was used with OpenSSL 1.0.x to enter Hungarian passphrase, one that can be misinterpreted if attempted as UTF-8. Now you want to access the object with OpenSSL 1.1.x. If you use UTF-8 input method, none of the three variants would actually do the job. You have no option, but to use non-UTF-8 input method. If you use non-UTF-8 Hungarian input method and call setlocale, 2nd method will yield same password you would have entered with UTF-8 input method, and it's not right one, not for object created with 1.0.x. If you use non-UTF-8 Hungarian input method and don't call setlocale, mbstowcs fails, variant is not exercised. In other words in no combination second method does what hasn't been tried already or is right answer. (*) To be completely accurate 1st call to PKCS12_verify_mac would actually succeed, because it does fall back to equivalent of 3rd method internally. |
Now let's take Windows. In 1.0.x for Windows UTF-8 input method was not an option, so that as long as we're talking about backward compatibility for Hungarian passphrases, we should only consider non-UTF-8 Hungarian input method used to type password to 1.0.x. Now we want to access it with 1.1.x. If user sets OPENSSL_WIN32_UTF8, no password is right, irregardless whether or not application called setlocale. If user does not set OPENSSL_WIN32_UTF8 and application calls setlocale, 2nd method will yield same password you would have entered with UTF-8 input method, and it's not right one. If application doesn't call setlocale, 2nd method will yield same password as 3rd method. Even here 2nd method doesn't offer any advantage. |
Do note that message prior last one had "as far as compatibility with 1.0.x goes" in opening sentence. This is because there is case when 2nd method is kind of meaningful. But[!] it has nothing to do with backward compatibility. Imagine standard-compliant PKCS#12 object protected with Hungarian passphrase. What does it take to read it with 1.1.x? UTF-8 input method would always do. The only question is what happens if user uses non-UTF-8 [Hungarian] input method. That's where 2nd method can become handy. It will translate the passphrase to UTF-8, and it will be the right passphrase. Now question is if this convenience feature is justified. Thing is that it will yield correct UTF-8 passphrase only if application called setlocale. So it all boils down to following. Do we simply recommend users to use UTF-8 input method with standard-compliant PKCS#12 objects? Or do we tell application developers call setlocale? I for one would say that former is sufficient. Overwhelming majority of Linux/Unix users are using UTF-8 input method, so they wouldn't even notice the problem. Now one can still argue that Windows user would tend to forget to set OPENSSL_WIN32_UTF8 variable. All right, it might be appropriate to make life a bit easier for them. But would I use mbstowcs? No, I'd use MultiByteToWideChar with CP_ACP. Why? Because it would be independent on whether or not application called setlocale. |
Yes, you did come to the conclusion I hoped to see (what the main use of variant 2 is). I'll quote the comment that comes with it:
Maybe that comment needs clarification? Now, for your suggestion that this exercise (variant 2) is necessary or not, I'll let @dwmw2 and you fight it out. Me, I'm thinking that if it serves some and is essentially a no-op to the rest, what's the harm? Re your suggestion to use |
That is an uninteresting case. For backward compatibility we care about someone operating in a legacy 8-bit locale who is still operating in that same legacy 8-bit locale with an updated version of OpenSSL. But when we weren't making any attempt to deal with correct encodings, moving from one input encoding to another (e.g. taking my file created under ISO8859-1 and using it on a ISO8859-2 system, where my passphrase happens to have characters which are in both but not at the same codepoint) was never expected to work anyway. It's a red herring. |
To summarize three messages [of mine] above. I'd omit 2nd method on Unix and replace mcstowcs with MultiByteToWideChar with CP_ACP on Windows. This way library remains neutral to whether or not application called setlocale. You may say "but doesn't it limit user's ability?" No. User is in control of input method, so that amount of tried combinations would be sufficient. It's just that some combinations are performed by library and one by user. |
It is clearly necessary. Sure, we can all come up with examples (like plain ASCII password) where the first attempt will work. But for a PKCS#12 file with that pathological password example in an ISO8859-2 locale, there are incontrovertibly three different ways it could have been created: OpenSSL 1.0, OpenSSL 1.1, and correct. And we need to attempt all three. |
Yeah. Perhaps with OPENSSL_WIN32_UTF8 we should actually be calling
If the user doesn't set |
I think using |
That's effectively what happens in |
# else | ||
wpass_len = mbstowcs(wpass, (char *)tpass, | ||
OSSL_NELEM(wpass)); | ||
# endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a ping to show that MultiByteToWideChar(CP_ACP, ...
is now in place.
Taking this out of WIP. |
Looking good. I think we just need to indicate that the passphrase input should be in the locale charset (to offset confusion about the ways in which we interpret it). |
To clarify. I don't view it unreasonable to say "pass UTF-8 to libcrypto". It's not unreasonable to put burden of conversion to UTF-8 on application (or burden of choice of UTF-8 input method on user). It's really not bigger deal on contemporary Unices, in overwhelming majority of cases it's actually "do nothing special." As for "ĂŻ" with ISO8859-2 input method. It's correct that as it stands now, 1.1.0 will fail to produce compliant PKCS#12 with non-UTF-8 input method. It's just that so far(*) assumption was that user will use UTF-8 input method. And majority de-facto does and has been doing it for a while by now. So that chances are that UTF-8 method was used to enter "ĂŻ" already to 1.0.x(**). But if we're to aim to support even non-UTF-8 input method, that is not impossible to resolve in openssl command line tool. Note that I'm arguing against dependency on setlocale in librypto, but it doesn't mean that it would be inappropriate to perform conversion in apps/pkcs12.c. One can probably even argue that it would be reasonable to better support non-UTF-8 input methods, at least to facilitate "normalization" of legacy PKCS#12 object. I mean for far in order to "normalize" object created with 1.1.0 with non-UTF-8 input method user would have to a) choose original method and change password to ASCII-only pass phrase, then b) switch to UTF-8 method and change back to original pass phase. One can argue that it should be possible to do this in one step. (*) Or rather upon 1.1.0 release. |
So let's see if I can summarize, if an application wants to support standard compliant PKCS#12 objects and non-UTF-8 input method when loading through
In terms of burden on application writers and on users that have to handle non-compliant (in terms of pass phrase conversion into BMPString) PKCS#12 objects, I don't agree with you about what's reasonable. Regarding (**), I would believe that the vast majority type ASCII only in their pass phrases... which might be their way, incidently, of "suffering in silence". |
Side note: you're right to point out |
I would too. My remark was about those who actually typed non-ASCII characters. I mean "everybody" was not "every OpenSSL user", but "everybody who used non-ASCII characters in passphrase." Whole request is about them, and naturally the rest of this remark.
As for 2, part about having the user convert. It is desired direction, users should be encouraged to do that. As for rest of 2. It would be required only if application insists on using non-UTF-8 input method with standard-compliant PKCS#12 object. Question is do we have to meet that demand? Application insists -> application fixes. Note that UTF-8 input methods are common place and customarily is matter of end user's choice. And in unlike event UTF-8 input method is not an option, well, such system probably doesn't have mbstowcs either, and application would have to come up with some custom solution anyway. Or simply adhere to non-standard PKCS#12... Yes, you can sense contradiction. On one hand I'm saying that users should "normalize" and on the other hand suggest that there might be situations when "non-normalized" could be an option. Well, life is not fair... But note that not making any assumptions about locale in libcrypto aligns better with situation when you're stuck with non-UTF-8 input method. In sense that it would be more appropriate to interfere as little as possible with what application would have to do to pull it off. |
But it doesn't mean that somebody else couldn't. I merely express my opinion. It's just that me not approving it is an [unfortunate] expression of the opinion, but what can I do :-( It's not like I don't appreciate the effort (as I do!) or sentiment... |
In this, you are wrong. VMS terminal drivers do not generate UTF-8, but there is locale support in DCL (you can set a locale, and you can compile a locale, but DCL itself isn't wide char aware, so editing command lines that happens to have UTF-8 characters is quirky), and the C RTL is fully C95 compliant in this regard, and is perfectly possible to use mbstowcs. Basically, what locale and wide char support there is on VMS, it currently only applies to programs written in C. You can imagine that the lack of support in the native tools that run in a terminal session (DCL itself, various native editors such as EVE, .....) makes a UTF-8 input method not entirely desirable (as soon as there is one multibyte character on a line, the position of the cursor isn't trustable). |
I've taken this to openssl-project@openssl.org (@dwmw2, are you listening to it?) |
All right. I'm utterly surprised that they actually don't support UTF-8 as input option while providing UTF-8 locale[s]. By this they kind of say "application is on its own when it comes to UTF-8, we just let it make something out if it, it's not our busyness." And so question would be why should our message be different? One can also put it as following. If target platform doesn't support UTF-8 as input option, would it be unreasonable for us to disclaim support for UTF-8 in general and standard-compliant PKCS#12 in particular? Once again, all this is about users who actually type non-ASCII characters in pass phrases. |
At one time, there were plenty of programs on Unix that didn't provide a UTF-8 input method, while UTF-8 locales did exist. xterm was one such program (this has now changed, but it took a while before that happened). VMS' native programs haven't received the same love and care for quite a few years, and VSI are currently working their collective asses off to catch up on quite a number of topics (I do hope this is one of them). |
So it just will have to start working when they catch up... Once again, is it actually unreasonable for us to disclaim support if vendor doesn't provide UTF-8 input method? Note that it doesn't prevent application from solving the problem... |
@dot-asm I'm getting a bit lost. Can you be specific about what you're suggesting? The STORE API should be defined to take filenames etc. in the locale-encoding like all normal APIs do except that the passphrase (and the passphrase alone?) shall be in UTF-8? What does the default PEM callback do, in order to meet that requirement? Or do we stop providing a default? Or do we provide a default which silently does the wrong thing? |
Errr, the STORE API takes URIs, and the exact syntax of a URI is well defined (percent encoding for anything non-ASCII), so I don't quite see how your argument holds there, @dwmw2 (but I should look over loader_file.c with regards to percent encoding). Otherwise, @dot-asm's message seems to be clear, the application should be made to provide UTF-8 encoded pass phrases, period, and libcrypto shouldn't care how that happens, and the user should convert any object that has used a differently encoded pass phrase. (note that in a way, this complies with your desire for UTF-8 uniformity, just not quite the way Postel stipulated) |
Yeah, let's make test cases for that. Create a PKCS#12 object in a file called |
Listening only, it seems
|
Individuals can post; it’s moderated.
|
Right. Modulo fact that libcrypto would still have to tolerate and application occasionally pass non-UTF-8 input for backward compatibility with 1.0.x PKCS#12. Unfortunately :-( But IMHO libcrypto shouldn't "help" application by performing operations that are dependent on call to setlocale, i.e. calling mbstowcs, for accessing standard-compliant objects. One can put it like this. Demand UTF-8 input, but tolerate non-UTF-8 for backward compatibility, not forward. |
The trouble is that the application cannot be expected to know what kind of object the UI_METHOD is asked to provide a pass phrase for. I have also fantasized around making a UI retriable, and thereby have the UI_METHOD serve more than one pass phrase variant. Haven't quite figured out the mechanics for that yet. That would be a fairly practical way that still puts the problem of variants (a problem that we have created, I might add) into the application's lap. I don't think that's very fair (because we created the problem in the first place), but if our app can constitute a good source to copy from, it's still a somewhat viable option. Yet another way to deal with this is to add a wchar_t only UI API and only use that from OSSL_STORE. Of course, we can then forget about non-compliant objects from openssl 1.0.x then. |
We cannot lock ourselves in an ivory tower and pretend not to be affected by the locale character set. Every filename we pass to We have three choices: The problem with the last option is that we still have to do conversions back to the locale encoding, if we want to attempt to load PKCS#12 files created with OpenSSL <=1.0 (or even 1.1 in some cases). And choosing that because we "don't want to require I understand the desire to keep life simple for ourselves and not properly address this rather complicated issue. I think it's entirely the wrong choice though, and is very detrimental to our users. |
The project consensus seems to go toward requiring UTF-8 pass phrase encoding for the STORE API, and that any file that can't be loaded with that encoding has to be converted first. For PKCS#12, this has the benefit to push for standard compliant objects. It also means that the job of making sure there is a correctly encoded pass phrase depends entirely on the application, and only takes one attempt. What's needed in this case is a simple conversion tool, designed to handle three object types:
Did I forget anything? |
There seems to be an assumption here that wchar_t represents UTF-32 on most UNIX systems, like it (generally) does on Linux. It does not. On AIX, HP-UX, and Solaris, wchar_t is a locale-specific encoding. I just don't want someone to come along and see this thread, and think that using mbstowcs will convert to UTF-32 on other UNIX systems, because it won't. I really don't think the code in this PR does what the author intended on most UNIX systems. And really, the majority of the incorrect encodings for passphrases are coming from Windows, which would be expecting UTF-16 encoded passphrases, so even on Linux, the problem is not addressed appropriately. This first link is a little out of date, as several of the systems have more locales than listed, and have added more API support. The second link includes clear information regarding wchar_t on Solaris. https://www.tldp.org/HOWTO/Unicode-HOWTO-6.html |
Richard Levitte wrote:
The project consensus seems to go toward requiring UTF-8 pass phrase encoding for the STORE API, and that any file that can't be loaded with that encoding has to be converted first. For PKCS#12, this has the benefit to push for standard compliant objects. It also means that the job of making sure there is a correctly encoded pass phrase depends entirely on the application, and only takes one attempt.
What's needed in this case is a simple conversion tool, designed to handle three object types:
- legacy encrypted PEM
- PKCS#8
- PKCS#12
Did I forget anything?
Perhaps "simple conversion tool" is already implemented.
History: switch OS from 8-bit locale to UTF-8. Process includes
(1)rename of files and (2)convert content of text-files. Mostly it is
manual process although shell scripts could help "batch" rename.
For instance in the past I use following:
===========
#! /bin/sh
for F in ${1+"$@"}; do
test -e "$F" || continue
N=`echo $F | iconv -f ISO-8859-5 -t UTF-8`
test "$F" = "$N" && continue
mv -v "$F" "$N"
done
============
Process cannot be fully automatic just because another set of filenames
or content was in CP1251.
So why openssl commands to differ from above manual scenario?
This issue is about UI method (used to get passwords) but why to
implement 'conversion' for "Store API"?
I just test password on PCKS12 files using KOI8-R codeset. openssl
v1.0.2o and current master work fine using -passin file:... argument and
I could recreate PKCS12 file using password in UTF-8 codeset obtained
command line (... pass:... ).
I think that users switch from 8-bit locale to UTF-8 must perform manual
password 'conversion" on PKCS#12 and etc files.
Regards,
Roumen Petrov
|
Richard Levitte wrote:
The trouble is that the application cannot be expected to know what kind of object the UI_METHOD is asked to provide a pass phrase for.
This is true. More over application could launch another application
that prompt for 'password'.
So either we tell application writers that they should loop around the whole loading cycle (open, load, close) with one pass phrase variant for each iteration, or we must accept one well specified variant.
As application writer I don't expect to perform some conversions.
PKIX-SSH supports OpenSSL "Store-API" to load PKCS#12 files. It uses own
UI method that launch ASKPASS program.
I don't know how many such programs exists in the world.
The common if that ASKPASS program runs in X-Window locale not in locale
of ssh command.
Remark: usually both match but in some cases are different (connection
to old systems).
I have also fantasized around making a UI retriable, and thereby have the UI_METHOD serve more than one pass phrase variant. Haven't quite figured out the mechanics for that yet. That would be a fairly practical way that still puts the problem of variants (a problem that *we* have created, I might add) into the application's lap. I don't think that's very fair (because *we* created the problem in the first place), but if our app can constitute a good source to copy from, it's still a somewhat viable option.
I'm not sure that issue is with openssl implementation. This is normal
evolution : first 7-bit standard, then 8-bit locales, then UNICODE locales.
Yet another way to deal with this is to add a wchar_t only UI API and use and only that from OSSL_STORE. Of course, we can then forget about non-compliant objects from openssl 1.0.x then.
I thin that current documentation about pass management is enough - I
mean PR for password callback.
On modern OS-es (POSIX 2k8 compatible) OpenSSL library always could
check user locale (code snippet):
=============
locale_t loc;
loc = newlocale(LC_CTYPE_MASK, "", (locale_t) 0);
if (loc == (locale_t) 0) {
perror("newlocale fail");
return 1;
}
printf("CODESET=%s\n", nl_langinfo_l(CODESET, loc));
freelocale(loc);
=============
Above code bay-pass application settings (setlocale) and returns value
according application environment.
In X-terminals password (or user input) is more closed to environment of
X-window then application locale.
Perhaps based on CODESET library could reject some inputs?
What about if codeset is not ANSI_X3.4-1968, UTF-8 or ISO-8859-1 to
raise error when UI is called to get password for PKCS#12 or etc files ?
Remark: UI method could be used for other prompts,some confirmations
for instance.
Regards,
Roumen
|
So after some discussion on openssl-project, it's been concluded that the path forward is to reinforce that the PKCS12 functions really should be given UTF-8 encoded strings, period, and that the application is responsible for ensuring this. Anything else cannot be expected to be standard compliant. This could be phrased "undefined". Furthermore, the OSSL_STORE API is new, and we are therefore free to define exactly what can be used through that API. In the same discussion, the conclusion was that we can simply require that the pass phrase input be UTF-8 encoded, and that any object that's made accessible through a URI to OSSL_STORE must be converted in such a way that the UTF-8 encoded passphrase gets you the expected result. That's simply a matter of unpacking the object with the old encoding and repack it with the UTF-8 encoded pass phrase. With all that, this PR has become moot, as far as I can see. However, since we make demands on the application, I thought it wise to try and avoid creating non-compliant PKCS#12 objects (thanks @dot-asm for the inspiration), see #6392. |
Pass phrases for PKCS#12 objects are problematic, as our conversion to
BMPString was very naïve in OpenSSL <1.1.0, and while being less so in
OpenSSL >=1.1.0, it still doesn't account for the application locale.
OSSL_STORE, on the other hand, is supposed to be fairly agnostic to
the types of objects it loads, and thereby all internal intricaties
concerning pass phrases. Therefore, we need to create a few variants
of the given pass phrase when trying to open a PKCS#12 object, with
the hope that this will work on PKCS#12 objects generated by OpenSSL
<1.1.0 as well as OpenSSL >=1.1.0 as well as other software.
While doing this, we need to make sure the locale is set correctly.
Fortunately, that's a simple setlocale(LC_ALL, "").
Ideas largely inspired by @dwmw2