-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hmac.secure_compare() leaks information about length of strings #59266
Comments
The secure_compare() function immediately returns False when both strings don't have equal length. With the patch the run time of secure_compare() always depends on the length of the right side. It no longer gives away information about the length of the left side. The patch should be applied in combination with the patch in issue bpo-14955. |
secure_compare leaks the password always. Note that it takes different time to create a result of ord() depending whether it's <=100 or > 100 due to caching of small numbers. Such functions should be written in C. |
We should. Adding secure functions that aren't really secure is something we should rather avoid. :) Christian, are you willing to do that? |
fijal: while I agree with you, the limit for small ints has actually been pushed to 257 in recent CPythons. So it should still theoretically work --- of course, assuming a predictable CPU, which is wrong, and assuming a simple interpreter. (We can probably dig enough to find a timing issue even with CPython, and of course it's clear with any of the other Python interpreters out there.) |
I don't see how the function is going to leak this information when both this patch and the patch in bpo-14955 are applied. With http://bugs.python.org/file25801/secure-compare-fix-v2.patch ord() is no longer used and thus avoid the timing difference for integers > 256 (NSMALLPOSINTS is defined as 257, not 100). |
Ah unicodes. is encode('unicode-internal') independent on the string characters? I heavily doubt so. you leak at least some information through that function alone. |
With PEP-393 unicode objects can have several representations, which makes it unlikely that *really* constant-timing functions can be devised. Speaking about this particular patch, I don't understand the point. secure_compare() is obviously meant to be used on inputs of some known length, not arbitrary data. |
IMHO it's not obvious to all users. Better safe than sorry. ;) The invariant 'known and equal length' impresses an artificial limitation. Code may need to compare outside data with internal data without exposing too many details about the structure of the internal data. The other matter should be discussed at bpo-14955. |
I don’t want to be the killjoy but I find it highly questionable to add a function that is advertised as "secure" while we can't fully grok the complexities at play. If we can't produce a provable secure one, we should scrub the function for good; or at least rename it somehow. Unjustified perceived security is IMHO the worst possible case. |
The function is probably secure (modulo unseen bugs) in the For unicode strings things are a bit trickier though. Again, a C version |
Antoine, seriously? You want to explore a function that's called "secure" when the only thing you know about it is "probably secure"? This is extremely tricky business and I think it should be called secure only if you can prove it's secure. Otherwise it's plain insecure and should not be named that. |
export not explore. Why can't I edit my own post? |
What's the methodology to "prove" that it's secure? We could rename "secure" to "safe" to downtone it a bit, but it's still |
For unicode at the very least it's not an improvement at all. With the patch mentioned that does encode it's also not an improvement at all. Prove as in reason about the function in C and make sure it does not do any conditionals depending on the input data. This is much easier than it is in Python. We did this exercise for PyPy once, just for the sake of it. We looked at generated IR and made sure a comparison is not leaking any data. As far as the function goes right now - I don't know. For now following the entire code of long_bitwise is a lot of effort - I genuinely can't say that it'll be the same for all numbers of 0-255. Can you? It's easier with low-level language simply (And yes, this is one of the few cases where I would argue it makes sense to implement something in C :) |
I recommend to revert the addition of the function, given that it can't be made secure. |
I've two suggestions:
|
Hi Christian. It's either secure or it's not. If it's not, there is no point in introducing it at all as I don't think it's a good idea to have a kind-of-secure-but-i-dont-know functions in stdlib. If you restrict input to bytes it looks okish, but I looked at all the code that's invoked on the C side and it's quite a lot of code. Does you or anyone else actually go and review all the C code that's called via various operations to check if it does or does not depend on the value of various characters? I can't tell myself, it's too long. |
I don't think that's true. By that reasoning, Python is not secure so there's no point in fixing crashes or providing a hashlib module. That said, I think renaming to "total_compare" isn't really helpful. The point of the function is to be secure (as much as possible), not to do a "total" comparison. |
Maciej, please read http://mjg59.dreamwidth.org/13061.html "Secure" vs "not secure" is not a binary state - it's about making attacks progressively more difficult. Something that is secure against a casual script kiddie scatter gunning attacks on various sites with an automated script won't stand up to a systematic attack from a motivated attacker (also see the reporting on Flame and Stuxnet for what a *really* motivated and well resourced attacker can achieve). The hash randomisation changes didn't make Python completely secure against hashing DoS attacks - it just made them harder, by requiring attackers to figure out the hashing seed for the currently running process first. It's protecting against scatter gun attacks, not targeted ones. Performing a timing attack on Python's default short-circuiting comparison operation is *relatively easy* because the timing variations can be so large (send strings of increasing length until the time stops increasing - now you know the target digest length. Then try various initial characters until the time starts increasing again - now you know the first character. Repeat for the last character, then start with the second character and work through the string. Now you have the target hash, which you can try to crack offline at your leisure. The new comparison function is designed to significantly *reduce* the variance, thus leaking *less* information about the target hash, and making the attack *harder* (albeit, probably still not impossible). I agree with Christian's last two suggestions: change the name to total_compare, and only allow use on byte sequences (where the integer values are certain to be cached). Nothing should ever be called "safe" or "secure" in the standard library, because the immediate question from anyone that knows what they're talking about is "Secure/safe against what level of threat and what kind of threat"? People that *don't* know what they're doing will think "Secure/safe against everything" and that's dangerously misleading. Improving protection against remote timing attacks (e.g. by reducing comparison timing variance to the point where it is small relative to network timing variance) is a *lot* easier than protecting against local timing attacks. Protecting against someone with physical access to the machine hosting the target hash is harder still. Regardless, the target needs to be *improving the status quo*. Being able to tell people "using hmac.total_compare will make you less vulnerable to timing attacks than using ordinary short circuiting comparisons" is a *good thing*. We just need to be careful not to oversell it as making you *immune* to timing attacks. |
The problem here is, that _if_ you add a "secure" to the name of a method, it becomes binary. At least in the minds of the users. I know you address that, but that's the main point here.
Why not write a C function which can be more secure than Python code? I would argue that would be an general asset for the stdlib, not just for HMAC (therefore, I'd put it elsewhere). |
On 14.06.2012 14:26, Antoine Pitrou wrote:
The proper statement is "It's either time-independent or it's not". |
No, it's not. It's a *bad thing*. The two issues that have been If nobody else does, I'll revert the addition before the beta. Note |
Hi. This is what we did with Armin: http://bpaste.net/show/32123/ It seems there is still *some* information leaking via side-channels, although it's a bit unclear what. Feel free to play with it (try swapping, having different object etc.) |
I've attached a header for that implements a single C function timingsafe_eq(a, b). The file is targeted for Objects/stringlib/timingsafe.h. Please review the file. Comments
Open questions Where should I place the function? hashlib would be a nice place but there are multiple backends for hashlib. _hashopenssl.c seems wrong. |
stringlib is for type-generic functions, so I don't think it should be
We could handle all bytes-compatible objects, using the buffer API.
It looks ok to me.
Well, if it's meant to be a private function called from hmac, where |
It is timing unsafe.
The user can just do timingsafe_eq(a.decode('ascii'), About code. Instead (PyBytes_CheckExact(a) && PyBytes_CheckExact(b)) you |
You mean .encode()?
I agree.
What's the difference? They are the same. |
Yes, of cause. timingsafe_eq(a.encode('ascii'), b.encode('ascii')).
Laziness. If "a" (a secret key) is not bytes then PyBytes_CheckExact(b) |
How so?
I don't think that's the right answer, because people will instead e.g. |
I'm a bit rusty and I hope I got it right. The ASCII unicode case is a good idea and IMO timing safe. The buffer path is also timing safe once I have both views. The function leaks some timing information when an error occurs. Since the timing just reveals minimal information about the involved types and none about the bytes it's IMO safe. The acquiring of the buffer views may leak an unknown amount of timing data which may be an issue. The comparison is still safe. I've introduced a new module _hashlibfb (fb = fallback) for systems without openssl. I'm also open for a completely new module for future implementation of other digest, key derivation (PBKDF2) and password related C code. |
The patch has another flaw. The compiler may choose to fold and optimize code in _tscmp(). I'm going to declare the length of the right side and both char* as volatile. That should stop any compiler. I could also add some pragmas: MSVC: GCC 4.4+: |
I checked myself, and I see that most likely I was wrong. At least for
And what of that? It is outside of the timingsafe_eq function. People The error will be if code works for developer from ASCII word, and then |
No, the fastest way is to check the kind attribute of the unicode object in C code. That doesn't involve any additional conversion or Python function call. The function is deliberately limited. The ASCII fallback is very useful as most people will store hex encoded bytes of their passphrases in their databases. With ASCII support you can do timingsafe_compare(hex_from_db, hmac.hexdigest()). Maciej: The C function is one order of magnitude faster and the spread is one order smaller. 1e-07 is within the noise level on my idle computer. A busy server will have a higher noise level. |
In order to get the patch in before the beta release I'm willing to drop the promise and document that the function may leak some information if the arguments differ in length or the arguments' types are incompatible. |
That's not a problem to me. Programming errors are not the nominal case. |
Updated patch with volatile, better error report for non-ASCII strings and updated comments |
I'm not really happy with the addition of a separate extension module for a single private function. You could just put it in the operator module, for instance. Also, the idea was not to expose timingsafe_cmp but to use it in compare_digest(). |
Me neither but you didn't want it in the operator module in the first place (msg162882). :) Please make a decision. I'm happy to follow it. My idea is to drop the pure Python implementation of compare_digest() and just use the C implementation. |
Oh, sorry. I've changed my mind about it, but I think the operator module should only export a private function (and hmac or hashlib export it publicly). |
Doesn't belong into operator IMO. We used to have a "strop" module where it would have fitted... |
This is why I wanted to close the issue with the pure Python implementation, and punt on the question of a C accelerator for the moment. compare_digest is effectively the same as what all the Python web servers and frameworks do now for signature checking. Yes, it's more vulnerable to timing attacks than a C implementation, but it's going to to take a sophisticated attacker to attack that through the noise of network jitter. It's sufficiently hardened that's it's unlikely to be the weakest link in the security chain. For 3.4, I hope to see a discussion open up regarding the idea of something like a "securitytools" module that aims to provide some basic primitives for operations where Python's standard assumptions (such as flexibility and short circuiting behaviour) are a bad fit for security reasons. That would include exposing a C level full_compare option, as well as the core pbkdf2 algorithm. |
Again, it can be a private function in the operator module that happens |
Yes. |
Yes, we just need a place for the function. The operator module is a Nick: |
Strong +1 on that one. We could even consider adding bcrypt and scrypt as C isn't really an issue for us. Ideally we'd add a module with docs which both promote and leverage secure behavior. Basically how to realize http://www.daemonology.net/blog/2009-06-11-cryptographic-right-answers.html in Python. |
New patch. The compare_digest method now lives in the operator module as operator._compare_digest |
I see; I missed that your version was using &. In any case, I don't |
New patch. I've removed the special handling of PyBytes_CheckExact, support subclasses of str, non compact ASCII str and updated the docs. (Next time I'll create a sandbox and push my work to its own branch.) |
New changeset c82451eeb595 by Christian Heimes in branch 'default': |
Thanks to all for your input and assistance! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: