Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent the compiler from converting "secure" memcmp() into canonical one. #102

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
5 participants
@Dmitry-Me
Copy link
Contributor

Dmitry-Me commented May 5, 2014

Current implementation of "secure" memcmp() is not that secure - the compiler can use LTO and see that the calling code only tests for zero values and then convert it into canonical, not "secure" version.

@richsalz richsalz added the reviewed label Jan 21, 2016

@richsalz

This comment has been minimized.

Copy link
Contributor

richsalz commented Jan 22, 2016

+1

@kroeckx

This comment has been minimized.

Copy link
Member

kroeckx commented Jan 23, 2016

So I wonder if this problem returns if we change return x to return x != 0

@Dmitry-Me

This comment has been minimized.

Copy link
Contributor Author

Dmitry-Me commented Jan 25, 2016

@kroeckx Do you mean the new code or the original code?

@kroeckx

This comment has been minimized.

Copy link
Member

kroeckx commented Jan 25, 2016

@briansmith

This comment has been minimized.

Copy link

briansmith commented on crypto/cryptlib.c in ad0c467 Jan 25, 2016

Why does volatile help in the face of an optimizing compiler? The compiler can see that a pointer to non-volatile memory was passed as in_a and in_b and then optimize away the volatile, right? It would be helpful to cite the part of the C standard that says that isn't allowed, otherwise.

This comment has been minimized.

Copy link
Owner Author

Dmitry-Me replied Jan 25, 2016

The compiler is not allowed to change the behavior of code if that change affects sequence of I/O library calls and reads/writes to volatile variables. This change doesn't strictly guarantee that those variables are treated as volatile (because original variables are not necessarily declared as volatile) but current compilers treat volatile* pointers as if they pointed to variables declared volatile.

@Dmitry-Me

This comment has been minimized.

Copy link
Contributor Author

Dmitry-Me commented Jan 25, 2016

@kroeckx Such change should not affect the new code. If the compiler treats volatile* as "read this stuff no matter what" then it will compile this code as expected no matter what the return statement is. If the compiler deduces that original variables are not really volatile and decides to ignore volatile* then it can break either code. The calling code checks the returned value for != 0 anyway so with LTO on the two return statements do effectively the same.

@briansmith

This comment has been minimized.

Copy link
Contributor

briansmith commented Jan 26, 2016

This change doesn't strictly guarantee that those variables are treated as volatile (because original variables are not necessarily declared as volatile) but current compilers treat volatile* pointers as if they pointed to variables declared volatile.

See http://gcc.godbolt.org/#%7B%22version%22%3A3%2C%22filterAsm%22%3A%7B%22labels%22%3Atrue%2C%22directives%22%3Atrue%2C%22commentOnly%22%3Atrue%2C%22intel%22%3Atrue%7D%2C%22compilers%22%3A%5B%7B%22sourcez%22%3A%22MQSwdgxgNgrgJgUwAQB4DOAXOiBmA6ACwD4AoEzAQwxAiQDcB7EOJAWwVbQQwH00AKRsyQAqegBokaEAC8EPDEjABKJAG8SSegyhUQUZDDDSA5mAQsIBCgCdRSAA5IAvEkE69BpEdPnL1uxFlOgBuTSQcBht%2BaTkFJBAXJAAGEITUJTSAaiyQVQ0tLQcAbRAAXSTU8IBfElqyIRYcAEZ%2BfPCfEDMLJCtbJAAPYoAWMrCtdk5uPn4ByViEBhxB5TD6xoiAJjb1DuMuv16AweKAdjHwya5eATmpWUXlgdW6kiAAA%3D%3D%22%2C%22compiler%22%3A%22g530%22%2C%22options%22%3A%22-x%20c%20-std%3Dc99%20-O3%22%7D%5D%7D.

See also http://gcc.godbolt.org/#%7B%22version%22%3A3%2C%22filterAsm%22%3A%7B%22labels%22%3Atrue%2C%22directives%22%3Atrue%2C%22commentOnly%22%3Atrue%2C%22intel%22%3Atrue%7D%2C%22compilers%22%3A%5B%7B%22sourcez%22%3A%22MQSwdgxgNgrgJgUwAQB4DOAXOiBmA6ACwD4AoEzAQwxAiQDcB7EOJAWwVbQQwH00AKRsyQAqegBokaEAC8EPDEjABKJAG8SSegyhUQUZDDDSA5mAQsIBCgCdR23dQNIADkgC8SQTr3Ojp80trOzFGR30EZToAbk0kHAYbfmk5BSQQDyQABmj01CVcgGpCkFUNLS0XAG0QAF1MnLiAXxIWsiEWHABGfjK4%2FxAzCyQrWyQADyqAFlrYrXZObj5%2BcckUhAYcCeVYto74gCZe9X7jQcCR4ImqgHZZuIWuXgFVqVkNrfGd1pIgA%3D%3D%22%2C%22compiler%22%3A%22g530%22%2C%22options%22%3A%22-x%20c%20-std%3Dc99%20-O3%22%7D%5D%7D.

@Dmitry-Me

This comment has been minimized.

Copy link
Contributor Author

Dmitry-Me commented Jan 26, 2016

@briansmith Nice!!! Here's what I found so far:

  1. It isn't reproduced in gcc 4.5.3 but is reproduced in 4.6.4 so it's a rather new change.
  2. It only happens when size of array is 1, 2 or 4 bytes and is allocated on stack.
  3. Size to overwrite doesn't matter (as long as the overwrite doesn't run past array borders).
  4. clang 3.7 doesn't exhibit this behavior.

So this is a bug in gcc which must be fixed. I'd say it's a serious bug because we have no idea what exactly causes it and so it may surface again later. However at this moment its impact isn't large.

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 26, 2016

@briansmith Careful! You didn't have the same source code in both cases. In the first case you had this:

#include <stddef.h>

static void memset_s(void * v, size_t n) {
  volatile unsigned char * p = (volatile unsigned char *)v;
  for(size_t i = 0; i < n; ++i) {
    p[i] = 0;
  }
}


void f1() {
  unsigned char x[4];
  memset_s(x, sizeof x);
}

void f2() {
  unsigned char x[7];
  memset_s(x, sizeof x);
}

while in the second case you had this:

#include <stddef.h>

static void memset_s(void * v, size_t n) {
  volatile unsigned char * volatile p = (volatile unsigned char * volatile)v;
  for(size_t i = 0; i < n; ++i) {
    p[i] = 0;
  }
}


void f1() {
  unsigned char x[4];
  memset_s(x, sizeof x);
}

void f2() {
  unsigned char x[7];
  memset_s(x, sizeof x);
}

Subtle -but huge- difference.

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 26, 2016

And this is the difference:

--- /tmp/x  2016-01-26 13:19:02.698960078 -0600
+++ /tmp/y  2016-01-26 13:19:10.222929880 -0600
@@ -1,7 +1,7 @@
 #include <stddef.h>

 static void memset_s(void * v, size_t n) {
-  volatile unsigned char * p = (volatile unsigned char *)v;
+  volatile unsigned char * volatile p = (volatile unsigned char * volatile)v;
   for(size_t i = 0; i < n; ++i) {
     p[i] = 0;
   }

So it seems the case has to be to volatile pointer to volatile unsigned char.

/* Must use "pointers to volatile" to tell the compiler to never
* attempt optimizing the reads. Otherwise the compiler is allowed
* to use LTO and convert this function into canonical version. */
const volatile unsigned char *a = in_a;

This comment has been minimized.

@nicowilliams

nicowilliams Jan 26, 2016

Evidently this should be const volatile unsigned char * volatile a = in_a; and const volatile unsigned char * volatile b = in_b;.

EDIT: but only because that is a workaround to known gcc bugs. Performance-wise it's best as you have it, const volatile unsigned char *a = in_a;.

This comment has been minimized.

@Dmitry-Me

Dmitry-Me Jan 27, 2016

Author Contributor
  1. This makes code larger and much slower.
  2. It doesn't actually make it any better. If memory pointed to wasn't volatile originally then it isn't any more volatile with this pointer declaration. The compiler is required to preserve the sequence of reads and writes for the pointer itself but not for the pointed to memory.

This comment has been minimized.

@nicowilliams

nicowilliams Jan 27, 2016

@Dmitry-Me Yes, it makes it slower. That's kinda the point. It makes it slower still than we'd like, but I don't see how to fix this in a C-coded memcmp_s(). As to point (2), I see that the spec makes accesses to volatile objects through non-volatile pointers undefined, but not the opposite, and it makes no reference to casts, so my reading is that the cast to a volatile-qualified type has to be honored.

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 26, 2016

I'd not know about gcc.godbolt.org. Very nice. I especially like the colorize option:

http://goo.gl/X4QM4Q

EDIT: Wrong link. Also, gcc.godbolt.org has a shortened permalink option. Lastly, it seems that it doesn't update the window location as you change the source and options, but the permalink button does capture the latest source options.

@briansmith

This comment has been minimized.

Copy link
Contributor

briansmith commented Jan 26, 2016

So it seems the case has to be to volatile pointer to volatile unsigned char.

IMO, that is almost the opposite conclusion to draw. The correct conclusion is that we can't rely on tricks with volatile to implement this function.

These examples came from the thread on Twitter here: https://twitter.com/BRIAN_____/status/691580542207160321. The conclusions from that thread were:

  1. The C standard only requires volatile semantics for pointers-to-volatile when the pointed-at thing was originally declared/defined to be volatile. The C standard allows a pointer-to-volatile pointing at a non-volatile object to be treated as pointer-to-non-volatile. This makes sense, because it is a pointer to a non-volatile object.
  2. GCC and Clang apparently try to make volatile mean something stricter than what the C standard says it means.
  3. However, not all compilers make volatile mean the thing that people are hoping it means in order for the trick in this patch to do what it is intending to do. Some may not even try (only conforming to what the spec requires) and others might try and fail (the first GCC example above).

John's code (the second example) works, however I don't consider it reliable enough to rely on.

@kroeckx

This comment has been minimized.

Copy link
Member

kroeckx commented Jan 26, 2016

My other concern with it is that I don't think the C standard guarantees that it needs to read everything if it knows the result already, and so could in theory check that x != 0 and break in that case.

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 27, 2016

@briansmith That's true, but nonetheless one has to do something. In the shortest term using volatile this way works for many compilers. In the medium term the alternative is assembly-coding such functions, but that takes more time. In the longer term we need the next C standard to do something useful here.

@Dmitry-Me

This comment has been minimized.

Copy link
Contributor Author

Dmitry-Me commented Jan 27, 2016

@briansmith This is not even a bug in gcc. Have you noted that the array must have size 1, 2 or 4? That makes it eligible for being allocated in a register. That's why there're no writes - no memory to overwrite in the first place. If you insert a printf() call and pass the array into printf() then the array actually is allocated in memory and writes are not eliminated.

@Dmitry-Me

This comment has been minimized.

Copy link
Contributor Author

Dmitry-Me commented Jan 27, 2016

@kroeckx If the compiler decides to treat memory pointed to by volatile* as volatile then it will not insert any x!=0 checks because breaking a loop early would change the sequence of reads.

@briansmith

This comment has been minimized.

Copy link
Contributor

briansmith commented Jan 27, 2016

In the shortest term using volatile this way works for many compilers. In the medium term the alternative is assembly-coding such functions, but that takes more time. In the longer term we need the next C standard to do something useful here.

I agree 100%.

This is not even a bug in gcc. Have you noted that the array must have size 1, 2 or 4? That makes it eligible for being allocated in a register. That's why there're no writes

Good point.

no memory to overwrite in the first place.

I'm not convinced on this point.

However, I don't want perfect to be the enemy of the good.

@Dmitry-Me

This comment has been minimized.

Copy link
Contributor Author

Dmitry-Me commented Jan 27, 2016

@briansmith How about this code?

  char x[4];
  x[0] = 0;
  printf( "%s", x );
  memset_s(x, 4);

The following is emitted:

    subq    $24, %rsp
    movl    $.LC0, %edi
    xorl    %eax, %eax
    movq    %rsp, %rsi
    movb    $0, (%rsp)
    call    printf
    movb    $0, (%rsp)
    movb    $0, 1(%rsp)
    movb    $0, 2(%rsp)
    movb    $0, 3(%rsp)
    addq    $24, %rsp
    ret

My point is that your original code which only declares an array and then overwrites it and then the array is never accessed doesn't really need much protection because how would that array happen to contain any sensitive data?

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 27, 2016

@briansmith Looking at N1570 (so we all can, and accepting that that's just almost-C11 while OpenSSL can't use C11 yet) the spec specifically says (6.7.3) that accessing a volatile object through a pointer to non-volatile is undefined, but says nothing of the reverse:

If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type, the behavior is undefined.

I believe this means that the compiler has to honor the cast to volatile (and even a cast that drops the volatile specifier, but then you get undefined behavior if you actually try to access the object through a pointer to non-volatile). Of course, spec lawyering wouldn't stop here, and we'd have to look at C99 too, and make sure we have every footnote and clause read and understood. But so far this is my take, and you're welcome to correct me if this is wrong (please though, with pointers to spec text).

Note that even if the object is automatic and small, the compiler would have to honor the volatile qualifier because there might be something very special going on that the compiler can't know about. For objects on the stack one might wonder what that would be... but nonetheless one could imagine it: for example, communicating with a GC that (carefully!) traverses stacks of running threads.

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 27, 2016

I can also imagine a very aggressive LTO that optimizes away calls to assembly-coded memset_s() and memcmp_s() functions unless those are declared as taking pointers to volatile memory. (But, of course, being assembly-coded means that they can be optimized in ways that the compiler couldn't.)

The bottom-line is that this sort of change should be accepted, though assembly-coded implementations would be better in the longer term.

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 27, 2016

Oh, and perhaps these functions' prototypes should have pointer to volatile-qualified objects as arguments, forcing the cast at the call sites precisely to disable aggressive LTO.

@briansmith

This comment has been minimized.

Copy link
Contributor

briansmith commented Jan 27, 2016

@nicowilliams The spec talks about accessing an "object defined with a volatile-qualified type" a.k.a. "volatile object". That is, all the requirements on "volatile" only apply when the original object being pointed at was qualified volatile.

but says nothing of the reverse

Exactly. That means you can't rely on it to work any particular way.

I can also imagine a very aggressive LTO that optimizes away calls to assembly-coded memset_s() and memcmp_s() functions unless those are declared as taking pointers to volatile memory. (But, of course, being assembly-coded means that they can be optimized in ways that the compiler couldn't.)
I agree with your conclusions: It's an improvement to "just add volatile" and then later do the assembly language implementations.

In the same document, you can read the definition of memset_s. It's definition is much stricter than volatile's definition. In particular, the compiler needs to treat (its own) memset_s differently from every other function. This is clear because the pointer argument to memset_s isn't even volatile. This means that it is not possible to implement memset_s in C.

The bottom-line is that this sort of change should be accepted, though assembly-coded implementations would be better in the longer term.

I agree.

@richsalz

This comment has been minimized.

Copy link
Contributor

richsalz commented Jan 27, 2016

So has everyone come to consensus here? What should openssl do?

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 27, 2016

@richsalz The proposed patch can't make things worse, and does make things better for at least some compilers. The consensus is to take this sort of patch in the short-term; longer-term an assembly-coded function might be better.

I do believe it would be more prudent to have the prototype declare the pointer arguments as pointing to volatile, but that can be left for another day.

@briansmith I believe you're misreading the spec. Semantics point 6 refers to access to a volatile-qualified type object via non-volatile-qualified pointer types (but not the reverse) and talks about "defined", but semantics point 7 is the one that we really care about and it doesn't say or imply that the object must have been defined as volatile.

It makes sense that point 6 refers to the object having been defined as volatile... After all, consider this contrived case:

unsigned char a[4];
unsigned char *ap = a;
volatile unsigned char *vap = (volatile unsigned char *)ap;
unsigned char *ap2 = (unsigned char *)vap;

*ap2 = 'A'; /* OK, since the object is not volatile */

There the object was defined as non-volatile, therefore accessing it via ap2 should be OK because the compiler can see this. But here we'd definitely get undefined behavior:

volatile unsigned char a[4];
unsigned char *ap = (unsigned char *)&a[0]; /* OK if we don't use ap */

*ap = 'A'; /* undefined behavior! */

But also, if C was as you say, then one could not use mmap(2) to access device registers, since there would be no way to treat the mapped memory as volatile, and we'd then need an mmap_volatile(2) version. This would be inconvenient (not that it would be surprisingly inconvenient; standards can have such artifacts, such as dlsym(2) not being able to return pointers to functions...).

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 27, 2016

@richsalz @briansmith The spec says (6.3.2.3, point 2):

For any qualifier q, a pointer to a non-q-qualified type may be converted to a pointer to the q-qualified version of the type; the values stored in the original and converted pointers shall compare equal.

In other words, casting a pointer to non-volatile type to a pointer to volatile type works.

The behavior where the pointer itself has to also be volatile is just a gcc bug.

@@ -401,8 +401,11 @@ void *OPENSSL_stderr(void) { return stderr; }
int CRYPTO_memcmp(const void *in_a, const void *in_b, size_t len)

This comment has been minimized.

@nicowilliams

nicowilliams Jan 27, 2016

The prototype needs to read:

int CRYPTO_memcmp(const volatile void *in_a, const volatile void *in_b, size_t len)

in order to defeat possibly very aggressive LTO.

@Dmitry-Me

This comment has been minimized.

Copy link
Contributor Author

Dmitry-Me commented Jan 28, 2016

@nicowilliams It's not even a bug in gcc in terms of "gcc deliberately failing to emit writes and refusing to overwrite memory". In this specific case (an array fits into a register and its address is not passed anywhere) the whole array is allocated in a register, so there's no memory which could have been overwritten.

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 28, 2016

@Dmitry-Me The array's address is most certainly passed: because the array is passed to a function, the passed value decays to a pointer. That it fits in a register is irrelevant. That the function is inlined is also irrelevant. The pointer was declared as pointing to a volatile object, therefore the memory accesses should have been performed, therefore it was a gcc bug.

@Dmitry-Me

This comment has been minimized.

Copy link
Contributor Author

Dmitry-Me commented Jan 29, 2016

@nicowilliams Here's my guess of what happens. First the function is inlined, then the compiler sees that the variable fits into a register and also all operations can be carried out without allocating memory, then it decides that no code needs to be emitted.

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 29, 2016

@Dmitry-Me That's a good guess, but nonetheless that would be a compiler bug, as inlining shouldn't change the arrays-decay-to-pointers semantics, and therefore the pointer-to-volatile conversion had to be honored.

We know how to work around the bug: declare the object and the pointer volatile in the function's prototype, then assign those arguments to a non-volatile pointer to volatile in the function. For some compilers declaring the argument pointer volatile itself has not impact when the argument is passed via registers anyways, but it does disable the gcc bug in question. http://goo.gl/WdOoti -- all compilers in their list don't optimize away this memset_s():

/*
 * Making the argument a pointer to volatile memory should disable LTO.
 * The extra volatile keyword here works around a bug in some version of GCC.
 */
static void memset_s(volatile void * volatile v, size_t n) {
  volatile unsigned char *p = v; /* but now use a non-volatile pointer to volatile so we can go faster */
  for(size_t i = 0; i < n; ++i) {
    p[i] = 0;
  }
}

Now, because these compiler bugs have happened and could be worse in other compilers, it'd be nice if OpenSSL could test this at build time and warn. Constructing such a test will be fun.

@richsalz

This comment has been minimized.

Copy link
Contributor

richsalz commented Jan 30, 2016

So in master and 1.0.2 we make the parameters void; see 98ab576 for example. Closing this. If there's more to do, please open a new PR. :)

@richsalz richsalz closed this Jan 30, 2016

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 30, 2016

@richsalz Making the pointer itself in the prototype would work around that GCC bug. Otherwise it LGTM.

@richsalz

This comment has been minimized.

Copy link
Contributor

richsalz commented Jan 30, 2016

sorry, i don't know what you mean? and my comment about making "the parameters void" should have been "make them volatile" :)

@briansmith

This comment has been minimized.

Copy link
Contributor

briansmith commented Jan 30, 2016

There is no GCC bug. There is a lot of not understanding what volatile actually means, but that's not a bug in GCC if "bug" means "does not conform to ISO C".

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 30, 2016

@richsalz Making the prototype this:

int CRYPTO_memcmp(const volatile void * volatile a, const volatile void * volatile b, size_t len);

works around a GCC bug and doesn't break with the other compilers. The in_a and in_b pointers in CRYPTO_memcmp()'s body should remain as const volatile unsigned char *.

@richsalz

This comment has been minimized.

Copy link
Contributor

richsalz commented Jan 31, 2016

And then the body doesn't match the prototype? I don't like that.

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 31, 2016

The function's definition and declaration have to match, naturally. The arguments should be pointers to volatile, and they should be volatile pointers to volatile only to work around that gcc bug. The pointers in_a and in_b need only be pointers to volatile. There's nothing wrong with that nor not to like.

@richsalz

This comment has been minimized.

Copy link
Contributor

richsalz commented Jan 31, 2016

your initial comment confused me. submitting for review, thanks.

@richsalz

This comment has been minimized.

Copy link
Contributor

richsalz commented Jan 31, 2016

commit 769adcf thanks!

@nicowilliams

This comment has been minimized.

Copy link

nicowilliams commented Jan 31, 2016

Excellent!

@Dmitry-Me

This comment has been minimized.

Copy link
Contributor Author

Dmitry-Me commented Jan 31, 2016

Well, not really excellent. Code looks like magic an isn't proper commented. Which means it'll be broken sooner or later.

@richsalz

This comment has been minimized.

Copy link
Contributor

richsalz commented Jan 31, 2016

  • It takes an amount of time dependent on |len|, but independent of the
    • contents of |a| and |b|. Unlike memcmp, it cannot be used to put elements
    • into a defined order as the return value when a != b is undefined, other
  • * than to be non-zero.
  • * than to be non-zero. The uncommon use of volatile is to try and
  • * prevent compilers from optimizing this "down to" the standard memcmp
  • * which is probably not timing-attack safe.
    */

Ok?

@Dmitry-Me

This comment has been minimized.

Copy link
Contributor Author

Dmitry-Me commented Jan 31, 2016

@richsalz Not even close to OK. I'll submit a PR in a day or two.

@richsalz

This comment has been minimized.

Copy link
Contributor

richsalz commented Jan 31, 2016

Really? Shrug, okay, look forward to seeing it.

@Dmitry-Me

This comment has been minimized.

Copy link
Contributor Author

Dmitry-Me commented Feb 1, 2016

@richsalz I submitted #606

x64architecture added a commit to vigortls/vigortls that referenced this pull request Feb 7, 2016

dot-asm added a commit to dot-asm/openssl that referenced this pull request May 15, 2016

levitte pushed a commit that referenced this pull request May 19, 2016

Add assembly CRYPTO_memcmp.
GH: #102

Reviewed-by: Richard Levitte <levitte@openssl.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.