New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elliptic Curve code without ASM is broken with Clang 14 #18225
Comments
|
This looks to be a problem with Clang optimising better in version 14. Compiling with |
|
Thanks @paulidale for pointing out that it was not an issue when disabling strict aliasing. I've identified |
|
The issue seems to be around the function I'm more a C++ programmer than a C programmer, so I might have missed something that is specific to C, but it can points you toward which section of the code is problematic. The rest of the I've compared the generated code against when strict aliasing is disabled and it seems that the whole copy is discarded (and additional operations from the same bn_nist_strict_aliasing.txt I've dumped the preprocessed source file for easier debugging if it can help you: Just need to pass these additional arguments to the compiler to match the OpenSSL buildsystem: |
|
Great work. I agree that function should have no aliasing problems. I've raised an issue with the LLVM folks. |
|
I'm not sure that's right. I think there is a strict aliasing violation, though you should check with language experts. C's strict aliasing and union rules, as I understand them, are quite harsh. While there is a carve-out in C (but not C++, which is even stricter) for using unions to do type-punning, that only applies when you read or write directly to a union. Writing through a pointer to an inactive arm is UB. See example 3 of 6.5.2.3 in n1256. (Whether these are the right rules is another matter. I doubt there is much C code out there that uses unions which is well-defined w.r.t. all this.) Likewise, the strict aliasing rules (6.5, clause 7) doesn't include an allowance for unions in the effective type of the object, only the type of the lvalue expression (which doesn't involve a union). |
|
Agreed, example 3 is damning. |
As of clang-14 the strict aliasing is causing code to magically disappear. By explicitly inlining the code, the aliasing problem evaporates. Fixes openssl#18225
|
This looks like the fundamental issue is if the array in the union is converted to a pointer or accessed directly as an array. If it is converted, the behaviour is undefined. The patch accesses the union as an array which solves this. Interesting is that using memcpy/memset which both convert the array to a pointer, doesn't break the code. At this point I'm satisfied that the problem is in the OpenSSL code. I'm a little miffed that there were no warnings about undefined behaviour ever. |
As of clang-14 the strict aliasing is causing code to magically disappear. By explicitly inlining the code, the aliasing problem evaporates. Fixes #18225 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Tim Hudson <tjh@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18258) (cherry picked from commit 8712db5)
When introducing EC_SCALAR and EC_FELEM, I used unions as convenience for converting to and from the byte representation. However, type-punning with unions is not allowed in C++ and hard to use correctly in C. As I understand the rules, they are: - The abstract machine knows what member of union was last written to. - In C, reading from an inactive member is defined to type-pun. In C++, it is UB though some compilers promise the C behavior anyway. - However, if you read or write from a *pointer* to a union member, the strict aliasing rule applies. (A function passed two pointers of different types otherwise needs to pessimally assume they came from the same union.) That last rule means the type-punning allowance doesn't apply if you take a pointer to an inactive member, and it's common to abstract otherwise direct accesses of members via pointers. openssl/openssl#18225 is an example where similar union tricks have caused problems for OpenSSL. While we don't have that code, EC_SCALAR and EC_FELEM play similar tricks. We do get a second lifeline because our alternate view is a uint8_t, which we require to be unsigned char. Strict aliasing always allows the pointer type to be a character type, so pointer-indirected accesses of EC_SCALAR.bytes aren't necessarily UB. But if we ever write to EC_SCALAR.bytes directly (and we do), we'll switch the active arm and then pointers to EC_SCALAR.words become strict aliasing violations! This is all far too complicated to deal with. Ideally everyone would build with -fno-strict-aliasing because no real C code actually follows these rules. But we don't always control our downstream consumers' CFLAGS, so let's just avoid the union. This also avoids a pitfall if we ever move libcrypto to C++. For p224-64.c, I just converted the representations directly, which avoids worrying about the top 32 bits in p224_felem_to_generic. Most of the rest was words vs. bytes conversions and boils down to a cast (we're still dealing with a character type, at the end of the day). But I took the opportunity to extract some more "words"-based helper functions out of BIGNUM, so the casts would only be in one place. That too saves us from the top bits problem in the bytes-to-words direction. Bug: 301 Change-Id: I3285a86441daaf824a4f6862e825d463a669efdb Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/52505 Commit-Queue: Bob Beck <bbe@google.com> Reviewed-by: Bob Beck <bbe@google.com>
Notably, this might have caught openssl#18225, as Clang 14 wasn't - and is not yet until this commit - in OpenSSL's CI. It makes sense to ensure CI tests compilers used in newer Linux distributions: * Fedora 36 ships with GCC 12 * Ubuntu 22.04 ships with Clang 14 We switch from 'ubuntu-latest' (which can change meaning but currently points to ubuntu-20.04) to ubuntu-20.04 for the older existing compilers, and ubuntu-22.04 for the newer ones added by this commit. CLA: trivial
Notably, this might have caught #18225, as Clang 14 wasn't - and is not yet until this commit - in OpenSSL's CI. It makes sense to ensure CI tests compilers used in newer Linux distributions: * Fedora 36 ships with GCC 12 * Ubuntu 22.04 ships with Clang 14 We switch from 'ubuntu-latest' (which can change meaning but currently points to ubuntu-20.04) to ubuntu-20.04 for the older existing compilers, and ubuntu-22.04 for the newer ones added by this commit. CLA: trivial Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #18639)
Notably, this might have caught #18225, as Clang 14 wasn't - and is not yet until this commit - in OpenSSL's CI. It makes sense to ensure CI tests compilers used in newer Linux distributions: * Fedora 36 ships with GCC 12 * Ubuntu 22.04 ships with Clang 14 We switch from 'ubuntu-latest' (which can change meaning but currently points to ubuntu-20.04) to ubuntu-20.04 for the older existing compilers, and ubuntu-22.04 for the newer ones added by this commit. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #18639)
|
Are there plans to fix this for the 1.1.1 release? |
As of clang-14 the strict aliasing is causing code to magically disappear. By explicitly inlining the code, the aliasing problem evaporates. Fixes openssl#18225 Backport of openssl#18528 to 1.1.1.
As of clang-14 the strict aliasing is causing code to magically disappear. By explicitly inlining the code, the aliasing problem evaporates. Fixes openssl#18225 Backport of openssl#18258 to 1.1.1.
I created a backport or the fix in #18948 |
jpboivin commentedMay 2, 2022
Hello,
It looks like on Linux (x86_64), starting with Clang 14, the code for Elliptic Curves (when assembler is not enabled) is broken when compiled with optimizations (O2/O3). It is totally fine when using assembler, but when using regular C code, some curves will fail in OpenSSL internals. It doesn't seem to be new (I tested a few older versions) and it seems to be also affecting the 3.0.x branch although I have personally experienced it on the 1.1.1 branch.
From running the tests, it seems to be only affecting a few curves (like
secp384r1,prime192v1,prime256v1) and it seems to be that some validation does not pass, potentially due to bad codegen. I don't know if OpenSSL has some undefined behaviors in that section of the code as I haven't investigated yet. But, it was working well up to Clang 13, and GCC 11 is good too.I've tested and generated logs for the following configs:
All these tests were done in a Docker container based on Ubuntu 22.04. I've also tested
1.1.1k | Clang 14 | No ASMand3.0.2 | Clang 14 | No ASMwhich reproduce the exact same issue, so didn't generated logs for them.config_1_1_1n_noasm_clang14.txt
tests_1_1_1n_noasm_clang14.txt
tests_verbose_1_1_1n_noasm_clang14.txt
config_1_1_1n_asm_clang14.txt
tests_1_1_1n_asm_clang14.txt
config_1_1_1n_noasm_gcc.txt
tests_1_1_1n_noasm_gcc.txt
The text was updated successfully, but these errors were encountered: