-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizated calculation of shared power of 2 in bn_gcd #24332
Conversation
https://www.openssl.org/policies/cla.html:
It's trivial contribution, so CLA not needed. |
This looks like it undoes constant time code.
|
Execution time of this function depends on size of input numbers, so it's not constant initially. |
This comes from #10122, everything seems to be talking about constant time. This adds a branch, making it not constant time.
Of course it depends on the size. Assuming that's public info, that's fine.
This change will probably cause a timing leak in cases where there isn't one now.
|
I don't think this is a trivial contribution from a copyright viewpoint. I'm also concerned about the loss of constant time here. Do you have any performance figures for comparison? |
Updated optimization to be consttime. I didn't do perfomance measures, but there is 64 (32 on old CPUs) times less loop iterations so it must be faster. |
crypto/bn/bn_gcd.c
Outdated
mask = r->d[i] | g->d[i]; | ||
if ((pow2_flag == 0) | (mask == 0)) { | ||
shifts += BN_BITS2 & pow2_flag; | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes everything non-constant time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent of input numbers (except a==0 || b==0
, but this case checked before) it will run loop to check each bit in chunk exactly once:
mask branch
0000 mask == 0, add 64 and continue
0000 mask == 0, add 64 and continue
0000 mask == 0, add 64 and continue
...
0000 mask == 0, add 64 and continue
1234 mask != 0, check each bit
5678 pow2_flag == 0, add 64&0 and continue
0000 pow2_flag == 0, add 64&0 and continue
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both the if and continue concern me for being non-constant time. I guess that you can write code so that the if is evaluated exactly the same amount of times, but the CPU will speculatively execute different code and you end up being able to measure the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we talk about constant time code, we're mostly talking about branch free code, where the for loop is the only branch code.
I made some measures of perfomance. New code faster ~25 times when compiled with
Test program (spoiler)#include <cstdint>
#include <cstdlib>
#include <ctime>
#include <iostream>
#define SIZE (100000)
#define TESTS (1000)
static std::uint64_t r[SIZE];
static std::uint64_t g[SIZE];
template<bool new_impl>
void test();
int main() {
std::size_t i;
std::srand(123);
for (i = 0; i < SIZE; i++) {
r[i] = (((std::uint64_t) std::rand()) << 32) | (std::uint64_t) std::rand();
g[i] = (((std::uint64_t) std::rand()) << 32) | (std::uint64_t) std::rand();
}
auto start = std::clock();
for (i = 0; i < TESTS; i++)
test<false>();
std::cout << "Old | Size: " << SIZE << " Clocks per test: " << (std::clock() - start) / TESTS << std::endl;
start = std::clock();
for (i = 0; i < TESTS; i++)
test<true>();
std::cout << "New | Size: " << SIZE << " Clocks per test: " << (std::clock() - start) / TESTS << std::endl;
}
template<bool new_impl>
[[clang::noinline, gnu::noinline]]
void test() {
if constexpr (new_impl) {
std::uint64_t pow2_temp, pow2_mask;
int pow2_flag, pow2_chunk_index, pow2_shifts, i, j;
pow2_flag = 1;
pow2_chunk_index = 0;
pow2_shifts = 0;
for (i = 0; i < SIZE; i++) {
pow2_mask = r[i] | g[i];
pow2_temp = (pow2_flag != 0) & (pow2_mask != 0);
pow2_flag &= !pow2_temp;
pow2_temp = ((~pow2_temp & (pow2_temp - 1)) >> 63) - 1;
pow2_shifts += 1uL & pow2_temp;
pow2_temp = (i ^ pow2_chunk_index) & pow2_temp;
pow2_chunk_index ^= pow2_temp;
}
pow2_shifts *= 64;
pow2_mask = r[pow2_chunk_index] | g[pow2_chunk_index];
pow2_flag = 1;
for (j = 0; j < 64; j++) {
pow2_flag &= pow2_mask;
pow2_shifts += pow2_flag;
pow2_mask >>= 1;
}
} else {
std::uint64_t mask;
int j, bit, shifts=0, i;
bit = 1;
for (i = 0; i < SIZE; i++) {
mask = ~(r[i] | g[i]);
for (j = 0; j < 64; j++) {
bit &= mask;
shifts += bit;
mask >>= 1;
}
}
}
} |
We have a lot of test failing in CI, so I assume the result of the calculation is wrong now. |
@romen: Is this something you can/want to review? |
I still have some suspicions about how the compiler might make the logical |
If this is merged, we might want to reevaluate if ossl_bn_rsa_fips186_4_derive_prime() should use gcd or not. |
I think it would be enlightening to see the min and max times. That would confirm constant time execution. |
I will try to find the time for a proper review, but I am a bit swamped at the moment. The first thing that came to mind inspecting the code changes, is that as a general rule of thumb, one generally wants to avoid mixing bitwise operations and the I might try to summon @cpereida and @bbbrumley if they can volunteer their expertise. |
Commits to be squashed when merging. |
This pull request is ready to merge |
Squashed and merged to the master branch. Thank you for your contribution. |
Reviewed-by: Paul Dale <ppzgs1@gmail.com> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #24332)
Reviewed-by: Paul Dale <ppzgs1@gmail.com> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #24332)
Reviewed-by: Paul Dale <ppzgs1@gmail.com> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from openssl#24332)
Reviewed-by: Paul Dale <ppzgs1@gmail.com> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from openssl#24332)
No description provided.