Optimizated calculation of shared power of 2 in bn_gcd #24332

landgrafhomyak · 2024-05-05T11:29:39Z

No description provided.

landgrafhomyak · 2024-05-05T11:44:54Z

https://www.openssl.org/policies/cla.html:

Every non-trivial contribution needs to be covered by a signed Contributor License Agreement (CLA) from all original authors.

It's trivial contribution, so CLA not needed.

kroeckx · 2024-05-05T15:49:07Z

This looks like it undoes constant time code.

landgrafhomyak · 2024-05-05T15:55:02Z

This looks like it undoes constant time code.

Execution time of this function depends on size of input numbers, so it's not constant initially.

kroeckx · 2024-05-05T16:54:13Z

This comes from #10122, everything seems to be talking about constant time. This adds a branch, making it not constant time. Of course it depends on the size. Assuming that's public info, that's fine. This change will probably cause a timing leak in cases where there isn't one now.

paulidale · 2024-05-05T22:49:46Z

I don't think this is a trivial contribution from a copyright viewpoint.

I'm also concerned about the loss of constant time here.

Do you have any performance figures for comparison?

landgrafhomyak · 2024-05-06T07:59:51Z

Updated optimization to be consttime.

I didn't do perfomance measures, but there is 64 (32 on old CPUs) times less loop iterations so it must be faster.

paulidale · 2024-05-06T08:53:32Z

crypto/bn/bn_gcd.c

+        mask = r->d[i] | g->d[i];
+        if ((pow2_flag == 0) | (mask == 0)) {
+            shifts += BN_BITS2 & pow2_flag;
+            continue;


This makes everything non-constant time.

Independent of input numbers (except a==0 || b==0, but this case checked before) it will run loop to check each bit in chunk exactly once:

mask branch 0000 mask == 0, add 64 and continue 0000 mask == 0, add 64 and continue 0000 mask == 0, add 64 and continue ... 0000 mask == 0, add 64 and continue 1234 mask != 0, check each bit 5678 pow2_flag == 0, add 64&0 and continue 0000 pow2_flag == 0, add 64&0 and continue ...

Both the if and continue concern me for being non-constant time. I guess that you can write code so that the if is evaluated exactly the same amount of times, but the CPU will speculatively execute different code and you end up being able to measure the difference.

When we talk about constant time code, we're mostly talking about branch free code, where the for loop is the only branch code.

landgrafhomyak · 2024-05-06T14:45:16Z

I made some measures of perfomance. New code faster ~25 times when compiled with gcc -O0 on WSL:

Old | Size: 100000 Clocks per test: 27043
New | Size: 100000 Clocks per test: 1251

Test program (spoiler)

#include <cstdint>
#include <cstdlib>
#include <ctime>
#include <iostream>

#define SIZE (100000)

#define TESTS (1000)

static std::uint64_t r[SIZE];
static std::uint64_t g[SIZE];

template<bool new_impl>
void test();

int main() {
  std::size_t i;
  std::srand(123);
  for (i = 0; i < SIZE; i++) {
      r[i] = (((std::uint64_t) std::rand()) << 32) | (std::uint64_t) std::rand();
      g[i] = (((std::uint64_t) std::rand()) << 32) | (std::uint64_t) std::rand();
  }

  auto start = std::clock();

  for (i = 0; i < TESTS; i++)
      test<false>();

  std::cout << "Old | Size: " << SIZE << " Clocks per test: " << (std::clock() - start) / TESTS << std::endl;

  start = std::clock();

  for (i = 0; i < TESTS; i++)
      test<true>();

  std::cout << "New | Size: " << SIZE << " Clocks per test: " << (std::clock() - start) / TESTS << std::endl;
}

template<bool new_impl>
[[clang::noinline, gnu::noinline]]
void test() {

  if constexpr (new_impl) {
      std::uint64_t pow2_temp, pow2_mask;
      int pow2_flag, pow2_chunk_index, pow2_shifts, i, j;
      pow2_flag = 1;
      pow2_chunk_index = 0;
      pow2_shifts = 0;
      for (i = 0; i < SIZE; i++) {
          pow2_mask = r[i] | g[i];
          pow2_temp = (pow2_flag != 0) & (pow2_mask != 0);
          pow2_flag &= !pow2_temp;
          pow2_temp = ((~pow2_temp & (pow2_temp - 1)) >> 63) - 1;
          pow2_shifts += 1uL & pow2_temp;
          pow2_temp = (i ^ pow2_chunk_index) & pow2_temp;
          pow2_chunk_index ^= pow2_temp;
      }
      pow2_shifts *= 64;
      pow2_mask = r[pow2_chunk_index] | g[pow2_chunk_index];
      pow2_flag = 1;
      for (j = 0; j < 64; j++) {
          pow2_flag &= pow2_mask;
          pow2_shifts += pow2_flag;
          pow2_mask >>= 1;
      }
  } else {
      std::uint64_t mask;
      int j, bit, shifts=0, i;
      bit = 1;
      for (i = 0; i < SIZE; i++) {
          mask = ~(r[i] | g[i]);
          for (j = 0; j < 64; j++) {
              bit &= mask;
              shifts += bit;
              mask >>= 1;
          }
      }
  }
}

crypto/bn/bn_gcd.c

kroeckx · 2024-05-07T07:47:44Z

We have a lot of test failing in CI, so I assume the result of the calculation is wrong now.

kroeckx · 2024-05-07T11:54:13Z

@romen: Is this something you can/want to review?

t8m · 2024-05-07T13:52:03Z

I still have some suspicions about how the compiler might make the logical ! operations non-const time. Perhaps the const_time_select...() functions could be used instead? Otherwise this looks fairly good to me.

kroeckx · 2024-05-07T14:00:33Z

If this is merged, we might want to reevaluate if ossl_bn_rsa_fips186_4_derive_prime() should use gcd or not.

paulidale · 2024-05-07T21:07:06Z

I made some measures of perfomance. New code faster ~25 times when compiled with gcc -O0 on WSL:
Old | Size: 100000 Clocks per test: 27043
New | Size: 100000 Clocks per test: 1251

I think it would be enlightening to see the min and max times. That would confirm constant time execution.

romen · 2024-05-08T00:34:29Z

I will try to find the time for a proper review, but I am a bit swamped at the moment.

The first thing that came to mind inspecting the code changes, is that as a general rule of thumb, one generally wants to avoid mixing bitwise operations and the ! operator.
Quite often the ! is processed by the compiler in a way that introduces a branch in the resulting binary code, or that could be handled as a branch by the hw optimization heuristics.
I computing this kind of bitwise masks, we tend to prefer using only bitwise operators &¦^~ as that usually prevents optimization steps to unfold the computation into branches.

I might try to summon @cpereida and @bbbrumley if they can volunteer their expertise.

t8m · 2024-05-13T09:23:26Z

Commits to be squashed when merging.

crypto/bn/bn_gcd.c

openssl-machine · 2024-05-15T08:00:23Z

This pull request is ready to merge

t8m · 2024-05-15T11:39:44Z

Squashed and merged to the master branch. Thank you for your contribution.

Reviewed-by: Paul Dale <ppzgs1@gmail.com> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #24332)

Reviewed-by: Paul Dale <ppzgs1@gmail.com> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from openssl#24332)

Optimizated calculation of shared power of 2 in bn_gcd

32fe87e

openssl-machine added the hold: cla required The contributor needs to submit a license agreement label May 5, 2024

Making optimized calculation of shared power of 2 in bn_gcd consttime

38fe0c4

paulidale reviewed May 6, 2024

View reviewed changes

Removing non-loop jumps in calculation of shared power of 2 in bn_gcd

e407371

t8m reviewed May 6, 2024

View reviewed changes

crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved

t8m reviewed May 6, 2024

View reviewed changes

crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved

t8m added branch: master Merge to master branch tests: exempted The PR is exempt from requirements for testing triaged: performance The issue/pr reports/fixes a performance concern labels May 6, 2024

Fixes in bn_gcd

3612944

t8m requested changes May 6, 2024

View reviewed changes

crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved

crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved

Fixes and microoptimizations in bn_gcd

b60ace5

github-actions bot added the severity: fips change The pull request changes FIPS provider sources label May 7, 2024

landgrafhomyak added 2 commits May 7, 2024 13:03

Fixes in bn_gcd

1203da8

Update code to be compatible with C90 in bn_gcd

4b788a5

Cleaning code in bn_gcd

8345ecd

paulidale added the approval: review pending This pull request needs review by a committer label May 12, 2024

t8m added tests: present The PR has suitable tests present and removed tests: exempted The PR is exempt from requirements for testing labels May 13, 2024

t8m approved these changes May 13, 2024

View reviewed changes

t8m added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer approval: done This pull request has the required number of approvals labels May 13, 2024

t8m added the approval: done This pull request has the required number of approvals label May 13, 2024

bernd-edlinger reviewed May 13, 2024

View reviewed changes

crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved

t8m reviewed May 13, 2024

View reviewed changes

crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved

Removing redundant operation

fbbc43d

t8m approved these changes May 13, 2024

View reviewed changes

bernd-edlinger requested changes May 13, 2024

View reviewed changes

crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved

crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved

landgrafhomyak added 2 commits May 13, 2024 22:00

Minor fixes

418223c

Rewrited version of non-zero word selector

b244f45

paulidale approved these changes May 13, 2024

View reviewed changes

bernd-edlinger approved these changes May 14, 2024

View reviewed changes

t8m approved these changes May 14, 2024

View reviewed changes

t8m added approval: done This pull request has the required number of approvals and removed approval: done This pull request has the required number of approvals labels May 14, 2024

openssl-machine added approval: ready to merge The 24 hour grace period has passed, ready to merge and removed approval: done This pull request has the required number of approvals labels May 15, 2024

t8m closed this May 15, 2024

openssl-machine pushed a commit that referenced this pull request May 15, 2024

Additional testcases for bn_gcd

a6afe2b

Reviewed-by: Paul Dale <ppzgs1@gmail.com> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #24332)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizated calculation of shared power of 2 in bn_gcd #24332

Optimizated calculation of shared power of 2 in bn_gcd #24332

landgrafhomyak commented May 5, 2024

landgrafhomyak commented May 5, 2024

kroeckx commented May 5, 2024 via email

landgrafhomyak commented May 5, 2024

kroeckx commented May 5, 2024 via email

paulidale commented May 5, 2024

landgrafhomyak commented May 6, 2024

paulidale May 6, 2024

landgrafhomyak May 6, 2024

kroeckx May 6, 2024

kroeckx May 6, 2024

landgrafhomyak commented May 6, 2024 •

edited

Loading

kroeckx commented May 7, 2024

kroeckx commented May 7, 2024

t8m commented May 7, 2024

kroeckx commented May 7, 2024

paulidale commented May 7, 2024

romen commented May 8, 2024

t8m commented May 13, 2024

openssl-machine commented May 15, 2024

t8m commented May 15, 2024

Optimizated calculation of shared power of 2 in bn_gcd #24332

Optimizated calculation of shared power of 2 in bn_gcd #24332

Conversation

landgrafhomyak commented May 5, 2024

landgrafhomyak commented May 5, 2024

kroeckx commented May 5, 2024 via email

landgrafhomyak commented May 5, 2024

kroeckx commented May 5, 2024 via email

paulidale commented May 5, 2024

landgrafhomyak commented May 6, 2024

paulidale May 6, 2024

Choose a reason for hiding this comment

landgrafhomyak May 6, 2024

Choose a reason for hiding this comment

kroeckx May 6, 2024

Choose a reason for hiding this comment

kroeckx May 6, 2024

Choose a reason for hiding this comment

landgrafhomyak commented May 6, 2024 • edited Loading

kroeckx commented May 7, 2024

kroeckx commented May 7, 2024

t8m commented May 7, 2024

kroeckx commented May 7, 2024

paulidale commented May 7, 2024

romen commented May 8, 2024

t8m commented May 13, 2024

openssl-machine commented May 15, 2024

t8m commented May 15, 2024

landgrafhomyak commented May 6, 2024 •

edited

Loading