Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OQS_USE_SHA3_OPENSSL=ON makes running tests significantly slower #1426

Open
beldmit opened this issue Apr 3, 2023 · 13 comments
Open

OQS_USE_SHA3_OPENSSL=ON makes running tests significantly slower #1426

beldmit opened this issue Apr 3, 2023 · 13 comments

Comments

@beldmit
Copy link
Contributor

beldmit commented Apr 3, 2023

Describe the bug
I build liboqs-0.8.0-dev with and without OQS_USE_SHA3_OPENSSL=ON. With OQS_USE_SHA3_OPENSSL=ON I see 9 minutes for running tests and without - just 2 min. Fedora 37, OpenSSL 3.0.x

To Reproduce
cmake -GNinja -DBUILD_SHARED_LIBS=ON -DOQS_ALGS_ENABLED=STD -DOQS_USE_SHA3_OPENSSL=ON -DCMAKE_BUILD_TYPE=Debug -DOQS_USE_AVX2_INSTRUCTIONS=OFF -DOQS_USE_AVX512_INSTRUCTIONS=OFF

make
time make test

Repeat the same without -DOQS_USE_SHA3_OPENSSL=ON

(I do it as a part of RPM build, so pure difference should be in the testing)

Expected behavior
More or less equal time.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: Fedora 37
  • OpenSSL version: 3.0.8
  • Compiler version used: gcc (GCC) 12.2.1 20221121
  • Build variables used: -DBUILD_SHARED_LIBS=ON -DOQS_ALGS_ENABLED=STD -DOQS_USE_SHA3_OPENSSL=ON -DCMAKE_BUILD_TYPE=Debug -DOQS_USE_AVX2_INSTRUCTIONS=OFF -DOQS_USE_AVX512_INSTRUCTIONS=OFF
  • liboqs version: main branch

Additional context
We want to avoid multiple implementations of low-level algorithms (AES, SHA2, SHA3) in Fedora and want to heavily rely on OpenSSL implementation. We would like to narrow down the problem - is it the performance of low-level primitives or, say, digest fetching from openssl, or smth else.

@beldmit
Copy link
Contributor Author

beldmit commented Apr 3, 2023

User time is even worse: 27 mins vs 4

@baentsch
Copy link
Member

baentsch commented Apr 3, 2023

With OQS_USE_SHA3_OPENSSL=ON I see 9 minutes for running tests and without - just 2 min. Fedora 37, OpenSSL 3.0.x

Expected behavior
More or less equal time.

Why do you expect this? "OQS_USE_SHA3_OPENSSL=OFF" activates a local XKCP version, so entirely different code from OpenSSL's. What SHA3 code base does OpenSSL use? The same? Then indeed I'd understand your surprise.

We would like to narrow down the problem - is it the performance of low-level primitives or, say, digest fetching from openssl, or smth else.

FWIW, the speed differential is clearly visible even in the built-in artificial speed tests -- but not a factor of 6:

$ ./build/tests/speed_common 
Configuration info
==================
Target platform:  x86_64-Linux-5.19.0-38-generic
Compiler:         gcc (11.3.0)
Compile options:  [-Wa,--noexecstack;-Wstrict-overflow;-ggdb3;-Wbad-function-cast]
OQS version:      0.8.0-dev
Git commit:       d61d81c526da8bb62e363f5a75191689572151cb
OpenSSL enabled:  Yes (OpenSSL 3.0.2 15 Mar 2022)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  BUILD_SHARED_LIBS OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Debug 
CPU exts active:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2023-04-03 17:40:31
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
OQS_AES128_ECB_load+free_sch         |    4495481 |          1.000 |           0.222 |      0.416 |                       371 |         26
OQS_AES128_ECB_enc_sch               |    4961547 |          1.000 |           0.202 |      0.401 |                       294 |         24
OQS_AES128_ECB_enc                   |    2511140 |          1.000 |           0.398 |      0.490 |                       706 |         38
OQS_AES256_ECB_load+free_sch         |    5163286 |          1.000 |           0.194 |      0.396 |                       317 |         32
OQS_AES256_ECB_enc_sch               |    3699662 |          1.000 |           0.270 |      0.445 |                       400 |         45
OQS_AES256_ECB_enc                   |    2341165 |          1.000 |           0.427 |      0.495 |                       759 |         46
OQS_AES256_CTR_init+iv+free          |    4956465 |          1.000 |           0.202 |      0.402 |                       332 |         32
OQS_AES256_CTR_inc_stream_iv         |    7567083 |          1.000 |           0.132 |      0.339 |                       200 |         19
OQS_AES256_CTR_inc_stream_blks       |    7917212 |          1.000 |           0.126 |      0.332 |                       189 |         20
OQS_SHA2_sha256                      |    3303524 |          1.000 |           0.303 |      0.468 |                       524 |        165
OQS_SHA2_sha384                      |    2454523 |          1.000 |           0.407 |      0.492 |                       723 |         43
OQS_SHA2_sha512                      |    2401595 |          1.000 |           0.416 |      0.493 |                       739 |         40
OQS_SHA3_sha3_256                    |    2675547 |          1.000 |           0.374 |      0.492 |                       658 |        188
OQS_SHA3_sha3_384                    |    2671477 |          1.000 |           0.374 |      0.492 |                       659 |        190
OQS_SHA3_sha3_512                    |    2688687 |          1.000 |           0.372 |      0.491 |                       654 |        187
OQS_SHA3_shake128                    |    2577158 |          1.000 |           0.388 |      0.497 |                       685 |        213
OQS_SHA3_shake256                    |    2568989 |          1.000 |           0.389 |      0.498 |                       687 |        213
Ended at 2023-04-03 17:40:50

$ ./build-openssl/tests/speed_common 
Configuration info
==================
Target platform:  x86_64-Linux-5.19.0-38-generic
Compiler:         gcc (11.3.0)
Compile options:  [-Wa,--noexecstack;-Wstrict-overflow;-ggdb3;-Wbad-function-cast]
OQS version:      0.8.0-dev
Git commit:       d61d81c526da8bb62e363f5a75191689572151cb
OpenSSL enabled:  Yes (OpenSSL 3.0.2 15 Mar 2022)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            OpenSSL
OQS build flags:  BUILD_SHARED_LIBS OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Debug 
CPU exts active:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2023-04-03 17:40:58
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
OQS_AES128_ECB_load+free_sch         |    4487752 |          1.000 |           0.223 |      0.419 |                       372 |         93
OQS_AES128_ECB_enc_sch               |    5086516 |          1.000 |           0.197 |      0.400 |                       291 |         83
OQS_AES128_ECB_enc                   |    2507581 |          1.000 |           0.399 |      0.490 |                       707 |         33
OQS_AES256_ECB_load+free_sch         |    5149337 |          1.000 |           0.194 |      0.396 |                       312 |         27
OQS_AES256_ECB_enc_sch               |    3762857 |          1.000 |           0.266 |      0.442 |                       389 |         35
OQS_AES256_ECB_enc                   |    2328186 |          1.000 |           0.430 |      0.499 |                       763 |        119
OQS_AES256_CTR_init+iv+free          |    4915280 |          1.000 |           0.203 |      0.405 |                       334 |         86
OQS_AES256_CTR_inc_stream_iv         |    7548342 |          1.000 |           0.132 |      0.339 |                       200 |         20
OQS_AES256_CTR_inc_stream_blks       |    7887930 |          1.000 |           0.127 |      0.333 |                       190 |         31
OQS_SHA2_sha256                      |    3296287 |          1.000 |           0.303 |      0.471 |                       525 |        192
OQS_SHA2_sha384                      |    2417810 |          1.000 |           0.414 |      0.497 |                       734 |        125
OQS_SHA2_sha512                      |    2415321 |          1.000 |           0.414 |      0.497 |                       735 |        126
OQS_SHA3_sha3_256                    |    1901105 |          1.000 |           0.526 |      0.505 |                       946 |        136
OQS_SHA3_sha3_384                    |    1924791 |          1.000 |           0.520 |      0.506 |                       934 |        142
OQS_SHA3_sha3_512                    |    1916460 |          1.000 |           0.522 |      0.504 |                       940 |        129
OQS_SHA3_shake128                    |    1842362 |          1.000 |           0.543 |      0.499 |                       979 |         50
OQS_SHA3_shake256                    |    1882932 |          1.000 |           0.531 |      0.499 |                       958 |         44
Ended at 2023-04-03 17:41:17

@beldmit
Copy link
Contributor Author

beldmit commented Apr 3, 2023

So you have just a better implementation of SHA3, right? Is it yours implementation? (All my OpenSSL-related hats on) Could it be brought to OpenSSL? (All my OpenSSL-related hats off)

I'm aware of the following antipattern - in 3.0. using old methods like EVP_sha3_256() instead of fetching a digest object via EVP_MD_fetch and using a fetched object gives a performance penalty. See https://github.com/openssl/openssl/pull/20354/files#diff-86fbd44f96c9f78b9c4a27d8b5fb9d83aa787b9a7d92c6774d9172c5952b791f for example. If it is the case (and looks like it is), it's worth fixing

@baentsch
Copy link
Member

baentsch commented Apr 3, 2023

So you have just a better implementation of SHA3, right?

Define "better". "Faster": Probably. "More secure": Not sure.

Is it yours implementation?

Nope: This is basically https://github.com/XKCP/XKCP, right @jschanck ?

@beldmit
Copy link
Contributor Author

beldmit commented Apr 3, 2023

Sure, I mean "faster". I still think that the antipattern I mentioned is worth fixing, both for SHA3 and SHA2.

@jschanck
Copy link
Contributor

jschanck commented Apr 3, 2023

OpenSSL doesn't expose a streaming API for XOF output, which several PQ schemes require. The slowdown you're seeing is due to the XXX in our code here: https://github.com/open-quantum-safe/liboqs/blob/main/src/common/sha3/ossl_sha3.c#L146.

@beldmit
Copy link
Contributor Author

beldmit commented Apr 3, 2023

I agree, but it's a different problem.

@baentsch
Copy link
Member

baentsch commented Apr 4, 2023

it's a different problem.

Agreed -- but it's a rather significant one. There are several things, then:

Someone needs to "re-activate" the 2018 OpenSSL issue openssl/openssl#7894 then, right @beldmit ? My hunch is it explains a larger part of the speed differential (could be validated by using OpenSSL111, right?).

For the OpenSSL3 liboqs integration, just created #1427.

I'd thus suggest to close this issue then (and work on the two above).

@beldmit
Copy link
Contributor Author

beldmit commented Apr 4, 2023

I'm OK to reactivate openssl/openssl#7894, though it shouldn't be a new API (if possible). How do we reactivate it?

@baentsch
Copy link
Member

baentsch commented Apr 5, 2023

How do we reactivate it?

See openssl/openssl#7921 (comment).

@baentsch
Copy link
Member

@dstebila Can you remind me about the reason for closing this issue? When doing some spot checks before a possible 0.11.0 I noticed the problem still exists: Activating OpenSSL SHA3 slows down liboqs substantially (running "non-streaming" openssl v3.0.2 for example). Or is the "resolution" the documentation that liboqs prioritizes performance over security?

These default choices have been made to optimize the default performance of all algorithms. Changing them implies performance penalties.

@beldmit How are you dealing with this these days?

@baentsch baentsch reopened this Jul 27, 2024
@beldmit
Copy link
Contributor Author

beldmit commented Jul 30, 2024

We use OpenSSL implementation for potential FIPS's sake. But OpenSSL started proposing XOF API since then, I think that may give better performance.

On the other hand I doubt that correctly written digest code may be of any harm and it would be great to bring the faster code to OpenSSL, if possible

@SWilson4
Copy link
Member

SWilson4 commented Jul 30, 2024

After #1694, we have the XOF API integrated into liboqs, conditional on a sufficiently high OpenSSL version. It would be nice to run some comparisons and see how this impacts OpenSSL SHA3 performance in liboqs. I'll take a look later this week if time allows (of course, don't let that stop anybody else from picking it up in the meantime).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants