Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancies between T::provider() and T::providers() #3837

Open
securitykernel opened this issue Dec 14, 2023 · 3 comments
Open

Discrepancies between T::provider() and T::providers() #3837

securitykernel opened this issue Dec 14, 2023 · 3 comments

Comments

@securitykernel
Copy link
Collaborator

securitykernel commented Dec 14, 2023

I was looking into adding output of the provider (= hardware acceleration) used for each algorithm benchmarked to bench.py and noticed that the provider is only printed for cipher modes, e.g., AES-128/GCM ("clmul" in my case), but not for e.g., hash functions. From looking into the speed CLI, hash functions and cipher modes are benchmarked differently:

  • For hash functions (as well as XOFs, block ciphers, stream ciphers, and MACs), bench_providers_of() calls T::providers() and then runs benchmarks for every provider returned
  • For all other algorithms, (a single) T is directly instantiated and benchmarked, making the library choose the best available implementation, and then the output of T::provider() is printed along with T's name

The first question is, what is the reasoning behind these two different types of benchmarking? For running bench.py to compare with OpenSSL, only the second option is really useful, since what you want is both Botan and OpenSSL to choose the best implementation available and benchmark these against each other. I don't see a high value in benchmarking e.g., the clmul, pmull and software implementations of OpenSSL and Botan against each other separately. It makes more sense to be able to benchmark different implementations inside Botan against each other in order to check improvements of hardware-accelerated implementations against software implementations for example. To accommodate this, we could add an optional --all-providers switch to the speed CLI, implementing the first option. The default mode would implement the second option (for all algorithms equally). Even this wouldn't work, since this would currently only run at most {"base", "commoncrypto"} benchmarks, see below.

Further looking into why armv8 was not printed as a provider for SHA-256 on my system by the speed CLI, I found that HashFunction::providers() would ever only return at most {"base", "commoncrypto"}. But SHA_256::provider() returns armv8 on my system, which made me wonder what is the correlation between these two functions? The value of SHA_256::provider() is selected at compile-time based on which hardware acceleration is available, but HashFunction::providers() is always hardcoded and never probes HashFunction::provider() at any time.

@randombit
Copy link
Owner

This got messed up (I think in changes just prior to 2.0); previously "base" was precisely the baseline implementation, and to get eg "sse2" SHA-1 you would request it explicitly, and in doing so got a completely different class than the baseline. That is there was a 1:1 between provider strings and classes which implemented that algorithm. But now "base" can in fact be implemented via hardware accel and so it's instead 1->N.

We could address this by changing the providers to always return "base", and then add some additional getter that specifies implementation specific detail (eg "aes_ni" vs "ssse3").

The first question is, what is the reasoning behind these two different types of benchmarking?

Really this is a historical thing from when we had an OpenSSL based provider. It might still be useful for someone using CommonCrypto, IDK.

We might actually consider dropping the whole provider notion entirely. Instead of a CommonCrypto provider (or OpenSSL or ...) that provides X and Y, there is a CommonCrypto implementation of algorithms X and Y which live as submodules within X and Y just as we have for hardware accel, etc. IDK.

[It's also worth looking at if CommonCrypto is actually faster than Botan on macOS and/or iOS systems, if not we could consider dropping the whole thing]

It makes more sense to be able to benchmark different implementations inside Botan against each other in order to check improvements of hardware-accelerated implementations against software implementations for example.

You can do this with BOTAN_CLEAR_CPUID but it's a bit fiddly.

@securitykernel
Copy link
Collaborator Author

We could address this by changing the providers to always return "base", and then add some additional getter that specifies implementation specific detail (eg "aes_ni" vs "ssse3").

I'll give this a try.

We might actually consider dropping the whole provider notion entirely. Instead of a CommonCrypto provider (or OpenSSL or ...) that provides X and Y, there is a CommonCrypto implementation of algorithms X and Y which live as submodules within X and Y just as we have for hardware accel, etc. IDK.

This would mean that we would also split out all the PKCS#11 types from the one central place in src/lib/pkcs11 to all the algorithm-specific modules, yak. Though PKCS#11 is probably special case, and with the current library's implementation not allowing to select a PKCS#11 implementation via any provider string, is really just not a provider, but can simply be a module as all other modules are. Same with TPM, I guess.

I can at least say that even given the massive usage of Botan in our company, I've never seen anyone using the provider interface via passing a provider string to any of the T::create(). But that may be totally different in other projects.

From looking at the ToDo list:

  • /dev/crypto provider (ciphers, hashes)
  • Windows CryptoNG provider (ciphers, hashes)
  • Extend Apple CommonCrypto provider (HMAC, CMAC, RSA, ECDSA, ECDH)

Are there still plans to add these?

@securitykernel
Copy link
Collaborator Author

[It's also worth looking at if CommonCrypto is actually faster than Botan on macOS and/or iOS systems, if not we could consider dropping the whole thing]

Benchmarks from a MacBook Pro with M2 Pro:

$ ./botan speed SHA-1 SHA-224 SHA-256 SHA-384 SHA-512 AES-128 AES-192 AES-256 Blowfish CAST-128 DES TripleDES
SHA-1 hash buffer size 1024 bytes: 2063.318 MiB/sec (1031.659 MiB in 500.000 ms)
SHA-1 [commoncrypto] hash buffer size 1024 bytes: 1484.645 MiB/sec (742.322 MiB in 500.000 ms)
SHA-224 hash buffer size 1024 bytes: 1999.951 MiB/sec (999.976 MiB in 500.000 ms)
SHA-224 [commoncrypto] hash buffer size 1024 bytes: 2058.053 MiB/sec (1029.026 MiB in 500.000 ms)
SHA-256 hash buffer size 1024 bytes: 1998.535 MiB/sec (999.268 MiB in 500.000 ms)
SHA-256 [commoncrypto] hash buffer size 1024 bytes: 2053.080 MiB/sec (1026.540 MiB in 500.000 ms)
SHA-384 hash buffer size 1024 bytes: 1256.502 MiB/sec (628.251 MiB in 500.000 ms)
SHA-384 [commoncrypto] hash buffer size 1024 bytes: 1150.582 MiB/sec (575.291 MiB in 500.000 ms)
SHA-512 hash buffer size 1024 bytes: 1254.117 MiB/sec (627.059 MiB in 500.000 ms)
SHA-512 [commoncrypto] hash buffer size 1024 bytes: 1176.537 MiB/sec (588.269 MiB in 500.000 ms)
AES-128 encrypt buffer size 1024 bytes: 13615.801 MiB/sec (6807.900 MiB in 500.000 ms)
AES-128 decrypt buffer size 1024 bytes: 14733.504 MiB/sec (7366.752 MiB in 500.000 ms)
AES-128 [commoncrypto] encrypt buffer size 1024 bytes: 10069.283 MiB/sec (5034.642 MiB in 500.000 ms)
AES-128 [commoncrypto] decrypt buffer size 1024 bytes: 11121.605 MiB/sec (5560.803 MiB in 500.000 ms)
AES-192 encrypt buffer size 1024 bytes: 14077.303 MiB/sec (7038.651 MiB in 500.000 ms)
AES-192 decrypt buffer size 1024 bytes: 13011.605 MiB/sec (6505.803 MiB in 500.000 ms)
AES-192 [commoncrypto] encrypt buffer size 1024 bytes: 9206.490 MiB/sec (4603.245 MiB in 500.000 ms)
AES-192 [commoncrypto] decrypt buffer size 1024 bytes: 9207.026 MiB/sec (4603.550 MiB in 500.004 ms)
AES-256 encrypt buffer size 1024 bytes: 10885.150 MiB/sec (5442.575 MiB in 500.000 ms)
AES-256 decrypt buffer size 1024 bytes: 11382.822 MiB/sec (5691.411 MiB in 500.000 ms)
AES-256 [commoncrypto] encrypt buffer size 1024 bytes: 10478.533 MiB/sec (5239.267 MiB in 500.000 ms)
AES-256 [commoncrypto] decrypt buffer size 1024 bytes: 8091.938 MiB/sec (4045.969 MiB in 500.000 ms)
Blowfish encrypt buffer size 1024 bytes: 387.832 MiB/sec (193.916 MiB in 500.000 ms)
Blowfish decrypt buffer size 1024 bytes: 392.916 MiB/sec (196.459 MiB in 500.002 ms)
Blowfish [commoncrypto] encrypt buffer size 1024 bytes: 261.384 MiB/sec (130.692 MiB in 500.001 ms)
Blowfish [commoncrypto] decrypt buffer size 1024 bytes: 260.496 MiB/sec (130.248 MiB in 500.001 ms)
CAST-128 encrypt buffer size 1024 bytes: 241.127 MiB/sec (120.564 MiB in 500.004 ms)
CAST-128 decrypt buffer size 1024 bytes: 237.987 MiB/sec (118.994 MiB in 500.002 ms)
CAST-128 [commoncrypto] encrypt buffer size 1024 bytes: 235.566 MiB/sec (117.783 MiB in 500.000 ms)
CAST-128 [commoncrypto] decrypt buffer size 1024 bytes: 234.301 MiB/sec (117.150 MiB in 500.000 ms)
DES encrypt buffer size 1024 bytes: 118.748 MiB/sec (59.374 MiB in 500.001 ms)
DES decrypt buffer size 1024 bytes: 119.705 MiB/sec (59.853 MiB in 500.001 ms)
DES [commoncrypto] encrypt buffer size 1024 bytes: 111.320 MiB/sec (55.660 MiB in 500.001 ms)
DES [commoncrypto] decrypt buffer size 1024 bytes: 111.392 MiB/sec (55.696 MiB in 500.001 ms)
TripleDES encrypt buffer size 1024 bytes: 42.575 MiB/sec (21.288 MiB in 500.011 ms)
TripleDES decrypt buffer size 1024 bytes: 42.453 MiB/sec (21.227 MiB in 500.006 ms)
TripleDES [commoncrypto] encrypt buffer size 1024 bytes: 35.774 MiB/sec (17.888 MiB in 500.026 ms)
TripleDES [commoncrypto] decrypt buffer size 1024 bytes: 35.779 MiB/sec (17.890 MiB in 500.005 ms)

So in all cases except SHA-224 and SHA-256, CommonCrypto is slower than Botan's implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants