-
Notifications
You must be signed in to change notification settings - Fork 568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: Final iteration on load/store #3869
Refactor: Final iteration on load/store #3869
Conversation
cab16ed
to
5ad0f31
Compare
5ad0f31
to
4c9f780
Compare
4c9f780
to
530ad51
Compare
cb8089a
to
8aa12b3
Compare
7c3c0f1
to
bc377d6
Compare
c5f52a1
to
a177713
Compare
675cf4c
to
2bd8195
Compare
Rebased after #3888. |
This refactors the load/store helpers to duplicating all overloads, for big-endian and little-endian functionality. We hope to reduce the complexity and maintenance burden of this code. As a side-effect this also centralizes some of the pre-processor defines into low-level helper methods. The actual logic now uses `if constexpr` based on those helpers. Also, the support for legacy overloads (using pointers or C-arrays) produces much less clutter now. Additionally, the load/store helpers can now handle strong types that wrap unsigned integer values. Co-Authored-By: Fabian Albert <fabian.albert@rohde-schwarz.com>
This allows loading and storing of entire collections of unsigned integers from/into byte ranges. That can be used to extract a digest from the hash function's internal state array, for instance. Co-Authored-By: Fabian Albert <fabian.albert@rohde-schwarz.com>
2bd8195
to
74012f9
Compare
Rebased and resolved conflicts after #3908 got merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m a little skeptical of this change simply because it’s +900/-300 just to keep feature parity, and the new code is rather more complicated than what came before, which does not seem like a simplification so much to me 🙂 but I can see the benefits re strong types and consistent constexpr
.
As a smoke test botan speed
seems to me fine, but 7% for AES-NI seems quite surprising! For one, because AES-NI uses SIMD_4x32::load_le
which doesn’t use this load infrastructure at all - it basically is just a _mm_loadu_si128
. (And likewise for stores.)
I’ll look into benchmarks on my machine asap. Approving in so far as - assuming we can clearly demonstrate no significant regressions - change seems ok.
I'd like to argue that the this adds almost 400 lines of test code. 🙂 However, admittedly, the code is denser than before, that's for sure. |
I checked times of this PR vs master on some algorithms that seemed likely to be impacted by performance of load/store (especially hash functions), nothing notable so ok to merge. |
Description
This should be the final iteration on the big/little endian helpers for now (modulo the eventual removal of the legacy ptr-based overloads).
The patch fulfills three main objectives:
This is achieved by introducing
detail::load_any<>
anddetail::store_any<>
with a template parameterEndianness
. By passing this all the way to the lowest-level overload, we can do the distinction once and implement convenience overloads without duplicating them each time (+more duplication for legacy overloads)This allows for things like:
load_be<MyIntStrongType>(some_bytes)
, orstore_be(some_enum_value)
. It makes working with strong types and enums more convenient when loading/storing them.Before, one could load/store multiple variables at once (passed as variadic parameters). Now, its possible to pass ranges of integers (similarly to
typecast_copy()
). That is useful initializing and extracting internal hash states (that are collections ofuint64_t
), for instance.The whole range of features described above may also be used at compile-time. For instance, to initialize algorithm parameters that need a certain endianness.
Quick and Dirty Benchmark
Running
./botan speed
both on the feature branch and on themaster
commit it is based on, both compiled with clang-15 on ubuntu 22.04 (running in WSL). Is that a reasonable smoke test?Many algorithms actually seem to benefit from this patch (without modifying the used load/store overloads). For instance, AES encryption/decryption (using NI extensions) clocks in at a 7% speedup. Is that even realistic? 😲 Keccak seems to suffer slightly. For instance, the SHAKE throughput is about 2% less.
Here's the full data set: bench_loadstore_refactoring.txt