Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: Final iteration on load/store #3869

Merged

Conversation

reneme
Copy link
Collaborator

@reneme reneme commented Jan 3, 2024

Description

This should be the final iteration on the big/little endian helpers for now (modulo the eventual removal of the legacy ptr-based overloads).

The patch fulfills three main objectives:

  1. Remove the inherent code duplication of the load/store overloads for big-endian and little endian
    This is achieved by introducing detail::load_any<> and detail::store_any<> with a template parameter Endianness. By passing this all the way to the lowest-level overload, we can do the distinction once and implement convenience overloads without duplicating them each time (+more duplication for legacy overloads)
  2. Support strong types and enums that wrap unsigned integers
    This allows for things like: load_be<MyIntStrongType>(some_bytes), or store_be(some_enum_value). It makes working with strong types and enums more convenient when loading/storing them.
  3. Allow loading/storing collections of unsigned integers
    Before, one could load/store multiple variables at once (passed as variadic parameters). Now, its possible to pass ranges of integers (similarly to typecast_copy()). That is useful initializing and extracting internal hash states (that are collections of uint64_t), for instance.
  4. constexpr
    The whole range of features described above may also be used at compile-time. For instance, to initialize algorithm parameters that need a certain endianness.

Quick and Dirty Benchmark

Running ./botan speed both on the feature branch and on the master commit it is based on, both compiled with clang-15 on ubuntu 22.04 (running in WSL). Is that a reasonable smoke test?

Many algorithms actually seem to benefit from this patch (without modifying the used load/store overloads). For instance, AES encryption/decryption (using NI extensions) clocks in at a 7% speedup. Is that even realistic? 😲 Keccak seems to suffer slightly. For instance, the SHAKE throughput is about 2% less.

Here's the full data set: bench_loadstore_refactoring.txt

@reneme reneme force-pushed the chore/more_loadstore_convenience branch 2 times, most recently from cab16ed to 5ad0f31 Compare January 3, 2024 15:27
@coveralls
Copy link

coveralls commented Jan 3, 2024

Coverage Status

coverage: 92.048% (+0.04%) from 92.006%
when pulling dde4cba on Rohde-Schwarz:chore/more_loadstore_convenience
into 13c7e5f on randombit:master.

@reneme reneme force-pushed the chore/more_loadstore_convenience branch from 5ad0f31 to 4c9f780 Compare January 3, 2024 17:14
@reneme reneme force-pushed the chore/more_loadstore_convenience branch from 4c9f780 to 530ad51 Compare January 4, 2024 08:53
@reneme reneme changed the title Chore: More loadstor.h modernization Chore: Replace copy_out_{le/be} where possible Jan 4, 2024
@reneme reneme changed the title Chore: Replace copy_out_{le/be} where possible Chore: Replace copy_out_{le/be} (loadstor.h) where possible Jan 4, 2024
@reneme reneme added this to the Botan 3.4.0 milestone Jan 4, 2024
@reneme reneme marked this pull request as ready for review January 4, 2024 09:59
@reneme reneme mentioned this pull request Jan 4, 2024
@reneme reneme force-pushed the chore/more_loadstore_convenience branch from cb8089a to 8aa12b3 Compare January 10, 2024 16:36
@reneme reneme marked this pull request as draft January 10, 2024 16:37
@reneme reneme force-pushed the chore/more_loadstore_convenience branch 2 times, most recently from 7c3c0f1 to bc377d6 Compare January 11, 2024 14:57
@reneme reneme changed the title Chore: Replace copy_out_{le/be} (loadstor.h) where possible Refactor: Final iteration on load/store Jan 11, 2024
@reneme reneme force-pushed the chore/more_loadstore_convenience branch 4 times, most recently from c5f52a1 to a177713 Compare January 11, 2024 17:11
@reneme reneme marked this pull request as ready for review January 11, 2024 17:22
@reneme reneme force-pushed the chore/more_loadstore_convenience branch from 675cf4c to 2bd8195 Compare January 17, 2024 07:38
@reneme
Copy link
Collaborator Author

reneme commented Jan 17, 2024

Rebased after #3888.

@reneme reneme mentioned this pull request Jan 25, 2024
14 tasks
reneme and others added 3 commits February 16, 2024 16:15
This refactors the load/store helpers to duplicating all overloads,
for big-endian and little-endian functionality. We hope to reduce the
complexity and maintenance burden of this code.

As a side-effect this also centralizes some of the pre-processor defines
into low-level helper methods. The actual logic now uses `if constexpr`
based on those helpers. Also, the support for legacy overloads (using
pointers or C-arrays) produces much less clutter now.

Additionally, the load/store helpers can now handle strong types that
wrap unsigned integer values.

Co-Authored-By: Fabian Albert <fabian.albert@rohde-schwarz.com>
This allows loading and storing of entire collections of unsigned
integers from/into byte ranges. That can be used to extract a
digest from the hash function's internal state array, for instance.

Co-Authored-By: Fabian Albert <fabian.albert@rohde-schwarz.com>
@reneme reneme force-pushed the chore/more_loadstore_convenience branch from 2bd8195 to 74012f9 Compare February 16, 2024 15:23
@reneme
Copy link
Collaborator Author

reneme commented Feb 16, 2024

Rebased and resolved conflicts after #3908 got merged.

src/lib/utils/loadstor.h Outdated Show resolved Hide resolved
Copy link
Owner

@randombit randombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m a little skeptical of this change simply because it’s +900/-300 just to keep feature parity, and the new code is rather more complicated than what came before, which does not seem like a simplification so much to me 🙂 but I can see the benefits re strong types and consistent constexpr.

As a smoke test botan speed seems to me fine, but 7% for AES-NI seems quite surprising! For one, because AES-NI uses SIMD_4x32::load_le which doesn’t use this load infrastructure at all - it basically is just a _mm_loadu_si128. (And likewise for stores.)

I’ll look into benchmarks on my machine asap. Approving in so far as - assuming we can clearly demonstrate no significant regressions - change seems ok.

@reneme
Copy link
Collaborator Author

reneme commented Mar 19, 2024

I'd like to argue that the this adds almost 400 lines of test code. 🙂 However, admittedly, the code is denser than before, that's for sure.

@randombit
Copy link
Owner

I checked times of this PR vs master on some algorithms that seemed likely to be impacted by performance of load/store (especially hash functions), nothing notable so ok to merge.

@reneme reneme merged commit 94b1e77 into randombit:master Mar 21, 2024
39 checks passed
@reneme reneme deleted the chore/more_loadstore_convenience branch March 21, 2024 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants