Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORE-6: Pre-fetching algorithms #17003

Merged
merged 5 commits into from
Mar 15, 2024

Conversation

michael-redpanda
Copy link
Contributor

@michael-redpanda michael-redpanda commented Mar 11, 2024

This change improves the performance of the crypto library by pre-fetching the MD and HMAC operations.

See the performance section in OpenSSL's docs

Fixes: https://github.com/redpanda-data/core-internal/issues/1152
Fixes: https://github.com/redpanda-data/core-internal/issues/1154

CORE-8

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

  • none

@michael-redpanda
Copy link
Contributor Author

Force push 213c15e:

  • Rebased off dev due to conflict
  • Added perf tests

@michael-redpanda
Copy link
Contributor Author

Results:

non-FIPS:

182: single run iterations:    0
182: single run duration:      1.000s
182: number of runs:           5
182: number of cores:          1
182: random seed:              2394275564
182:
182: test                         iterations      median         mad         min         max      allocs       tasks        inst
182: openssl_perf.md5_1k           958464000     0.974ns     0.000ns     0.972ns     0.974ns       0.003       0.000         0.0
182: openssl_perf.sha256_1k       2172928000     0.461ns     0.000ns     0.461ns     0.461ns       0.004       0.000         0.0
182: openssl_perf.sha512_1k       1044480000     0.959ns     0.001ns     0.958ns     0.964ns       0.004       0.000         0.0
182: gnutls.md5_1k                 946176000     1.056ns     0.000ns     1.056ns     1.057ns       0.002       0.000         0.0
182: gnutls.sha256_1k             2406400000     0.416ns     0.000ns     0.415ns     0.419ns       0.002       0.000         0.0
182: gnutls.sha512_1k             1029120000     0.972ns     0.000ns     0.971ns     0.972ns       0.002       0.000         0.0
182: openssl_perf.hmac_sha256_1k  1004544000     0.997ns     0.000ns     0.997ns     1.000ns       0.015       0.000         0.0
182: openssl_perf.hmac_sha512_1k   562176000     1.768ns     0.001ns     1.767ns     1.770ns       0.015       0.000         0.0
182: gnutls.hmac_sha256_1k        1613824000     0.619ns     0.000ns     0.619ns     0.620ns       0.003       0.000         0.0
182: gnutls.hmac_sha512_1k         668672000     1.500ns     0.000ns     1.499ns     1.500ns       0.003       0.000         0.0

FIPS:

183: single run iterations:    0
183: single run duration:      1.000s
183: number of runs:           5
183: number of cores:          1
183: random seed:              3628882004
183:
183: test                         iterations      median         mad         min         max      allocs       tasks        inst
183: openssl_perf.sha256_1k       2163712000     0.463ns     0.000ns     0.463ns     0.464ns       0.004       0.000         0.0
183: openssl_perf.sha512_1k       1041408000     0.961ns     0.001ns     0.957ns     0.961ns       0.004       0.000         0.0
183: gnutls.md5_1k                 943104000     1.059ns     0.000ns     1.059ns     1.060ns       0.002       0.000         0.0
183: gnutls.sha256_1k             2404352000     0.416ns     0.000ns     0.416ns     0.417ns       0.002       0.000         0.0
183: gnutls.sha512_1k             1033216000     0.971ns     0.000ns     0.967ns     0.971ns       0.002       0.000         0.0
183: openssl_perf.hmac_sha256_1k   998400000     1.010ns     0.001ns     1.009ns     1.011ns       0.015       0.000         0.0
183: openssl_perf.hmac_sha512_1k   564224000     1.778ns     0.000ns     1.777ns     1.779ns       0.015       0.000         0.0
183: gnutls.hmac_sha256_1k        1624064000     0.617ns     0.000ns     0.616ns     0.617ns       0.003       0.000         0.0
183: gnutls.hmac_sha512_1k         670720000     1.495ns     0.001ns     1.492ns     1.496ns       0.003       0.000         0.0

@michael-redpanda
Copy link
Contributor Author

[CORE-6]

@michael-redpanda
Copy link
Contributor Author

Force push f8af602:

  • Seeing if JIRA integration works with changing initial commit to reference CORE-6

@michael-redpanda michael-redpanda changed the title Pre-fetching algorithms CORE-6: Pre-fetching algorithms Mar 13, 2024
@mergify mergify bot mentioned this pull request Mar 13, 2024
6 tasks

namespace internal {
EVP_MD* get_md(digest_type type) {
// Map of pre-fetched MD pointers. This replaces the older way of getting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

older way

this being "implicit fetch"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

if (!md_ptr) {
throw ossl_error(fmt::format("Failed to fetch algorithm {}", alg));
}
md_map.insert_or_assign(type, EVP_MD_ptr(md_ptr));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[[maybe_unused]] auto res = md_map.insert_or_assign(type, EVP_MD_ptr(md_ptr));
assert(res.second); // otherwise we'd leak the fetched algo ptr

or just a comment, tho assert won't have any performance impact in release build.

case digest_type::SHA512:
return sha512_params;
default:
vassert(false, "NOPE");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isn't a covering set, the assertion is nice. But maybe print out what it is, like MD5 looks like it could be the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, completely forgot to change that message... meant to go back and fix that

@michael-redpanda
Copy link
Contributor Author

Force push e6bb97f:

  • Updated per PR comments

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment on lines 61 to 63
[[maybe_unused]] auto res = md_map.insert_or_assign(
type, EVP_MD_ptr(md_ptr));
vassert(
res.second, "Failed to insert/create the fetched algorithm {}", alg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine. just fyi, you don't need [[maybe_unused]] if you use vassert because vassert is always compiled in so res is always used. but with vanilla assert it is compiled out in Release builds so res is not used and you need the annotation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doh!

CORE-6: Fixes this issue

Signed-off-by: Michael Boquard <michael@redpanda.com>
Similar to pre-fetching the MD, this will pre-fetch the HMAC
provider.

Signed-off-by: Michael Boquard <michael@redpanda.com>
Signed-off-by: Michael Boquard <michael@redpanda.com>
Signed-off-by: Michael Boquard <michael@redpanda.com>
Added perf tests to compare GnuTLS vs OpenSSL both in and out
of FIPS mode.

Signed-off-by: Michael Boquard <michael@redpanda.com>
@michael-redpanda
Copy link
Contributor Author

Force push 7425d18:

  • Removed unncessary [[maybe_unused]]

Copy link
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

src/v/crypto/crypto.cc Show resolved Hide resolved
src/v/crypto/crypto.cc Show resolved Hide resolved
Comment on lines +144 to +150
static size_t test_body(size_t msg_len, F n) {
auto buffer = random_generators::gen_alphanum_string(msg_len);
for (auto i = inner_iters; i--;) {
auto s = n(buffer);
perf_tests::do_not_optimize(s);
}
perf_tests::stop_measuring_time();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: does the time measurement include the string generation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it does

Copy link
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

auto buffer = random_generators::gen_alphanum_string(msg_len);
for (auto i = inner_iters; i--;) {
auto s = n(buffer);
perf_tests::do_not_optimize(s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the entire call of n(buffer) be passed to do_not_optimize? Here it seems just its return value is

@michael-redpanda michael-redpanda merged commit f7c03fd into redpanda-data:dev Mar 15, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants