Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Over a third of CPU usage is unaccelerated cryptographic operations on arm64 #16105

Closed
asheldon opened this issue Apr 21, 2021 · 5 comments · Fixed by #16104
Closed
Assignees

Comments

@asheldon
Copy link

asheldon commented Apr 21, 2021

The BoringSSL dependency used by Envoy does not use CPU intrinsics for cryptographic acceleration, falling back to unaccelerated mode.

Description

We profiled Envoy on both AWS Graviton2 machines and AWS c5 instances on the same workload at the same time. We found that on Graviton2, we are spending over a third of our time in BoringSSL's nohw methods and virtually none in corresponding methods on our amd64 instances. For example, we spend 0.0091% of our CPU time in CRYPTO_gcm128_encrypt_ctr32 on amd64, but a whopping 12.22% of our CPU time in that method on Graviton2.

On Graviton2, we spend a net of over a third of our CPU time in nohw BoringSSL methods.

We believe the issue is that the fork of BoringSSL that Envoy relies on lacks direct support for the arm64 architecture.

pprof output

Top Methods, amd64

__nss_passwd_lookup
tcmalloc::SLL_TryPop
Envoy::Http::HeaderMapImpl::StaticLookupTable::lookup
operator delete

Top Methods, arm64

gcm_mul64_nohw
aes_nohw_xor
gcm_polyval_nohw
aes_nohw_sub_bytes

Flame Graphs

AWS c5

Screen Shot 2021-04-21 at 3 00 20 PM

AWS Graviton2

Screen Shot 2021-04-21 at 3 00 01 PM

AWS Graviton2, filtered to nohw

Screen Shot 2021-04-21 at 3 01 13 PM

Repro steps:

  • Build Envoy for arm64 and run traffic through it on Graviton2
    • In our case, we compiled with -march=armv8.2-a+crypto
  • Profile Envoy
  • Observe significant CPU spent in unaccelerated cryptographic operations
@asheldon asheldon added bug triage Issue requires triage labels Apr 21, 2021
@jensengrey
Copy link

Thanks @asheldon, folks on the Envoy team are currently testing this patch

#16104

@PiotrSikora
Copy link
Contributor

@asheldon if you could build Envoy from #16104 and verify that the performance improved, that would be great!

@asheldon
Copy link
Author

Do you have a branch based on Envoy 1.17.2 I can build? We aren't on 1.18 yet, but this patch looks straight-forward enough to backport and verify.

@PiotrSikora
Copy link
Contributor

@asheldon I don't, but you can simply download updated bazel/boringssl_static.patch.

@asheldon
Copy link
Author

I think this patch works.

I haven't put this into production, so I can't share a comparable profile or comment on performance improvements yet, but I can confirm that my non-production profiles now have no methods with nohw in them and methods like aes_hw_ctr32_encrypt_blocks now exist.

@asraa asraa added area/tls and removed triage Issue requires triage labels Apr 23, 2021
htuch pushed a commit that referenced this issue May 2, 2021
Fixes #16105.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
gokulnair pushed a commit to gokulnair/envoy that referenced this issue May 6, 2021
Fixes envoyproxy#16105.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Signed-off-by: Gokul Nair <gnair@twitter.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants