New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce number of copies in ssl transport layer #14058

Closed
euroelessar opened this Issue Jan 18, 2018 · 2 comments

Comments

Projects
None yet
4 participants
@euroelessar
Copy link
Contributor

euroelessar commented Jan 18, 2018

Please answer these questions before submitting your issue.

Should this be an issue in the gRPC issue tracker?

Create new issues for bugs and feature requests. An issue needs to be actionable. General gRPC discussions and usage questions belong to:

Please don't double post your questions in more locations; we are monitoring both channels, and the time spent de-duplicating questions is better spent answering more user questions.

What version of gRPC and what language are you using?

1.8.4, grpc-core (proprietary binding)

What operating system (Linux, Windows, …) and version?

Linux, Ubuntu 16.04

What runtime / compiler are you using (e.g. python version or version of gcc)

gcc-4.9

What did you do?

Run a load test with the following setup (and other way around):

  1. Use grpc with TLS encryption enabled
  2. Single completion queue on both server and client
  3. Server and client are located on different servers in datacenter
  4. Client concurrently sends 16MB requests to server in a loop
  5. Server responds with empty body

What did you expect to see?

Both server and client spend most of their time in tcp stack, transferring data to/from tcp stack and doing actual encryption. Effectively setup should be able to handle ~600-800 MBps per cpu core. (Non-tls grpc handles ~900 MBps per cpu core on the same hardware under the same load conditions).

What did you see instead?

Server spends 80% of cpu time inside BIO_read (based on pprof) and throughput is bounded by 50 MBps (~20 times performance degradation compared to disabled encryption).

Anything else we should know about your project / environment?

Apparently ssl_transport_security uses BIO_s_mem which has potentially O(n^2) runtime complexity for reading all data from buffer by using BIO_read operation.
It can be explained by the fact that read is implemented as:

  1. read prefix from buffer
  2. move leftover to the beginning of buffer
  3. if buffer is not empty goto 1

It can be mitigated by using BIO_pair instead as it uses ring buffer and therefore upper bounds number of performed copies. It allows to achieve throughput of 300 MBps for the described above test.

More gains can be achieved by implementing zero copy protector interface to pass slices to boringssl directly using custom BIO implementation (similar to chrome's approach).

@hsaliak

This comment has been minimized.

Copy link
Member

hsaliak commented Feb 15, 2018

@jboeuf @jiangtaoli2016 this was brought up in the community meeting today, could you please take a look and let us know what more we can do here to improve performance.

@jboeuf jboeuf removed their assignment Feb 15, 2018

@jiangtaoli2016

This comment has been minimized.

Copy link
Contributor

jiangtaoli2016 commented May 16, 2018

Given #14060 has been merged, I think we can close this issue.

@lock lock bot locked as resolved and limited conversation to collaborators Sep 29, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.