-
Notifications
You must be signed in to change notification settings - Fork 417
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add performance best practices guide (#790)
* Added performance best practices guide. * Fix link nit in performance guide. * Addresses most review comments * Resolve remaining comments * Fix wording of streaming RPCs and some re-formatting
- Loading branch information
1 parent
5422a52
commit 7f42f2a
Showing
1 changed file
with
104 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
--- | ||
title: Performance Best Practices | ||
description: A user guide of both general and language-specific best practices to improve performance. | ||
--- | ||
|
||
### General | ||
|
||
* Always **re-use stubs and channels** when possible. | ||
|
||
* **Use keepalive pings** to keep HTTP/2 connections alive during periods of | ||
inactivity to allow initial RPCs to be made quickly without a delay (i.e. | ||
C++ channel arg GRPC_ARG_KEEPALIVE_TIME_MS). | ||
|
||
* **Use streaming RPCs** when handling | ||
a long-lived logical flow of data from the client-to-server, | ||
server-to-client, or in both directions. Streams can avoid continuous RPC initiation, | ||
which includes connection load balancing at the client-side, starting a new | ||
HTTP/2 request at the transport layer, and invoking a user-defined method | ||
handler on the server side. | ||
|
||
Streams, however, cannot be load balanced once they have started and can be hard | ||
to debug for stream failures. They also might increase performance at a small scale | ||
but can reduce scalability due to load balancing and complexity, so they should | ||
only be used when they provide substantial performance or simplicity benefit to | ||
application logic. Use streams to optimize the application, not gRPC. | ||
|
||
***Side note:*** *This does not apply to Python (see Python section for | ||
details).* | ||
|
||
* *(Special topic)* Each gRPC channel uses 0 or more HTTP/2 connections and each connection | ||
usually has a limit on the number of concurrent streams. When the number of | ||
active RPCs on the connection reaches this limit, additional RPCs are queued | ||
in the client and must wait for active RPCs to finish before they are sent. | ||
Applications with high load or long-lived streaming RPCs might see | ||
performance issues because of this queueing. There are two possible | ||
solutions: | ||
|
||
1. **Create a separate channel for each area of high load** in the | ||
application. | ||
|
||
2. **Use a pool of gRPC channels** to distribute RPCs over | ||
multiple connections (channels must have different channel args to | ||
prevent re-use so define a use-specific channel arg such as channel | ||
number). | ||
|
||
***Side note:*** *The gRPC team has plans to add a feature to fix these | ||
performance issues (see [grpc/grpc#21386](https://github.com/grpc/grpc/issues/21386) | ||
for more info), so any solution involving creating multiple channels | ||
is a temporary workaround that should eventually not be needed.* | ||
|
||
### C++ | ||
|
||
* **Do not use Sync API for performance sensitive servers.** If performance | ||
and/or resource consumption are not concerns, use the Sync API as it is the | ||
simplest to implement for low-QPS services. | ||
|
||
* **Favor callback API over other APIs for most RPCs**, given that the | ||
application can avoid all blocking operations or blocking operations can be | ||
moved to a separate thread. The callback API is easier to use than the | ||
completion-queue async API but is currently slower for truly high-QPS workloads. | ||
|
||
* If having to use the async completion-queue API, the **best scalability | ||
trade-off is having numcpu’s threads and one completion queue per thread**. | ||
|
||
* For the async completion-queue API, make sure to **register enough server | ||
requests for the desired level of concurrency** to avoid the server | ||
continuously getting stuck in a slow path that results in essentially serial | ||
request processing. | ||
|
||
* *(Special topic)* **Enable write batching in streams** if message k + 1 does not rely on | ||
responses from message k by passing a WriteOptions argument to Write with | ||
buffer_hint set: \ | ||
`stream_writer->Write(message, WriteOptions().set_buffer_hint());` | ||
|
||
* *(Special topic)* | ||
[gRPC::GenericStub](https://grpc.github.io/grpc/cpp/grpcpp_2generic_2generic__stub_8h.html) | ||
can be useful in certain cases when there is high contention / CPU time | ||
spent on proto serialization. This class allows the application to directly | ||
send **raw gRPC::ByteBuffer as data** rather than serializing from some | ||
proto. This can also be helpful if the same data is being sent multiple | ||
times, with one explicit proto-to-ByteBuffer serialization followed by | ||
multiple ByteBuffer sends. | ||
|
||
### Java | ||
|
||
* **Use non-blocking stubs** to parallelize RPCs. | ||
|
||
* **Provide a custom executor that limits the number of threads, based on your workload** (cached (default), fixed, forkjoin, etc). | ||
|
||
### Python | ||
|
||
* Streaming RPCs create extra threads for receiving and possibly sending the | ||
messages, which makes **streaming RPCs much slower than unary RPCs** in | ||
gRPC Python, unlike the other languages supported by gRPC. | ||
|
||
* **Using [asyncio](https://grpc.github.io/grpc/python/grpc_asyncio.html)** could improve performance. | ||
|
||
* Using the future API in the sync stack results in the creation of an extra | ||
thread. **Avoid the future API** if possible. | ||
|
||
* *(Experimental)* An experimental **single-threaded unary-stream | ||
implementation** is available via the | ||
[SingleThreadedUnaryStream channel option](https://github.com/grpc/grpc/blob/master/src/python/grpcio/grpc/experimental/__init__.py#L38), | ||
which can save up to 7% latency per message. |