Skip to content
This repository has been archived by the owner on Sep 21, 2023. It is now read-only.

Evaluate defining the PublishEvents() RPC as a stream #63

Open
cmacknz opened this issue Jun 21, 2022 · 0 comments
Open

Evaluate defining the PublishEvents() RPC as a stream #63

cmacknz opened this issue Jun 21, 2022 · 0 comments
Assignees
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.5.0

Comments

@cmacknz
Copy link
Member

cmacknz commented Jun 21, 2022

The primary RPC for sending events to the shipper is PublishEvents, currently defined as:

// Publishes a list of events via the Elastic agent shipper.
// Blocks until all processing steps complete and data is written to the queue.
// The order of `PublishRequest.events` always matches `PublishReply.results`.
//
// Returns the `codes.ResourceExhausted` gRPC status code if the queue is full and none of the events
// can be accepted at the moment.
//
// If the queue could accept some events from the request, this returns a successful response
// containing results for the first K events that were accepted by the queue.
// The client is expected to retry sending the rest of the events in a separate request later.
//
// Inputs may execute multiple concurrent Produce requests for independent data streams.
// The order in which concurrent requests complete is not guaranteed. Use sequential requests to
// control ordering.
rpc PublishEvents(PublishRequest) returns (PublishReply);

We should evaluate whether it would be more efficient to implement this RPC as a stream, or provide a streaming version of the RPC. The gRPC performance best practices guide (https://grpc.io/docs/guides/performance/) suggests using streaming RPCs for long lived exchanges, like those between most inputs and the shipper:

Use streaming RPCs when handling a long-lived logical flow of data from the client-to-server, server-to-client, or in both directions. Streams can avoid continuous RPC initiation, which includes connection load balancing at the client-side, starting a new HTTP/2 request at the transport layer, and invoking a user-defined method handler on the server side.

Streams, however, cannot be load balanced once they have started and can be hard to debug for stream failures. They also might increase performance at a small scale but can reduce scalability due to load balancing and complexity, so they should only be used when they provide substantial performance or simplicity benefit to application logic. Use streams to optimize the application, not gRPC.

Side note: This does not apply to Python (see Python section for details).

(Special topic) Each gRPC channel uses 0 or more HTTP/2 connections and each connection usually has a limit on the number of concurrent streams. When the number of active RPCs on the connection reaches this limit, additional RPCs are queued in the client and must wait for active RPCs to finish before they are sent. Applications with high load or long-lived streaming RPCs might see performance issues because of this queueing. There are two possible solutions:

Create a separate channel for each area of high load in the application.

Use a pool of gRPC channels to distribute RPCs over multiple connections (channels must have different channel args to prevent re-use so define a use-specific channel arg such as channel number).

Side note: The gRPC team has plans to add a feature to fix these performance issues (see grpc/grpc#21386 for more info), so any solution involving creating multiple channels is a temporary workaround that should eventually not be needed.

Note that the agent is planning to migrate the gRPC connection from using TCP to named pipes/Unix domain sockets which may affect our choices here: elastic/elastic-agent#218

@cmacknz cmacknz added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.4.0 labels Jun 21, 2022
@jlind23 jlind23 added v8.5.0 and removed v8.4.0 labels Jul 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.5.0
Projects
None yet
Development

No branches or pull requests

3 participants