Evaluate defining the PublishEvents() RPC as a stream #63

cmacknz · 2022-06-21T16:08:35Z

The primary RPC for sending events to the shipper is PublishEvents, currently defined as:

Lines 13 to 27 in 504c1e1

    
           // Publishes a list of events via the Elastic agent shipper. 
        
           // Blocks until all processing steps complete and data is written to the queue. 
        
           // The order of `PublishRequest.events` always matches `PublishReply.results`. 
        
           // 
        
           // Returns the `codes.ResourceExhausted` gRPC status code if the queue is full and none of the events 
        
           // can be accepted at the moment. 
        
           // 
        
           // If the queue could accept some events from the request, this returns a successful response 
        
           // containing results for the first K events that were accepted by the queue. 
        
           // The client is expected to retry sending the rest of the events in a separate request later. 
        
           // 
        
           // Inputs may execute multiple concurrent Produce requests for independent data streams.  
        
           // The order in which concurrent requests complete is not guaranteed. Use sequential requests to 
        
           // control ordering. 
        
           rpc PublishEvents(PublishRequest) returns (PublishReply);

We should evaluate whether it would be more efficient to implement this RPC as a stream, or provide a streaming version of the RPC. The gRPC performance best practices guide (https://grpc.io/docs/guides/performance/) suggests using streaming RPCs for long lived exchanges, like those between most inputs and the shipper:

Use streaming RPCs when handling a long-lived logical flow of data from the client-to-server, server-to-client, or in both directions. Streams can avoid continuous RPC initiation, which includes connection load balancing at the client-side, starting a new HTTP/2 request at the transport layer, and invoking a user-defined method handler on the server side.

Streams, however, cannot be load balanced once they have started and can be hard to debug for stream failures. They also might increase performance at a small scale but can reduce scalability due to load balancing and complexity, so they should only be used when they provide substantial performance or simplicity benefit to application logic. Use streams to optimize the application, not gRPC.

Side note: This does not apply to Python (see Python section for details).

(Special topic) Each gRPC channel uses 0 or more HTTP/2 connections and each connection usually has a limit on the number of concurrent streams. When the number of active RPCs on the connection reaches this limit, additional RPCs are queued in the client and must wait for active RPCs to finish before they are sent. Applications with high load or long-lived streaming RPCs might see performance issues because of this queueing. There are two possible solutions:

Create a separate channel for each area of high load in the application.

Use a pool of gRPC channels to distribute RPCs over multiple connections (channels must have different channel args to prevent re-use so define a use-specific channel arg such as channel number).

Side note: The gRPC team has plans to add a feature to fix these performance issues (see grpc/grpc#21386 for more info), so any solution involving creating multiple channels is a temporary workaround that should eventually not be needed.

Note that the agent is planning to migrate the gRPC connection from using TCP to named pipes/Unix domain sockets which may affect our choices here: elastic/elastic-agent#218

The text was updated successfully, but these errors were encountered:

cmacknz added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.4.0 labels Jun 21, 2022

cmacknz assigned rdner Jun 21, 2022

cmacknz mentioned this issue Jun 21, 2022

Implement the shipper's gRPC API #34

Closed

10 tasks

jlind23 added v8.5.0 and removed v8.4.0 labels Jul 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate defining the PublishEvents() RPC as a stream #63

Evaluate defining the PublishEvents() RPC as a stream #63

cmacknz commented Jun 21, 2022

Evaluate defining the PublishEvents() RPC as a stream #63

Evaluate defining the PublishEvents() RPC as a stream #63

Comments

cmacknz commented Jun 21, 2022