feat: add grpc socket and flattn tx batches to allow for lower allocations#3297
feat: add grpc socket and flattn tx batches to allow for lower allocations#3297tac0turtle merged 5 commits intomainfrom
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds Unix domain socket transport for the gRPC execution endpoint and replaces per-tx repeated Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Caller
participant Client as gRPC Client
participant H2 as HTTP/2 (h2c)
participant Unix as Unix Socket
participant Server as gRPC Server
participant Exec as Executor
Caller->>Client: ExecuteTxs(txs [][]byte)
Client->>Client: encodeTxBatch(txs)
alt unix:// target
Client->>H2: create h2c transport with unix dialer
H2->>Unix: dial socket path
else tcp/http target
Client->>H2: create h2c transport over TCP
end
Client->>Server: Send ExecuteTxsRequest{tx_batch} (over HTTP/2)
Server->>Server: decodeTxBatch(req.TxBatch)
Server->>Exec: Execute transactions
Exec-->>Server: Result
Server-->>Client: ExecuteTxsResponse
Client-->>Caller: Return result
sequenceDiagram
participant Client as gRPC Client
participant Encoder as TxBatch Encoder
participant Server as gRPC Server
participant Decoder as TxBatch Decoder
Client->>Encoder: encodeTxBatch(txs)
Encoder-->>Client: TxBatch{data, tx_sizes}
Client->>Server: Send RPC with TxBatch
Server->>Decoder: decodeTxBatch(TxBatch)
Decoder-->>Server: [][]byte txs
Server->>Server: Process txs (Filter/Execute)
Server-->>Client: Response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
|
Claude finished @tac0turtle's task in 2m 4s —— View job Code Review: feat: add grpc socket and flattn tx batchesTasks
OverviewThis PR adds two performance improvements to the gRPC execution layer: Unix domain socket transport to avoid TCP overhead when the executor runs on the same machine, and a contiguous 🔴 Issues1.
func decodeTxBatch(batch *pb.TxBatch) ([][]byte, error) {
if batch == nil {
return nil, nil // ← silent empty result
}When an older or buggy client omits 2.
message ExecuteTxsResponse {
bytes updated_state_root = 1;
uint64 max_bytes = 2; // ← never set by server, never read by client
}The server ( 3.
result, err := s.executor.FilterTxs(ctx, txs, req.Msg.MaxBytes, req.Msg.MaxGas, ...)
// ↑ no check that len(result) == len(txs)If the executor returns fewer statuses than transactions (e.g., a buggy implementation), the client will silently get a truncated status list. Add a post-call assertion: 🟡 Warnings4.
maxInt := uint64(int(^uint(0) >> 1))This is correct but hard to read. The equivalent 5.
mockFunc: func(ctx context.Context, t [][]byte, ...) ([]byte, error) {
return expectedStateRoot, nil // ← never inspects `t`
},The mock never checks that 6.
if len(resp.Msg.Statuses) != len(expectedStatuses) { // ← length onlyA wrong enum mapping (e.g., 7.
return http.Serve(listener, NewExecutorServiceHandler(executor, opts...))Callers must close the listener externally to stop the server. Accepting a 🟢 Positive Observations
Field reservations in the proto are correct: All removed Error wrapping is consistent: All paths use
SummaryThree actionable issues worth addressing in follow-up:
The unix socket implementation and TxBatch encoding are sound. Test coverage is adequate but the |
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3297 +/- ##
==========================================
+ Coverage 62.50% 62.71% +0.21%
==========================================
Files 122 124 +2
Lines 13131 13248 +117
==========================================
+ Hits 8207 8309 +102
- Misses 4032 4037 +5
- Partials 892 902 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
proto/evnode/v1/execution.proto (1)
63-67: Clarify precedence when bothtxsandtx_batchare populated.Please document a single authoritative rule (e.g., "
tx_batchtakes precedence;txsis legacy fallback") to avoid cross-client ambiguity during migration.Also applies to: 84-85, 142-143
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proto/evnode/v1/execution.proto` around lines 63 - 67, The proto comments for the repeated bytes field txs and the TxBatch field tx_batch are ambiguous about which to use when both are populated; update the comments for txs and tx_batch (including the other occurrences at the ranges noted) to state a single authoritative rule: tx_batch takes precedence and txs is a legacy fallback (i.e., when tx_batch is present, implementations MUST ignore txs; when tx_batch is absent, clients MAY use txs for backward compatibility). Ensure the same wording is applied to every message where txs and tx_batch appear to avoid cross-client ambiguity.execution/grpc/unix.go (1)
33-47: Consider documenting thathttp.Serveblocks until error.The function works correctly, but callers should understand this blocks. The defer order is correct (LIFO): socket cleanup runs after listener close.
Minor note: ignoring errors from
listener.Close()andremoveStaleUnixSocketin defers is acceptable for cleanup, but logging could help debugging shutdown issues.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@execution/grpc/unix.go` around lines 33 - 47, Update the ListenAndServeUnix doc comment to explicitly state that http.Serve blocks until it returns an error and that the deferred listener.Close() and removeStaleUnixSocket() run when Serve returns (LIFO order); mention callers must expect blocking behavior and handle cancellation accordingly. While optional, add a brief note that errors from listener.Close() and removeStaleUnixSocket() are currently ignored in the defers and could be logged for shutdown diagnostics; reference ListenAndServeUnix, http.Serve, listener.Close, removeStaleUnixSocket, and NewExecutorServiceHandler when making this documentation change.execution/grpc/client.go (1)
50-64: Consider validating socket path at creation time.The empty socket path check inside
DialTLSContext(line 56-58) will only trigger on the first dial attempt rather than at client creation. SinceclientTransportForTargetalready extracts the path, validation could happen there to fail fast.That said, this is defensive coding within the closure, and the path extraction in
unixSocketPath(line 74-79) already ensures a non-empty path when the prefix is present. The check here guards against misuse ifnewUnixHTTP2Client("")is called directly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@execution/grpc/client.go` around lines 50 - 64, The client builder newUnixHTTP2Client currently defers an empty-socket check until DialTLSContext; change it to validate socketPath up-front and return an error instead of creating a client for an empty path. Update newUnixHTTP2Client signature to return (*http.Client, error), check if socketPath == "" at the start and return a clear error, and then construct the http.Client as before. Propagate the change to callers such as clientTransportForTarget (which calls newUnixHTTP2Client) so they handle the error and fail fast when unixSocketPath extraction yields an empty string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/grpc/README.md`:
- Line 3: Update the README intro sentence to use the correct indefinite
article: replace the phrase "a Evolve node" with "an Evolve node" in the README
content so the sentence reads "This application runs an Evolve node with a
single sequencer..." (look for the README sentence containing "a Evolve node" to
make the change).
In `@execution/grpc/server.go`:
- Around line 85-87: GetTxsResponse currently only sets TxBatch and drops the
legacy txs field; update the server handler that returns GetTxsResponse (the
GetTxs implementation in grpc/server.go) to also populate the legacy txs field
with an equivalent flattened slice of transactions derived from txBatch (or a
converted form matching the old txs element type) so older clients continue to
see the expected list; ensure both fields are kept in sync (construct txs from
txBatch before returning) and preserve existing types/names used in
GetTxsResponse, tx_batch, and txs.
---
Nitpick comments:
In `@execution/grpc/client.go`:
- Around line 50-64: The client builder newUnixHTTP2Client currently defers an
empty-socket check until DialTLSContext; change it to validate socketPath
up-front and return an error instead of creating a client for an empty path.
Update newUnixHTTP2Client signature to return (*http.Client, error), check if
socketPath == "" at the start and return a clear error, and then construct the
http.Client as before. Propagate the change to callers such as
clientTransportForTarget (which calls newUnixHTTP2Client) so they handle the
error and fail fast when unixSocketPath extraction yields an empty string.
In `@execution/grpc/unix.go`:
- Around line 33-47: Update the ListenAndServeUnix doc comment to explicitly
state that http.Serve blocks until it returns an error and that the deferred
listener.Close() and removeStaleUnixSocket() run when Serve returns (LIFO
order); mention callers must expect blocking behavior and handle cancellation
accordingly. While optional, add a brief note that errors from listener.Close()
and removeStaleUnixSocket() are currently ignored in the defers and could be
logged for shutdown diagnostics; reference ListenAndServeUnix, http.Serve,
listener.Close, removeStaleUnixSocket, and NewExecutorServiceHandler when making
this documentation change.
In `@proto/evnode/v1/execution.proto`:
- Around line 63-67: The proto comments for the repeated bytes field txs and the
TxBatch field tx_batch are ambiguous about which to use when both are populated;
update the comments for txs and tx_batch (including the other occurrences at the
ranges noted) to state a single authoritative rule: tx_batch takes precedence
and txs is a legacy fallback (i.e., when tx_batch is present, implementations
MUST ignore txs; when tx_batch is absent, clients MAY use txs for backward
compatibility). Ensure the same wording is applied to every message where txs
and tx_batch appear to avoid cross-client ambiguity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: acfb26c3-ba2a-423a-832a-163fd34eb7d3
⛔ Files ignored due to path filters (1)
types/pb/evnode/v1/execution.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (13)
apps/grpc/README.mdapps/grpc/cmd/run.goexecution/grpc/README.mdexecution/grpc/client.goexecution/grpc/client_test.goexecution/grpc/go.modexecution/grpc/server.goexecution/grpc/server_test.goexecution/grpc/tx_batch.goexecution/grpc/tx_batch_test.goexecution/grpc/unix.goexecution/grpc/unix_test.goproto/evnode/v1/execution.proto
| return connect.NewResponse(&pb.GetTxsResponse{ | ||
| Txs: txs, | ||
| TxBatch: txBatch, | ||
| }), nil |
There was a problem hiding this comment.
GetTxs currently drops legacy txs payload and can break older consumers.
GetTxsResponse still defines txs, but this response now only sets tx_batch. Any client not yet upgraded to decode tx_batch will observe an empty tx list.
🔁 Compatibility-safe response shape
return connect.NewResponse(&pb.GetTxsResponse{
+ Txs: txs, // legacy compatibility during migration
TxBatch: txBatch,
}), nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@execution/grpc/server.go` around lines 85 - 87, GetTxsResponse currently only
sets TxBatch and drops the legacy txs field; update the server handler that
returns GetTxsResponse (the GetTxs implementation in grpc/server.go) to also
populate the legacy txs field with an equivalent flattened slice of transactions
derived from txBatch (or a converted form matching the old txs element type) so
older clients continue to see the expected list; ensure both fields are kept in
sync (construct txs from txBatch before returning) and preserve existing
types/names used in GetTxsResponse, tx_batch, and txs.
julienrbrt
left a comment
There was a problem hiding this comment.
ACK.
We should rename the grpc service specifically (the service and the file imho), as it is only used there and because the execution/grpc is an adaption of the execution interface.
We could even move the proto definition under execution/grpc but that's more a nit that is annoying to do.
| option go_package = "github.com/evstack/ev-node/types/pb/evnode/v1"; | ||
|
|
||
| // ExecutorService defines the execution layer interface for EVNode | ||
| service ExecutorService { |
There was a problem hiding this comment.
| service ExecutionGRPCService { |
| message FilterTxsRequest { | ||
| // All transactions (force-included + mempool) | ||
| // All transactions (force-included + mempool). | ||
| repeated bytes txs = 1; |
There was a problem hiding this comment.
Let's kill the ones we don't use.
| ### Changes | ||
|
|
||
| - Optimization of mutex usage in cache for reaper [#3286](https://github.com/evstack/ev-node/pull/3286) | ||
| - Add Unix domain socket support for gRPC execution endpoints via `unix:///path/to/socket`. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
client/crates/types/src/proto/evnode.v1.messages.rs (1)
304-335:⚠️ Potential issue | 🟠 MajorKeep the legacy
txsencoding alive during rollout.Replacing
txswithtx_batchhere breaks mixed-version peers: olderGetTxsconsumers will see no transactions, and olderExecuteTxscallers will be parsed as empty batches by the new Go gRPC layer. Please make this change in the.protosource by carrying both encodings during migration, or gate the protocol version explicitly before deletingtxs, then regenerate.Based on learnings: "Maintain backward compatibility when modifying protobuf definitions".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/crates/types/src/proto/evnode.v1.messages.rs` around lines 304 - 335, GetTxsResponse and ExecuteTxsRequest now only carry tx_batch which breaks mixed-version peers; restore the legacy encoding by keeping the original txs fields alongside tx_batch in the .proto (e.g., re-add the previous txs field names/types) and regenerate the Rust types so GetTxsResponse and ExecuteTxsRequest include both txs and tx_batch; alternatively gate removal behind an explicit protocol version check so older peers still read/write the legacy txs encoding—update the proto, regenerate code, and ensure the generated structs (GetTxsResponse, ExecuteTxsRequest, TxBatch) preserve both encodings during rollout.
🧹 Nitpick comments (2)
execution/grpc/server_test.go (2)
208-218: Assert the decoded tx slice in the happy path.This case only checks the returned state root, so it still passes if
Server.ExecuteTxsdrops or reorders the decoded batch before calling the executor. Verifying thetxsargument inside the mock would pin down the newTxBatchdecode path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@execution/grpc/server_test.go` around lines 208 - 218, The test’s happy-path mock currently only checks the returned state root, so update the mockFunc used in the "success" case to assert that the decoded transaction slice passed to the executor matches the original txs from the test: inside mockFunc(ctx, t [][]byte, ...) decode or compare t against the expected encoded tx bytes produced by mustEncodeTxBatch (or the original txs variable) using reflect.DeepEqual or your test comparator and return a non-nil error if they differ; leave the successful return of expectedStateRoot when the slice matches. This targets Server.ExecuteTxs / ExecuteTxsRequest / mockFunc to ensure the TxBatch decode path preserves order and contents.
400-420: Tighten theFilterTxsassertions around both payload and status mapping.This test currently validates batch length / limits on the request side and only
len(resp.Msg.Statuses)on the response side. It would still pass if the decoded transactions were reordered or if theexecution.FilterStatus→ protobuf enum conversion returned the wrong values.Also applies to: 473-475
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@execution/grpc/server_test.go` around lines 400 - 420, The test's assertions in the "success" case are too weak: it only checks request length/limits in mockFunc and only len(resp.Msg.Statuses) on response, so reordering or incorrect enum mapping could slip through; update the test to assert that the decoded tx payloads passed into mockFunc equal the original tx bytes (use mustEncodeTxBatch and the mockFunc txsIn parameter) and that each returned execution.FilterStatus in expectedStatuses maps to the exact protobuf enum value in resp.Msg.Statuses (compare element-by-element rather than only lengths), and apply the same stronger checks to the second case around lines noted (ensure element-wise comparison between expectedStatuses and resp.Msg.Statuses and verify the order matches).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@execution/grpc/tx_batch.go`:
- Around line 44-52: The function decodeTxBatch should not treat a nil
*pb.TxBatch as an explicit empty batch; change the handling in decodeTxBatch so
that when batch == nil it returns a non-nil error (e.g., "missing tx_batch" or
similar) instead of (nil, nil), leaving the existing behavior for an actual
empty struct (&pb.TxBatch{}) intact; update callers such as ExecuteTxs /
FilterTxs (if they rely on nil meaning empty) to propagate or handle this error
accordingly so missing/omitted tx_batch from clients is rejected rather than
treated as zero transactions.
---
Duplicate comments:
In `@client/crates/types/src/proto/evnode.v1.messages.rs`:
- Around line 304-335: GetTxsResponse and ExecuteTxsRequest now only carry
tx_batch which breaks mixed-version peers; restore the legacy encoding by
keeping the original txs fields alongside tx_batch in the .proto (e.g., re-add
the previous txs field names/types) and regenerate the Rust types so
GetTxsResponse and ExecuteTxsRequest include both txs and tx_batch;
alternatively gate removal behind an explicit protocol version check so older
peers still read/write the legacy txs encoding—update the proto, regenerate
code, and ensure the generated structs (GetTxsResponse, ExecuteTxsRequest,
TxBatch) preserve both encodings during rollout.
---
Nitpick comments:
In `@execution/grpc/server_test.go`:
- Around line 208-218: The test’s happy-path mock currently only checks the
returned state root, so update the mockFunc used in the "success" case to assert
that the decoded transaction slice passed to the executor matches the original
txs from the test: inside mockFunc(ctx, t [][]byte, ...) decode or compare t
against the expected encoded tx bytes produced by mustEncodeTxBatch (or the
original txs variable) using reflect.DeepEqual or your test comparator and
return a non-nil error if they differ; leave the successful return of
expectedStateRoot when the slice matches. This targets Server.ExecuteTxs /
ExecuteTxsRequest / mockFunc to ensure the TxBatch decode path preserves order
and contents.
- Around line 400-420: The test's assertions in the "success" case are too weak:
it only checks request length/limits in mockFunc and only len(resp.Msg.Statuses)
on response, so reordering or incorrect enum mapping could slip through; update
the test to assert that the decoded tx payloads passed into mockFunc equal the
original tx bytes (use mustEncodeTxBatch and the mockFunc txsIn parameter) and
that each returned execution.FilterStatus in expectedStatuses maps to the exact
protobuf enum value in resp.Msg.Statuses (compare element-by-element rather than
only lengths), and apply the same stronger checks to the second case around
lines noted (ensure element-wise comparison between expectedStatuses and
resp.Msg.Statuses and verify the order matches).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 07fec0ff-4531-4cc9-9ab5-a9b357898657
⛔ Files ignored due to path filters (1)
types/pb/evnode/v1/execution.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (15)
CHANGELOG.mdapps/grpc/README.mdapps/grpc/cmd/run.gobuf.yamlclient/crates/types/src/proto/evnode.v1.messages.rsclient/crates/types/src/proto/evnode.v1.services.rsexecution/grpc/README.mdexecution/grpc/client.goexecution/grpc/client_test.goexecution/grpc/server.goexecution/grpc/server_test.goexecution/grpc/tx_batch.goexecution/grpc/tx_batch_test.goexecution/grpc/unix.goproto/evnode/v1/execution.proto
✅ Files skipped from review due to trivial changes (2)
- CHANGELOG.md
- buf.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
- apps/grpc/cmd/run.go
- execution/grpc/tx_batch_test.go
- proto/evnode/v1/execution.proto
- execution/grpc/README.md
- apps/grpc/README.md
- execution/grpc/unix.go
| func decodeTxBatch(batch *pb.TxBatch) ([][]byte, error) { | ||
| if batch == nil { | ||
| return nil, nil | ||
| } | ||
| if len(batch.TxSizes) == 0 { | ||
| if len(batch.Data) != 0 { | ||
| return nil, fmt.Errorf("tx batch has %d data bytes but no tx sizes", len(batch.Data)) | ||
| } | ||
| return nil, nil |
There was a problem hiding this comment.
Don't treat a missing batch as an explicit empty batch.
Returning (nil, nil) for batch == nil makes every caller interpret an omitted tx_batch as “zero transactions”. That hides mixed-version/client bugs and lets ExecuteTxs / FilterTxs proceed with empty work instead of rejecting the request. An explicit empty batch is already representable as &pb.TxBatch{}.
🛠️ Proposed fix
func decodeTxBatch(batch *pb.TxBatch) ([][]byte, error) {
if batch == nil {
- return nil, nil
+ return nil, fmt.Errorf("tx_batch is required")
}
if len(batch.TxSizes) == 0 {
if len(batch.Data) != 0 {
return nil, fmt.Errorf("tx batch has %d data bytes but no tx sizes", len(batch.Data))As per coding guidelines: "Validate all inputs from external sources in Go code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@execution/grpc/tx_batch.go` around lines 44 - 52, The function decodeTxBatch
should not treat a nil *pb.TxBatch as an explicit empty batch; change the
handling in decodeTxBatch so that when batch == nil it returns a non-nil error
(e.g., "missing tx_batch" or similar) instead of (nil, nil), leaving the
existing behavior for an actual empty struct (&pb.TxBatch{}) intact; update
callers such as ExecuteTxs / FilterTxs (if they rely on nil meaning empty) to
propagate or handle this error accordingly so missing/omitted tx_batch from
clients is rejected rather than treated as zero transactions.
Overview
Flatten tx batch to avoid allocations and add in a unix socket over grpc to avoid tcp overhead
Summary by CodeRabbit
New Features
Breaking Changes
Documentation
Tests