[SharovBot] engineapi: fix flaky tests caused by TOCTOU port races#20328
[SharovBot] engineapi: fix flaky tests caused by TOCTOU port races#20328
Conversation
Tests in execution/engineapi were flaky due to TOCTOU port races: 1. freeport.NextFreePort() selected a port as free, but between selection and bind, another test could bind the same port → 'address already in use' 2. The server might not be ready when the first request arrives → 'empty response from JSON-RPC server' Fix: pre-create net.Listener instances on port 0 (kernel-assigned, guaranteed free and already bound) and pass them through HttpCfg → StartHTTPEndpoint so the server takes ownership of an already-bound socket. Changes: - node/endpoints.go: add optional Listener field to HttpEndpointConfig - cmd/rpcdaemon/cli/httpcfg/http_cfg.go: add HttpListener/AuthRpcListener fields - cmd/rpcdaemon/cli/config.go: wire up pre-created listeners in both RPC paths - execution/engineapi/engineapitester/engine_api_tester.go: use net.Listen(:0) for JSON-RPC and engineApi ports instead of freeport.NextFreePort() CI job: https://github.com/erigontech/erigon/actions/runs/23983504381/job/69951584262 Co-authored-by: Giulio Rebuffo <giulio.rebuffo@gmail.com>
There was a problem hiding this comment.
Pull request overview
Fixes flaky execution/engineapi tests by eliminating TOCTOU port selection races via pre-bound kernel-assigned listeners (:0) and wiring those listeners through the RPC daemon HTTP endpoint startup path.
Changes:
- Add support for passing a pre-created
net.Listenerintonode.StartHTTPEndpointto avoid re-binding races. - Extend
rpcdaemonHTTP config to optionally carry pre-bound listeners for regular HTTP RPC and Engine API (auth RPC). - Update the Engine API test harness to use
net.Listen("tcp", "127.0.0.1:0")instead offreeport.NextFreePort()for JSON-RPC and Engine API ports.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| node/endpoints.go | Allows StartHTTPEndpoint to use an already-bound listener instead of calling net.Listen. |
| cmd/rpcdaemon/cli/httpcfg/http_cfg.go | Adds optional listener fields to the HTTP config for test-driven pre-binding. |
| cmd/rpcdaemon/cli/config.go | Wires config-provided listeners into the regular RPC and Engine API HTTP endpoint startup. |
| execution/engineapi/engineapitester/engine_api_tester.go | Switches engineapi tester to pre-bind ephemeral ports and pass listeners into rpcdaemon config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
yperbasis
left a comment
There was a problem hiding this comment.
Issues
- Sentry port still has a TOCTOU race (engine_api_tester.go:199-202)
The sentry listener is created, its port extracted, then explicitly closed before node.New binds to it:
// Close the sentry listener to release the port before the sentry binds to it.
// The listener was only held open to reserve the port and prevent TOCTOU races.
sentryListener.Close()
ethNode, err := node.New(ctx, &nodeConfig, logger)
This re-introduces the same class of race the PR aims to fix — between Close() and the sentry's internal net.Listen, another concurrent test can grab that port. The window is narrower
than freeport, but the fundamental problem remains. The comment is misleading because releasing the listener breaks the reservation it claims to provide.
This is an inherent limitation (the P2P stack doesn't accept pre-created listeners), but the comment should be honest about it:
// Close the sentry listener to release the port for the P2P stack, which doesn't
// accept pre-created listeners. A narrow TOCTOU window remains but is acceptable
// since sentry bind failures weren't observed in the flaky test reports.
- TestingEnabled field is dead code (http_cfg.go:118-119)
// TestingEnabled enables the testing_ RPC namespace. Should only be used in test/dev environments.
TestingEnabled bool
This field is added but never read or set anywhere in the PR or codebase. Should be removed — it appears to be leftover from a broader draft.
- "127.0.0.1:0" not extracted to a const (engine_api_tester.go:128,132,136)
anacrolix explicitly requested this in a review comment ("also abstract 127.0.0.1:0 into a const"). The string is repeated 3 times but was not extracted. Simple fix:
const localhostEphemeral = "127.0.0.1:0"
- Commit message hygiene
The second commit is titled save. This should be squash-merged or given a descriptive message (engineapi: address review — add t.Cleanup, pre-bind sentry, fix typo) before merge.
Minor Notes
- The t.Cleanup for sentryListener at line 130 is a no-op — the listener is already explicitly closed at line 201, and the cleanup's _ = sentryListener.Close() silently discards the
"use of closed network connection" error. Not harmful, but slightly misleading about lifecycle ownership. - The t.Cleanup closers for jsonRpcListener and engineApiListener will double-close after http.Server.Shutdown already closes them. Again harmless (errors ignored), but worth a
comment clarifying these are safety nets. - freeport is still imported by txnprovider/shutter/block_building_integration_test.go, so the package can't be removed yet. Worth noting for a follow-up.
- httpcfg: remove dead TestingEnabled field - node/endpoints: fix err shadowing in StartHTTPEndpoint else branch - engineapitester: extract "127.0.0.1:0" into localhostEphemeral const - engineapitester: rewrite sentry close comment to be honest about the remaining narrow TOCTOU window (the P2P stack doesn't accept pre-created listeners) - engineapitester: clarify that the listener t.Cleanup calls are safety nets and a no-op on the happy path Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The sentry/P2P stack already supports kernel-assigned ports via
ListenAddr=":0" and AllowedPorts=[]uint{0}, so the pre-bind+close
dance for the sentry listener was unnecessary and left a narrow
TOCTOU window (between Close and the sentry's internal net.Listen).
Drop the sentry pre-bind/close entirely and let p2p.Server pick
the port itself. This also addresses both new Copilot review
comments in one change (the ignored-error on sentryListener.Close
becomes moot since the line is gone).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Problem
Tests in
execution/engineapiwere failing intermittently in therace-tests / tests-linux (ubuntu-latest, execution-other)CI job:TestEngineApiHighGasContractsFillBlock→listen tcp 127.0.0.1:46053: bind: address already in useTestEngineApiBlockGasOverflowSpillsToNextBlock→empty response from JSON-RPC serverObserved on commits a9db702 and acc9898 (2 out of 10 recent commits).
Root cause: The test infrastructure used
freeport.NextFreePort()to pick a free port, then later callednet.Listen()with that port. This TOCTOU (time-of-check/time-of-use) race meant another concurrent test could bind the same port in between, causingaddress already in use. Separately, the server might not be ready when the first client request arrived, causingempty response from JSON-RPC server.Fix
Pre-create
net.Listenerinstances on port:0(kernel-assigned, guaranteed free and already bound) before any server starts, then hand the bound socket directly toStartHTTPEndpoint. The kernel guarantees no other process can steal the port once it's bound.Changes:
node/endpoints.go: add optionalListenerfield toHttpEndpointConfig; if set, skip thenet.Listencall and use the pre-bound socketcmd/rpcdaemon/cli/httpcfg/http_cfg.go: addHttpListener/AuthRpcListenerfields toHttpCfgcmd/rpcdaemon/cli/config.go: wire up pre-created listeners in both the regular RPC and engine API listener pathsexecution/engineapi/engineapitester/engine_api_tester.go: replacefreeport.NextFreePort()withnet.Listen("tcp", "127.0.0.1:0")for the JSON-RPC and engine API portsTesting
go build ./...✅*_test.gofiles modified ✅go test -race ./execution/engineapi/...runs all pass ✅CI job reference: https://github.com/erigontech/erigon/actions/runs/23983504381/job/69951584262