Skip to content

[SharovBot] engineapi: fix flaky tests caused by TOCTOU port races#20328

Merged
yperbasis merged 7 commits intomainfrom
fix/engineapi-port-race
Apr 29, 2026
Merged

[SharovBot] engineapi: fix flaky tests caused by TOCTOU port races#20328
yperbasis merged 7 commits intomainfrom
fix/engineapi-port-race

Conversation

@erigon-copilot
Copy link
Copy Markdown
Contributor

@erigon-copilot erigon-copilot Bot commented Apr 4, 2026

Problem

Tests in execution/engineapi were failing intermittently in the race-tests / tests-linux (ubuntu-latest, execution-other) CI job:

  • TestEngineApiHighGasContractsFillBlocklisten tcp 127.0.0.1:46053: bind: address already in use
  • TestEngineApiBlockGasOverflowSpillsToNextBlockempty response from JSON-RPC server

Observed on commits a9db702 and acc9898 (2 out of 10 recent commits).

Root cause: The test infrastructure used freeport.NextFreePort() to pick a free port, then later called net.Listen() with that port. This TOCTOU (time-of-check/time-of-use) race meant another concurrent test could bind the same port in between, causing address already in use. Separately, the server might not be ready when the first client request arrived, causing empty response from JSON-RPC server.

Fix

Pre-create net.Listener instances on port :0 (kernel-assigned, guaranteed free and already bound) before any server starts, then hand the bound socket directly to StartHTTPEndpoint. The kernel guarantees no other process can steal the port once it's bound.

Changes:

  • node/endpoints.go: add optional Listener field to HttpEndpointConfig; if set, skip the net.Listen call and use the pre-bound socket
  • cmd/rpcdaemon/cli/httpcfg/http_cfg.go: add HttpListener / AuthRpcListener fields to HttpCfg
  • cmd/rpcdaemon/cli/config.go: wire up pre-created listeners in both the regular RPC and engine API listener paths
  • execution/engineapi/engineapitester/engine_api_tester.go: replace freeport.NextFreePort() with net.Listen("tcp", "127.0.0.1:0") for the JSON-RPC and engine API ports

Testing

  • go build ./...
  • No *_test.go files modified ✅
  • 5 consecutive go test -race ./execution/engineapi/... runs all pass ✅

CI job reference: https://github.com/erigontech/erigon/actions/runs/23983504381/job/69951584262

Tests in execution/engineapi were flaky due to TOCTOU port races:
1. freeport.NextFreePort() selected a port as free, but between selection
   and bind, another test could bind the same port → 'address already in use'
2. The server might not be ready when the first request arrives →
   'empty response from JSON-RPC server'

Fix: pre-create net.Listener instances on port 0 (kernel-assigned, guaranteed
free and already bound) and pass them through HttpCfg → StartHTTPEndpoint so
the server takes ownership of an already-bound socket.

Changes:
- node/endpoints.go: add optional Listener field to HttpEndpointConfig
- cmd/rpcdaemon/cli/httpcfg/http_cfg.go: add HttpListener/AuthRpcListener fields
- cmd/rpcdaemon/cli/config.go: wire up pre-created listeners in both RPC paths
- execution/engineapi/engineapitester/engine_api_tester.go: use net.Listen(:0)
  for JSON-RPC and engineApi ports instead of freeport.NextFreePort()

CI job: https://github.com/erigontech/erigon/actions/runs/23983504381/job/69951584262

Co-authored-by: Giulio Rebuffo <giulio.rebuffo@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes flaky execution/engineapi tests by eliminating TOCTOU port selection races via pre-bound kernel-assigned listeners (:0) and wiring those listeners through the RPC daemon HTTP endpoint startup path.

Changes:

  • Add support for passing a pre-created net.Listener into node.StartHTTPEndpoint to avoid re-binding races.
  • Extend rpcdaemon HTTP config to optionally carry pre-bound listeners for regular HTTP RPC and Engine API (auth RPC).
  • Update the Engine API test harness to use net.Listen("tcp", "127.0.0.1:0") instead of freeport.NextFreePort() for JSON-RPC and Engine API ports.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
node/endpoints.go Allows StartHTTPEndpoint to use an already-bound listener instead of calling net.Listen.
cmd/rpcdaemon/cli/httpcfg/http_cfg.go Adds optional listener fields to the HTTP config for test-driven pre-binding.
cmd/rpcdaemon/cli/config.go Wires config-provided listeners into the regular RPC and Engine API HTTP endpoint startup.
execution/engineapi/engineapitester/engine_api_tester.go Switches engineapi tester to pre-bind ephemeral ports and pass listeners into rpcdaemon config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread node/endpoints.go Outdated
Comment thread execution/engineapi/engineapitester/engine_api_tester.go Outdated
Comment thread execution/engineapi/engineapitester/engine_api_tester.go Outdated
@yperbasis yperbasis requested a review from anacrolix April 7, 2026 12:16
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues

  1. Sentry port still has a TOCTOU race (engine_api_tester.go:199-202)

The sentry listener is created, its port extracted, then explicitly closed before node.New binds to it:

// Close the sentry listener to release the port before the sentry binds to it.
// The listener was only held open to reserve the port and prevent TOCTOU races.
sentryListener.Close()
ethNode, err := node.New(ctx, &nodeConfig, logger)

This re-introduces the same class of race the PR aims to fix — between Close() and the sentry's internal net.Listen, another concurrent test can grab that port. The window is narrower
than freeport, but the fundamental problem remains. The comment is misleading because releasing the listener breaks the reservation it claims to provide.

This is an inherent limitation (the P2P stack doesn't accept pre-created listeners), but the comment should be honest about it:

// Close the sentry listener to release the port for the P2P stack, which doesn't
// accept pre-created listeners. A narrow TOCTOU window remains but is acceptable
// since sentry bind failures weren't observed in the flaky test reports.

  1. TestingEnabled field is dead code (http_cfg.go:118-119)

// TestingEnabled enables the testing_ RPC namespace. Should only be used in test/dev environments.
TestingEnabled bool

This field is added but never read or set anywhere in the PR or codebase. Should be removed — it appears to be leftover from a broader draft.

  1. "127.0.0.1:0" not extracted to a const (engine_api_tester.go:128,132,136)

anacrolix explicitly requested this in a review comment ("also abstract 127.0.0.1:0 into a const"). The string is repeated 3 times but was not extracted. Simple fix:

const localhostEphemeral = "127.0.0.1:0"

  1. Commit message hygiene

The second commit is titled save. This should be squash-merged or given a descriptive message (engineapi: address review — add t.Cleanup, pre-bind sentry, fix typo) before merge.

Minor Notes

  • The t.Cleanup for sentryListener at line 130 is a no-op — the listener is already explicitly closed at line 201, and the cleanup's _ = sentryListener.Close() silently discards the
    "use of closed network connection" error. Not harmful, but slightly misleading about lifecycle ownership.
  • The t.Cleanup closers for jsonRpcListener and engineApiListener will double-close after http.Server.Shutdown already closes them. Again harmless (errors ignored), but worth a
    comment clarifying these are safety nets.
  • freeport is still imported by txnprovider/shutter/block_building_integration_test.go, so the package can't be removed yet. Worth noting for a follow-up.

@yperbasis yperbasis added this to the 3.5.0 milestone Apr 20, 2026
anacrolix and others added 2 commits April 29, 2026 09:51
- httpcfg: remove dead TestingEnabled field
- node/endpoints: fix err shadowing in StartHTTPEndpoint else branch
- engineapitester: extract "127.0.0.1:0" into localhostEphemeral const
- engineapitester: rewrite sentry close comment to be honest about the
  remaining narrow TOCTOU window (the P2P stack doesn't accept
  pre-created listeners)
- engineapitester: clarify that the listener t.Cleanup calls are safety
  nets and a no-op on the happy path

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread execution/engineapi/engineapitester/engine_api_tester.go Outdated
Comment thread execution/engineapi/engineapitester/engine_api_tester.go Outdated
The sentry/P2P stack already supports kernel-assigned ports via
ListenAddr=":0" and AllowedPorts=[]uint{0}, so the pre-bind+close
dance for the sentry listener was unnecessary and left a narrow
TOCTOU window (between Close and the sentry's internal net.Listen).

Drop the sentry pre-bind/close entirely and let p2p.Server pick
the port itself. This also addresses both new Copilot review
comments in one change (the ignored-error on sentryListener.Close
becomes moot since the line is gone).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yperbasis yperbasis added this pull request to the merge queue Apr 29, 2026
Merged via the queue into main with commit 109d9d9 Apr 29, 2026
37 checks passed
@yperbasis yperbasis deleted the fix/engineapi-port-race branch April 29, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants