Fix test flakes from port allocation races#847
Conversation
|
🎉 This PR is now ready for review! |
b068c07 to
efd2b64
Compare
tnull
left a comment
There was a problem hiding this comment.
Okay, I think we considered that before but you were of the opinion it wouldn't work? Any case, open to test whether it actually makes a difference (I'm not positive the flakes are actually stemming from this at all).
Just one nit.
efd2b64 to
39c79bc
Compare
I think I suggested it in #718 (comment), but not sure if I was of the opinion that it wouldn't work. But yes, let's run with this for a while, see if it makes a difference. My naive repro prs failed to repro, so no quick way to test this I think. |
Integration tests allocate listening ports for test nodes by binding to 127.0.0.1:0, reading the OS-assigned port, then immediately dropping the socket. Later, Node::start() tries to bind to that same port. This has two problems: Parallel test collisions: with 35 tests running concurrently, each creating 2-4 nodes with 2 ports each, the OS can reassign a freed port to another test's node before the original node calls start(). This is a classic TOCTOU race. In CI, it caused ~50% of Rust test runs to fail with InvalidSocketAddress, always exactly one random test per run. Restart instability: when a test stops and restarts a node, the port is released during stop() and must be re-acquired during start(). Another test's node can grab it in between. The peer store still has the old address, so auto-reconnection also fails. Both problems are inherent to any scheme that allocates a port and later releases it. The only way to guarantee a port stays yours is to never release it, or to never share the port space. A deterministic atomic counter starting at a fixed base port (20000) solves both problems: each fetch_add returns a unique value, so no two nodes in the same process ever get the same port, and ports are stable across restarts because the same node keeps the same config. There is no external contention because CI runners are isolated. AI tools were used in preparing this commit.
39c79bc to
a51c2bd
Compare
No description provided.