Skip to content

test: increase ReconnectFarm timeout from 30s to 60s#26583

Closed
frankmueller-msft wants to merge 1 commit intomainfrom
fix/reconnect-farm-test-timeout
Closed

test: increase ReconnectFarm timeout from 30s to 60s#26583
frankmueller-msft wants to merge 1 commit intomainfrom
fix/reconnect-farm-test-timeout

Conversation

@frankmueller-msft
Copy link
Contributor

Summary

Increases the timeout for the ReconnectFarm stress tests from 30s to 60s. These tests run randomized merge-tree operations with reconnection and can intermittently exceed the 30s timeout on CI machines under load, causing flaky Timeout of 30000ms exceeded failures.

Observed in build 379358: MergeTree.Client > ReconnectFarm_2_0 timed out with 1554 other tests passing.

Coverage audit

No change to what the test validates — same operations, same assertions, same seed-based determinism. Only the timeout threshold changed.

Test plan

  • Change is a single constant: 30 * 100060 * 1000

🤖 Generated with Claude Code

The ReconnectFarm stress test runs randomized merge-tree operations
with reconnection and can intermittently exceed the 30s timeout on
CI machines under load. Doubling the timeout reduces flaky failures
without changing what the test validates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 27, 2026 05:21
@frankmueller-msft frankmueller-msft requested a review from a team as a code owner February 27, 2026 05:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces CI flakiness in the MergeTree reconnect stress fuzz tests by increasing the Mocha timeout threshold for ReconnectFarm from 30s to 60s, accommodating slower CI machines under load without changing test behavior.

Changes:

  • Increase ReconnectFarm test timeout from 30 * 1000 to 60 * 1000.

@anthony-murphy
Copy link
Contributor

anthony-murphy commented Feb 27, 2026

what is the impetus for this change? I'm not aware of these tests being flaky in the PR build or CI? is there any associated issue? In general, i don't like just increasing timeouts, as it can lead to performance regression over time.

i managed to find 379358, but that is a topic branch run, not a run on main. i think we need more evidence that this is all a net good, before we make changes. I worry that our build machines are under resourced for the parallelism, and we are trading time for stability, which is not a good trade off.

@frankmueller-msft
Copy link
Contributor Author

Fair points. To give context: the timeout failure in build 379358 was on the ci/combined-pipeline-parallelization topic branch where I was developing #26586 (splitting DDS coverage shards). That PR runs the farm/fuzz tests with mocha --parallel, which adds CPU contention and could push these long-running fuzz tests closer to the 30s boundary.

I haven't seen this fail on main, so the evidence for a standalone timeout increase is weak. The right path is probably to close this and monitor after #26586 merges — if --parallel causes ReconnectFarm timeouts, the timeout increase should be part of that PR, not standalone.

Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants