Fix MultiListBlockingPopTest flake on Windows CI#1851
Merged
Conversation
The 0.5s BLPOP/BRPOP timeout in MultiListBlockingPopTest was too tight for Windows CI scheduling. Both tasks issue 64 RPUSH/BLPOP pairs with independent random 20-100ms delays, accumulating expected drift of sqrt(64)*sigma ~= 160ms between them. On a slow Windows runner, drift > 500ms can cause the BLPOP to time out before the matching RPUSH arrives. When BLPOP times out, Garnet writes a null array (*-1\r\n, one token), but LightClient.SendCommand is waiting for 3 tokens (success-path *2\r\n$..\r\n$..\r\n). This mismatch makes the client spin forever in CompletePendingRequests (default infinite timeout), so the test hits its outer 60s wall and reports "Items not retrieved in allotted time." instead of an assertion mismatch. Bump the finite-timeout variants from 0.5s to 10s: well above OS scheduler noise on slow CI runners while still exercising the finite-timeout BLPOP path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR stabilizes the RespBlockingCollectionTests.MultiListBlockingPopTest on Windows CI by increasing the finite BLPOP/BRPOP timeout used in the test cases, reducing the likelihood of scheduling drift causing the blocking pop to time out before the matching push occurs.
Changes:
- Increased the finite-timeout test cases for
BRPOPandBLPOPfrom0.5seconds to10seconds. - Left the
0(block-forever) variants unchanged to continue covering the infinite-blocking path.
vazois
approved these changes
Jun 3, 2026
TedHartMS
approved these changes
Jun 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
MultiListBlockingPopTest("BLPOP","RPUSH",0.5d)intermittently fails on Windows CI with:Root cause
The test runs two concurrent tasks for 64 iterations:
BLPOP key 0.5thenTask.Delay(20-100ms)RPUSH key valuethenTask.Delay(20-100ms)Both tasks drift independently — expected drift after 64 iterations is roughly
sqrt(64) * 20ms ≈ 160ms. On a slow Windows runner the drift can exceed 500 ms, causing one of the BLPOPs to actually time out before the matching RPUSH lands.When BLPOP times out, Garnet writes a null array
*-1\r\n(ListCommands.cs:301→WriteNullArray). That is one RESP token, but the test invokedLightClient.SendCommand("BLPOP …", 3), tellingnumPendingRequests += 3.LightClient.CompletePendingRequests(LightClient.cs:272) defaults to an infinite deadline and spins untilnumPendingRequests == 0, which never happens — so the test client hangs on that iteration. The outer 60 s wall fires and produces the "Items not retrieved in allotted time" failure instead of an assertion mismatch.This is racy and Windows-specific because Linux scheduling is tight enough that drift stays well under 500 ms; the test passed 3/3 times locally on Linux during repro.
Fix
Bump the finite-timeout variants from 0.5 s to 10 s:
10 seconds is comfortably above any plausible scheduler drift on slow CI runners while still exercising the finite-timeout BLPOP/BRPOP code path (vs the
0= block-forever path that the other two variants cover). The outer 60 s budget still bounds the test, so a real broker stall would still surface.Verification
All 4 test variants pass 3/3 runs locally on Linux (~16 s each).