Skip to content

Fix MultiListBlockingPopTest flake on Windows CI#1851

Merged
badrishc merged 1 commit into
mainfrom
badrishc/fix-multilist-blocking-pop-flake
Jun 3, 2026
Merged

Fix MultiListBlockingPopTest flake on Windows CI#1851
badrishc merged 1 commit into
mainfrom
badrishc/fix-multilist-blocking-pop-flake

Conversation

@badrishc
Copy link
Copy Markdown
Collaborator

@badrishc badrishc commented Jun 3, 2026

Problem

MultiListBlockingPopTest("BLPOP","RPUSH",0.5d) intermittently fails on Windows CI with:

Failed Garnet.test.RespBlockingCollectionTests.MultiListBlockingPopTest("BLPOP","RPUSH",0.5d) [1 m]
Error Message: Items not retrieved in allotted time.

Root cause

The test runs two concurrent tasks for 64 iterations:

  • blockingTask: BLPOP key 0.5 then Task.Delay(20-100ms)
  • releasingTask: RPUSH key value then Task.Delay(20-100ms)

Both tasks drift independently — expected drift after 64 iterations is roughly sqrt(64) * 20ms ≈ 160ms. On a slow Windows runner the drift can exceed 500 ms, causing one of the BLPOPs to actually time out before the matching RPUSH lands.

When BLPOP times out, Garnet writes a null array *-1\r\n (ListCommands.cs:301WriteNullArray). That is one RESP token, but the test invoked LightClient.SendCommand("BLPOP …", 3), telling numPendingRequests += 3. LightClient.CompletePendingRequests (LightClient.cs:272) defaults to an infinite deadline and spins until numPendingRequests == 0, which never happens — so the test client hangs on that iteration. The outer 60 s wall fires and produces the "Items not retrieved in allotted time" failure instead of an assertion mismatch.

This is racy and Windows-specific because Linux scheduling is tight enough that drift stays well under 500 ms; the test passed 3/3 times locally on Linux during repro.

Fix

Bump the finite-timeout variants from 0.5 s to 10 s:

-[TestCase("BRPOP", "LPUSH", 0.5)]
+[TestCase("BRPOP", "LPUSH", 10)]
-[TestCase("BLPOP", "RPUSH", 0.5)]
+[TestCase("BLPOP", "RPUSH", 10)]

10 seconds is comfortably above any plausible scheduler drift on slow CI runners while still exercising the finite-timeout BLPOP/BRPOP code path (vs the 0 = block-forever path that the other two variants cover). The outer 60 s budget still bounds the test, so a real broker stall would still surface.

Verification

All 4 test variants pass 3/3 runs locally on Linux (~16 s each).

The 0.5s BLPOP/BRPOP timeout in MultiListBlockingPopTest was too tight for
Windows CI scheduling. Both tasks issue 64 RPUSH/BLPOP pairs with independent
random 20-100ms delays, accumulating expected drift of sqrt(64)*sigma ~= 160ms
between them. On a slow Windows runner, drift > 500ms can cause the BLPOP to
time out before the matching RPUSH arrives.

When BLPOP times out, Garnet writes a null array (*-1\r\n, one token), but
LightClient.SendCommand is waiting for 3 tokens (success-path *2\r\n$..\r\n$..\r\n).
This mismatch makes the client spin forever in CompletePendingRequests
(default infinite timeout), so the test hits its outer 60s wall and reports
"Items not retrieved in allotted time." instead of an assertion mismatch.

Bump the finite-timeout variants from 0.5s to 10s: well above OS scheduler
noise on slow CI runners while still exercising the finite-timeout BLPOP path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 3, 2026 05:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR stabilizes the RespBlockingCollectionTests.MultiListBlockingPopTest on Windows CI by increasing the finite BLPOP/BRPOP timeout used in the test cases, reducing the likelihood of scheduling drift causing the blocking pop to time out before the matching push occurs.

Changes:

  • Increased the finite-timeout test cases for BRPOP and BLPOP from 0.5 seconds to 10 seconds.
  • Left the 0 (block-forever) variants unchanged to continue covering the infinite-blocking path.

@badrishc badrishc merged commit 1bee34e into main Jun 3, 2026
283 of 284 checks passed
@badrishc badrishc deleted the badrishc/fix-multilist-blocking-pop-flake branch June 3, 2026 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants