Fix MultiListBlockingPopTest flake on Windows CI by badrishc · Pull Request #1851 · microsoft/garnet

badrishc · 2026-06-03T05:43:39Z

Problem

MultiListBlockingPopTest("BLPOP","RPUSH",0.5d) intermittently fails on Windows CI with:

Failed Garnet.test.RespBlockingCollectionTests.MultiListBlockingPopTest("BLPOP","RPUSH",0.5d) [1 m]
Error Message: Items not retrieved in allotted time.

Root cause

The test runs two concurrent tasks for 64 iterations:

blockingTask: BLPOP key 0.5 then Task.Delay(20-100ms)
releasingTask: RPUSH key value then Task.Delay(20-100ms)

Both tasks drift independently — expected drift after 64 iterations is roughly sqrt(64) * 20ms ≈ 160ms. On a slow Windows runner the drift can exceed 500 ms, causing one of the BLPOPs to actually time out before the matching RPUSH lands.

When BLPOP times out, Garnet writes a null array *-1\r\n (ListCommands.cs:301 → WriteNullArray). That is one RESP token, but the test invoked LightClient.SendCommand("BLPOP …", 3), telling numPendingRequests += 3. LightClient.CompletePendingRequests (LightClient.cs:272) defaults to an infinite deadline and spins until numPendingRequests == 0, which never happens — so the test client hangs on that iteration. The outer 60 s wall fires and produces the "Items not retrieved in allotted time" failure instead of an assertion mismatch.

This is racy and Windows-specific because Linux scheduling is tight enough that drift stays well under 500 ms; the test passed 3/3 times locally on Linux during repro.

Fix

Bump the finite-timeout variants from 0.5 s to 10 s:

-[TestCase("BRPOP", "LPUSH", 0.5)]
+[TestCase("BRPOP", "LPUSH", 10)]
-[TestCase("BLPOP", "RPUSH", 0.5)]
+[TestCase("BLPOP", "RPUSH", 10)]

10 seconds is comfortably above any plausible scheduler drift on slow CI runners while still exercising the finite-timeout BLPOP/BRPOP code path (vs the 0 = block-forever path that the other two variants cover). The outer 60 s budget still bounds the test, so a real broker stall would still surface.

Verification

All 4 test variants pass 3/3 runs locally on Linux (~16 s each).

The 0.5s BLPOP/BRPOP timeout in MultiListBlockingPopTest was too tight for Windows CI scheduling. Both tasks issue 64 RPUSH/BLPOP pairs with independent random 20-100ms delays, accumulating expected drift of sqrt(64)*sigma ~= 160ms between them. On a slow Windows runner, drift > 500ms can cause the BLPOP to time out before the matching RPUSH arrives. When BLPOP times out, Garnet writes a null array (*-1\r\n, one token), but LightClient.SendCommand is waiting for 3 tokens (success-path *2\r\n$..\r\n$..\r\n). This mismatch makes the client spin forever in CompletePendingRequests (default infinite timeout), so the test hits its outer 60s wall and reports "Items not retrieved in allotted time." instead of an assertion mismatch. Bump the finite-timeout variants from 0.5s to 10s: well above OS scheduler noise on slow CI runners while still exercising the finite-timeout BLPOP path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR stabilizes the RespBlockingCollectionTests.MultiListBlockingPopTest on Windows CI by increasing the finite BLPOP/BRPOP timeout used in the test cases, reducing the likelihood of scheduling drift causing the blocking pop to time out before the matching push occurs.

Changes:

Increased the finite-timeout test cases for BRPOP and BLPOP from 0.5 seconds to 10 seconds.
Left the 0 (block-forever) variants unchanged to continue covering the infinite-blocking path.

Copilot AI review requested due to automatic review settings June 3, 2026 05:43

Copilot started reviewing on behalf of badrishc June 3, 2026 05:43 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

vazois approved these changes Jun 3, 2026

View reviewed changes

TedHartMS approved these changes Jun 3, 2026

View reviewed changes

badrishc merged commit 1bee34e into main Jun 3, 2026
283 of 284 checks passed

badrishc deleted the badrishc/fix-multilist-blocking-pop-flake branch June 3, 2026 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MultiListBlockingPopTest flake on Windows CI#1851

Fix MultiListBlockingPopTest flake on Windows CI#1851
badrishc merged 1 commit into
mainfrom
badrishc/fix-multilist-blocking-pop-flake

badrishc commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

badrishc commented Jun 3, 2026

Problem

Root cause

Fix

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants