Fix flaky pending-messages-modified E2E test across SDKs#1362
Conversation
The pending-messages-modified event-fidelity test was failing intermittently in the C# SDK CI leg. The root cause was that this test was the only one in its fixture not using the standard `SendAndWaitAsync` + event-collector pattern; it went through a custom helper that did two independently-timed awaits and used an `async void` local function for backfilling existing messages. Refactor the test to the standard pattern in C#, Node, Python, and Go, and while here, replace the `async void` (and its JS analog,`new Promise(async (...) => ...)`) with proper `async Task` / `async` functions drained deterministically. The C# helper now has zero `async void` methods. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR fixes intermittent flakiness in the cross-SDK “pending messages modified” event-fidelity E2E test by refactoring the test flow to a consistent “subscribe/collect → sendAndWait → assert against collected events” pattern, and by removing helper implementations that could lose exceptions or race on timing.
Changes:
- Refactors the pending-messages-modified E2E test in Python/Node/Go/.NET to use
send_and_wait/sendAndWait/SendAndWaitand then inspect the collected events. - Hardens helper logic in .NET and Node by eliminating
async void/new Promise(async ...)-style patterns that can drop exceptions or create timing races. - Cleans up now-unused imports and simplifies event acquisition logic in the affected tests.
Show a summary per file
| File | Description |
|---|---|
| python/e2e/test_event_fidelity_e2e.py | Switches the flaky test to send_and_wait + event list collection, consistent with other tests in the file. |
| nodejs/test/e2e/harness/sdkTestHelper.ts | Reworks getFinalAssistantMessage to avoid new Promise(async ...) and makes the “existing vs future” logic fully async/await. |
| nodejs/test/e2e/event_fidelity.e2e.test.ts | Refactors the flaky test to sendAndWait + collected events, removing the split await pattern. |
| go/internal/e2e/event_fidelity_e2e_test.go | Refactors the flaky test to use SendAndWait and a mutex-protected event snapshot instead of a timeout channel wait. |
| dotnet/test/Harness/TestHelper.cs | Replaces an async void backfill with an observable Task and drains it deterministically to avoid outliving disposals. |
| dotnet/test/E2E/EventFidelityE2ETests.cs | Refactors the flaky test to SendAndWaitAsync + a single event collector, matching the fixture’s dominant pattern. |
Copilot's findings
Comments suppressed due to low confidence (1)
dotnet/test/Harness/TestHelper.cs:89
- GetFinalAssistantMessageAsync uses a CancellationTokenSource for timeouts, but the backfill task (CheckExistingMessagesAsync → GetExistingMessagesAsync → session.GetEventsAsync()) doesn't receive the cancellation token. Because the method now awaits
backfillinfinally, a slow/hung GetEventsAsync call can delay (or prevent) the timeout exception from surfacing and potentially hang tests. Consider threadingcts.Tokeninto GetExistingMessagesAsync and passing it to session.GetEventsAsync (and/or otherwise ensuring backfill can't outlive the timeout).
// Backfill from already-delivered messages so we don't lose events that arrived
// between SendAsync returning and the subscription being installed. Run it
// concurrently with the live subscription, but keep the Task observable so any
// exception is propagated through tcs (not the unobserved-task handler) and so
// we can drain it deterministically below.
var backfill = CheckExistingMessagesAsync();
using var registration = cts.Token.Register(
static state => ((TaskCompletionSource<AssistantMessageEvent>)state!).TrySetException(
new TimeoutException("Timeout waiting for assistant message")),
tcs);
try
{
return await tcs.Task;
}
finally
{
// Drain the backfill before our `using` scopes (cts, subscription) dispose.
// Any exception was already routed through tcs above, so swallow here.
try { await backfill.ConfigureAwait(false); } catch { }
}
async Task CheckExistingMessagesAsync()
{
try
{
var (existingFinal, existingIdle) = await GetExistingMessagesAsync(session, alreadyIdle);
lock (stateLock)
- Files reviewed: 6/6 changed files
- Comments generated: 1
- TestHelper.cs: replace bare `catch { }` in the backfill-drain `finally`
with `catch (Exception) { /* intentionally ignored: already propagated
via tcs */ }` to make the swallow intent explicit and satisfy CodeQL.
- sdkTestHelper.ts: update the comment above `getFutureFinalResponse` to
reflect that the live subscription is installed before the existing-
messages RPC fires, matching the actual call ordering.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Thread cts.Token through CheckExistingMessagesAsync into GetExistingMessagesAsync.session.GetEventsAsync so a hung backfill can't delay (or prevent) the TimeoutException from surfacing through the `finally` drain. Flagged by the Copilot PR reviewer. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cross-SDK Consistency Review ✅This PR applies the same fix across all four SDK implementations (Node.js, Python, Go, and .NET), which is exactly the right approach for test consistency. Pattern applied uniformly:
Additional cleanups are also consistent:
No cross-SDK inconsistencies found. All changes are in test/harness code, not public API surface.
|
The
Should_Emit_Pending_Messages_Modified_Event_When_Message_Queue_Changesevent-fidelity test was failing intermittently in the C# SDK CI leg (runs/26224349644). It was the only test in its fixture not using the standardSendAndWaitAsync+ event-collector pattern, instead going through a custom helper that did two independently-timed awaits and used anasync voidlocal function to backfill existing messages -- a combination prone to timing races and lost exceptions.Approach
Refactor the test to the standard pattern (subscribe,
SendAndWait, then filter the collected events) in all four affected SDKs so the same fix doesn't have to be rediscovered language-by-language. While here, eliminate the underlying anti-patterns the bug rode in on:TestHelper.GetFinalAssistantMessageAsync): replaceasync void CheckExistingMessageswithasync Task CheckExistingMessagesAsync, drained deterministically in afinallyso the backfill cannot outlive the enclosingusingdisposals. The dotnet/ tree now has zeroasync voidmethods.sdkTestHelper.ts): replacenew Promise<T>(async (resolve, reject) => ...)-- the JS analog ofasync void, where synchronous throws inside the executor are silently lost -- with idiomaticasync/awaitingetFinalAssistantMessageandgetExistingFinalResponse.Validation
All four EventFidelity E2E suites pass after the change. Full E2E suites also clean:
pending_work_resume.e2e.test.tsare a known local-env auth race; not touched by this PR)Lints/typechecks clean in all four languages; unused imports cleaned up.
PR description generated by Copilot.