Skip to content

Fix flaky pending-messages-modified E2E test across SDKs#1362

Merged
stephentoub merged 3 commits into
mainfrom
stephentoub/fix-csharp-sdk-ci-test
May 21, 2026
Merged

Fix flaky pending-messages-modified E2E test across SDKs#1362
stephentoub merged 3 commits into
mainfrom
stephentoub/fix-csharp-sdk-ci-test

Conversation

@stephentoub
Copy link
Copy Markdown
Collaborator

The Should_Emit_Pending_Messages_Modified_Event_When_Message_Queue_Changes event-fidelity test was failing intermittently in the C# SDK CI leg (runs/26224349644). It was the only test in its fixture not using the standard SendAndWaitAsync + event-collector pattern, instead going through a custom helper that did two independently-timed awaits and used an async void local function to backfill existing messages -- a combination prone to timing races and lost exceptions.

Approach

Refactor the test to the standard pattern (subscribe, SendAndWait, then filter the collected events) in all four affected SDKs so the same fix doesn't have to be rediscovered language-by-language. While here, eliminate the underlying anti-patterns the bug rode in on:

  • C# (TestHelper.GetFinalAssistantMessageAsync): replace async void CheckExistingMessages with async Task CheckExistingMessagesAsync, drained deterministically in a finally so the backfill cannot outlive the enclosing using disposals. The dotnet/ tree now has zero async void methods.
  • Node (sdkTestHelper.ts): replace new Promise<T>(async (resolve, reject) => ...) -- the JS analog of async void, where synchronous throws inside the executor are silently lost -- with idiomatic async/await in getFinalAssistantMessage and getExistingFinalResponse.
  • Python / Go: pure test refactor; helpers were already clean.

Validation

All four EventFidelity E2E suites pass after the change. Full E2E suites also clean:

  • C#: refactored test 10/10, EventFidelity 8/8, Session + Tools 54/55 (1 unrelated skip)
  • Node: 280/289 (2 file failures in an unrelated pending_work_resume.e2e.test.ts are a known local-env auth race; not touched by this PR)
  • Python: 300 passed, 7 skipped
  • Go: all e2e tests pass

Lints/typechecks clean in all four languages; unused imports cleaned up.


PR description generated by Copilot.

The pending-messages-modified event-fidelity test was failing intermittently in

the C# SDK CI leg. The root cause was that this test was the only one in its

fixture not using the standard `SendAndWaitAsync` + event-collector pattern;

it went through a custom helper that did two independently-timed awaits and

used an `async void` local function for backfilling existing messages.

Refactor the test to the standard pattern in C#, Node, Python, and Go, and

while here, replace the `async void` (and its JS analog,`new Promise(async

(...) => ...)`) with proper `async Task` / `async` functions drained

deterministically. The C# helper now has zero `async void` methods.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@stephentoub stephentoub requested a review from a team as a code owner May 21, 2026 15:11
Copilot AI review requested due to automatic review settings May 21, 2026 15:11
Comment thread dotnet/test/Harness/TestHelper.cs Fixed
Comment thread dotnet/test/Harness/TestHelper.cs Fixed
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes intermittent flakiness in the cross-SDK “pending messages modified” event-fidelity E2E test by refactoring the test flow to a consistent “subscribe/collect → sendAndWait → assert against collected events” pattern, and by removing helper implementations that could lose exceptions or race on timing.

Changes:

  • Refactors the pending-messages-modified E2E test in Python/Node/Go/.NET to use send_and_wait / sendAndWait / SendAndWait and then inspect the collected events.
  • Hardens helper logic in .NET and Node by eliminating async void / new Promise(async ...)-style patterns that can drop exceptions or create timing races.
  • Cleans up now-unused imports and simplifies event acquisition logic in the affected tests.
Show a summary per file
File Description
python/e2e/test_event_fidelity_e2e.py Switches the flaky test to send_and_wait + event list collection, consistent with other tests in the file.
nodejs/test/e2e/harness/sdkTestHelper.ts Reworks getFinalAssistantMessage to avoid new Promise(async ...) and makes the “existing vs future” logic fully async/await.
nodejs/test/e2e/event_fidelity.e2e.test.ts Refactors the flaky test to sendAndWait + collected events, removing the split await pattern.
go/internal/e2e/event_fidelity_e2e_test.go Refactors the flaky test to use SendAndWait and a mutex-protected event snapshot instead of a timeout channel wait.
dotnet/test/Harness/TestHelper.cs Replaces an async void backfill with an observable Task and drains it deterministically to avoid outliving disposals.
dotnet/test/E2E/EventFidelityE2ETests.cs Refactors the flaky test to SendAndWaitAsync + a single event collector, matching the fixture’s dominant pattern.

Copilot's findings

Comments suppressed due to low confidence (1)

dotnet/test/Harness/TestHelper.cs:89

  • GetFinalAssistantMessageAsync uses a CancellationTokenSource for timeouts, but the backfill task (CheckExistingMessagesAsync → GetExistingMessagesAsync → session.GetEventsAsync()) doesn't receive the cancellation token. Because the method now awaits backfill in finally, a slow/hung GetEventsAsync call can delay (or prevent) the timeout exception from surfacing and potentially hang tests. Consider threading cts.Token into GetExistingMessagesAsync and passing it to session.GetEventsAsync (and/or otherwise ensuring backfill can't outlive the timeout).
        // Backfill from already-delivered messages so we don't lose events that arrived
        // between SendAsync returning and the subscription being installed. Run it
        // concurrently with the live subscription, but keep the Task observable so any
        // exception is propagated through tcs (not the unobserved-task handler) and so
        // we can drain it deterministically below.
        var backfill = CheckExistingMessagesAsync();

        using var registration = cts.Token.Register(
            static state => ((TaskCompletionSource<AssistantMessageEvent>)state!).TrySetException(
                new TimeoutException("Timeout waiting for assistant message")),
            tcs);

        try
        {
            return await tcs.Task;
        }
        finally
        {
            // Drain the backfill before our `using` scopes (cts, subscription) dispose.
            // Any exception was already routed through tcs above, so swallow here.
            try { await backfill.ConfigureAwait(false); } catch { }
        }

        async Task CheckExistingMessagesAsync()
        {
            try
            {
                var (existingFinal, existingIdle) = await GetExistingMessagesAsync(session, alreadyIdle);
                lock (stateLock)
  • Files reviewed: 6/6 changed files
  • Comments generated: 1

Comment thread nodejs/test/e2e/harness/sdkTestHelper.ts Outdated
stephentoub and others added 2 commits May 21, 2026 11:18
- TestHelper.cs: replace bare `catch { }` in the backfill-drain `finally`

  with `catch (Exception) { /* intentionally ignored: already propagated

  via tcs */ }` to make the swallow intent explicit and satisfy CodeQL.

- sdkTestHelper.ts: update the comment above `getFutureFinalResponse` to

  reflect that the live subscription is installed before the existing-

  messages RPC fires, matching the actual call ordering.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Thread cts.Token through CheckExistingMessagesAsync into

GetExistingMessagesAsync.session.GetEventsAsync so a hung backfill can't

delay (or prevent) the TimeoutException from surfacing through the

`finally` drain. Flagged by the Copilot PR reviewer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread dotnet/test/Harness/TestHelper.cs
@github-actions
Copy link
Copy Markdown
Contributor

Cross-SDK Consistency Review ✅

This PR applies the same fix across all four SDK implementations (Node.js, Python, Go, and .NET), which is exactly the right approach for test consistency.

Pattern applied uniformly:

  • Subscribe to events with a simple collector before sending
  • Use sendAndWait / send_and_wait / SendAndWait / SendAndWaitAsync in one round trip
  • Filter the collected events after the call returns

Additional cleanups are also consistent:

  • .NET: async voidasync Task (with deterministic drain in finally)
  • Node.js: new Promise(async ...) anti-pattern → idiomatic async/await
  • Python/Go: pure test refactor, no anti-patterns to clean up

No cross-SDK inconsistencies found. All changes are in test/harness code, not public API surface.

Generated by SDK Consistency Review Agent for issue #1362 · ● 108K ·

@stephentoub stephentoub merged commit a205e69 into main May 21, 2026
37 checks passed
@stephentoub stephentoub deleted the stephentoub/fix-csharp-sdk-ci-test branch May 21, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants