Skip to content

Fix flaky parallel tools test by removing non-deterministic synchronization#161

Merged
edburns merged 2 commits intomainfrom
copilot/investigate-failed-test-run
May 5, 2026
Merged

Fix flaky parallel tools test by removing non-deterministic synchronization#161
edburns merged 2 commits intomainfrom
copilot/investigate-failed-test-run

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 5, 2026

Resolves #158


Before the change?

  • testShouldExecuteMultipleCustomToolsInParallelSingleTurn used CountDownLatch barriers to force both tool handlers to overlap, then released them simultaneously. The completion order was non-deterministic — sometimes toolcall_1 result was sent to the CLI before toolcall_0. The replaying proxy performs strict message-order matching, so when tool results arrived out of snapshot order, it returned 500.

After the change?

  • Test simplified to match the reference implementation approach: tool handlers return immediately without synchronization barriers. Tool results are sent in dispatch order (deterministic). The SDK still executes tools concurrently via its executor — this test just no longer forces timing that causes ordering issues.

Pull request checklist

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been reviewed and added / updated if needed (for bug fixes / features)
  • mvn spotless:apply has been run to format the code
  • mvn clean verify passes locally

Does this introduce a breaking change?

  • Yes
  • No

…zation

The testShouldExecuteMultipleCustomToolsInParallelSingleTurn test used
CountDownLatch barriers to verify that tool handlers overlapped in
execution. This caused a race condition: both handlers completed
simultaneously after the barrier was released, and the order in which
tool results were sent back to the CLI was non-deterministic. When
results arrived in a different order than the snapshot expected
(toolcall_1 before toolcall_0), the proxy returned a 500 error.

The fix simplifies the test to match the reference implementation
approach: tools return immediately, and we verify both tools were
called and the response contains both results. The SDK still dispatches
tools concurrently via its executor; the test just no longer forces
a specific timing that causes ordering issues.

Fixes #158

Co-authored-by: edburns <75821+edburns@users.noreply.github.com>
Copilot AI changed the title [WIP] Investigate and resolve failed test run after merging #157 Fix flaky parallel tools test by removing non-deterministic synchronization May 5, 2026
Copilot finished work on behalf of edburns May 5, 2026 20:00
Copilot AI requested a review from edburns May 5, 2026 20:00
@edburns edburns marked this pull request as ready for review May 5, 2026 20:02
Copilot AI review requested due to automatic review settings May 5, 2026 20:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the E2E tools test suite to remove synchronization that made the parallel-tools snapshot replay flaky. It aims to keep the Java SDK aligned with the reference implementation while making the test deterministic against the strict replaying proxy.

Changes:

  • Simplifies testShouldExecuteMultipleCustomToolsInParallelSingleTurn so both tool handlers return immediately.
  • Removes latch/atomic coordination code used to force handler overlap.
  • Keeps the test focused on successful dual-tool invocation and combined assistant output.
Show a summary per file
File Description
src/test/java/com/github/copilot/sdk/ToolsTest.java Simplifies the parallel custom tools E2E test by removing forced synchronization and overlap assertions.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 1

Comment thread src/test/java/com/github/copilot/sdk/ToolsTest.java
@edburns edburns merged commit 3e874dc into main May 5, 2026
14 checks passed
@edburns edburns deleted the copilot/investigate-failed-test-run branch May 5, 2026 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MAINT]: Investigate and resolved failed test run after merge of #157 Reference implementation sync: 2 new commits (2026-05-05)

3 participants