E2E test infrastructure improvements#26
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves E2E test infrastructure with two main enhancements: preventing corrupted snapshot writes on test failures and updating snapshots to reflect the CLI's new behavior of coalescing tool calls into single assistant messages for Anthropic extended thinking compatibility.
Changes:
- Implements test failure detection across all SDK languages to prevent writing corrupted snapshots
- Updates snapshot files to include both old (separate assistant messages) and new (coalesced tool calls) conversation formats
- Modifies proxy stop endpoints to accept a
skipWritingCacheparameter
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/snapshots/tools/invokes_built_in_tools.yaml | Adds new conversation format with coalesced tool calls and updates response content |
| test/snapshots/session/should_create_session_with_custom_tool.yaml | Removes intermediate assistant content message |
| test/snapshots/permissions/*.yaml | Adds new conversation snapshots with coalesced tool calls for all permission tests |
| test/harness/replayingCapiProxy.ts | Adds skipWritingCache query parameter support to /stop endpoint |
| python/e2e/testharness/proxy.py | Adds skip_writing_cache parameter to stop() method with documentation |
| python/e2e/testharness/context.py | Adds test_failed parameter to teardown() to control snapshot writing |
| python/e2e/conftest.py | Implements pytest hook to track test failures globally |
| nodejs/test/e2e/harness/sdkTestContext.ts | Implements vitest onTestFailed hook to track failures |
| nodejs/test/e2e/harness/CapiProxy.ts | Adds skipWritingCache parameter to stop() method |
| go/e2e/testharness/proxy.go | Adds StopWithOptions() method with skipWritingCache support |
| go/e2e/testharness/context.go | Uses t.Failed() to detect test failures in cleanup |
| dotnet/test/Harness/E2ETestContext.cs | Uses CI environment variable as workaround for test failure detection |
| dotnet/test/Harness/CapiProxy.cs | Adds skipWritingCache parameter to StopAsync() method |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Track if any test failed to avoid writing corrupted snapshots | ||
| _any_test_failed = False |
There was a problem hiding this comment.
The global _any_test_failed variable is not thread-safe. If pytest runs tests in parallel (e.g., with pytest-xdist), multiple test processes could have race conditions. Consider using pytest's built-in mechanisms or a thread-safe approach to track test failures.
| // Skip writing snapshots in CI to avoid corrupting them on test failures | ||
| var isCI = !string.IsNullOrEmpty(Environment.GetEnvironmentVariable("CI")); | ||
| await _proxy.StopAsync(skipWritingCache: isCI); |
There was a problem hiding this comment.
The .NET implementation uses a CI environment variable check as a workaround for lack of test failure detection, while other languages (Go, Python, Node.js) properly detect actual test failures. This means snapshots will always be skipped in CI even for passing tests, which differs from the behavior in other languages and may not align with the PR's stated goal of only skipping on failure. Consider using a more sophisticated approach like capturing test status through ITestOutputHelper or other xUnit extensibility points.
| // Skip writing snapshots in CI to avoid corrupting them on test failures | |
| var isCI = !string.IsNullOrEmpty(Environment.GetEnvironmentVariable("CI")); | |
| await _proxy.StopAsync(skipWritingCache: isCI); | |
| await _proxy.StopAsync(skipWritingCache: false); |
| options.requestOptions.path?.startsWith("/stop") && | ||
| options.requestOptions.method === "POST" | ||
| ) { | ||
| const skipWritingCache = options.requestOptions.path.includes("skipWritingCache=true"); |
There was a problem hiding this comment.
Using string .includes() to parse query parameters is fragile and could match unintended patterns (e.g., ?foo=skipWritingCache=true or ?skipWritingCache=true&bar=1). Consider using a proper URL query parameter parser like new URLSearchParams() for more robust parameter extraction.
| const skipWritingCache = options.requestOptions.path.includes("skipWritingCache=true"); | |
| let skipWritingCache = false; | |
| const path = options.requestOptions.path; | |
| const queryIndex = path?.indexOf("?"); | |
| if (queryIndex !== undefined && queryIndex !== -1) { | |
| const searchParams = new URLSearchParams(path.substring(queryIndex + 1)); | |
| skipWritingCache = searchParams.get("skipWritingCache") === "true"; | |
| } |
* Expose permissions requests in SDK client * bump copilot version * update package.json * Fix go client * fix lint and formatting * Add dotnet support * format python * Generate stubs * Make the tests work * reformat python * formatting * linting * Remove resume test * Backslashes
Add 10 Java scenario implementations covering: - callbacks: hooks (pre/post tool use, session start/end), permissions - prompts: attachments - sessions: concurrent-sessions, session-resume - tools: custom-agents, tool-overrides, mcp-servers, skills - auth: gh-app (OAuth device flow) All scenarios compile successfully with mvn compile. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Two improvements to E2E test infrastructure:
1. Skip writing snapshots on test failure
Prevents corrupted snapshots from being written when tests fail. Each language uses its native test framework hooks:
2. Update snapshots for Anthropic extended thinking
The CLI now coalesces tool calls into single assistant messages for Anthropic extended thinking compatibility. This updates all affected snapshots to match the new format.
Related
Mirrors changes from https://github.com/github/copilot-agent-runtime/pull/1508