Skip to content

Harden permission-reject E2E tests across all SDKs (#1194)#1317

Merged
stephentoub merged 1 commit into
mainfrom
stephentoub/fix-dotnet-permission-decision-serializa
May 17, 2026
Merged

Harden permission-reject E2E tests across all SDKs (#1194)#1317
stephentoub merged 1 commit into
mainfrom
stephentoub/fix-dotnet-permission-decision-serializa

Conversation

@stephentoub
Copy link
Copy Markdown
Collaborator

Issue #1194 reported that the .NET SDK serialized PermissionDecision as empty JSON because it instantiated the polymorphic base type. The user-visible bug was already fixed at the codegen layer on main (the [JsonIgnore] was removed from the generated base discriminator property), so no runtime code change is needed. However, none of the existing per-SDK E2E permission tests would have caught the regression, which is what this PR addresses.

Why the existing tests missed it

Every SDK had a "reject permission" E2E test that only asserted the target file was unchanged after a model-driven edit attempt. That is a false-positive-prone check: it passes even when the agent freezes, when the permission decision is silently dropped, or when the wrong discriminator is sent on the wire. It would not have caught the .NET serialization bug.

What changed

Each SDK's reject test now additionally asserts that the CLI emits a tool.execution_complete event whose error message contains "user rejected" (case-insensitive). The CLI emits a kind-specific message for the reject decision ("The user rejected this tool call.") that is distinct from the message used for user-not-available, so this asserts the specific reject discriminator round-tripped end-to-end - exactly the property that was broken in #1194.

The pattern (subscribe to events before sending the prompt, then wait for the matching tool.execution_complete) mirrors what each SDK's "user not available" test already does, so the new code reuses an established idiom in each language.

Files touched (test-only):

  • dotnet/test/E2E/PermissionE2ETests.cs
  • nodejs/test/e2e/permissions.e2e.test.ts
  • python/e2e/test_permissions_e2e.py
  • go/internal/e2e/permissions_e2e_test.go
  • rust/tests/e2e/permissions.rs (also adds an is_user_rejected_tool_completion helper next to the existing is_permission_denied_tool_completion helper)

Each test carries a comment referencing #1194 so future readers see why the event-level assertion exists.

Validation

Ran the targeted reject test in each SDK locally on Windows; all 5 pass:

  • .NET: dotnet test --filter "Should_Deny_Permission_When_Handler_Returns_Denied"
  • Node: npx vitest run test/e2e/permissions.e2e.test.ts -t "should deny permission when handler returns denied"
  • Python: python -m pytest e2e/test_permissions_e2e.py::TestPermissions::test_should_deny_permission_when_handler_returns_denied
  • Go: go test ./internal/e2e -run "TestPermissionsE2E/deny_permission$"
  • Rust: cargo test --features test-support --test e2e should_deny_permission_when_handler_returns_denied

Out of scope

The .NET/Python/Go/Rust public PermissionRequestResult surface still only exposes 4 basic kinds (Approved/Rejected/UserNotAvailable/NoResult); only Node exposes the full polymorphic union with extra fields (session-scope approval rules, location keys, permanent-approval domains, reject feedback). That's a separate enhancement and not addressed here.

The existing "reject permission" E2E test in every SDK asserted only that
the target file was unchanged after a model-driven edit attempt. That is a
false-positive-prone check: it passes even when the agent freezes, when the
permission decision is silently dropped, or when the wrong discriminator is
sent. Specifically, it would not have caught the .NET `PermissionDecision`
empty-JSON regression reported in #1194 (since fixed in codegen).

Strengthen each SDK's reject test to additionally assert that the CLI
emits a `tool.execution_complete` event whose error message contains
"user rejected" (case-insensitive). The CLI emits a kind-specific message
for the reject decision ("The user rejected this tool call.") vs. the
distinct message for user-not-available, so this asserts that the specific
`reject` discriminator round-tripped end-to-end - exactly the property
that was broken in #1194.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@stephentoub stephentoub requested a review from a team as a code owner May 17, 2026 02:20
Copilot AI review requested due to automatic review settings May 17, 2026 02:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Test-only hardening across all SDK languages to ensure the "reject permission" E2E tests catch regressions like #1194, where a permission decision could be silently dropped due to broken discriminator serialization. Each test now asserts that a tool.execution_complete event surfaces a "user rejected" error message, in addition to the existing assertion that the target file was unmodified.

Changes:

  • Add event subscription in each SDK's reject test to detect tool.execution_complete events with a user-rejection error.
  • Add a Rust helper is_user_rejected_tool_completion mirroring the existing is_permission_denied_tool_completion.
  • Comment each change with a reference to issue #1194.
Show a summary per file
File Description
dotnet/test/E2E/PermissionE2ETests.cs Subscribe to events and assert user-rejected tool completion.
go/internal/e2e/permissions_e2e_test.go Add mutex-guarded event listener asserting rejection error.
nodejs/test/e2e/permissions.e2e.test.ts Add event listener and expectation for rejected tool call.
python/e2e/test_permissions_e2e.py Add event listener via pattern match asserting rejection error.
rust/tests/e2e/permissions.rs Subscribe to events and add is_user_rejected_tool_completion helper.

Copilot's findings

  • Files reviewed: 5/5 changed files
  • Comments generated: 0

@github-actions
Copy link
Copy Markdown
Contributor

Cross-SDK Consistency Review ✅

This PR updates all five SDK implementations (Node.js, Python, Go, .NET, Rust) with the same regression check, and the approach is consistent across the board. A few notes:

Consistent across all SDKs:

  • Listener/subscription registered before send_and_wait in every SDK ✅
  • Case-insensitive "user rejected" matching in all five (toLowerCase, lower(), strings.ToLower, StringComparison.OrdinalIgnoreCase, .to_lowercase()) ✅
  • Assertion happens after the send completes ✅
  • Each implementation follows language-idiomatic patterns (Go adds a sync.Mutex for concurrent callback safety; Rust uses the established subscribe() + wait_for_event() pattern; Python uses structural pattern matching) ✅

One pre-existing asymmetry (out of scope, noted for awareness):

The Python on_event handler has more complex error extraction than the other SDKs:

msg = (
    error
    if isinstance(error, str)
    else (getattr(error, "message", None) if error is not None else None)
)

This dual-type handling (error can be a plain str or an object with a .message attribute) suggests a pre-existing Python SDK type difference in ToolExecutionCompleteData.error compared to the other SDKs, which uniformly access .Error.Message / .error?.message. This isn't introduced by this PR and is out of scope here, but it may be worth a follow-up to align the Python error type.

Overall, this PR is a high-quality, well-coordinated consistency fix. The new assertion pattern is a meaningful improvement over the previous file-unchanged-only check, and it's applied symmetrically across all languages.

Generated by SDK Consistency Review Agent for issue #1317 · ● 170.3K ·

@stephentoub stephentoub merged commit a623d07 into main May 17, 2026
47 checks passed
@stephentoub stephentoub deleted the stephentoub/fix-dotnet-permission-decision-serializa branch May 17, 2026 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants