Setup all: fix AADSTS errors in admin consent url + in-line SP provisioning#432
Merged
Merged
Conversation
Resolves issue #429 (a365 setup all admin-consent URL returns invalid_client in browser). Three concrete Entra error codes were surfacing from the unified /v2.0/adminconsent flow added by PR #424; each is now fixed independently. AADSTS650053 - scope mismatch - The Messaging Bot API blueprint spec hardcoded "Authorization.ReadWrite" + "user_impersonation"; the resource SP publishes exactly "AgentData.ReadWrite". PR #424's strict /v2.0/adminconsent endpoint rejected the whole URL on the drift. Spec now reads ConfigConstants.MessagingBotApiAdminConsentScope so the spec list and URL builder share one source of truth. - New ScopeAvailabilityValidator queries each resource SP's published scopes before building the unified URL and drops anything missing, with a per-resource warning. Defends against future first-party scope drift. AADSTS650052 - resource lacks an SP in tenant - Phase 1's POST /servicePrincipals needs Application.ReadWrite.All which the CLI's MSAL token does not carry; some first-party multi-tenant resource apps consequently failed to provision silently. New EnsureMissingResourceSpsAsync helper detects these after Phase 1 and shells out to 'az ad sp create --id' via the operator's az login (a GA's az token carries the role implicitly). SP id is parsed from the az JSON output - no Graph re-poll, no false "Graph still does not see the SP" warnings on slow replicas. - New --skip-sp-provisioning flag (implicit when stdin is redirected) opts out of the in-line provisioning and surfaces both the az command and a per-SP /v2.0/adminconsent URL keyed to the blueprint as numbered Action Required items, so operators complete recovery without re-running setup all. AADSTS500011 - V2 per-server audience URI not found - GetResourceIdentifierUri keyed off resourceName="Agent 365 Tools" and collapsed every V2 per-server audience onto the WorkIQ Tools URI. Routing through api://{appId} then failed because per-server SPs have identifierUris null. The canonical fallback for SPs without a published URI is the bare appId GUID (verified live: servicePrincipalNames: [<appId>]). - Per-resource URL builders (BuildAdminConsentUrls, BuildCombinedConsentUrl, PopulateAdminConsentUrls, ApplyConsentUrlsIfNeeded) take a new optional mcpScopesByAudience map so V2 audiences route per-appId instead of collapsing. UX - Unified [y/N] prompt shape (default No) is reused across S2S PowerShell fallback, delegated consent PowerShell fallback, and per-SP provisioning. Per-SP "N. ResourceName appId" upfront list + matching "N. Name - Provision?" prompts replace the previous run-on prompts. - az-CLI consent polling supports Enter-to-skip - matches the Graph polling path so operators can break out of a 180s wait when the browser tab failed. TODOs flagged in PowerShellConsentRunner.cs and PowerShellS2SRunner.cs to replace both with az rest-based runners before the PR opens - Connect-MgGraph boot is 5-10s cold and MSAL/WAM browser negotiation is unreliable. The az session is already authenticated and faster; AzRestConsentRunner and AzRestS2SRunner are tracked as follow-up. Tests: 1614 passed / 12 skipped / 0 failed. Adds: - BatchPermissionsOrchestratorMissingSpTests - 12 tests covering pre-flight, --skip-sp-provisioning, decline, accept-with-az-success, az-fail, GUID guard, JSON parser theory. - ScopeAvailabilityValidatorTests - 8 tests including swallow-Graph-error and propagate-OperationCanceled. - PowerShellConsentRunnerTests - 8 tests mirroring the S2S runner pattern. - PermissionSpecsTests - locks in Messaging Bot scope contract. - SetupHelpersConsentUrlTests - V2 audience routing in all three URL builders.
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes multiple Entra (AADSTS) failures in a365 setup all by making admin-consent URL generation and resource service principal provisioning more robust, especially for MCP V2 per-server audiences and scope drift across first-party resources.
Changes:
- Add a scope-availability preflight (
ScopeAvailabilityValidator) to drop non-published delegated scopes before building unified/v2.0/adminconsentURLs. - Add in-line provisioning support for missing resource service principals via
az ad sp create --id <appId>, plus--skip-sp-provisioningand Action Required recovery items. - Fix MCP V2 per-server audience routing by threading
mcpScopesByAudiencethrough consent URL builders and using bareappIdresource identifiers for SPs withidentifierUris: null.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/AllSubcommand.cs | Adds --skip-sp-provisioning, threads per-audience MCP scope mapping through setup flow |
| src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/BatchPermissionsOrchestrator.cs | Filters invalid scopes, provisions missing SPs via az, adds opt-in prompts and delegated consent PowerShell fallback |
| src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/NonDwBlueprintSetupOrchestrator.cs | Passes per-audience MCP scope mapping into consent URL persistence logic |
| src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/PermissionsSubcommand.cs | Updates help/dry-run output to reflect unified consent URL + corrected Bot scope |
| src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/PowerShellConsentRunner.cs | New delegated-consent fallback runner using Microsoft Graph PowerShell SDK |
| src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/PowerShellS2SRunner.cs | Messaging tweaks and removes unused “success marker” output |
| src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/SetupContext.cs | Adds SkipSpProvisioning flag plumbed through setup context |
| src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/SetupHelpers.cs | Fixes MCP V2 audience routing and consent URL building/persistence (per-audience) |
| src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/SetupResults.cs | Adds MissingSpActions model for actionable end-of-run recovery steps |
| src/Microsoft.Agents.A365.DevTools.Cli/Constants/ConfigConstants.cs | Makes Messaging Bot scope a single source of truth (AgentData.ReadWrite) |
| src/Microsoft.Agents.A365.DevTools.Cli/Services/Helpers/AdminConsentHelper.cs | Allows Enter-to-skip in az-based consent polling path |
| src/Microsoft.Agents.A365.DevTools.Cli/Services/Helpers/ScopeAvailabilityValidator.cs | New helper that queries resource SP published scopes and filters spec scopes accordingly |
| src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/Commands/BatchPermissionsOrchestratorMissingSpTests.cs | New tests for missing-SP provisioning and recovery artifact generation |
| src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/Commands/PowerShellConsentRunnerTests.cs | New tests for delegated consent PowerShell runner script generation and guards |
| src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/Commands/SetupSubcommands/PermissionSpecsTests.cs | Ensures Bot spec uses the shared scope constant |
| src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/Helpers/SetupHelpersConsentUrlTests.cs | Adds MCP V2 per-audience routing tests (bare appId resource identifiers) |
| src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/Services/Helpers/ScopeAvailabilityValidatorTests.cs | New tests for scope filtering behavior and failure/cancellation semantics |
| CHANGELOG.md | Documents new option + fixes for AADSTS650053/650052/500011 |
Follow-up to 78c8048 on issue #429. - Replace PowerShellConsentRunner / PowerShellS2SRunner with AzRestConsentRunner / AzRestS2SRunner. The pwsh + Connect-MgGraph fallback was unreliable: MSAL/WAM browser sometimes failed to open, sometimes hung for two minutes. az rest runs synchronously against the operator's existing az login; a GA's az token carries every Graph application permission implicitly via the directory role. pwsh and the Microsoft.Graph modules are no longer required. - Ctrl+C on `setup all` now exits silently instead of running DisplaySetupSummary, which had rendered not-yet-attempted phases as "failed". - Address six Copilot review comments on PR #432: prompt wording for app-role assignment, --skip-sp-provisioning help text and XML doc, and three stale api://{appId} comments in SetupHelpers. - CHANGELOG: trim verbose entries to release-notes shape, fix the 'Ca-' stray prefix, add the Ctrl+C entry.
gwharris7
previously approved these changes
May 28, 2026
Follow-up to cbb9e5a on PR #432. - Single up-front S2S prompt covers both the primary Graph API path and the az rest fallback. Previously the prompt only existed in the fallback branch (regression from when the az rest runners landed); operators on tenants where the primary path succeeded saw no confirmation for an admin-level write. - Remove Enter-to-skip from consent polling. It was solving a non-problem (polling already detects consent) and creating a spurious second prompt: after Enter, the orchestrator fell into the az rest fallback ("Grant admin consent for these permission(s)?") despite the operator having just said "move on". TryConsumeEnterKey + 4 call sites + the 250ms short-poll loops deleted. - Az rest poll: catch (OperationCanceledException) now throws instead of returning false, matching the Graph overload. Ctrl+C during polling aborts cleanly via AllSubcommand's silent OCE handler. Closes a Rule P (swallowed OCE) finding from /review-staged. - Polling timeout message points at the most likely real cause: "az login may be signed into a different tenant than the consent target. Verify with 'az account show'." (issue #430 territory). - Wording polish: "permissions" -> "permission(s)" on both prompts; drop redundant "(covers all required delegated permissions)" parens; drop redundant "in the browser" from the reassurance line. - Tests: new ConfigureAllPermissions_WhenOperatorDeclinesS2SPrompt_ NoGrantsAttempted locks the no-grant-on-decline contract. PollAdminConsentAsync_ReturnsFalse_WhenNoGrant renamed/rewritten to assert OCE propagation directly (the original was asserting cancellation behavior via the previous swallow).
Fixes 23 of 25 unresolved comments (10 unique findings, most posted 3x
across Copilot's review iterations). One finding pushed back; one held
for a follow-up commit.
Stale "PowerShell fallback" wording (3 comments)
- BatchPermissionsOrchestrator.cs:599/602: LogWarning text and
Warnings.Add text both said "PowerShell fallback" but the fallback is
az rest after this PR. Both replaced.
Stale "api://{appId}" comments (12 comments across 4 files)
- AllSubcommand.cs:899-904, NonDwBlueprintSetupOrchestrator.cs:368-371,
SetupHelpersConsentUrlTests.cs:316-321: comments still described the
pre-fix routing as "api://{appId}". The actual implemented routing is
the bare appId GUID (api:// triggered AADSTS500011 for per-server SPs
with identifierUris=null). Comments rewritten to match the code.
- BatchPermissionsOrchestrator.cs:624-633: missing-SP provisioning
comment still said "per-app admin-consent URLs" but the helper now
shells out to az ad sp create. Rewritten to describe the actual
mechanism plus the AADSTS65003 reason the URL pattern was removed.
CHANGELOG proc- typo (3 comments)
- Line 137: "proc- `setup all --dry-run`..." was missing the markdown
list marker. Fixed.
Test-class parallelism (2 comments)
- BatchPermissionsOrchestratorTests and BatchPermissionsOrchestratorMissingSpTests
both mutate static state (BypassSpProvisioningForTests,
OpenUrlOverrideForTests) without [Collection]. Added
[Collection("Sequential")] on both to serialize with other classes that
mutate the same globals.
GetResourceIdentifierUri "bare GUID for unknowns" (2 comments) - HELD
- Copilot flagged this as theoretically too broad — a custom resource
whose SP omits the bare appId from servicePrincipalNames would fail.
- I initially pushed back; the user correctly pointed out the screenshot
evidence (W365 MCP appId in ToolingManifest) and the fact that the name
signal isn't reliable (operator-typed customPermission names don't
match "Agent 365 Tools"). The follow-up will plumb the loaded
ToolingManifest's audience set as the discriminator. Tracked in a
TODO inline at GetResourceIdentifierUri.
- The trimmed comment block replaces an 18-line essay with 3 lines on
the WorkIQ check + 3 lines on the follow-up.
Skill update — three new Cross-Cutting Contract Checks
- /review-staged missed 7 of 9 unique findings on its prior runs. The
miss pattern is structural: staged-only review surface (not branch),
no comment-vs-code consistency check, no test-parallelism check.
- Added Rules S, T, U to .claude/skills/review-staged/SKILL.md to
close the gaps. Full miss analysis is at
.codereviews/why-review-staged-missed-copilot-findings-...md
(gitignored — local artifact for posterity).
Replaces the unreliable resourceName "Agent 365 Tools" heuristic with an explicit isMcpAudience bool fed by the loaded ToolingManifest audience set. Restores the pre-PR-#432 api://{appId} catch-all for unknown resources (custom permissions etc.); bare appId GUID is now emitted only when the caller signals an MCP per-server audience. - GetResourceIdentifierUri / BuildFullyQualifiedScope: drop string resourceName param, add bool isMcpAudience (default false). - V2 audience loops in BuildAdminConsentUrls / BuildCombinedConsentUrl hardcode isMcpAudience: true (the loop's existence is the signal). - ConfigureAllPermissionsAsync + GrantAdminConsentAsync + ExecuteBatchPermissionsStepAsync take optional knownMcpAudienceAppIds. AllSubcommand loads it from mcpScopesByAudience.Keys after BuildPermissionSpecsAsync. - Catch-all spec loop in GrantAdminConsentAsync passes knownMcpAudienceAppIds?.Contains(s.ResourceAppId) per spec, so customPermissions referencing an MCP audience appId still route to bare GUID while non-MCP customPermissions keep api://{appId}. - +5 tests pinning all four branches: known platform (canonical URI), WorkIQ shared (canonical URI, takes precedence over isMcpAudience), V2 MCP per-server (bare GUID), unknown (api://{appId}).
Original bug: a365 setup all --m365 (no --aiteammate) emitted api:// for V2 MCP per-server audiences in the consent URL, causing AADSTS500011. Commit 7a1e317 wired knownMcpAudienceAppIds through the DW path but missed the non-DW blueprint default path at NonDwBlueprintSetupOrchestrator.cs:362 — that caller invoked ExecuteBatchPermissionsStepAsync without the audience set, so the catch-all spec loop in GrantAdminConsentAsync defaulted to api://{appId} for every spec. Hardening - ExecuteBatchPermissionsStepAsync now takes IReadOnlyDictionary<string, string[]> mcpScopesByAudience as a required parameter (no default). The compiler refuses to build any caller that forgets to thread the manifest audience set through. - Both callers (DW + non-DW) updated to pass the dictionary directly. Copilot review comments - BatchPermissionsOrchestrator.cs:1042 — log "next steps below" was misleading in CI/stdin-redirected runs (the steps surface only in the setup summary). Now reads "steps will be listed in the setup summary". - BatchPermissionsOrchestrator.cs:1257 (GetResourceUriForBlueprintConsent) — the per-SP recovery URL emitted bare GUID for ALL resources, conflicting with the broader contract. Now takes isMcpAudience and routes: MCP audience → bare GUID, everything else → api://{appId}. EnsureMissingResourceSpsAsync, RecordMissingSpAction, and BuildPerSpBlueprintConsentUrl plumb the signal through; 5 internal call sites updated. - PermissionsSubcommand.cs:682 + :1075 — both bypassed the ExecuteBatchPermissionsStepAsync hardening by calling ConfigureAllPermissionsAsync directly. Now pass knownMcpAudienceAppIds: setup permissions mcp uses scopesByAudience.Keys; setup permissions custom loads the manifest so a customPermission entry that points at an MCP audience (e.g. the W365 case) still routes correctly. Regression tests (+3) - ConfigureAllPermissions_NonAdmin_McpAudienceUsesBareGuid_NonMcpUsesApiScheme exercises ConfigureAllPermissionsAsync end-to-end through the orchestrator catch-all spec loop with one MCP appId and one non-MCP appId; asserts both branches. - BuildPerSpBlueprintConsentUrl_DefaultIsMcpAudienceFalse_UsesApiSchemePrefix and BuildPerSpBlueprintConsentUrl_McpAudience_UsesBareAppIdGuidNotApiScheme pin the two branches of the per-SP recovery URL helper. Skill update - New Rule V (hardening-bypass detection) in .claude/skills/review-staged/SKILL.md. When a diff makes a parameter required on an entry-point helper, grep direct production callers of the lower-level function the helper delegates to and verify each passes the parameter. The PermissionsSubcommand miss above is the motivating example — exactly the shape /review-staged didn't catch.
gwharris7
previously approved these changes
May 29, 2026
Live customer test with --m365 surfaced three UX issues: - "consent likely succeeded" banner was wrong when the browser flow had failed (polling timed out, no grant landed), - every V2 MCP per-server audience displayed as a generic "Agent 365 Tools" in the spec list and consent URLs, making them indistinguishable, - phase output (messaging endpoint, project settings) ran together visually with no separators. Changes: - Fallback prompt now opens with the preamble "Consent was not detected - unclear whether it was declined in the browser or an error occurred." and reads "Add these permissions to the blueprint programmatically? [y/N]:". - Spec list and per-audience consent URLs read per-server names from the manifest (mcp_MailTools, mcp_CalendarTools, ...) while the legacy shared audience stays "Agent 365 Tools". - Phase separators added between messaging-endpoint and project-settings steps in both DW and non-DW paths. - Tenant-mismatch warning that previously followed the timeout demoted to Debug since the new preamble covers the ambiguity. - Header / fallback-prompt singular/plural form now driven by actual scope count (no more "permission(s)" hedge). Regression-risk review record at .codereviews/claude-staged-20260529T171449Z.md. Build clean, 1643 / 0 tests passing.
gwharris7
approved these changes
May 30, 2026
ajmfehr
approved these changes
May 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Resolves issue #429 —
a365 setup alladmin-consent URL returnsinvalid_clientin browser. Three concrete Entra error codes were surfacing from the unified/v2.0/adminconsentflow added by #424; each is fixed independently in this PR.Authorization.ReadWrite+user_impersonation; the resource SP only publishesAgentData.ReadWrite. Spec now readsConfigConstants.MessagingBotApiAdminConsentScopeso the spec list and URL builder share one source of truth. NewScopeAvailabilityValidatorqueries each resource SP's published scopes before building the URL and drops anything missing (defense in depth against future scope drift).POST /servicePrincipalsneedsApplication.ReadWrite.Allwhich the CLI's MSAL token doesn't carry. NewEnsureMissingResourceSpsAsynchelper detects unresolved SPs after Phase 1, prompts per-resource (default No), and shells out toaz ad sp create --id <appId>via the operator's GA-privileged az login. SP id is parsed from the az JSON output — no Graph re-poll, no false replica-lag warnings. New--skip-sp-provisioningflag (implicit when stdin is redirected) opts out; declined / failed resources surface in the setup summary's Action Required block as numbered items with both the az command and a per-SP/v2.0/adminconsentURL keyed to the blueprint — operators complete recovery without re-runningsetup all.GetResourceIdentifierUricollapsed V2 per-server audiences onto the WorkIQ Tools URI via theresourceName="Agent 365 Tools"check. Routing throughapi://{appId}then failed because per-server SPs haveidentifierUris: null. The canonical fallback for SPs without a published URI is the bare appId GUID (verified live:servicePrincipalNames: [<appId>]).GetResourceIdentifierUrinow takes an explicitisMcpAudiencesignal fed from the loadedToolingManifest.jsonaudience set: manifest-known appIds → bare GUID; every other unknown →api://{appId}. The three URL builders (BuildAdminConsentUrls,BuildCombinedConsentUrl,PopulateAdminConsentUrls) take anmcpScopesByAudiencemap so V2 audiences route per-appId instead of collapsing.Reliability rewrite — az rest replaces the PowerShell fallbacks.
PowerShellConsentRunner+PowerShellS2SRunnerare gone.AzRestConsentRunner+AzRestS2SRunnerissue the samePOST /oauth2PermissionGrantsandPOST /appRoleAssignmentswrites against the operator's existingaz login.Connect-MgGraph's 5-10s cold boot + MSAL/WAM browser negotiation was unreliable in practice (2-minute hangs observed);az restis synchronous and fast. The operator'saz logincarries enough privilege for both writes without any additional consent.UX polish: unified
[y/N]prompt shape (default No) across the S2S and delegated-consent az rest fallbacks and per-SP provisioning. Per-SPN. ResourceName appIdupfront list + matchingN. Name - Provision?prompts. Ctrl+C onsetup allexits silently instead of printing a misleading partial summary.Hardening:
AllSubcommand.ExecuteBatchPermissionsStepAsyncnow takesmcpScopesByAudienceas a required parameter — the compiler refuses to build any caller that forgets to thread the manifest audience set through. Prevents the same shape of V2 routing regression from recurring at a future call site.