Skip to content

Reduce custom app permissions and improve setup reliability#409

Merged
sellakumaran merged 8 commits into
mainfrom
users/sellak/min-permissions
May 11, 2026
Merged

Reduce custom app permissions and improve setup reliability#409
sellakumaran merged 8 commits into
mainfrom
users/sellak/min-permissions

Conversation

@sellakumaran
Copy link
Copy Markdown
Contributor

Removes DelegatedPermissionGrant.ReadWrite.All and AgentIdentity.Create.All from the required CLI app permission set. Agent identity creation now uses Blueprint app-only credentials (AgentIdentity.CreateAsManager auto-granted to Blueprint apps). Principal-scoped oauth2 grants use AgentIdentityBlueprint.UpdateAuthProperties.All. EnsureServicePrincipalForAppIdAsync eliminated for agent identity SPs (id == appId for ServiceIdentity type), removing the Application.ReadWrite.All dependency.

Adds exponential back-off retry loops for AADSTS700016 and Authorization_IdentityNotFound propagation errors on fresh blueprint setups. All propagation-lag retry logs downgraded to Debug (not user-actionable).

Additional fixes:

  • --authmode obo with --aiteammate warns instead of hard-erroring
  • Messaging endpoint summary shows not-configured vs failed correctly
  • Explicit null guard on AgentBlueprintClientSecret before UnprotectSecret
  • Stale error message referencing removed permissions corrected
  • Retry loop convention aligned (maxAttempts / < throughout)
  • ConfigService omits null values from ExtractDynamicProperties to prevent null-overwrite cycle on re-run (issue 408 fix)

Validated end-to-end across base, --aiteammate, --m365, and --authmode both paths as Agent ID Developer role with no Application.ReadWrite.All, DelegatedPermissionGrant.ReadWrite.All, AgentIdentity.ReadWrite.All, or AgentIdentity.Create.All on the custom app.

Removes DelegatedPermissionGrant.ReadWrite.All and AgentIdentity.Create.All
from the required CLI app permission set. Agent identity creation now uses
Blueprint app-only credentials (AgentIdentity.CreateAsManager auto-granted
to Blueprint apps). Principal-scoped oauth2 grants use
AgentIdentityBlueprint.UpdateAuthProperties.All. EnsureServicePrincipalForAppIdAsync
eliminated for agent identity SPs (id == appId for ServiceIdentity type),
removing the Application.ReadWrite.All dependency.

Adds exponential back-off retry loops for AADSTS700016 and
Authorization_IdentityNotFound propagation errors on fresh blueprint setups.
All propagation-lag retry logs downgraded to Debug (not user-actionable).

Additional fixes:
- --authmode obo with --aiteammate warns instead of hard-erroring
- Messaging endpoint summary shows not-configured vs failed correctly
- Explicit null guard on AgentBlueprintClientSecret before UnprotectSecret
- Stale error message referencing removed permissions corrected
- Retry loop convention aligned (maxAttempts / < throughout)
- ConfigService omits null values from ExtractDynamicProperties to prevent
  null-overwrite cycle on re-run (issue 408 fix)

Validated end-to-end across base, --aiteammate, --m365, and --authmode both
paths as Agent ID Developer role with no Application.ReadWrite.All,
DelegatedPermissionGrant.ReadWrite.All, AgentIdentity.ReadWrite.All, or
AgentIdentity.Create.All on the custom app.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 8, 2026 01:19
@sellakumaran sellakumaran requested review from a team as code owners May 8, 2026 01:19
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

⚠️ Deprecation Warning: The deny-licenses option is deprecated for possible removal in the next major release. For more information, see issue 997.

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates a365 setup flows to reduce required permissions on the custom CLI app and improve reliability on fresh blueprint setups (replication-lag retries, clearer messaging), while aligning generated config behavior to avoid null-overwrite cycles.

Changes:

  • Switch agent identity creation to use Blueprint app-only credentials and remove DelegatedPermissionGrant.ReadWrite.All from required permission lists.
  • Add exponential backoff retries for transient Entra propagation errors and downgrade propagation-lag logs to Debug.
  • Improve setup validation/UX: --authmode obo --aiteammate warns (continues), messaging endpoint summary distinguishes “not configured”, and generated config omits null dynamic properties.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/Services/Agent365ConfigServiceTests.cs Adds regression tests ensuring null dynamic properties are omitted and non-null secrets persist.
src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/Commands/SetupCommandTests.cs Updates tests for --authmode + --aiteammate behavior (warning vs error) and validates incompatible modes still fail.
src/Microsoft.Agents.A365.DevTools.Cli/Services/GraphApiService.cs Adds retry/backoff for blueprint token acquisition and agent identity creation; adjusts propagation-lag logging levels.
src/Microsoft.Agents.A365.DevTools.Cli/Services/ConfigService.cs Filters out null values when extracting dynamic properties to avoid null-overwrite on reruns.
src/Microsoft.Agents.A365.DevTools.Cli/Constants/AuthenticationConstants.cs Removes DelegatedPermissionGrant.ReadWrite.All from required permissions/scopes.
src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/SetupHelpers.cs Improves summary output/action-required messaging for “messaging endpoint not configured”.
src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/NonDwBlueprintSetupOrchestrator.cs Uses blueprint client secret for agent identity creation; removes agent identity SP “ensure” step.
src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/AllSubcommand.cs Changes --authmode obo --aiteammate from hard error to warning; keeps other modes incompatible.
CHANGELOG.md Documents permission reductions, retry behavior, and setup UX fixes.
.gitignore Ignores docs/min-permissions/.

Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Services/GraphApiService.cs Outdated
Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Services/GraphApiService.cs Outdated
Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Services/GraphApiService.cs Outdated
- Reduce required delegated scopes for a365 CLI client app:
  - Use AgentIdentityBlueprint.ReadWrite.All as umbrella for blueprint ops
  - Require AgentIdentityBlueprintPrincipal.Create for SP creation
  - Replace Directory.Read.All with Application.Read.All
  - Remove User.ReadWrite.All, broad blueprint sub-scopes, and AppRoleAssignment.ReadWrite.All
- Update all code, logging, and user guidance to reference new scopes
- Role checks now decode wids claim from MSAL token (no Graph call)
- Improve token acquisition retry logic for blueprint creation
- Update tests and documentation to match new permission model
- Endpoint registration guidance now points to Teams Developer Portal
- Reduces privilege footprint; 7-permission set validated across admin and developer roles
gwharris7
gwharris7 previously approved these changes May 8, 2026
- Fix off-by-one in retry log {Max} argument: pass maxRetries/maxAttempts
  instead of maxRetries-1/maxAttempts-1 in three retry loops
- Assert exit code is 0 (not just non-1) in WarnsAndContinues test
- Replace brittle JSON string assertions with JsonNode parsing in
  SaveStateAsync_NonNullStringProperty_IsWrittenToJson
- Remove misleading 'id == appId' comment in NonDwBlueprintSetupOrchestrator

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 10, 2026 15:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/Microsoft.Agents.A365.DevTools.Cli/Services/GraphApiService.cs:1526

  • GetBlueprintAccessTokenAsync increased maxRetries to 12 with exponential backoff capped at 60s. With baseDelaySeconds=5 this can sleep for ~8+ minutes total (5+10+20+40+60*7), which is a large behavioral/operational change and doesn’t match the PR description’s “~60s total”. Consider reducing attempts (or lowering baseDelaySeconds) and rename maxRetries to maxAttempts for clarity since the loop condition is attempt < maxRetries (total attempts).
            const int maxRetries = 12;
            const int baseDelaySeconds = 5;

            for (int attempt = 0; attempt < maxRetries; attempt++)
            {

Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Services/AgentBlueprintService.cs Outdated
- Fix XML comment in InteractiveGraphAuthServiceTests: AgentIdentityBlueprintPrincipal.Create
  is a separate required scope, not covered by the ReadWrite.All umbrella
- Improve Contains() guard comment in AgentBlueprintService: explicitly states agent user
  cleanup is disabled (intentional) until create-instance is re-enabled
- Document RequiredPermissionGrantScopes = [] intent: empty routes to standard AuthenticationService
  token path which already carries all required scopes via RequiredClientAppPermissions (PR #409)
- Document RequiredS2SGrantScopes = [] intent: AppRoleAssignment.ReadWrite.All removed; admins
  have bypass, developers fall back to PowerShell instructions (PR #409)
- Add detection rules E/F/G to pr-code-reviewer.md to catch these patterns in future reviews

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sellakumaran sellakumaran enabled auto-merge (squash) May 10, 2026 18:53
The "Run tests" step on Ubuntu has hung twice on this branch (4h 50m and 1h+
respectively) with no log surfaced because GitHub publishes job logs only on
completion. Add two narrowly-scoped diagnostic guardrails so the next hang
fails fast and tells us which test is stuck:

- job-level `timeout-minutes: 20` — bounds the run to ~2x the Windows-local
  suite time instead of GitHub's 6-hour default.
- `--blame-hang --blame-hang-timeout 5min` — produces a Sequence_*.xml hang
  report naming the stuck test method (and the test before it) when any
  single test exceeds 5 minutes.

Also demote the MsalBrowserCredential "Failed to register persistent token
cache" warning to Debug. The same exception was already logged at Debug on
the line above; the warning text ("auth prompts may be repeated") was not
actionable by the user (common cause on headless Linux is no D-Bus/Keychain)
and produced noise in CI test output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 10, 2026 19:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 6 comments.

Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Services/GraphApiService.cs Outdated
Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/SetupHelpers.cs Outdated
Comment thread CHANGELOG.md Outdated
Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/SetupHelpers.cs Outdated
sellakumaran and others added 2 commits May 11, 2026 06:22
Six Copilot AI comments addressed:
- GraphApiService: rewrite XML doc for CheckDirectoryRoleAsync and its two
  wrappers to describe the wids-claim implementation (no Graph call, no
  scope dependency) and document the group-assignment / PIM-eligible
  limitations. The previous doc still described the old transitiveMemberOf
  query path.
- SetupHelpers: replace the misleading "uses AgentIdentityBlueprint.ReadWrite.All
  as umbrella" comment with an explicit note that permissionGrantScopes is
  intentionally empty and that empty arrays fall through to the standard
  token path.
- AuthenticationConstants: delete unused AgentIdentityBlueprintDeleteRestoreAllScope
  and AgentIdentityBlueprintAddRemoveCredsAllScope constants — they
  contradicted the code that uses the ReadWrite.All umbrella for those
  operations, and grep confirmed no callers in src/.
- CHANGELOG: correct the retry-timing claim from "~60s total" to several
  minutes worst case (12 attempts × 60s cap ≈ 8 min for the blueprint
  token retry).
- GraphApiServiceTests: rename IsCurrentUserAdminAsync_GraphFails_ReturnsUnknown
  and IsCurrentUserAgentIdAdminAsync_GraphReturnsNull_ReturnsUnknown to
  *_TokenAcquisitionFails_ReturnsUnknown so the names match the now-token-based
  failure mode.
- MessagingEndpointFailureReasons: extract the four string literals
  ("NotOwner", "BlueprintMissing", "NotConfigured", "Other") into a shared
  constant class in Constants/, replacing 11 string-literal usages across
  AllSubcommand, SetupHelpers, TeamsGraphBackendConfigurator, and
  AllSubcommandTests.

CI fix:
- MockToolingServerSubcommandTests: remove HandleStartServer_WithValidPort_LogsStartingMessage
  and HandleStartServer_WithNullPort_UsesDefaultPort. Both started a real
  Kestrel server via Server.Start() on a fire-and-forget LongRunning task
  that the test never tore down. On Linux CI this caused two failures:
  (a) the Theory port 1 case requires root and never binds, and (b) parallel
  tests collided on the leaked port 5309 binding. --blame-hang-timeout caught
  the deadlock on the previous run. Remaining tests still cover handler logic
  (dry-run, background, invalid port, verbose) without binding any port; a
  comment documents the decision to keep the regression from coming back.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous CI run hung in
PermissionsSubcommandTests.ConfigureMcpPermissionsAsync_V1AndMetadataScopes_AreKnownAndProceed
for 5+ minutes until --blame-hang-timeout aborted the run.

Root cause: BatchPermissionsOrchestrator's pre-warm call,
  graph.GraphGetAsync(tenantId, "/v1.0/me?$select=id", ct, scopes: prewarmScopes)
now receives scopes: [] because RequiredPermissionGrantScopes was emptied
earlier on this branch. Empty scopes route EnsureGraphHeadersAsync to the
standard token path (GetGraphAccessTokenAsync), and on a partial mock that
falls through to the real MSAL AuthenticationService. On Linux CI with no
cached credentials, that blocks waiting for browser/device-code auth.
Windows masked it with cached tokens (2s test runtime).

Fix: pre-stub three virtual GraphApiService methods (GraphGetAsync,
IsCurrentUserAdminAsync, IsCurrentUserAgentIdAdminAsync) in the test class
constructor so the orchestrator gets a null pre-warm response and
short-circuits out of Phase 1/2/3 deterministically. Inline comment
documents why so a future reader hitting the same pattern in another
test class has the reasoning.

Targeted test now runs in 178 ms (was 2 s on Windows, 5+ min hang on
Linux). Full suite drops from 12.58 s to 5.18 s for 1392 passing tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 11, 2026 13:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 27 changed files in this pull request and generated 2 comments.

Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Services/GraphApiService.cs Outdated
Two workstreams combined in one commit per request.

A. Copilot review-comment fixes on PR #409:
- GraphApiService.CheckDirectoryRoleAsync: previously acquired the role-check
  token via AuthenticationService → PowerShell well-known clientId, which does
  NOT have the wids optional claim configured. The method always returned
  Unknown, causing BatchPermissionsOrchestrator to treat real Global Admins as
  non-admins (admin URL printed even when the signed-in user could grant
  inline). Now routes through _tokenProvider with CustomClientAppId and
  User.Read, so the JWT comes from the app that actually carries wids.
- GraphApiService.EnsureGraphHeadersAsync: empty IEnumerable<string> previously
  fell through to the same PowerShell-clientId path. Routing changed to use
  _tokenProvider whenever (hasScopes || hasCustomApp). Bootstrap escape hatch
  preserved: no scopes AND no CustomClientAppId still uses legacy
  AuthenticationService so the initial app lookup doesn't hang on a null
  clientId.
- GraphApiServiceTests: helper mocks now return the wids JWT via the token
  provider (matching the new production path). Production methods called by 8
  existing tests still pass.
- pr-code-reviewer.md: added Rule H — "JWT claim decoded → verify the token
  was issued by the app registration that has the claim configured." Cites
  PR #409 as the concrete example so reviewers ground future analysis.

B. Remove a365 deploy references from CLI code:
- PermissionsSubcommand.cs: help text and runtime "Next step" log no longer
  reference the long-removed 'a365 deploy'. Both now point at 'a365 publish',
  the actual next a365 command in the workflow.
- PermissionsSubcommandTests.cs: assertion updated to pin "a365 publish".
- NodeBuildFailedException.cs and NodeDependencyInstallException.cs deleted:
  dead code since a365 deploy was removed (no throw sites, no test refs).
- ErrorCodes.cs: removed NodeBuildFailed and NodeDependencyInstallFailed
  (only callers were the deleted exception classes).
- design.md: removed DeployCommand.cs row, removed the five deploy-era service
  rows from the Services folder tree, replaced the entire "Multiplatform
  Deployment Architecture" section (IPlatformBuilder interface + Deployment
  Pipeline mermaid + Restart Mode) with a tight "Multiplatform Project
  Detection" section that accurately describes what PlatformDetector does
  today (used by publish, not deploy). Fixed Program.cs sketch.
- CHANGELOG.md: one bullet under [Unreleased] Fixed documenting the
  user-visible help/log change.

Validation:
- Unit suite: 1392/1392 pass, 7.2s total, no slow tests.
- End-to-end Run 2-retest2 Minimum (cached cache, 8s): all role-check tokens
  from clientId 716ae110- (test custom app).
- End-to-end Run 2-retest2 Medium (cleared cache, 1m 50s): bootstrap escape
  hatch correctly used legacy AuthenticationService, no Connect-MgGraph
  fallback; steady-state used custom app.

Doc-side a365 deploy references in docs/ai-workflows/, docs/agent365-guided-
setup/, CLAUDE.md, DEVELOPER.md, and two folder READMEs deferred to a
follow-on PR (per user's plan scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sellakumaran sellakumaran merged commit 3177ae6 into main May 11, 2026
9 checks passed
@sellakumaran sellakumaran deleted the users/sellak/min-permissions branch May 11, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants