Capture MSAL exception details#2587
Conversation
There was a problem hiding this comment.
Pull request overview
This PR enhances error handling/telemetry in Microsoft.Mcp.Core by capturing additional (intended PII-safe) details for MSAL-related exceptions and mapping MsalServiceException to an HTTP status code, aiming to improve diagnostics when authentication flows fail.
Changes:
- Add
MsalServiceExceptionHTTP status code mapping inGlobalCommand.GetStatusCode. - Update
BaseCommand.HandleExceptionto emit structured exception telemetry (status + error code + IDs) forRequestFailedException,MsalServiceException, andMsalClientException. - Extend
BaseCommand.GetStatusCodeto mapRequestFailedExceptionandMsalServiceException.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| core/Microsoft.Mcp.Core/src/Commands/GlobalCommand.cs | Maps MsalServiceException to an HTTP status code. |
| core/Microsoft.Mcp.Core/src/Commands/BaseCommand.cs | Emits structured exception details into telemetry tags and expands status-code mapping. |
jongio
left a comment
There was a problem hiding this comment.
Clean refactoring - the shared exceptionDetails object eliminates duplication between branches and makes it straightforward to add new exception types.
The bot's point about CorrelationId vs RequestId naming is the key item to address before merge. Mixing MSAL correlation IDs with Azure service request IDs under the same "RequestId" key will make telemetry queries confusing - they're semantically different identifiers from different systems. Consider "CorrelationId" for MSAL and keep "RequestId" for ClientRequestId.
One nice side effect: commands extending BaseCommand directly (not via GlobalCommand) now get correct HTTP status codes for RequestFailedException instead of a blanket 500.
Nit: the comment on line 95 ("this is the only PII safe property") should be reworded since ErrorCode and request/correlation IDs are also added below. Something like "Start with the status code, then add PII-safe diagnostic properties per exception type" would be clearer.
jongio
left a comment
There was a problem hiding this comment.
Addresses my previous feedback. The CorrelationId rename is correct and eliminates the semantic confusion with Azure SDK's ClientRequestId.
Nit: Line 114 comment says "correlation code" - should be "correlation ID".
What does this PR do?
Capture details from
MsalServiceExceptionandMsalServiceExceptionin telemetry when they're thrown to help improve details for where the MCP server can improve.GitHub issue number?
[Link to the GitHub issue this PR addresses]Pre-merge Checklist
servers/Azure.Mcp.Server/README.mdand/orservers/Fabric.Mcp.Server/README.mddocumentationREADME.mdchanges running the script./eng/scripts/Process-PackageReadMe.ps1. See Package READMEToolDescriptionEvaluatorand obtained a score of0.4or more and a top 3 ranking for all related test promptsconsolidated-tools.jsonbreaking-changelabelservers/Azure.Mcp.Server/docs/azmcp-commands.md./eng/scripts/Update-AzCommandsMetadata.ps1to update tool metadata inazmcp-commands.md(required for CI)servers/Azure.Mcp.Server/docs/e2eTestPrompts.mdcrypto mining, spam, data exfiltration, etc.)/azp run mcp - pullrequest - liveto run Live Test Pipeline