Skip to content

Add friendly error messages to health check failures#14072

Open
Copilot wants to merge 9 commits intomainfrom
copilot/display-friendly-error-messages
Open

Add friendly error messages to health check failures#14072
Copilot wants to merge 9 commits intomainfrom
copilot/display-friendly-error-messages

Conversation

Copy link
Contributor

Copilot AI commented Jan 23, 2026

Description

Health checks were displaying raw exception stack traces instead of actionable error messages. Dashboard users saw multi-line exception details like System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection... with no clear indication of what failed.

The root cause was that the third-party UriHealthCheck from AspNetCore.HealthChecks.Uris returns unhealthy results with exception information in the Description field rather than the Exception property. The initial implementation only wrapped results when an exception object was present, allowing most unhealthy results to pass through with raw stack traces.

Changes

HTTP endpoint health checks (ExternalServiceBuilderExtensions.cs):

  • Created StaticUriHealthCheck and enhanced ParameterUriHealthCheck to wrap UriHealthCheck from AspNetCore.HealthChecks.Uris
  • Fixed: Now wraps ALL unhealthy results from UriHealthCheck, not just those with exception objects
  • Provides friendly error messages for common failure scenarios:
    • Timeouts: "Request to {url} timed out."
    • Explicit cancellation: "Health check for {url} was canceled."
    • HTTP errors with status: "Request to {url} returned {code} {status}."
    • Connection failures: "Failed to connect to {url}."
    • Generic failures: "Health check failed for {url}."

Component health checks - Added descriptive error messages for 6 components:

  • Seq: "Request to {url} returned {statusCode}" with proper URI construction and explicit cancellation handling
  • NATS: "Failed to connect to NATS server" / "Connecting to NATS server..." (only for unhealthy/degraded states)
  • Milvus: "Failed to connect to Milvus server"
  • Qdrant: "Failed to connect to Qdrant server"
  • Azure AI Search: "Failed to connect to Azure AI Search service"
  • Azure Web PubSub: "Failed to connect to Azure Web PubSub service"

Additional improvements from PR review feedback:

  • Fixed ParameterUriHealthCheck to create fresh UriHealthCheckOptions per invocation to avoid duplicate URI accumulation
  • Simplified XML documentation for internal classes to follow Aspire conventions
  • Distinguished between timeout and explicit user cancellation in error messages
  • Removed descriptions from healthy states to maintain backward compatibility

Exception objects remain attached to HealthCheckResult for detailed debugging when needed.

Example

Before:

System.Threading.Tasks.TaskCanceledException: The operation was canceled.
---> System.IO.IOException: Unable to read data from the transport connection...

After:

Health check failed for http://192.0.2.1/.

or

Request to http://example.com/api/ timed out.

Checklist

  • Is this feature complete?
    • Yes. Ready to ship.
    • No. Follow-up changes expected.
  • Are you including unit tests for the changes and scenario tests if relevant?
    • Yes
    • No
  • Did you add public API?
    • Yes
    • No
  • Does the change make any security assumptions or guarantees?
    • Yes
    • No
  • Does the change require an update in our Aspire docs?
    • Yes
    • No
Original prompt

This section details on the original issue you should resolve

<issue_title>Display more friendly error message for common health checks errors</issue_title>
<issue_description>### Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

This is what is displayed for a health check that is validates whether an endpoint returns 200:

Image

No message is returned so health checks defaults to displaying the exception message. This doesn't look user friendly.

The health check should have a failure message like "Request to xxx timed out" if there is a timeout. Or "Request to xxx returned yyy status code".

We should do the same for other built-in checks: polling an endpoint for 200, a database for a good response, or message queue, etc. The failure message message should explain what has gone wrong in a friendly way. More detail from the exception message is still available from the exception details if the user wants to view it.

Expected Behavior

No response

Steps To Reproduce

No response

Exceptions (if any)

No response

.NET Version info

No response

Anything else?

No response</issue_description>

Comments on the Issue (you are @copilot in this section)

@davidfowl Maybe we should do this [here](https://github.com/Xabaril/AspNetCore.Diagnostics.HealthChecks). That way we don't need to do anything special for these health checks.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits January 23, 2026 00:40
Co-authored-by: maddymontaquila <12660687+maddymontaquila@users.noreply.github.com>
Co-authored-by: maddymontaquila <12660687+maddymontaquila@users.noreply.github.com>
Co-authored-by: maddymontaquila <12660687+maddymontaquila@users.noreply.github.com>
Copilot AI changed the title [WIP] Add friendly error messages for health check failures Add friendly error messages to health check failures Jan 23, 2026
Copilot AI requested a review from maddymontaquila January 23, 2026 00:53
Remove code duplication between StaticUriHealthCheck and ParameterUriHealthCheck
by extracting the common GetFriendlyErrorMessage method into a shared internal
HttpHealthCheckHelpers static class.

Also remove unused _expectedStatusCode field from StaticUriHealthCheck.
@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 14072

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 14072"

@maddymontaquila maddymontaquila marked this pull request as ready for review February 5, 2026 19:03
Copilot AI review requested due to automatic review settings February 5, 2026 19:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds friendly, user-readable error messages to health check failures across Aspire components. Previously, health checks displayed raw exception stack traces which were not actionable for users. The changes provide clear, descriptive messages like "Request to {url} timed out." instead of multi-line stack traces.

Changes:

  • Implemented wrapper health checks (StaticUriHealthCheck and enhanced ParameterUriHealthCheck) for HTTP endpoint health checks with friendly error messages
  • Added descriptive error messages to 6 component health checks (Seq, NATS, Milvus, Qdrant, Azure AI Search, Azure Web PubSub)
  • Created comprehensive test coverage for the new HTTP health check error messages

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/Aspire.Hosting.Tests/Health/FriendlyHealthCheckErrorMessagesTests.cs New test file validating friendly error messages for static URI, parameter URI, and invalid URL scenarios
src/Components/Aspire.Seq/SeqHealthCheck.cs Enhanced with friendly error messages for timeouts, connection failures, and HTTP status codes
src/Components/Aspire.Qdrant.Client/QdrantHealthCheck.cs Added descriptive error message for connection failures
src/Components/Aspire.NATS.Net/NatsHealthCheck.cs Added messages for all connection states including healthy, degraded, and failed states
src/Components/Aspire.Milvus.Client/MilvusHealthCheck.cs Added descriptive error messages for connection failures and unhealthy server states
src/Components/Aspire.Azure.Search.Documents/AzureSearchIndexHealthCheck.cs Added connection failure message
src/Components/Aspire.Azure.Messaging.WebPubSub/AspireWebPubSubExtensions.cs Added connection failure message to inner health check class
src/Aspire.Hosting/ExternalServiceBuilderExtensions.cs Implemented StaticUriHealthCheck, enhanced ParameterUriHealthCheck, and added HttpHealthCheckHelpers for friendly error message generation
Comments suppressed due to low confidence (1)

src/Aspire.Hosting/ExternalServiceBuilderExtensions.cs:320

  • The XML documentation for this internal sealed class is more verbose than recommended for internal APIs. According to Aspire's documentation standards, internal classes should have brief summary tags only. Consider simplifying to just describe what the class does in one sentence.
/// <summary>
/// A health check that resolves URL from a parameter asynchronously and delegates to UriHealthCheck.
/// </summary>
internal sealed class ParameterUriHealthCheck : IHealthCheck

catch (TaskCanceledException tce) when (!cancellationToken.IsCancellationRequested)
{
return HealthCheckResult.Unhealthy($"Request to {_client.BaseAddress}health timed out.", tce);
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition when (!cancellationToken.IsCancellationRequested) filters TaskCanceledException only when the cancellation token is NOT requested, which is correct for timeouts. However, this means if the user explicitly cancels the health check (via the cancellation token), the TaskCanceledException will be caught by the generic Exception handler at line 38. Consider whether an explicit user cancellation should result in a different message than a general health check failure.

Suggested change
}
}
catch (TaskCanceledException tce) when (cancellationToken.IsCancellationRequested)
{
return HealthCheckResult.Unhealthy($"Health check for {_client.BaseAddress}health was canceled.", tce);
}

Copilot uses AI. Check for mistakes.
{
return exception switch
{
TaskCanceledException or OperationCanceledException => $"Request to {uri} timed out.",
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern matching in GetFriendlyErrorMessage treats both TaskCanceledException and OperationCanceledException as timeouts. However, this doesn't distinguish between actual timeouts and explicit user cancellations. When a health check is explicitly cancelled by the user (e.g., application shutdown), the message "Request to {uri} timed out." may be misleading. Consider checking if the cancellation was requested before determining the message.

Suggested change
TaskCanceledException or OperationCanceledException => $"Request to {uri} timed out.",
TaskCanceledException => $"Request to {uri} timed out.",
OperationCanceledException => $"Request to {uri} was canceled.",

Copilot uses AI. Check for mistakes.
}

targetUri = uri;

Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UriHealthCheckOptions instance is shared across multiple health check invocations. Each time CheckHealthAsync is called, AddUri is invoked again on line 368, potentially adding duplicate URI configurations to the same options object. This could lead to unexpected behavior if the health check is invoked multiple times. Consider creating a new UriHealthCheckOptions instance for each health check invocation, or ensure that the options are cleared before reuse.

Suggested change
_options.Uris.Clear();

Copilot uses AI. Check for mistakes.
}
catch (HttpRequestException hre)
{
return HealthCheckResult.Unhealthy($"Failed to connect to {_client.BaseAddress}health.", hre);
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message concatenates the base address with "health" directly. This has the same issue as lines 28 and 32.

Copilot uses AI. Check for mistakes.
}
catch (Exception ex)
{
return HealthCheckResult.Unhealthy($"Health check failed for {_client.BaseAddress}health.", ex);
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message concatenates the base address with "health" directly. This has the same issue as lines 28, 32, and 36.

Copilot uses AI. Check for mistakes.
{
await natsConnection.ConnectAsync().ConfigureAwait(false);
return HealthCheckResult.Healthy();
return HealthCheckResult.Healthy("Connected to NATS server.");
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a description message to healthy result is a change in behavior. See comment on line 15.

Copilot uses AI. Check for mistakes.
Comment on lines +254 to +273
/// <summary>
/// Provides helper methods for HTTP health checks.
/// </summary>
internal static class HttpHealthCheckHelpers
{
/// <summary>
/// Returns a friendly, user-readable error message for common HTTP health check exceptions.
/// </summary>
public static string GetFriendlyErrorMessage(Uri uri, Exception exception)
{
return exception switch
{
TaskCanceledException or OperationCanceledException => $"Request to {uri} timed out.",
HttpRequestException hre when hre.StatusCode.HasValue =>
$"Request to {uri} returned {(int)hre.StatusCode.Value} {hre.StatusCode.Value}.",
HttpRequestException => $"Failed to connect to {uri}.",
_ => $"Health check failed for {uri}."
};
}
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML documentation for this internal static class is more verbose than recommended for internal APIs. According to Aspire's documentation standards, internal classes should have brief summary tags only, avoiding detailed explanations. Consider simplifying to just describe what the class does in one sentence.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +275 to +278
/// <summary>
/// A health check wrapper for static URIs that provides friendly error messages.
/// </summary>
internal sealed class StaticUriHealthCheck : IHealthCheck
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML documentation for this internal sealed class is more verbose than recommended for internal APIs. According to Aspire's documentation standards, internal classes should have brief summary tags only. Consider simplifying the summary to just describe what the class does without implementation details about wrapping or friendly messages.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +28 to +40
: HealthCheckResult.Unhealthy($"Request to {_client.BaseAddress}health returned {(int)response.StatusCode} {response.StatusCode}.");
}
catch (TaskCanceledException tce) when (!cancellationToken.IsCancellationRequested)
{
return HealthCheckResult.Unhealthy($"Request to {_client.BaseAddress}health timed out.", tce);
}
catch (HttpRequestException hre)
{
return HealthCheckResult.Unhealthy($"Failed to connect to {_client.BaseAddress}health.", hre);
}
catch (Exception ex)
{
return HealthCheckResult.Unhealthy($"Health check failed for {_client.BaseAddress}health.", ex);
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message concatenates the base address with "health" directly without a separator. If BaseAddress ends with a trailing slash (e.g., "http://localhost:5341/"), this will produce "http://localhost:5341/health". However, if BaseAddress doesn't end with a slash, this produces an ambiguous URL-like string. Consider using proper URI construction or ensuring the message is clear about what's being checked.

Suggested change
: HealthCheckResult.Unhealthy($"Request to {_client.BaseAddress}health returned {(int)response.StatusCode} {response.StatusCode}.");
}
catch (TaskCanceledException tce) when (!cancellationToken.IsCancellationRequested)
{
return HealthCheckResult.Unhealthy($"Request to {_client.BaseAddress}health timed out.", tce);
}
catch (HttpRequestException hre)
{
return HealthCheckResult.Unhealthy($"Failed to connect to {_client.BaseAddress}health.", hre);
}
catch (Exception ex)
{
return HealthCheckResult.Unhealthy($"Health check failed for {_client.BaseAddress}health.", ex);
: HealthCheckResult.Unhealthy($"Request to {new Uri(_client.BaseAddress!, "/health")} returned {(int)response.StatusCode} {response.StatusCode}.");
}
catch (TaskCanceledException tce) when (!cancellationToken.IsCancellationRequested)
{
return HealthCheckResult.Unhealthy($"Request to {new Uri(_client.BaseAddress!, "/health")} timed out.", tce);
}
catch (HttpRequestException hre)
{
return HealthCheckResult.Unhealthy($"Failed to connect to {new Uri(_client.BaseAddress!, "/health")}.", hre);
}
catch (Exception ex)
{
return HealthCheckResult.Unhealthy($"Health check failed for {new Uri(_client.BaseAddress!, "/health")}.", ex);

Copilot uses AI. Check for mistakes.
}
catch (TaskCanceledException tce) when (!cancellationToken.IsCancellationRequested)
{
return HealthCheckResult.Unhealthy($"Request to {_client.BaseAddress}health timed out.", tce);
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message concatenates the base address with "health" directly. This has the same issue as line 28 - if BaseAddress doesn't end with a trailing slash, the concatenation might produce an unclear message.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

🎬 CLI E2E Test Recordings

The following terminal recordings are available for commit b4b974f:

Test Recording
AgentCommands_AllHelpOutputs_AreCorrect ▶️ View Recording
AgentInitCommand_MigratesDeprecatedConfig ▶️ View Recording
Banner_DisplayedOnFirstRun ▶️ View Recording
Banner_DisplayedWithExplicitFlag ▶️ View Recording
CreateAndDeployToDockerCompose ▶️ View Recording
CreateAndDeployToDockerComposeInteractive ▶️ View Recording
CreateAndPublishToKubernetes ▶️ View Recording
CreateAndRunAspireStarterProject ▶️ View Recording
CreateAndRunJsReactProject ▶️ View Recording
CreateAndRunPythonReactProject ▶️ View Recording
CreateEmptyAppHostProject ▶️ View Recording
CreateStartAndStopAspireProject ▶️ View Recording
CreateTypeScriptAppHostWithViteApp ▶️ View Recording
DoctorCommand_DetectsDeprecatedAgentConfig ▶️ View Recording
DoctorCommand_WithSslCertDir_ShowsTrusted ▶️ View Recording
DoctorCommand_WithoutSslCertDir_ShowsPartiallyTrusted ▶️ View Recording
LogsCommandShowsResourceLogs ▶️ View Recording
PsCommandListsRunningAppHost ▶️ View Recording
ResourcesCommandShowsRunningResources ▶️ View Recording

📹 Recordings uploaded automatically from CI run #21727341811

- SeqHealthCheck: Use proper URI construction for /health endpoint to avoid
  ambiguous URL concatenation
- NatsHealthCheck: Remove descriptions from Healthy results to avoid
  behavioral changes (keep only Unhealthy/Degraded messages)
- ExternalServiceBuilderExtensions:
  - Simplify XML documentation for internal classes
  - Fix ParameterUriHealthCheck to create fresh UriHealthCheckOptions per
    invocation to avoid duplicate URI accumulation
  - Remove unused fields from ParameterUriHealthCheck
@JamesNK
Copy link
Member

JamesNK commented Feb 6, 2026

@maddymontaquila I tested this in TestShop and I didn't see an improvement

image image

@maddymontaquila
Copy link
Contributor

@maddymontaquila I tested this in TestShop and I didn't see an improvement

image image

Ugh. @copilot can you take a look?

Copilot AI and others added 2 commits February 6, 2026 16:29
Co-authored-by: maddymontaquila <12660687+maddymontaquila@users.noreply.github.com>
Co-authored-by: maddymontaquila <12660687+maddymontaquila@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 6, 2026

@maddymontaquila I tested this in TestShop and I didn't see an improvement

<img alt="image" width="1078" height="205" src="https://private-user-images.githubusercontent.com/303201/545968120-0...

Fixed in commit 45c5597. The issue was that the UriHealthCheck from AspNetCore.HealthChecks.Uris returns unhealthy results with exception details in the Description field rather than the Exception property. Updated both StaticUriHealthCheck and ParameterUriHealthCheck to wrap ALL unhealthy results, not just those with exceptions. This should now display friendly messages like "Health check failed for {url}." instead of the raw exception stack trace.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • example.com
    • Triggering command: /home/REDACTED/work/aspire/aspire/artifacts/bin/Aspire.Hosting.Tests/Debug/net8.0/Aspire.Hosting.Tests /home/REDACTED/work/aspire/aspire/artifacts/bin/Aspire.Hosting.Tests/Debug/net8.0/Aspire.Hosting.Tests --internal-msbuild-node /home/REDACTED/.local/share/acd1bd6bed38487aaa142c072e4d9df4/.p --filter-not-trait category=failing --filter-class *.ExternalServiceTests --filter-not-trait quarantined=true --filter-not-trait outerloop=true (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Display more friendly error message for common health checks errors

4 participants