Add friendly error messages to health check failures#14072
Add friendly error messages to health check failures#14072
Conversation
Co-authored-by: maddymontaquila <12660687+maddymontaquila@users.noreply.github.com>
Co-authored-by: maddymontaquila <12660687+maddymontaquila@users.noreply.github.com>
Co-authored-by: maddymontaquila <12660687+maddymontaquila@users.noreply.github.com>
…dly-error-messages
Remove code duplication between StaticUriHealthCheck and ParameterUriHealthCheck by extracting the common GetFriendlyErrorMessage method into a shared internal HttpHealthCheckHelpers static class. Also remove unused _expectedStatusCode field from StaticUriHealthCheck.
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 14072Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 14072" |
There was a problem hiding this comment.
Pull request overview
This PR adds friendly, user-readable error messages to health check failures across Aspire components. Previously, health checks displayed raw exception stack traces which were not actionable for users. The changes provide clear, descriptive messages like "Request to {url} timed out." instead of multi-line stack traces.
Changes:
- Implemented wrapper health checks (
StaticUriHealthCheckand enhancedParameterUriHealthCheck) for HTTP endpoint health checks with friendly error messages - Added descriptive error messages to 6 component health checks (Seq, NATS, Milvus, Qdrant, Azure AI Search, Azure Web PubSub)
- Created comprehensive test coverage for the new HTTP health check error messages
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/Aspire.Hosting.Tests/Health/FriendlyHealthCheckErrorMessagesTests.cs | New test file validating friendly error messages for static URI, parameter URI, and invalid URL scenarios |
| src/Components/Aspire.Seq/SeqHealthCheck.cs | Enhanced with friendly error messages for timeouts, connection failures, and HTTP status codes |
| src/Components/Aspire.Qdrant.Client/QdrantHealthCheck.cs | Added descriptive error message for connection failures |
| src/Components/Aspire.NATS.Net/NatsHealthCheck.cs | Added messages for all connection states including healthy, degraded, and failed states |
| src/Components/Aspire.Milvus.Client/MilvusHealthCheck.cs | Added descriptive error messages for connection failures and unhealthy server states |
| src/Components/Aspire.Azure.Search.Documents/AzureSearchIndexHealthCheck.cs | Added connection failure message |
| src/Components/Aspire.Azure.Messaging.WebPubSub/AspireWebPubSubExtensions.cs | Added connection failure message to inner health check class |
| src/Aspire.Hosting/ExternalServiceBuilderExtensions.cs | Implemented StaticUriHealthCheck, enhanced ParameterUriHealthCheck, and added HttpHealthCheckHelpers for friendly error message generation |
Comments suppressed due to low confidence (1)
src/Aspire.Hosting/ExternalServiceBuilderExtensions.cs:320
- The XML documentation for this internal sealed class is more verbose than recommended for internal APIs. According to Aspire's documentation standards, internal classes should have brief summary tags only. Consider simplifying to just describe what the class does in one sentence.
/// <summary>
/// A health check that resolves URL from a parameter asynchronously and delegates to UriHealthCheck.
/// </summary>
internal sealed class ParameterUriHealthCheck : IHealthCheck
| catch (TaskCanceledException tce) when (!cancellationToken.IsCancellationRequested) | ||
| { | ||
| return HealthCheckResult.Unhealthy($"Request to {_client.BaseAddress}health timed out.", tce); | ||
| } |
There was a problem hiding this comment.
The condition when (!cancellationToken.IsCancellationRequested) filters TaskCanceledException only when the cancellation token is NOT requested, which is correct for timeouts. However, this means if the user explicitly cancels the health check (via the cancellation token), the TaskCanceledException will be caught by the generic Exception handler at line 38. Consider whether an explicit user cancellation should result in a different message than a general health check failure.
| } | |
| } | |
| catch (TaskCanceledException tce) when (cancellationToken.IsCancellationRequested) | |
| { | |
| return HealthCheckResult.Unhealthy($"Health check for {_client.BaseAddress}health was canceled.", tce); | |
| } |
| { | ||
| return exception switch | ||
| { | ||
| TaskCanceledException or OperationCanceledException => $"Request to {uri} timed out.", |
There was a problem hiding this comment.
The pattern matching in GetFriendlyErrorMessage treats both TaskCanceledException and OperationCanceledException as timeouts. However, this doesn't distinguish between actual timeouts and explicit user cancellations. When a health check is explicitly cancelled by the user (e.g., application shutdown), the message "Request to {uri} timed out." may be misleading. Consider checking if the cancellation was requested before determining the message.
| TaskCanceledException or OperationCanceledException => $"Request to {uri} timed out.", | |
| TaskCanceledException => $"Request to {uri} timed out.", | |
| OperationCanceledException => $"Request to {uri} was canceled.", |
| } | ||
|
|
||
| targetUri = uri; | ||
|
|
There was a problem hiding this comment.
The UriHealthCheckOptions instance is shared across multiple health check invocations. Each time CheckHealthAsync is called, AddUri is invoked again on line 368, potentially adding duplicate URI configurations to the same options object. This could lead to unexpected behavior if the health check is invoked multiple times. Consider creating a new UriHealthCheckOptions instance for each health check invocation, or ensure that the options are cleared before reuse.
| _options.Uris.Clear(); |
| } | ||
| catch (HttpRequestException hre) | ||
| { | ||
| return HealthCheckResult.Unhealthy($"Failed to connect to {_client.BaseAddress}health.", hre); |
There was a problem hiding this comment.
The error message concatenates the base address with "health" directly. This has the same issue as lines 28 and 32.
| } | ||
| catch (Exception ex) | ||
| { | ||
| return HealthCheckResult.Unhealthy($"Health check failed for {_client.BaseAddress}health.", ex); |
There was a problem hiding this comment.
The error message concatenates the base address with "health" directly. This has the same issue as lines 28, 32, and 36.
| { | ||
| await natsConnection.ConnectAsync().ConfigureAwait(false); | ||
| return HealthCheckResult.Healthy(); | ||
| return HealthCheckResult.Healthy("Connected to NATS server."); |
There was a problem hiding this comment.
Adding a description message to healthy result is a change in behavior. See comment on line 15.
| /// <summary> | ||
| /// Provides helper methods for HTTP health checks. | ||
| /// </summary> | ||
| internal static class HttpHealthCheckHelpers | ||
| { | ||
| /// <summary> | ||
| /// Returns a friendly, user-readable error message for common HTTP health check exceptions. | ||
| /// </summary> | ||
| public static string GetFriendlyErrorMessage(Uri uri, Exception exception) | ||
| { | ||
| return exception switch | ||
| { | ||
| TaskCanceledException or OperationCanceledException => $"Request to {uri} timed out.", | ||
| HttpRequestException hre when hre.StatusCode.HasValue => | ||
| $"Request to {uri} returned {(int)hre.StatusCode.Value} {hre.StatusCode.Value}.", | ||
| HttpRequestException => $"Failed to connect to {uri}.", | ||
| _ => $"Health check failed for {uri}." | ||
| }; | ||
| } | ||
| } |
There was a problem hiding this comment.
The XML documentation for this internal static class is more verbose than recommended for internal APIs. According to Aspire's documentation standards, internal classes should have brief summary tags only, avoiding detailed explanations. Consider simplifying to just describe what the class does in one sentence.
| /// <summary> | ||
| /// A health check wrapper for static URIs that provides friendly error messages. | ||
| /// </summary> | ||
| internal sealed class StaticUriHealthCheck : IHealthCheck |
There was a problem hiding this comment.
The XML documentation for this internal sealed class is more verbose than recommended for internal APIs. According to Aspire's documentation standards, internal classes should have brief summary tags only. Consider simplifying the summary to just describe what the class does without implementation details about wrapping or friendly messages.
| : HealthCheckResult.Unhealthy($"Request to {_client.BaseAddress}health returned {(int)response.StatusCode} {response.StatusCode}."); | ||
| } | ||
| catch (TaskCanceledException tce) when (!cancellationToken.IsCancellationRequested) | ||
| { | ||
| return HealthCheckResult.Unhealthy($"Request to {_client.BaseAddress}health timed out.", tce); | ||
| } | ||
| catch (HttpRequestException hre) | ||
| { | ||
| return HealthCheckResult.Unhealthy($"Failed to connect to {_client.BaseAddress}health.", hre); | ||
| } | ||
| catch (Exception ex) | ||
| { | ||
| return HealthCheckResult.Unhealthy($"Health check failed for {_client.BaseAddress}health.", ex); |
There was a problem hiding this comment.
The error message concatenates the base address with "health" directly without a separator. If BaseAddress ends with a trailing slash (e.g., "http://localhost:5341/"), this will produce "http://localhost:5341/health". However, if BaseAddress doesn't end with a slash, this produces an ambiguous URL-like string. Consider using proper URI construction or ensuring the message is clear about what's being checked.
| : HealthCheckResult.Unhealthy($"Request to {_client.BaseAddress}health returned {(int)response.StatusCode} {response.StatusCode}."); | |
| } | |
| catch (TaskCanceledException tce) when (!cancellationToken.IsCancellationRequested) | |
| { | |
| return HealthCheckResult.Unhealthy($"Request to {_client.BaseAddress}health timed out.", tce); | |
| } | |
| catch (HttpRequestException hre) | |
| { | |
| return HealthCheckResult.Unhealthy($"Failed to connect to {_client.BaseAddress}health.", hre); | |
| } | |
| catch (Exception ex) | |
| { | |
| return HealthCheckResult.Unhealthy($"Health check failed for {_client.BaseAddress}health.", ex); | |
| : HealthCheckResult.Unhealthy($"Request to {new Uri(_client.BaseAddress!, "/health")} returned {(int)response.StatusCode} {response.StatusCode}."); | |
| } | |
| catch (TaskCanceledException tce) when (!cancellationToken.IsCancellationRequested) | |
| { | |
| return HealthCheckResult.Unhealthy($"Request to {new Uri(_client.BaseAddress!, "/health")} timed out.", tce); | |
| } | |
| catch (HttpRequestException hre) | |
| { | |
| return HealthCheckResult.Unhealthy($"Failed to connect to {new Uri(_client.BaseAddress!, "/health")}.", hre); | |
| } | |
| catch (Exception ex) | |
| { | |
| return HealthCheckResult.Unhealthy($"Health check failed for {new Uri(_client.BaseAddress!, "/health")}.", ex); |
| } | ||
| catch (TaskCanceledException tce) when (!cancellationToken.IsCancellationRequested) | ||
| { | ||
| return HealthCheckResult.Unhealthy($"Request to {_client.BaseAddress}health timed out.", tce); |
There was a problem hiding this comment.
The error message concatenates the base address with "health" directly. This has the same issue as line 28 - if BaseAddress doesn't end with a trailing slash, the concatenation might produce an unclear message.
🎬 CLI E2E Test RecordingsThe following terminal recordings are available for commit
📹 Recordings uploaded automatically from CI run #21727341811 |
- SeqHealthCheck: Use proper URI construction for /health endpoint to avoid
ambiguous URL concatenation
- NatsHealthCheck: Remove descriptions from Healthy results to avoid
behavioral changes (keep only Unhealthy/Degraded messages)
- ExternalServiceBuilderExtensions:
- Simplify XML documentation for internal classes
- Fix ParameterUriHealthCheck to create fresh UriHealthCheckOptions per
invocation to avoid duplicate URI accumulation
- Remove unused fields from ParameterUriHealthCheck
|
@maddymontaquila I tested this in TestShop and I didn't see an improvement
|
Ugh. @copilot can you take a look? |
Co-authored-by: maddymontaquila <12660687+maddymontaquila@users.noreply.github.com>
Co-authored-by: maddymontaquila <12660687+maddymontaquila@users.noreply.github.com>
Fixed in commit 45c5597. The issue was that the Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|




Description
Health checks were displaying raw exception stack traces instead of actionable error messages. Dashboard users saw multi-line exception details like
System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection...with no clear indication of what failed.The root cause was that the third-party
UriHealthCheckfromAspNetCore.HealthChecks.Urisreturns unhealthy results with exception information in the Description field rather than the Exception property. The initial implementation only wrapped results when an exception object was present, allowing most unhealthy results to pass through with raw stack traces.Changes
HTTP endpoint health checks (
ExternalServiceBuilderExtensions.cs):StaticUriHealthCheckand enhancedParameterUriHealthCheckto wrapUriHealthCheckfromAspNetCore.HealthChecks.UrisComponent health checks - Added descriptive error messages for 6 components:
Additional improvements from PR review feedback:
ParameterUriHealthCheckto create freshUriHealthCheckOptionsper invocation to avoid duplicate URI accumulationException objects remain attached to
HealthCheckResultfor detailed debugging when needed.Example
Before:
After:
or
Checklist
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.