Skip to content

Commit

Permalink
Change default policy to HealthyOrPanic (#2171)
Browse files Browse the repository at this point in the history
  • Loading branch information
Tratcher committed Jun 28, 2023
1 parent 01715d1 commit 5ce9ac6
Show file tree
Hide file tree
Showing 4 changed files with 10 additions and 7 deletions.
8 changes: 4 additions & 4 deletions docs/docfx/articles/dests-health-checks.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ endpoints.MapReverseProxy(proxyPipeline =>
- `ReactivationPeriod` - period after which an unhealthy destination's passive health state is reset to `Unknown` and it starts receiving traffic again. Default value is `null` which means the period will be set by a `IPassiveHealthCheckPolicy`

### Built-in policies
There is currently one built-in passive health check policy - [`TransportFailureRateHealthPolicy`](xref:Yarp.ReverseProxy.Health.TransportFailureRateHealthPolicyOptions). It calculates the proxied requests failure rate for each destination and marks it as unhealthy if the specified limit is exceeded. Rate is calculated as a percentage of failured requests to the total number of request proxied to a destination in the given period of time. Failed and total counters are tracked in a sliding time window which means that only the recent readings fitting in the window are taken into account.
There is currently one built-in passive health check policy - [`TransportFailureRateHealthPolicy`](xref:Yarp.ReverseProxy.Health.TransportFailureRateHealthPolicyOptions). It calculates the proxied requests failure rate for each destination and marks it as unhealthy if the specified limit is exceeded. Rate is calculated as a percentage of failed requests to the total number of request proxied to a destination in the given period of time. Failed and total counters are tracked in a sliding time window which means that only the recent readings fitting in the window are taken into account.
There are two sets of policy parameters defined globally and on per cluster level.

Global parameters are set via the options mechanism using `TransportFailureRateHealthPolicyOptions` type with the following properties:
Expand All @@ -272,7 +272,7 @@ services.Configure<TransportFailureRateHealthPolicyOptions>(o =>
```

Cluster-specific parameters are set in the cluster's metadata as follows:
`TransportFailureRateHealthPolicy.RateLimit` - failure rate limit for a destination to be marked as unhealhty. The value is in range `(0,1)`. Default value is provided by the global `DefaultFailureRateLimit` parameter.
`TransportFailureRateHealthPolicy.RateLimit` - failure rate limit for a destination to be marked as unhealthy. The value is in range `(0,1)`. Default value is provided by the global `DefaultFailureRateLimit` parameter.

### Design
The main component is [PassiveHealthCheckMiddleware](xref:Yarp.ReverseProxy.Health.PassiveHealthCheckMiddleware) sitting in the request pipeline and analyzing responses returned by destinations. For each response from a destination belonging to a cluster with enabled passive health checks, `PassiveHealthCheckMiddleware` invokes an [IPassiveHealthCheckPolicy](xref:Yarp.ReverseProxy.Health.IPassiveHealthCheckPolicy) specified for the cluster. The policy analyzes the given response, evaluates a new destination's passive health state and calls [IDestinationHealthUpdater](xref:Yarp.ReverseProxy.Health.IDestinationHealthUpdater) to actually update [DestinationHealthState.Passive](xref:Yarp.ReverseProxy.Model.DestinationHealthState.Passive) value. The update happens asynchronously in the background and doesn't block the request pipeline. When a destination gets marked as unhealthy, it stops receiving new requests until it gets reactivated after a configured period. Reactivation means the destination's `DestinationHealthState.Passive` state is reset from `Unhealthy` to `Unknown` and the cluster's list of healthy destinations is rebuilt to include it. A reactivation is scheduled by `IDestinationHealthUpdater` right after setting the destination's `DestinationHealthState.Passive` to `Unhealthy`.
Expand Down Expand Up @@ -329,10 +329,10 @@ public class FirstUnsuccessfulResponseHealthPolicy : IPassiveHealthCheckPolicy
## Available destination collection
Destinations health state is used to determine which of them are eligible for receiving proxied requests. Each cluster maintains its own list of available destinations on `AvailableDestinations` property of the [ClusterDestinationState](xref:Yarp.ReverseProxy.Model.ClusterDestinationsState) type. That list gets rebuilt when any destination's health state changes. The [IClusterDestinationsUpdater](xref:Yarp.ReverseProxy.Health.IClusterDestinationsUpdater) controls that process and calls an [IAvailableDestinationsPolicy](xref:Yarp.ReverseProxy.Health.IAvailableDestinationsPolicy) configured on the cluster to actually choose the available destinations from the all cluster's destinations. There are the following built-in policies provided and custom ones can be implemented if necessary.

- `HealthyAndUnknown` - inspects each `DestinationState` and adds it on the available destination list if all of the following statements are TRUE. If no destinations are available then requests will get a 503 error. This is the default policy.
- `HealthyAndUnknown` - inspects each `DestinationState` and adds it on the available destination list if all of the following statements are TRUE. If no destinations are available then requests will get a 503 error.
- Active health checks are disabled on the cluster OR `DestinationHealthState.Active != DestinationHealth.Unhealthy`
- Passive health checks are disabled on the cluster OR `DestinationHealthState.Passive != DestinationHealth.Unhealthy`
- `HealthyOrPanic` - calls `HealthyAndUnknown` policy at first to get the available destinations. If none of them are returned from this call, it marks all cluster's destinations as available.
- `HealthyOrPanic` - calls `HealthyAndUnknown` policy at first to get the available destinations. If none of them are returned from this call, it marks all cluster's destinations as available. This is the default policy.

**NOTE**: An available destination policy configured on a cluster will be always called regardless of if any health check is enabled on the given cluster. The health state of a disabled health check is set to `Unknown`.

Expand Down
2 changes: 1 addition & 1 deletion src/ReverseProxy/Configuration/ConfigValidator.cs
Original file line number Diff line number Diff line change
Expand Up @@ -482,7 +482,7 @@ private void ValidateHealthChecks(IList<Exception> errors, ClusterConfig cluster
if (string.IsNullOrEmpty(availableDestinationsPolicy))
{
// The default.
availableDestinationsPolicy = HealthCheckConstants.AvailableDestinations.HealthyAndUnknown;
availableDestinationsPolicy = HealthCheckConstants.AvailableDestinations.HealthyOrPanic;
}

if (!_availableDestinationsPolicies.ContainsKey(availableDestinationsPolicy))
Expand Down
4 changes: 2 additions & 2 deletions src/ReverseProxy/Health/ClusterDestinationsUpdater.cs
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ private void UpdateInternal(ClusterState cluster, IReadOnlyList<DestinationState
var config = cluster.Model.Config;
var destinationPolicy = _destinationPolicies.GetRequiredServiceById(
config.HealthCheck?.AvailableDestinationsPolicy,
HealthCheckConstants.AvailableDestinations.HealthyAndUnknown);
HealthCheckConstants.AvailableDestinations.HealthyOrPanic);

var availableDestinations = destinationPolicy.GetAvailalableDestinations(config, allDestinations);

Expand All @@ -85,7 +85,7 @@ private void UpdateInternal(ClusterState cluster, IReadOnlyList<DestinationState
// The first thread (T1) finished a rebuild and left the lock while still holding the semaphore. The second thread (T2)
// waiting on the lock gets awaken, proceeds under the lock and begins the next rebuild. If at this exact moment
// the third thread (T3) enters this method and tries to acquire the semaphore, it will be debounced because
// the semaphore's count is still 0. However, T2 could have already made some progress and didnt' observe updates made
// the semaphore's count is still 0. However, T2 could have already made some progress and didn't observe updates made
// by T3.
// By releasing the semaphore under the lock, we make sure that in the above situation T3 will proceed till the lock and
// its updates will be observed anyways.
Expand Down
3 changes: 3 additions & 0 deletions test/ReverseProxy.FunctionalTests/PassiveHealthCheckTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
using Yarp.ReverseProxy.Common;
using Yarp.ReverseProxy.Configuration;
using Yarp.ReverseProxy.Forwarder;
using Yarp.ReverseProxy.Health;

namespace Yarp.ReverseProxy;

Expand Down Expand Up @@ -65,6 +66,7 @@ public async Task PassiveHealthChecksEnabled_MultipleDestinationFailures_ProxyRe
{
HealthCheck = new HealthCheckConfig
{
AvailableDestinationsPolicy = HealthCheckConstants.AvailableDestinations.HealthyAndUnknown,
Passive = new PassiveHealthCheckConfig
{
Enabled = true
Expand All @@ -85,6 +87,7 @@ public async Task PassiveHealthChecksEnabled_MultipleDestinationFailures_ProxyRe
using var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Get, uri));
Assert.Equal(i < 10 ? HttpStatusCode.BadGateway : HttpStatusCode.ServiceUnavailable, response.StatusCode);
await Task.Yield();
}
});

Expand Down

0 comments on commit 5ce9ac6

Please sign in to comment.