You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My organization deploys a lot of Spring Boot applications in an RedHat OpenShift cluster (based on Kubernetes solution). This solution uses 2 types of health checks : readiness and liveness probes.
Readiness probe is used to detect if the application is well started and ready to accept incoming requests. Liveness probe is used to detect if the application is UP and can be added to the internal load balancer (or kicked out).
We currently use the actuator /health endpoint for both readiness and liveness probes. Sometimes we want to select the list of indicators that should be part of the aggregated health status. For instance, some indicators switch to DOWN but are not considered critical so we don't want they take part of the aggregated status.
A concrete example is circuit breaker status (we use resilience4j). When a circuit breaker is opened, its corresponding health status switch to DOWN, so the global health status too. This behavior has a huge repercussion because when a remote service used by our application is in a bad shape, all instances of our application open the circuit breaker. As a consequence all health statuses switch to DOWN so all instances are kicked out of the load balancer. Our application became totally unavailable.
We could consider this behavior makes sense but, sometimes, even a non critical functionality can lead to a service interruption. That's why we suggest to suggest to offer a full control on global health status aggregation.
Proposal
Currently it is possible to provide a custom bean implementing HealthAggregator but instead of redefining the whole mechanic it would be great if we could have a fine tuning of the indicators that take part of the global status aggregation using simple configuration.
We could provide 2 new config properties such as management.endpoint.health.aggregated-status.include and management.endpoint.health.aggregated-status.exclude (a bit like the way we configure the exposure of indicators over the /health endpoint using management.endpoints.web.exposure.include|exclude)
To respect the backward compatibility principle, we could just set * the default value for management.endpoint.health.aggregated-status.include and leave exclude property empty.
Then we could easily adapt HealthIndicatorAutoConfigurationand AbstractHealthAggregator to deal with these new configurations and adapt the behavior.
If someone wants to exclude some health indicators or specify the exact list of indicators to include then they are free to do so.
If you think this kind a feature can be helpful and makes sense I can develop it and make a pull request. I'm also ready to start a discussion.
Just keep me in touch.
The text was updated successfully, but these errors were encountered:
Thanks for the suggestion but I think this is a duplicate of #14022. Feel free to subscribe to that issue or add additional comments to it. There's clearly quite a bit of interest in this feature and we've got a few ideas that we'd like to discuss as a team before taking any specific approach. I think it would be best to wait for the outcome of those discussions (happening in January) before starting on a pull request.
Hello,
My organization deploys a lot of Spring Boot applications in an RedHat OpenShift cluster (based on Kubernetes solution). This solution uses 2 types of health checks : readiness and liveness probes.
Readiness probe is used to detect if the application is well started and ready to accept incoming requests. Liveness probe is used to detect if the application is UP and can be added to the internal load balancer (or kicked out).
We currently use the actuator
/health
endpoint for both readiness and liveness probes. Sometimes we want to select the list of indicators that should be part of the aggregated health status. For instance, some indicators switch to DOWN but are not considered critical so we don't want they take part of the aggregated status.A concrete example is circuit breaker status (we use resilience4j). When a circuit breaker is opened, its corresponding health status switch to DOWN, so the global health status too. This behavior has a huge repercussion because when a remote service used by our application is in a bad shape, all instances of our application open the circuit breaker. As a consequence all health statuses switch to DOWN so all instances are kicked out of the load balancer. Our application became totally unavailable.
We could consider this behavior makes sense but, sometimes, even a non critical functionality can lead to a service interruption. That's why we suggest to suggest to offer a full control on global health status aggregation.
Proposal
Currently it is possible to provide a custom bean implementing
HealthAggregator
but instead of redefining the whole mechanic it would be great if we could have a fine tuning of the indicators that take part of the global status aggregation using simple configuration.We could provide 2 new config properties such as
management.endpoint.health.aggregated-status.include
andmanagement.endpoint.health.aggregated-status.exclude
(a bit like the way we configure the exposure of indicators over the/health
endpoint usingmanagement.endpoints.web.exposure.include|exclude
)To respect the backward compatibility principle, we could just set
*
the default value formanagement.endpoint.health.aggregated-status.include
and leaveexclude
property empty.Then we could easily adapt
HealthIndicatorAutoConfiguration
andAbstractHealthAggregator
to deal with these new configurations and adapt the behavior.If someone wants to exclude some health indicators or specify the exact list of indicators to include then they are free to do so.
If you think this kind a feature can be helpful and makes sense I can develop it and make a pull request. I'm also ready to start a discussion.
Just keep me in touch.
The text was updated successfully, but these errors were encountered: