Fine tuning of Health status aggregation #15479

jhaeyaert · 2018-12-16T21:30:05Z

Hello,

My organization deploys a lot of Spring Boot applications in an RedHat OpenShift cluster (based on Kubernetes solution). This solution uses 2 types of health checks : readiness and liveness probes.
Readiness probe is used to detect if the application is well started and ready to accept incoming requests. Liveness probe is used to detect if the application is UP and can be added to the internal load balancer (or kicked out).

We currently use the actuator /health endpoint for both readiness and liveness probes. Sometimes we want to select the list of indicators that should be part of the aggregated health status. For instance, some indicators switch to DOWN but are not considered critical so we don't want they take part of the aggregated status.

A concrete example is circuit breaker status (we use resilience4j). When a circuit breaker is opened, its corresponding health status switch to DOWN, so the global health status too. This behavior has a huge repercussion because when a remote service used by our application is in a bad shape, all instances of our application open the circuit breaker. As a consequence all health statuses switch to DOWN so all instances are kicked out of the load balancer. Our application became totally unavailable.

We could consider this behavior makes sense but, sometimes, even a non critical functionality can lead to a service interruption. That's why we suggest to suggest to offer a full control on global health status aggregation.

Proposal

Currently it is possible to provide a custom bean implementing HealthAggregator but instead of redefining the whole mechanic it would be great if we could have a fine tuning of the indicators that take part of the global status aggregation using simple configuration.

We could provide 2 new config properties such as management.endpoint.health.aggregated-status.include and management.endpoint.health.aggregated-status.exclude (a bit like the way we configure the exposure of indicators over the /health endpoint using management.endpoints.web.exposure.include|exclude)

To respect the backward compatibility principle, we could just set * the default value for management.endpoint.health.aggregated-status.include and leave exclude property empty.

Then we could easily adapt HealthIndicatorAutoConfigurationand AbstractHealthAggregator to deal with these new configurations and adapt the behavior.

If someone wants to exclude some health indicators or specify the exact list of indicators to include then they are free to do so.

If you think this kind a feature can be helpful and makes sense I can develop it and make a pull request. I'm also ready to start a discussion.

Just keep me in touch.

The text was updated successfully, but these errors were encountered:

philwebb · 2018-12-17T05:14:53Z

Thanks for the suggestion but I think this is a duplicate of #14022. Feel free to subscribe to that issue or add additional comments to it. There's clearly quite a bit of interest in this feature and we've got a few ideas that we'd like to discuss as a team before taking any specific approach. I think it would be best to wait for the outcome of those discussions (happening in January) before starting on a pull request.

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Dec 16, 2018

philwebb closed this as completed Dec 17, 2018

philwebb added status: duplicate A duplicate of another issue and removed status: waiting-for-triage An issue we've not yet triaged labels Dec 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine tuning of Health status aggregation #15479

Fine tuning of Health status aggregation #15479

jhaeyaert commented Dec 16, 2018 •

edited by philwebb

Loading

philwebb commented Dec 17, 2018

Fine tuning of Health status aggregation #15479

Fine tuning of Health status aggregation #15479

Comments

jhaeyaert commented Dec 16, 2018 • edited by philwebb Loading

philwebb commented Dec 17, 2018

jhaeyaert commented Dec 16, 2018 •

edited by philwebb

Loading