Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine tuning of Health status aggregation #15479

Closed
jhaeyaert opened this issue Dec 16, 2018 · 1 comment
Closed

Fine tuning of Health status aggregation #15479

jhaeyaert opened this issue Dec 16, 2018 · 1 comment
Labels
status: duplicate A duplicate of another issue

Comments

@jhaeyaert
Copy link

jhaeyaert commented Dec 16, 2018

Hello,

My organization deploys a lot of Spring Boot applications in an RedHat OpenShift cluster (based on Kubernetes solution). This solution uses 2 types of health checks : readiness and liveness probes.
Readiness probe is used to detect if the application is well started and ready to accept incoming requests. Liveness probe is used to detect if the application is UP and can be added to the internal load balancer (or kicked out).

We currently use the actuator /health endpoint for both readiness and liveness probes. Sometimes we want to select the list of indicators that should be part of the aggregated health status. For instance, some indicators switch to DOWN but are not considered critical so we don't want they take part of the aggregated status.

A concrete example is circuit breaker status (we use resilience4j). When a circuit breaker is opened, its corresponding health status switch to DOWN, so the global health status too. This behavior has a huge repercussion because when a remote service used by our application is in a bad shape, all instances of our application open the circuit breaker. As a consequence all health statuses switch to DOWN so all instances are kicked out of the load balancer. Our application became totally unavailable.

We could consider this behavior makes sense but, sometimes, even a non critical functionality can lead to a service interruption. That's why we suggest to suggest to offer a full control on global health status aggregation.

Proposal

Currently it is possible to provide a custom bean implementing HealthAggregator but instead of redefining the whole mechanic it would be great if we could have a fine tuning of the indicators that take part of the global status aggregation using simple configuration.

We could provide 2 new config properties such as management.endpoint.health.aggregated-status.include and management.endpoint.health.aggregated-status.exclude (a bit like the way we configure the exposure of indicators over the /health endpoint using management.endpoints.web.exposure.include|exclude)

To respect the backward compatibility principle, we could just set * the default value for management.endpoint.health.aggregated-status.include and leave exclude property empty.

Then we could easily adapt HealthIndicatorAutoConfigurationand AbstractHealthAggregator to deal with these new configurations and adapt the behavior.

If someone wants to exclude some health indicators or specify the exact list of indicators to include then they are free to do so.

If you think this kind a feature can be helpful and makes sense I can develop it and make a pull request. I'm also ready to start a discussion.

Just keep me in touch.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Dec 16, 2018
@philwebb
Copy link
Member

Thanks for the suggestion but I think this is a duplicate of #14022. Feel free to subscribe to that issue or add additional comments to it. There's clearly quite a bit of interest in this feature and we've got a few ideas that we'd like to discuss as a team before taking any specific approach. I think it would be best to wait for the outcome of those discussions (happening in January) before starting on a pull request.

@philwebb philwebb added status: duplicate A duplicate of another issue and removed status: waiting-for-triage An issue we've not yet triaged labels Dec 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: duplicate A duplicate of another issue
Projects
None yet
Development

No branches or pull requests

3 participants