Conditional Execution of Health Checks #3441

coreyjv · 2015-07-08T12:37:42Z

I've been using the Spring Boot Actuator health check functionality for a while now and really enjoy the easy of writing custom HealthIndicators. However, there is one need that Spring Boot Actuator does not currently support (see this Stack Overflow thread for reference here) that I'd like to suggest as a new feature but first I'd like to explain our challenge.

For our applications the /health endpoint is queried very frequently (say every 5 seconds) to determine if fail over to another site needs to occur. For frequently executed queries the /heath endpoint should report the health of the application, and not execute any HealthIndicators that check the health of external dependencies such as external web services because those calls are expensive.

Additionally, there are times where health checks against external dependencies are needed during post deployment or troubleshooting scenarios. It's very easy to write custom HealthIndicators and having the ability to consolidate it all under one /health endpoint is advantageous instead of writing additional code to expose this through another endpoint.

If this feature request is accepted I'd like to propose that the implementation is opt-in so by default custom HealthIndicators are executed like they are today so users don't find themselves in a situation where a HealthIndicator is not run when they expect it to be.

Finally, if possible, I'd be willing to make the contributions to implement this feature as well.

Thank you!

The text was updated successfully, but these errors were encountered:

snicoll · 2015-07-08T12:47:11Z

Are you aware that there is a endpoints.health.time-to-livethat caches the result of the health check. Would that help?

coreyjv · 2015-07-08T13:08:23Z

@snicoll --

Yes, I'm aware of the caching capabilities of the health result but I do not think it's the right solution for this need since it unilaterally caches all the results of the health check instead of an identified subset.

Setting a high time-to-live to mitigate expensive calls by HealthIndicators may give a false sense of the actual health of the application where I think a solution that allows a developer to distinguish and execute particular health indicators conditionally may be more clear.

Thanks!

snicoll · 2015-07-08T13:10:51Z

Either way, this sounds like a nice feature. If you want to give that a try, go ahead and we can discuss more based on the code (make sure to sign the CLA).

wilkinsona · 2015-07-08T13:17:54Z

As I said on Stack Overflow, I'm not convinced that this is a good idea. The description above hasn't changed that opinion – I still think this feels like premature optimisation.

@coreyjv You've said that some calls can be "expensive". Can you define that more concretely? Checking that a service is available, even if it's remote, should have a far lower cost than actually using that service for any real work.

I also don't really understand the point of having a health check that, based on some condition, might not actually check anything. Beyond a cache with predictable TTL, aren't you in danger of believing that your app is healthy when, in fact, it's not?

coreyjv · 2015-07-08T13:33:18Z

@wilkinsona,

Thanks for taking the time to respond!

Can you define that more concretely?

Sure thing. One example would be checking connectivity to a directory service such as LDAP or Active Directory. In these cases one would want to validate they could query certain parts of the tree successfully.

Another example would be a remote call on a web service. This may not only be expensive from a network perspective since it's a remote call but it presents another challenge. These downstream dependencies also have to be able to support the load that these health check calls are placing on which could slow down actual services, and be significant if the calling application is actually multiple application instances.

Beyond a cache with predictable TTL, aren't you in danger of believing that your app is healthy when, in fact, it's not?

I think there's a distinction between the application health, and the health of external dependencies. While an external dependency such as a remote web service may report as DOWN it does not necessarily indicate a problem with the application. It may be a transient failure where the network is at fault and fail over of the application should not occur in my opinion.

I hope this provides a bit of clarity and thanks for taking a critical look at this request.

wilkinsona · 2015-07-08T13:50:30Z

I remain unconvinced by the expensive argument. A periodic health check, even if it's happening every five seconds from multiple instances, should be very lightweight in comparison to the load that those instances are producing doing real work. If those health checks are sufficient to cause a noticeable degradation in performance then you have bigger problems.

I think there's a distinction between the application health, and the health of external dependencies. While an external dependency such as a remote web service may report as DOWN it does not necessarily indicate a problem with the application. It may be a transient failure where the network is at fault and fail over of the application should not occur in my opinion.

I find this part of the reasoning more compelling. It sounds like you want the health indicators to have an error threshold such that it takes multiple errors in a given sampling window for a service to be considered down, very much like what's supported by Hystrix.

snicoll · 2015-07-08T15:53:49Z

For the record, I thought this issue was about tuning the time-to-live of certain checks, not allowing for custom conditions on a given indicator.

mvitz · 2017-01-11T09:23:32Z

FYI: We use a similar approach in our current project. If written a single class which decouples returning the health result from calculating it. All that is needed then is to schedule a trigger for the calculation method. If there is any interest I could upload an example.

philwebb · 2018-03-22T19:08:48Z

Supporting this would require a lot of additional code on our end. We're not convinced there's enough of a need. You can implement this directly by writing a custom health aggregator.

ngaya-ll · 2018-08-03T04:48:28Z

@philwebb I'd like to reopen this for consideration to support the following use case.

My organization uses Spring Boot services running in a Kubernetes environment. Kubernetes has a concept of two types of health probes: liveness and readiness probes. Liveness probes are used to detect if the service needs to be restarted, while readiness probes detect if the service is ready to be added to load balancers.

It would be great if there were a way to perform different checks for readiness and liveness. For example, if an application temporarily loses connectivity to its data source it should be taken out of load balancing, but doesn't need to be restarted. So we would like to include the DataSourceHealthIndicator in the readiness probe but not the liveness probe.

Note that #8685 already allows querying a single health indicator, but not multiple indicators. Perhaps Spring Boot could provide a way to configure named groups of indicators that could be accessed as /actuator/health/{group}.

wilkinsona · 2018-08-08T14:54:50Z

@ngaya-ll That's an interesting use case, thank you. It feels a bit different to what was proposed in this issue so can you please open a new issue so that we can track it separately?

ngaya-ll · 2018-08-08T19:31:23Z

@wilkinsona Sure, filed #14022

wilkinsona · 2018-08-08T20:23:07Z

Thanks.

philwebb added type: enhancement A general enhancement discussion labels Jul 8, 2015

philwebb removed the discussion label Jan 7, 2016

wilkinsona mentioned this issue Jul 18, 2016

Reduce backend load by using passive health indicators #6407

Closed

philwebb added the priority: low label Jan 18, 2017

pioto mentioned this issue Mar 21, 2017

Run Specific Health Checks #8685

Closed

wilkinsona mentioned this issue Oct 11, 2017

Allow exclusion of health indicators from status rollup #10550

Closed

philwebb removed the priority: low label Mar 21, 2018

philwebb closed this as completed Mar 22, 2018

philwebb added the status: declined A suggestion or change that we don't feel we should currently apply label Mar 22, 2018

wilkinsona added the for: team-attention An issue we'd like other members of the team to review label Aug 6, 2018

philwebb reopened this Aug 8, 2018

philwebb closed this as completed Aug 8, 2018

snicoll removed the for: team-attention An issue we'd like other members of the team to review label Aug 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditional Execution of Health Checks #3441

Conditional Execution of Health Checks #3441

coreyjv commented Jul 8, 2015

snicoll commented Jul 8, 2015

coreyjv commented Jul 8, 2015

snicoll commented Jul 8, 2015

wilkinsona commented Jul 8, 2015

coreyjv commented Jul 8, 2015

wilkinsona commented Jul 8, 2015

snicoll commented Jul 8, 2015

mvitz commented Jan 11, 2017

philwebb commented Mar 22, 2018

ngaya-ll commented Aug 3, 2018 •

edited

Loading

wilkinsona commented Aug 8, 2018

ngaya-ll commented Aug 8, 2018

wilkinsona commented Aug 8, 2018

Conditional Execution of Health Checks #3441

Conditional Execution of Health Checks #3441

Comments

coreyjv commented Jul 8, 2015

snicoll commented Jul 8, 2015

coreyjv commented Jul 8, 2015

snicoll commented Jul 8, 2015

wilkinsona commented Jul 8, 2015

coreyjv commented Jul 8, 2015

wilkinsona commented Jul 8, 2015

snicoll commented Jul 8, 2015

mvitz commented Jan 11, 2017

philwebb commented Mar 22, 2018

ngaya-ll commented Aug 3, 2018 • edited Loading

wilkinsona commented Aug 8, 2018

ngaya-ll commented Aug 8, 2018

wilkinsona commented Aug 8, 2018

ngaya-ll commented Aug 3, 2018 •

edited

Loading