Add support for health indicator groups #14022

ngaya-ll · 2018-08-08T19:30:24Z

My organization uses Spring Boot services running in a Kubernetes environment. Kubernetes has a concept of two types of health probes: liveness and readiness probes. Liveness probes are used to detect if the service needs to be restarted, while readiness probes detect if the service is ready to be added to load balancers.

It would be great if there were a way to perform different checks for readiness and liveness. For example, if an application temporarily loses connectivity to its data source it should be taken out of load balancing, but doesn't need to be restarted. So we would like to include the DataSourceHealthIndicator in the readiness probe but not the liveness probe.

Note that #8685 already allows querying a single health indicator, but not multiple indicators. Perhaps Spring Boot could provide a way to configure named groups of indicators that could be accessed as /actuator/health/{group}.

The text was updated successfully, but these errors were encountered:

eddumelendez · 2018-08-09T23:51:34Z

I am working in this issue. I will be sending a PR this weekend.

wilkinsona · 2018-08-10T10:09:19Z

Thank you, @eddumelendez. We had a few thoughts when we discussed this earlier this week. They are only initial thoughts so they aren't prescriptive, but hopefully they'll be of some use.

We envisaged a property-based mechanism for creating a group of health indicators. In application.properties, the value of the property would probably be a comma-separated list of the ids of the health indicators that are to be included in the group. The key would be the name of the group.

We then thought that CompositeHealthIndicator could be reused and created with all the health indicators in the group. One gotcha that @philwebb identified is that these composite health indicators that represent a group will need to be hidden from the main health check otherwise any grouped indicators would be called twice for a single check.

eddumelendez · 2018-08-10T15:03:19Z

Thanks for letting me know @wilkinsona I'll be working on this taking into account these thoughts.

mbhave · 2019-03-15T18:44:17Z

@eddumelendez Are you still interested in working on this? If not, we'll go ahead and pick it up.

eddumelendez · 2019-03-15T18:56:46Z

@mbhave I have an initial draft which I can share it later today.

mbhave · 2019-03-15T19:46:06Z

great, thanks!

eddumelendez · 2019-03-18T04:36:16Z

sorry for the delay. I have submitted #16252

snicoll · 2019-04-27T14:48:01Z

I gave this some thoughts and I am wondering if we shouldn't have a concept of "out-of-the-box groups" the same way we have logging groups. The two obvious ones are those defined in this very request.

If there is an obvious default for a given indicator, having a way to indicate a preference would give a better out-of-the-box experience for users compared to having them configure the list of indicators in every app.

Another argument is that, in this scenario, if an indicator is not specified, it will not be invoked. When new indicators are added, there's no hint and it could easily be missed if the application is only relying on one group or another.

Of course, we can start with a configurable map for a start and see how we can improve the manual solution.

See spring-projectsgh-14022

snicoll · 2019-04-28T12:09:41Z

I've started to hack on a prototype where the following configuration is applied (random indicators based on spring-boot-sample-actuator):

management.health.groups.ready=diskSpace
management.health.groups.live=example,hello,db,unknown

(Notice the unknown indicator to represent the fact a reference to an unavailable indicator). Starting the sample based on that branch gives the following for the ready group:

// http://localhost:8080/actuator/health/ready

{
  "status": "UP",
  "details": {
    "diskSpace": {
      "status": "UP",
      "details": {
        "total": 499963170816,
        "free": 271707451392,
        "threshold": 10485760
      }
    }
  }
}

The live group looks as follows:

// http://localhost:8080/actuator/health/live

{
  "status": "UP",
  "details": {
    "example": {
      "status": "UP",
      "details": {
        "counter": 42
      }
    },
    "hello": {
      "status": "UP",
      "details": {
        "hello": "world"
      }
    },
    "db": {
      "status": "UP",
      "details": {
        "database": "H2",
        "result": 1,
        "validationQuery": "SELECT 1"
      }
    }
  }
}

The main endpoint does not contain any reference to live or ready as groups are automaticallly filtered out by the registry implementation.

This spike is quite limited but brings several interesting questions:

A health indicator cannot have the same name as a group. If one attempts to create a group with the name of an indicator, it (should) fail as the current idea is to query /health/{name} for both groups and single indicators.
What should happen if a config refers to an unknown indicator? Right now it is ignored
What should happen if a group is empty? Use case: all its indicators are not available or the group does not defined any indicator. Right now it returns UNKNOWN
The current implementation is quite a hack as it relies on the fact getAll() does not return the groups. The javadoc, however, mentions that indicators should be returned. This turns the groups into an implementation detail so perhaps we'd like to give them a bit more visibility
Indicators have to be listed one by one. This is tedious and error prone if new indicators are added along the way
Given how the groups are configured, groups is a reserved word, i.e. you cannot name an indicator that way.

wilkinsona · 2019-05-01T09:32:43Z

A health indicator cannot have the same name as a group. If one attempts to create a group with the name of an indicator, it (should) fail as the current idea is to query /health/{name} for both groups and single indicators.

+1 for failing

What should happen if a config refers to an unknown indicator? Right now it is ignored

Accidental absence of an indicator from a group could lead to traffic being routed to an instance when it's unable to handle it. As such, I think the implementation should try to help users to avoid that mistake. I think we should fail if a group is configured to include an unknown indicator. We could consider a configuration option that changes this behaviour from fail to ignore.

What should happen if a group is empty? Use case: all its indicators are not available or the group does not defined any indicator. Right now it returns UNKNOWN

I think unknown indicators should result in failure (see above). I think a group that's genuinely empty (rather than only containing indicators that do no exist) should return UNKNOWN as the main endpoint does today if there are no indicators.

The current implementation is quite a hack as it relies on the fact getAll() does not return the groups. The javadoc, however, mentions that indicators should be returned. This turns the groups into an implementation detail so perhaps we'd like to give them a bit more visibility

I share you're feeling that this isn't quite right, but I'm not sure I have a better proposal right now. The two marker interfaces for the group indicators feel like a bit of a smell, but I'm not sure how else we can distinguish between a group indicator and a standard indicator.

Indicators have to be listed one by one. This is tedious and error prone if new indicators are added along the way

I wonder if we need some sort of wildcard or pattern matching support. I'd be tempted to ship a first iteration without that and gather some feedback before we do anything more sophisticated.

Given how the groups are configured, groups is a reserved word, i.e. you cannot name an indicator that way.

While this would be a breaking change, I think it's unlikely that anyone has implemented their own indicator named groups. We should avoid the potential problem by allowing the group support to be switched off if needs be.

This commit allows a user to define an arbitrary number of health indicator groups using configuration. If a given health indicator is defined in more than one group, a single invocation is ensured using a AggregateHealth. A reactive counter-part is also provided. See spring-projectsgh-14022

snicoll · 2019-06-21T14:52:36Z

Team, I've rebased my proposal in snicoll@43144b6 - if someone has some cycles to review it, I'd appreciate. I've been back and forth on several strategies to identify that an indicator is a group and hope a review would ameliorate the proposal. It is also very basic at the moment where groups are declared in a static fashion via configuration.

ckoutsouridis · 2019-06-29T15:45:35Z

This is a very interesting topic, and in our organisation we required a separation of ready, and alive endpoints as others mentioned in the above.

But we went in a bit different direction..we introduced a new @Endpoint called alive. This endpoint was calling the actual Health. If the health is different than DOWN, it was returning 200 status code.
For example we had the bellow mapping of health status and http response statuses

Health	/actuator/alive http status code
`UP`	200
`OUT_OF_SERVICE`	200
`CUSTOM_HEALTH_STATUS`	200
`DOWN`	503

and we mapped k8s endpoints such as:

livenessProbe ->/actuator/alive
readinessProbe ->/actuator/health

Even the wording of the health statuses is kind of similar to the wording of kubernetes but a bit inverted,

livenessProbe -> NOT DOWN
readinessProbe -> NOT OUT_OF_SERVICE

All core health indicators already return DOWN, e.g. datasource, jms etc when the resource is not available *. which covers our needs
This allowed us to write our own health indicators which where returning OUT_OF_SERVICE (an example is long cache loading operations).

I understand that this is completely different approach from what is requested, but i think it delivers what was requested + a couple of opportunities that might be useful.

For example, some health indicators might get written in a smarter way, to distinguish whether the app should stop running, or it should just stop serving requests. one example, although probably difficult to implement, is to distinguish transient errors on datasourceHealthIndicator and return OUT_OF_SERVICE from other errors, and return DOWN.

I don't know if this approach is more powerful or not than grouping health indicators, but the implementation of the new alive @Endpoint was dead simple, and it gave us what we needed.
A benefit that i can think of is that with this approach a single health indicator can decide whether the incident requires a stop/restart in k8s, or just un-available for usage. So for the above mentioned example of cache-loading, our rule is if the cache is not loaded return OUT_OF_SERVICE and do not serve traffic, if the cache loading fails return DOWN, restart the container, and try again.

Actually this approach, can work well in conjunction with groups, all i am saying is that although the "custom" endpoint was easy to write, i would like to see something similar in spring-boot :)

Or it can also replace the grouping approach, as instead of configuring groups, we would then just configure the statuses returned from each health indicator, for example it would be quite easy to control with a property or code, that jmsHealthIndicator should return OUT_OF_SERVICE instead of DOWN

this way, we can finally give a semantic on the OUT_OF_SERVICE AND DOWN, besides the wording used.

I am happy to submit a pull request, if this makes sense and covers what is needed.

As a side note, there are some cases maybe, where some people would like a health indicator not to participate in neither liveness or readiness results, and just use if for alerting purposes, for those cases we could just use a custom health status called WARN that results in 200 http status both for health, and ready endpoints.. but i don't know if that makes sense to more people.

To finish my long post, i feel that the best approach would be for spring boot to support a cascading set of endpoints such as

/actuator/health as it works currently
/actuator/health/alive that cares only if at least a DOWN status exists
/actuator/health/ready that cares only if at least a OUT_OF_SERVICE status exists

and a way to configure the failure status of an indicator

some other benefits i can think from this approach are:

the codebase required is much smaller than the submitted pull request
having a fixed set of endpoints (instead of /actuator/health/groupA, which is app specific) is much much easier to configure downstream tools such as load balancers, client side discovery registries etc
eliminates issues where an indicator does not participate in any group

wilkinsona · 2019-07-16T15:43:46Z

Thanks for sharing your thoughts, @ckoutsouridis.

We discussed this last week and our feeling was that custom status mapping was likely to be quite specific to your usage and not something that we feel is generally applicable. As you may already know, you can perform custom mappings with HealthAggregator and HealthStatusHttpMapper.

having a fixed set of endpoints (instead of /actuator/health/groupA, which is app specific) is much much easier to configure downstream tools such as load balancers, client side discovery registries etc

Liveness and readiness checks are just one use case that we're hoping to cover by being able to group health indicators. We'd like to provide something that's sufficiently flexible to support liveness and readiness checks with Kubernetes as well as more general grouping.

eliminates issues where an indicator does not participate in any group

If we go with something that's similar to what we currently have in mind, this won't be an issue. Every indicators will be included in /actuator/health which can be thought of as an implicit group that includes every health indicator.

ckoutsouridis · 2019-07-16T16:18:07Z

thank you @wilkinsona for the reply. I wasn't referring so much to custom statuses, as we rarely use them as well.

I was mostly referring to the fact that at the moment OUT_OF_SERVICE and DOWN health statuses have no real difference, and it would be quite powerful if spring-boot offered an endpoint that can distinguish those. and then every user/developer can pick what status each indicator should return upon failure (or leave the default down).

But yes, this suggestion has nothing to do with grouping them.

This commit allows a user to define an arbitrary number of health indicator groups using configuration. If a given health indicator is defined in more than one group, a single invocation is ensured using a AggregateHealth. A reactive counter-part is also provided. See spring-projectsgh-14022

Migrate all `HealthIndicator` configuration to `HealthContributor` configurations instead. See spring-projectsgh-14022

Update the `HealthEndpoint` to support health groups. The `HealthEndpointSettings` interface has been replaced with `HealthEndpointGroups` which provides access to the primary group as well as an optional set of additional groups. Groups can be configured via properties and may have custom `StatusAggregator` and `HttpCodeStatusMapper` settings. Closes spring-projectsgh-14022 Co-authored-by: Stephane Nicoll <snicoll@pivotal.io>

Migrate all `HealthIndicator` configuration to `HealthContributor` configurations instead. See gh-14022

Migrate all `HealthIndicator` configuration to `HealthContributor` configurations instead. See spring-projectsgh-14022

Update the `HealthEndpoint` to support health groups. The `HealthEndpointSettings` interface has been replaced with `HealthEndpointGroups` which provides access to the primary group as well as an optional set of additional groups. Groups can be configured via properties and may have custom `StatusAggregator` and `HttpCodeStatusMapper` settings. Closes spring-projectsgh-14022 Co-authored-by: Stephane Nicoll <snicoll@pivotal.io>

wilkinsona · 2019-08-22T08:14:55Z

We need to update the Actuator API documentation to reflect the new functionality. Currently, it mentions components and component instances but makes no mention of groups. We also need to figure out how to document the arbitrarily deep path that's now supported.

ngaya-ll changed the title ~~Add a mechanism to request different subsets of health indicators~~ Mechanism to request different subsets of health indicators Aug 8, 2018

ngaya-ll mentioned this issue Aug 8, 2018

Conditional Execution of Health Checks #3441

Closed

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Aug 8, 2018

wilkinsona added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged labels Aug 9, 2018

philwebb added this to the General Backlog milestone Sep 11, 2018

philwebb mentioned this issue Oct 1, 2018

Exclude Beans from the Health Check #14639

Closed

philwebb mentioned this issue Oct 21, 2018

Add Kafka Health Indicator #14088

Closed

philwebb added the for: team-meeting label Dec 17, 2018

philwebb mentioned this issue Dec 17, 2018

Fine tuning of Health status aggregation #15479

Closed

philwebb added the status: pending-design-work Needs design work before any code can be developed label Dec 17, 2018

wilkinsona mentioned this issue Jan 15, 2019

Provide a way to flag DataSource beans as essential or non-essential for the dbHealthIndicator #6807

Closed

bclozel removed the for: team-meeting label Jan 16, 2019

bclozel modified the milestones: General Backlog, 2.2.x Jan 16, 2019

This was referenced Mar 19, 2019

Support resolve health groups #16252

Closed

Introduce Sticky health indicator #16275

Closed

snicoll self-assigned this Apr 27, 2019

snicoll added a commit to snicoll/spring-boot that referenced this issue Apr 28, 2019

Hacking

a9a57e5

See spring-projectsgh-14022

snicoll assigned wilkinsona May 2, 2019

snicoll added for: team-attention An issue we'd like other members of the team to review and removed for: team-attention An issue we'd like other members of the team to review labels Jul 9, 2019

philwebb mentioned this issue Jul 31, 2019

Allow actuator endpoint selectors to capture all paths #17743

Closed

philwebb changed the title ~~Mechanism to request different subsets of health indicators~~ Add support for health indicator groups Aug 21, 2019

philwebb mentioned this issue Aug 21, 2019

Configure ping (formerly application) health indicator by default #17926

Closed

philwebb added a commit to philwebb/spring-boot that referenced this issue Aug 21, 2019

Migrate HealthIndicator auto-configurations

74bd976

Migrate all `HealthIndicator` configuration to `HealthContributor` configurations instead. See spring-projectsgh-14022

philwebb added the for: team-attention An issue we'd like other members of the team to review label Aug 21, 2019

This was referenced Aug 21, 2019

Migrate from deprecated CompositeHealthIndicator spring-cloud/spring-cloud-commons#592

Merged

Migrate from deprecated CompositeHealthIndicator spring-cloud/spring-cloud-stream#1788

Closed

philwebb modified the milestones: 2.2.x, 2.2.0.M6 Aug 22, 2019

philwebb added a commit that referenced this issue Aug 22, 2019

Migrate HealthIndicator auto-configurations

f09e026

Migrate all `HealthIndicator` configuration to `HealthContributor` configurations instead. See gh-14022

philwebb closed this as completed in e03f822 Aug 22, 2019

pull bot pushed a commit to scope-demo/spring-boot that referenced this issue Aug 22, 2019

Migrate HealthIndicator auto-configurations

6053bb2

Migrate all `HealthIndicator` configuration to `HealthContributor` configurations instead. See spring-projectsgh-14022

wilkinsona removed the for: team-attention An issue we'd like other members of the team to review label Aug 22, 2019

wilkinsona reopened this Aug 22, 2019

ST-DDT mentioned this issue Aug 22, 2019

Expose service availability to actuator grpc-ecosystem/grpc-spring#110

Closed

snicoll assigned philwebb Aug 30, 2019

wilkinsona closed this as completed in 674d3dd Sep 9, 2019

ckoutsouridis mentioned this issue Oct 27, 2019

Better controls for Health Indicators #18753

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for health indicator groups #14022

Add support for health indicator groups #14022

ngaya-ll commented Aug 8, 2018

eddumelendez commented Aug 9, 2018

wilkinsona commented Aug 10, 2018 •

edited

Loading

eddumelendez commented Aug 10, 2018

mbhave commented Mar 15, 2019

eddumelendez commented Mar 15, 2019

mbhave commented Mar 15, 2019

eddumelendez commented Mar 18, 2019

snicoll commented Apr 27, 2019

snicoll commented Apr 28, 2019

wilkinsona commented May 1, 2019

snicoll commented Jun 21, 2019

ckoutsouridis commented Jun 29, 2019 •

edited

Loading

wilkinsona commented Jul 16, 2019

ckoutsouridis commented Jul 16, 2019

wilkinsona commented Aug 22, 2019

Add support for health indicator groups #14022

Add support for health indicator groups #14022

Comments

ngaya-ll commented Aug 8, 2018

eddumelendez commented Aug 9, 2018

wilkinsona commented Aug 10, 2018 • edited Loading

eddumelendez commented Aug 10, 2018

mbhave commented Mar 15, 2019

eddumelendez commented Mar 15, 2019

mbhave commented Mar 15, 2019

eddumelendez commented Mar 18, 2019

snicoll commented Apr 27, 2019

snicoll commented Apr 28, 2019

wilkinsona commented May 1, 2019

snicoll commented Jun 21, 2019

ckoutsouridis commented Jun 29, 2019 • edited Loading

wilkinsona commented Jul 16, 2019

ckoutsouridis commented Jul 16, 2019

wilkinsona commented Aug 22, 2019

wilkinsona commented Aug 10, 2018 •

edited

Loading

ckoutsouridis commented Jun 29, 2019 •

edited

Loading