Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for health indicator groups #14022

Closed
ngaya-ll opened this issue Aug 8, 2018 · 16 comments
Closed

Add support for health indicator groups #14022

ngaya-ll opened this issue Aug 8, 2018 · 16 comments
Assignees
Labels
type: enhancement A general enhancement
Milestone

Comments

@ngaya-ll
Copy link

ngaya-ll commented Aug 8, 2018

My organization uses Spring Boot services running in a Kubernetes environment. Kubernetes has a concept of two types of health probes: liveness and readiness probes. Liveness probes are used to detect if the service needs to be restarted, while readiness probes detect if the service is ready to be added to load balancers.

It would be great if there were a way to perform different checks for readiness and liveness. For example, if an application temporarily loses connectivity to its data source it should be taken out of load balancing, but doesn't need to be restarted. So we would like to include the DataSourceHealthIndicator in the readiness probe but not the liveness probe.

Note that #8685 already allows querying a single health indicator, but not multiple indicators. Perhaps Spring Boot could provide a way to configure named groups of indicators that could be accessed as /actuator/health/{group}.

@ngaya-ll ngaya-ll changed the title Add a mechanism to request different subsets of health indicators Mechanism to request different subsets of health indicators Aug 8, 2018
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Aug 8, 2018
@wilkinsona wilkinsona added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged labels Aug 9, 2018
@eddumelendez
Copy link
Contributor

I am working in this issue. I will be sending a PR this weekend.

@wilkinsona
Copy link
Member

wilkinsona commented Aug 10, 2018

Thank you, @eddumelendez. We had a few thoughts when we discussed this earlier this week. They are only initial thoughts so they aren't prescriptive, but hopefully they'll be of some use.

We envisaged a property-based mechanism for creating a group of health indicators. In application.properties, the value of the property would probably be a comma-separated list of the ids of the health indicators that are to be included in the group. The key would be the name of the group.

We then thought that CompositeHealthIndicator could be reused and created with all the health indicators in the group. One gotcha that @philwebb identified is that these composite health indicators that represent a group will need to be hidden from the main health check otherwise any grouped indicators would be called twice for a single check.

@eddumelendez
Copy link
Contributor

Thanks for letting me know @wilkinsona I'll be working on this taking into account these thoughts.

@mbhave
Copy link
Contributor

mbhave commented Mar 15, 2019

@eddumelendez Are you still interested in working on this? If not, we'll go ahead and pick it up.

@eddumelendez
Copy link
Contributor

@mbhave I have an initial draft which I can share it later today.

@mbhave
Copy link
Contributor

mbhave commented Mar 15, 2019

great, thanks!

@eddumelendez
Copy link
Contributor

sorry for the delay. I have submitted #16252

@snicoll snicoll self-assigned this Apr 27, 2019
@snicoll
Copy link
Member

snicoll commented Apr 27, 2019

I gave this some thoughts and I am wondering if we shouldn't have a concept of "out-of-the-box groups" the same way we have logging groups. The two obvious ones are those defined in this very request.

If there is an obvious default for a given indicator, having a way to indicate a preference would give a better out-of-the-box experience for users compared to having them configure the list of indicators in every app.

Another argument is that, in this scenario, if an indicator is not specified, it will not be invoked. When new indicators are added, there's no hint and it could easily be missed if the application is only relying on one group or another.

Of course, we can start with a configurable map for a start and see how we can improve the manual solution.

snicoll added a commit to snicoll/spring-boot that referenced this issue Apr 28, 2019
@snicoll
Copy link
Member

snicoll commented Apr 28, 2019

I've started to hack on a prototype where the following configuration is applied (random indicators based on spring-boot-sample-actuator):

management.health.groups.ready=diskSpace
management.health.groups.live=example,hello,db,unknown

(Notice the unknown indicator to represent the fact a reference to an unavailable indicator). Starting the sample based on that branch gives the following for the ready group:

// http://localhost:8080/actuator/health/ready

{
  "status": "UP",
  "details": {
    "diskSpace": {
      "status": "UP",
      "details": {
        "total": 499963170816,
        "free": 271707451392,
        "threshold": 10485760
      }
    }
  }
}

The live group looks as follows:

// http://localhost:8080/actuator/health/live

{
  "status": "UP",
  "details": {
    "example": {
      "status": "UP",
      "details": {
        "counter": 42
      }
    },
    "hello": {
      "status": "UP",
      "details": {
        "hello": "world"
      }
    },
    "db": {
      "status": "UP",
      "details": {
        "database": "H2",
        "result": 1,
        "validationQuery": "SELECT 1"
      }
    }
  }
}

The main endpoint does not contain any reference to live or ready as groups are automaticallly filtered out by the registry implementation.

This spike is quite limited but brings several interesting questions:

  • A health indicator cannot have the same name as a group. If one attempts to create a group with the name of an indicator, it (should) fail as the current idea is to query /health/{name} for both groups and single indicators.
  • What should happen if a config refers to an unknown indicator? Right now it is ignored
  • What should happen if a group is empty? Use case: all its indicators are not available or the group does not defined any indicator. Right now it returns UNKNOWN
  • The current implementation is quite a hack as it relies on the fact getAll() does not return the groups. The javadoc, however, mentions that indicators should be returned. This turns the groups into an implementation detail so perhaps we'd like to give them a bit more visibility
  • Indicators have to be listed one by one. This is tedious and error prone if new indicators are added along the way
  • Given how the groups are configured, groups is a reserved word, i.e. you cannot name an indicator that way.

@wilkinsona
Copy link
Member

A health indicator cannot have the same name as a group. If one attempts to create a group with the name of an indicator, it (should) fail as the current idea is to query /health/{name} for both groups and single indicators.

+1 for failing

What should happen if a config refers to an unknown indicator? Right now it is ignored

Accidental absence of an indicator from a group could lead to traffic being routed to an instance when it's unable to handle it. As such, I think the implementation should try to help users to avoid that mistake. I think we should fail if a group is configured to include an unknown indicator. We could consider a configuration option that changes this behaviour from fail to ignore.

What should happen if a group is empty? Use case: all its indicators are not available or the group does not defined any indicator. Right now it returns UNKNOWN

I think unknown indicators should result in failure (see above). I think a group that's genuinely empty (rather than only containing indicators that do no exist) should return UNKNOWN as the main endpoint does today if there are no indicators.

The current implementation is quite a hack as it relies on the fact getAll() does not return the groups. The javadoc, however, mentions that indicators should be returned. This turns the groups into an implementation detail so perhaps we'd like to give them a bit more visibility

I share you're feeling that this isn't quite right, but I'm not sure I have a better proposal right now. The two marker interfaces for the group indicators feel like a bit of a smell, but I'm not sure how else we can distinguish between a group indicator and a standard indicator.

Indicators have to be listed one by one. This is tedious and error prone if new indicators are added along the way

I wonder if we need some sort of wildcard or pattern matching support. I'd be tempted to ship a first iteration without that and gather some feedback before we do anything more sophisticated.

Given how the groups are configured, groups is a reserved word, i.e. you cannot name an indicator that way.

While this would be a breaking change, I think it's unlikely that anyone has implemented their own indicator named groups. We should avoid the potential problem by allowing the group support to be switched off if needs be.

snicoll added a commit to snicoll/spring-boot that referenced this issue May 13, 2019
This commit allows a user to define an arbitrary number of health
indicator groups using configuration. If a given health indicator is
defined in more than one group, a single invocation is ensured using
a AggregateHealth. A reactive counter-part is also provided.

See spring-projectsgh-14022
snicoll added a commit to snicoll/spring-boot that referenced this issue Jun 21, 2019
This commit allows a user to define an arbitrary number of health
indicator groups using configuration. If a given health indicator is
defined in more than one group, a single invocation is ensured using
a AggregateHealth. A reactive counter-part is also provided.

See spring-projectsgh-14022
@snicoll
Copy link
Member

snicoll commented Jun 21, 2019

Team, I've rebased my proposal in snicoll@43144b6 - if someone has some cycles to review it, I'd appreciate. I've been back and forth on several strategies to identify that an indicator is a group and hope a review would ameliorate the proposal. It is also very basic at the moment where groups are declared in a static fashion via configuration.

@ckoutsouridis
Copy link

ckoutsouridis commented Jun 29, 2019

This is a very interesting topic, and in our organisation we required a separation of ready, and alive endpoints as others mentioned in the above.

But we went in a bit different direction..we introduced a new @Endpoint called alive. This endpoint was calling the actual Health. If the health is different than DOWN, it was returning 200 status code.
For example we had the bellow mapping of health status and http response statuses

Health /actuator/alive http status code
UP 200
OUT_OF_SERVICE 200
CUSTOM_HEALTH_STATUS 200
DOWN 503

and we mapped k8s endpoints such as:

  • livenessProbe ->/actuator/alive
  • readinessProbe ->/actuator/health

Even the wording of the health statuses is kind of similar to the wording of kubernetes but a bit inverted,

  • livenessProbe -> NOT DOWN
  • readinessProbe -> NOT OUT_OF_SERVICE

All core health indicators already return DOWN, e.g. datasource, jms etc when the resource is not available *. which covers our needs
This allowed us to write our own health indicators which where returning OUT_OF_SERVICE (an example is long cache loading operations).

I understand that this is completely different approach from what is requested, but i think it delivers what was requested + a couple of opportunities that might be useful.

For example, some health indicators might get written in a smarter way, to distinguish whether the app should stop running, or it should just stop serving requests. one example, although probably difficult to implement, is to distinguish transient errors on datasourceHealthIndicator and return OUT_OF_SERVICE from other errors, and return DOWN.

I don't know if this approach is more powerful or not than grouping health indicators, but the implementation of the new alive @Endpoint was dead simple, and it gave us what we needed.
A benefit that i can think of is that with this approach a single health indicator can decide whether the incident requires a stop/restart in k8s, or just un-available for usage. So for the above mentioned example of cache-loading, our rule is if the cache is not loaded return OUT_OF_SERVICE and do not serve traffic, if the cache loading fails return DOWN, restart the container, and try again.

Actually this approach, can work well in conjunction with groups, all i am saying is that although the "custom" endpoint was easy to write, i would like to see something similar in spring-boot :)

Or it can also replace the grouping approach, as instead of configuring groups, we would then just configure the statuses returned from each health indicator, for example it would be quite easy to control with a property or code, that jmsHealthIndicator should return OUT_OF_SERVICE instead of DOWN

this way, we can finally give a semantic on the OUT_OF_SERVICE AND DOWN, besides the wording used.

I am happy to submit a pull request, if this makes sense and covers what is needed.

As a side note, there are some cases maybe, where some people would like a health indicator not to participate in neither liveness or readiness results, and just use if for alerting purposes, for those cases we could just use a custom health status called WARN that results in 200 http status both for health, and ready endpoints.. but i don't know if that makes sense to more people.

To finish my long post, i feel that the best approach would be for spring boot to support a cascading set of endpoints such as

  • /actuator/health as it works currently
  • /actuator/health/alive that cares only if at least a DOWN status exists
  • /actuator/health/ready that cares only if at least a OUT_OF_SERVICE status exists

and a way to configure the failure status of an indicator

some other benefits i can think from this approach are:

  • the codebase required is much smaller than the submitted pull request
  • having a fixed set of endpoints (instead of /actuator/health/groupA, which is app specific) is much much easier to configure downstream tools such as load balancers, client side discovery registries etc
  • eliminates issues where an indicator does not participate in any group

@snicoll snicoll added for: team-attention An issue we'd like other members of the team to review and removed for: team-attention An issue we'd like other members of the team to review labels Jul 9, 2019
@wilkinsona
Copy link
Member

Thanks for sharing your thoughts, @ckoutsouridis.

We discussed this last week and our feeling was that custom status mapping was likely to be quite specific to your usage and not something that we feel is generally applicable. As you may already know, you can perform custom mappings with HealthAggregator and HealthStatusHttpMapper.

having a fixed set of endpoints (instead of /actuator/health/groupA, which is app specific) is much much easier to configure downstream tools such as load balancers, client side discovery registries etc

Liveness and readiness checks are just one use case that we're hoping to cover by being able to group health indicators. We'd like to provide something that's sufficiently flexible to support liveness and readiness checks with Kubernetes as well as more general grouping.

eliminates issues where an indicator does not participate in any group

If we go with something that's similar to what we currently have in mind, this won't be an issue. Every indicators will be included in /actuator/health which can be thought of as an implicit group that includes every health indicator.

@ckoutsouridis
Copy link

thank you @wilkinsona for the reply. I wasn't referring so much to custom statuses, as we rarely use them as well.

I was mostly referring to the fact that at the moment OUT_OF_SERVICE and DOWN health statuses have no real difference, and it would be quite powerful if spring-boot offered an endpoint that can distinguish those. and then every user/developer can pick what status each indicator should return upon failure (or leave the default down).

But yes, this suggestion has nothing to do with grouping them.

philwebb pushed a commit to philwebb/spring-boot that referenced this issue Aug 7, 2019
This commit allows a user to define an arbitrary number of health
indicator groups using configuration. If a given health indicator is
defined in more than one group, a single invocation is ensured using
a AggregateHealth. A reactive counter-part is also provided.

See spring-projectsgh-14022
@philwebb philwebb changed the title Mechanism to request different subsets of health indicators Add support for health indicator groups Aug 21, 2019
philwebb added a commit to philwebb/spring-boot that referenced this issue Aug 21, 2019
Migrate all `HealthIndicator` configuration to `HealthContributor`
configurations instead.

See spring-projectsgh-14022
philwebb added a commit to philwebb/spring-boot that referenced this issue Aug 21, 2019
Update the `HealthEndpoint` to support health groups. The
`HealthEndpointSettings` interface has been replaced with
`HealthEndpointGroups` which provides access to the primary group
as well as an optional set of additional groups.

Groups can be configured via properties and may have custom
`StatusAggregator` and `HttpCodeStatusMapper` settings.

Closes spring-projectsgh-14022

Co-authored-by: Stephane Nicoll <snicoll@pivotal.io>
@philwebb philwebb added the for: team-attention An issue we'd like other members of the team to review label Aug 21, 2019
@philwebb philwebb modified the milestones: 2.2.x, 2.2.0.M6 Aug 22, 2019
philwebb added a commit that referenced this issue Aug 22, 2019
Migrate all `HealthIndicator` configuration to `HealthContributor`
configurations instead.

See gh-14022
pull bot pushed a commit to scope-demo/spring-boot that referenced this issue Aug 22, 2019
Migrate all `HealthIndicator` configuration to `HealthContributor`
configurations instead.

See spring-projectsgh-14022
pull bot pushed a commit to scope-demo/spring-boot that referenced this issue Aug 22, 2019
Update the `HealthEndpoint` to support health groups. The
`HealthEndpointSettings` interface has been replaced with
`HealthEndpointGroups` which provides access to the primary group
as well as an optional set of additional groups.

Groups can be configured via properties and may have custom
`StatusAggregator` and `HttpCodeStatusMapper` settings.

Closes spring-projectsgh-14022

Co-authored-by: Stephane Nicoll <snicoll@pivotal.io>
@wilkinsona wilkinsona removed the for: team-attention An issue we'd like other members of the team to review label Aug 22, 2019
@wilkinsona
Copy link
Member

We need to update the Actuator API documentation to reflect the new functionality. Currently, it mentions components and component instances but makes no mention of groups. We also need to figure out how to document the arbitrarily deep path that's now supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

9 participants