Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLO-based Health meter registry #2055

Closed
jkschneider opened this issue May 4, 2020 · 0 comments
Closed

SLO-based Health meter registry #2055

jkschneider opened this issue May 4, 2020 · 0 comments
Labels
enhancement A general enhancement spring-boot change Change is needed in Spring Boot for this issue
Milestone

Comments

@jkschneider
Copy link
Contributor

jkschneider commented May 4, 2020

This feature adds support for commonly requested functionality for an application to be able to aggregate some set of key performance indicators down to a health indicator. In many cases, users would like to configure a load balancer to avoid instances that are failing a key performance indicator by configuring an HTTP health check on the load balancer.

It introduces a new implementation micrometer-registry-health, with a HealthMeterRegistry that can be given programmatically defined service level objectives that can be polled for health at any time.

HealthMeterRegistry does not publish metrics anywhere or expose them as an endpoint by default. Its list of SLOs given by HealthMeterRegistry#getServiceLevelObjectives() are meant to be iterated over and bound to some health endpoint as determined by the user or web framework.

Building a HealthMeterRegistry

Use the registry builder to define service level objectives:

HealthMeterRegistry registry = HealthMeterRegistry.builder(HealthConfig.DEFAULT)
        .serviceLevelObjectives(slos)
        .build();

Predefined SLOs suitable for most Java apps

Micrometer provides a set of SLOs out of the box that should be generally applicable to a broad spectrum of Java applications.

HealthMeterRegistry registry = HealthMeterRegistry.builder(HealthConfig.DEFAULT)
        .serviceLevelObjectives(JvmServiceLevelObjectives.MEMORY)
        .serviceLevelObjectives(JvmServiceLevelObjectives.ALLOCATIONS)
        .serviceLevelObjectives(OperatingSystemServiceLevelObjectives.DISK)
        .build();

Building single indicator SLOs

Service Level Objectives are built up of one or more meter Search criteria, aggregated together in some way. In the below example, we are building a SLO named jvm.pool.memory that requires JvmHeapPressureMetrics to work. When polled, it inspects the value of jvm.memory.usage.after.gc and ensures that it is less than 0.9.

ServiceLevelObjective
        .build("jvm.pool.memory")
        .failedMessage("Memory usage in a single memory pool exceeds 90% after garbage collection.")
        .requires(new JvmHeapPressureMetrics())
        .baseUnit("percent used")
        .value(s -> s.name("jvm.memory.usage.after.gc"))
        .isLessThan(0.9),

Arithmetic operations

Arithmetic operations are available, e.g. for ratios:

ServiceLevelObjective
        .build("jvm.memory.consumption")
        .value(s -> s.name("jvm.memory.used"))
        .dividedBy(denom -> denom.value(s -> s.name("jvm.memory.committed")))
        .isLessThan(0.9)

Looking back over an interval

We can aggregate a value over an interval using maxOver, sumOver, and averageOver.

ServiceLevelObjective
        .build("jvm.memory.consumption")
        .value(s -> s.name("jvm.memory.used"))
        .dividedBy(denom -> denom.value(s -> s.name("jvm.memory.committed")))
        .maxOver(Duration.ofMinutes(5))
        .isLessThan(0.9)

Threshold tests

Out of the box simple tests are provided for isLessThan, isGreaterThan, isLessThanOrEqualTo, isGreaterThanOrEqualTo, and isEqualTo. Two other methods are available for more complex user-defined thresholds:

  • test(String thresholdDescription, Predicate<Double> threshold)
  • testDuration(String thresholdDescription, Predicate<Duration> threshold)

Building multi-indicator SLOs

Sometimes we want to test multiple indicators that together indicate a problem. These can be combined with or and and operators or with a user-provided combine(BinaryOperator<Boolean>).

ServiceLevelObjective
        .compose(
                "jvm.total.memory",
                ServiceLevelObjective
                        .build("jvm.gc.overhead")
                        .value(s -> s.name("jvm.gc.overhead"))
                        .isLessThan(0.2),
                ServiceLevelObjective
                        .build("jvm.memory.consumption")
                        .dividedBy(denom -> denom.value(s -> s.name("jvm.memory.committed")))
                        .maxOver(Duration.ofMinutes(5))
                        .isLessThan(0.9)
        )
        .failedMessage("A message describing the combination of these two SLOs.")
        .and()

Testing SLOs

At a high level, we can iterate over service level objectives and test each for its healthy state. Each SLO has a getName(), getTags(), getFailedMessage(), etc. In the case of composite SLOs, there is also a getObjectives() method that lets you drill down on component SLOs. This data can be used to provide rich error messages in framework-provided health endpoints.

registry.getServiceLevelObjectives().stream().findAny(slo -> !slo.healthy(registry))

Example of mapping SLOs to Spring Boot HealthIndicator

image

Controlling SLOs with MeterFilter

HealthMeterRegistry#Builder can take in MeterFilter instances to deny specific SLOs from being applied or to change their name, tags, base units, and description tags.

HealthMeterRegistry registry = HealthMeterRegistry.builder(HealthConfig.DEFAULT)
        .serviceLevelObjectives(JvmServiceLevelObjectives.MEMORY)
        .serviceLevelObjectiveFilter(MeterFilter.denyNameStartsWith("jvm.pool"))
        .serviceLevelObjectiveFilter(new MeterFilter() {
            @Override
            public Meter.Id map(Meter.Id id) {
                return id.getName().equals("jvm.gc.load") ? id.withName("jvm.collection.load") : id;
            }
        })
        .build();
@jkschneider jkschneider added the enhancement A general enhancement label May 4, 2020
@jkschneider jkschneider added this to the 1.6.0 milestone May 4, 2020
@jkschneider jkschneider added the spring-boot change Change is needed in Spring Boot for this issue label May 4, 2020
@shakuzen shakuzen changed the title Service level objective meter registry SLO-based Health meter registry Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A general enhancement spring-boot change Change is needed in Spring Boot for this issue
Projects
None yet
Development

No branches or pull requests

1 participant