Skip to content
microprofile fault tolerance
Branch: master
Clone or download
Pull request Compare This branch is 343 commits behind eclipse:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

// // Copyright (c) 2016-2017 Contributors to the Eclipse Foundation // // See the NOTICE file(s) distributed with this work for additional // information regarding copyright ownership. // // Licensed under the Apache License, Version 2.0 (the "License"); // You may not use this file except in compliance with the License. // You may obtain a copy of the License at // // // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // Join the chat at

Fault Tolerance

During the review process, add the following fields as needed:


It is increasingly important to build fault tolerant micro services. Fault tolerance is about leveraging different strategies to guide the execution and result of some logic. Retry policies, bulkheads, and circuit breakers are popular concepts in this area. They dictate whether and when executions should take place, and fallbacks offer an alternative result when an execution does not complete successfully.

As mentioned above, the Fault Tolerance proposal is to focus the aspects: TimeOut, RetryPolicy, Fallback, bulkhead and circuit breaker.

  • TimeOut: Define a duration for timeout
  • RetryPolicy: Define a criteria on when to retry
  • Fallback: provide an alternative solution for a failed execution.
  • Bulkhead: isolate failures in part of the system while the rest part of the system can still function.
  • CircuitBreaker: offer a way of fail fast by automatically failing execution to prevent the system overloading and indefinite wait or timeout by the clients.

The main design is to separate execution logic from execution. The execution can be configured with fault tolerance policies, such as RetryPolicy, fallback, Bulkhead and CircuitBreaker.

Hystrix and Failsafe are two popular libraries for handling failures. This proposal is to define a standard API and approach for applications to follow in order to achieve the fault tolerance.

The requirements are as follows:

  • Loose coupling: Execution logic should not know anything about the execution status or fault tolerance.
  • Failure handling strategy should be configured when the execution takes place.
  • Support for synchronous and asynchronous execution
  • Integration with 3rd party asynchronous APIs. This is necessary to handle executions that are completed at some time in the future, where retries will need to be explicitly scheduled from within the asynchronous execution. This is common when working with various 3rd party asynchronous tools such as Netty, RxJava, Vert.x, etc.
  • Require immutable failure handling policy configuration
  • Some Failure policy configurations, e.g. CircuitBreaker, RetryPolicy, can be used stand alone. For example, it has been very useful for circuit breakers to be standalone constructs which can be plugged into and intentionally shared across multiple executions. Likewise for retry policies. Additionally, an Execution construct can be offered that allows retry policies to be applied to some logic in a standalone, manually controlled way.

Mailinglist thread: Discussion thread topic for that proposal


Currently there are at least two libraries to provide fault tolerance. It is best to uniform the technologies and define a standard so that micro service applications can adopt and the implementation of fault tolerance can be provided by the containers if possible.

Proposed solution

Separate the responsibility of executing logic (Runnables/Callables/etc) from guiding when execution should take place (through retry policies, bulkheads, circuit breakers). In this way, failure handling strategies become configuration that can influence executions, and the execution API itself is just responsible for receiving some configuration and performing executions.

By default, a failure handling strategy could assume, for example, that any exception is a failure. This all is what the RetryPolicy's retryOn, abortOn are all about - defining a failure.

Standardise the Fallback, Bulkhead and CircuitBreaker APIs and provide implementations.

  • CDI-first approach to apply RetryPolicy, Fallback, BulkHead, CircuitBreaker using annotations

Detailed design (One example of implementations)

This specification utilises CDI to simplify the programming model.

CDI-based approach

Use interceptor binding to specify the execution and policy configuration. An annotation of Asynchronous has to be specified for any asynchronous calls. Otherwise, synchronous execution is assumed.

RetryPolicy: A policy to define the retry criteria

An annotation to specify the max retries, delays, maxDuration, Duration unit, jitter, retryOn etc.

CircuitBreaker: a rule to achieve fail fast, in order to prevent from repeating timeout

An annotation to specify when to open a circuit, when to half open, close the circuit.


Define the fallback method or fallback handler for a failed execution.

Timeout to be used specifying the maximum time for an execution

Timeout to specify the maximum time for a particular execution.

Bulkhead - threadpool or semaphore style

Use this annotation without Asynchronous annotation for semaphore style. When used with Asynchronous, it means threadpool style of bulkhead.


The annotations can be applied to a bean or methods. They can be used together. For an instance, @Retry can be used with @Fallback in order to trigger the fallback when the Retry policy fails.

public class FaultToleranceBean {
   int i = 0;
   @Retry(maxRetries = 2)
   public Runnable doWork() {
      Runnable mainService = () -> serviceA(); // This unreliable service sometimes succeeds but
                                         // sometimes throws a RuntimeException
	  return mainService;								 


The annotation parameters can be configured via MicroProfile Config. In order to configure the maxRetries to be 6 for the following Retry policy, define a property org.microprofile.readme.FaultToleranceBean/doWork/Retry/maxRetries=6. Alternatively, if the maxRetries of the Retry is to be configured to 6, just specify the property of Retry/maxRetries=6.

package org.microprofile.readme
public class FaultToleranceBean {
   int i = 0;
   @Retry(maxRetries = 2)
   public Runnable doWork() {
      Runnable mainService = () -> serviceA(); // This unreliable service sometimes succeeds but
                                         // sometimes throws a RuntimeException
	  return mainService;								 

Impact on existing code


Alternatives considered


You can’t perform that action at this time.