Simple, sophisticated failure handling
Java Shell
Latest commit ec87ba5 Feb 18, 2017 @jhalterman Update changelog

README.md

Failsafe

Build Status Maven Central License JavaDoc

Simple, sophisticated failure handling.

Introduction

Failsafe is a lightweight, zero-dependency library for handling failures. It was designed to be as easy to use as possible, with a concise API for handling everyday use cases and the flexibility to handle everything else. Failsafe features:

Supports Java 6+ though the documentation uses lambdas for simplicity.

Setup

Add the latest Failsafe Maven dependency to your project.

Usage

Retries

One of the core Failsafe features is retries. To start, define a RetryPolicy that expresses when retries should be performed:

RetryPolicy retryPolicy = new RetryPolicy()
  .retryOn(ConnectException.class)
  .withDelay(1, TimeUnit.SECONDS)
  .withMaxRetries(3);

Then use your RetryPolicy to execute a Runnable or Callable with retries:

// Run with retries
Failsafe.with(retryPolicy).run(() -> connect());

// Get with retries
Connection connection = Failsafe.with(retryPolicy).get(() -> connect());

Java 6 and 7 are also supported:

Connection connection = Failsafe.with(retryPolicy).get(new Callable<Connection>() {
  public Connection call() {
    return connect();
  }
});

Retry Policies

Failsafe's retry policies provide flexibility in allowing you to express when retries should be performed.

A policy can allow retries on particular failures:

RetryPolicy retryPolicy = new RetryPolicy()
  .retryOn(ConnectException.class, SocketException.class);
  .retryOn(failure -> failure instanceof ConnectException);

And for particular results or conditions:

retryPolicy
  .retryWhen(null);
  .retryIf(result -> result == null);  

It can add a fixed delay between retries:

retryPolicy.withDelay(1, TimeUnit.SECONDS);

Or a delay that backs off exponentially:

retryPolicy.withBackoff(1, 30, TimeUnit.SECONDS);

It can add a random jitter factor to the delay:

retryPolicy.withJitter(.1);

Or a time based jitter:

retryPolicy.withJitter(100, TimeUnit.MILLISECONDS);

It can add a max number of retries and a max retry duration:

retryPolicy
  .withMaxRetries(100)
  .withMaxDuration(5, TimeUnit.MINUTES);

It can also specify which results, failures or conditions to abort retries on:

retryPolicy
  .abortWhen(true)
  .abortOn(NoRouteToHostException.class)
  .abortIf(result -> result == true)

Retry policies support multiple retry or abort conditions of the same type:

retryPolicy
  .retryWhen(null)
  .retryWhen("");

And of course we can combine any of these things into a single policy.

Synchronous Retries

With a retry policy defined, we can perform a retryable synchronous execution:

// Run with retries
Failsafe.with(retryPolicy).run(this::connect);

// Get with retries
Connection connection = Failsafe.with(retryPolicy).get(this::connect);

Asynchronous Retries

Asynchronous executions can be performed and retried on a ScheduledExecutorService or custom Scheduler. They return a FailsafeFuture from which a result can be synchronously retrieved. Execution listeners can also be registered to learn when an execution completes:

Failsafe.with(retryPolicy)
  .with(executor)
  .onSuccess(connection -> log.info("Connected to {}", connection))
  .onFailure(failure -> log.error("Connection attempts failed", failure))
  .get(this::connect);

Circuit Breakers

Circuit breakers are a way of creating systems that fail-fast by temporarily disabling execution as a way of preventing system overload. Creating a CircuitBreaker is straightforward:

CircuitBreaker breaker = new CircuitBreaker()
  .withFailureThreshold(3, 10)
  .withSuccessThreshold(5)
  .withDelay(1, TimeUnit.MINUTES);

We can then execute a Runnable or Callable with the breaker:

Failsafe.with(breaker).run(this::connect);

When a configured threshold of execution failures occurs on a circuit breaker, the circuit is opened and further execution requests fail with CircuitBreakerOpenException. After a delay, the circuit is half-opened and trial executions are attempted to determine whether the circuit should be closed or opened again. If the trial executions exceed a success threshold, the breaker is closed again and executions will proceed as normal.

Circuit Breaker Configuration

Circuit breakers can be flexibly configured to express when the circuit should be opened or closed.

A circuit breaker can be configured to open when a successive number of executions have failed:

CircuitBreaker breaker = new CircuitBreaker()
  .withFailureThreshold(5);

Or when, for example, the last 3 out of 5 executions have failed:

breaker.withFailureThreshold(3, 5);

After opening, a breaker is typically configured to delay before attempting to close again:

breaker.withDelay(1, TimeUnit.MINUTES);

The breaker can be configured to close again if a number of trial executions succeed, else it will re-open:

breaker.withSuccessThreshold(5);

The breaker can also be configured to close again if, for example, the last 3 out of 5 executions succeed, else it will re-open:

breaker.withSuccessThreshold(3, 5);

The breaker can be configured to only recognize certain results, exceptions or conditions as failures:

breaker.
  .failWhen(true)
  .failOn(NoRouteToHostException.class)
  .failIf((result, failure) -> result == 500 || failure instanceof NoRouteToHostException);

And the breaker can be configured to recognize executions that exceed a certain timeout as failures:

breaker.withTimeout(10, TimeUnit.SECONDS);

With Retries

A CircuitBreaker can be used along with a RetryPolicy:

Failsafe.with(retryPolicy).with(breaker).get(this::connect);

Execution failures are first retried according to the RetryPolicy, then if the policy is exceeded the failure is recorded by the CircuitBreaker.

Failing Together

A circuit breaker can and should be shared across code that accesses inter-dependent system components that fail together. This ensures that if the circuit is opened, executions against one component that rely on another component will not be allowed until the circuit is closed again.

Standalone Usage

A CircuitBreaker can also be manually operated in a standalone way:

breaker.open();
breaker.halfOpen();
breaker.close();

if (breaker.allowsExecution()) {
  try {
    doSomething();
    breaker.recordSuccess();
  } catch (Exception e) {
    breaker.recordFailure(e);
  }
}

Fallbacks

Fallbacks allow you to provide an alternative result for a failed execution. They can be used to suppress exceptions and provide a default result:

Failsafe.with(retryPolicy)
  .withFallback(null)
  .get(this::connect);

Throw a custom exception:

Failsafe.with(retryPolicy)
  .withFallback(failure -> { throw new CustomException(failure); })
  .get(this::connect);

Or compute an alternative result such as from a backup resource:

Failsafe.with(retryPolicy)
  .withFallback(this::connectToBackup)
  .get(this::connectToPrimary);

Execution Context

Failsafe can provide an ExecutionContext containing execution related information such as the number of execution attempts as well as start and elapsed times:

Failsafe.with(retryPolicy).run(ctx -> {
  log.debug("Connection attempt #{}", ctx.getExecutions());
  connect();
});

Event Listeners

Failsafe supports a variety of execution and retry event listeners.

It can notify you when an execution completes:

Failsafe.with(retryPolicy)
  .onComplete((cxn, failure) -> {
    if (cxn != null)
      log.info("Connected to {}", cxn);
    else if (failure != null)
      log.error("Failed to create connection", e);
  })
  .get(this::connect);

Or on an execution success or failure:

Failsafe.with(retryPolicy)
  .onSuccess(cxn -> log.info("Connected to {}", cxn))
  .onFailure(failure -> log.error("Failed to create connection", e))
  .get(this::connect);

It can notify you when an execution attempt fails and before a retry is performed:

Failsafe.with(retryPolicy)
  .onFailedAttempt(failure -> log.error("Connection attempt failed", failure))
  .onRetry((c, f, ctx) -> log.warn("Failure #{}. Retrying.", ctx.getExecutions()))
  .get(this::connect);

And it can notify you when an execution fails and the max retries are exceeded:

Failsafe.with(retryPolicy)
  .onRetriesExceeded(ctx -> log.warn("Failed to connect. Max retries exceeded."))
  .get(this::connect);

Asynchronous listeners are also supported:

Failsafe.with(retryPolicy)
  .with(executor)
  .onFailureAsync(e -> log.error("Failed to create connection", e))
  .onSuccessAsync(cxn -> log.info("Connected to {}", cxn), anotherExecutor);
  .get(this::connect);

Java 6 and 7 users can extend the Listeners class and override individual event handlers:

Failsafe.with(retryPolicy)
  .with(new Listeners<Connection>() {
    public void onRetry(Connection cxn, Throwable failure, ExecutionContext ctx) {
      log.warn("Failure #{}. Retrying.", ctx.getExecutions());
    }
  }).get(() -> connect());

CircuitBreaker related event listeners can also be registered:

circuitBreaker.onOpen(() -> log.info("The circuit breaker was opened"));

Asynchronous API Integration

Failsafe can be integrated with asynchronous code that reports completion via callbacks. The runAsync, getAsync and futureAsync methods provide an AsyncExecution reference that can be used to manually schedule retries or complete the execution from inside asynchronous callbacks:

Failsafe.with(retryPolicy)
  .with(executor)
  .getAsync(execution -> service.connect().whenComplete((result, failure) -> {
    if (execution.complete(result, failure))
      log.info("Connected");
    else if (!execution.retry())
      log.error("Connection attempts failed", failure);
  }));

Failsafe can also perform asynchronous executions and retries on 3rd party schedulers via the Scheduler interface. See the Vert.x example for a more detailed implementation.

CompletableFuture Integration

Java 8 users can use Failsafe to retry CompletableFuture calls:

Failsafe.with(retryPolicy)
  .with(executor)
  .future(this::connectAsync)
  .thenApplyAsync(value -> value + "bar")
  .thenAccept(System.out::println));

Functional Interface Integration

Failsafe can be used to create retryable Java 8 functional interfaces:

Function<String, Connection> connect = address -> Failsafe.with(retryPolicy).get(() -> connect(address));

We can retry streams:

Failsafe.with(retryPolicy).run(() -> Stream.of("foo").map(value -> value + "bar"));

Individual Stream operations:

Stream.of("foo").map(value -> Failsafe.with(retryPolicy).get(() -> value + "bar"));

Or individual CompletableFuture stages:

CompletableFuture.supplyAsync(() -> Failsafe.with(retryPolicy).get(() -> "foo"))
  .thenApplyAsync(value -> Failsafe.with(retryPolicy).get(() -> value + "bar"));

Execution Tracking

In addition to automatically performing retries, Failsafe can be used to track executions for you, allowing you to manually retry as needed:

Execution execution = new Execution(retryPolicy);
while (!execution.isComplete()) {
  try {
    doSomething();
    execution.complete();
  } catch (ConnectException e) {
    execution.recordFailure(e);
  }
}

Execution tracking is also useful for integrating with APIs that have their own retry mechanism:

Execution execution = new Execution(retryPolicy);

// On failure
if (execution.canRetryOn(someFailure))
  service.scheduleRetry(execution.getWaitMillis(), TimeUnit.MILLISECONDS);

See the RxJava example for a more detailed implementation.

Additional Resources

Library and API Integration

For library and public API developers, Failsafe integrates nicely into existing APIs, allowing your users to configure retry policies for different operations. One integration approach is to subclass the RetryPolicy class and expose that as part of your API while the rest of Failsafe remains internal. Another approach is to use something like the Maven shade plugin to rename and relocate Failsafe classes into your project's package structure as desired.

Contribute

Failsafe is a volunteer effort. If you use it and you like it, let us know, and also help by spreading the word!

License

Copyright 2015-2016 Jonathan Halterman and friends. Released under the Apache 2.0 license.