Skip to content

Automatic retries and error handling

Jay Miller edited this page Dec 22, 2023 · 23 revisions

When Rebus receives a message, it keeps the message's ID in an in-memory dictionary along with a few pieces of information on when the message was received, etc.

It also keeps track of how many times it has seen the message before!

This way, Rebus can see if message delivery has failed a certain number of times (default: 5), and if the same message is received one more time, it will be considered "poisonous", and it will be forwarded to Rebus' error queue.

This way, the message is persisted to be retried at a later time.

Also, Rebus can receive the message and handle it inside of a TransactionScope (an "ambient transaction"), provided that you

Configure.With(...)
    .Options(b => b.HandleMessagesInsideTransactionScope())
    .(...)

some time during the configuration spell. This requires the Rebus TransactionScopes package (and at least .NET 4.5.1, because that's required in order for the ambient transaction to flow properly to continuations when you await something.

How to diagnose what went wrong?

Either you turn to Rebus' log - a full exception will be logged at the WARN level for each delivery attempt, except the last which will be logged as an ERROR.

Moreover, to make it even easier, the full exceptions will also be included in a header inside the message, allowing Rebus Snoop, or any other tool that is capable of showing message headers, to show you what went wrong.

Customizing retries

With Rebus, it is assumed that you want to retry delivery when it fails - and by default, it will be delivered 5 times before the message is moved to the error queue.

You can configure the number of retries though - just do something like this:

using Rebus.Retry.Simple;

Configure.With(...)
    .Options(b => b.RetryStrategy(maxDeliveryAttempts: 10))
    .(...)

to increase the number of delivery attempts to 10.

Configuring which error queue to use

By default, a queue named "error" will be used and will be automatically created. If you want to change this setting, you can specify an alternative error queue like this:

using Rebus.Retry.Simple;

Configure.With(...)
    .Options(b => b.RetryStrategy(errorQueueAddress: "somewhere_else"))
    .(...)

Second-level retries

If you want to handle errors with your own logic when a message delivery has failed too many times, you can enable second-level retries like this:

using Rebus.Retry.Simple;

Configure.With(...)
    .Options(b => b.RetryStrategy(secondLevelRetriesEnabled: true))
    .(...)

which causes a failed message to be dispatched as a IFailed<TMessage> when it has failed too many times. This way, message handlers can customize error handling like this:

public class SomeHandler : IHandleMessages<DoStuff>, IHandleMessages<IFailed<DoStuff>>
{
    readonly IBus _bus;

    public SomeHandler(IBus bus)
    {
        _bus = bus;
    }

    public async Task Handle(DoStuff message)
    {
        // do stuff that can fail here...
    }

    public async Task Handle(IFailed<DoStuff> failedMessage)
    {
        const int maxDeferCount = 5;
        var deferCount = Convert.ToInt32(message.Headers.GetValueOrDefault(Headers.DeferCount));
        if (deferCount >= maxDeferCount) {
            await _bus.Advanced.TransportMessage.Deadletter($"Failed after {deferCount} deferrals\n\n{message.ErrorDescription}");
            return;
        }
        await _bus.Advanced.TransportMessage.Defer(TimeSpan.FromSeconds(30));
    }
}

to retry delivery after 30 seconds when the normal delivery attempts have failed. You can then use the optional headers dictionary argument when calling the Defer method to pass along some information on how many second-level delivery attempts Rebus has made, allowing you to forward the message to the error queue if you at some point decide that something is so wrong that no further attempts should be made.

Since this is a fairly common thing to do, Rebus will automatically maintain a special header, rbs2-defer-count (available under the Headers.DeferCount key), which is set to 1, 2, 3, ... as the transport messages gets deferred. The code sample above also gives up after 5 deferrals and will send the message to the dead letter queue with a message about it having failed too many times, along with passing in the original error description.

Please note that the transport message API (available via bus.Advanced.TransportMessage is used to defer the transport message in its entirety, preserving all of its original headers (including its message ID).

Exception Information

The second-level retry class IFailed<TMessage> contains information about the exception(s) causing the initial error in one or more ExceptionInfo objects. By default, these objects are easily serializable for the purpose of distributed error tracking. If you need more detailed metadata about the error(s) and/or access to the original exception, you may need to implement your own IExceptionInfoFactory. This factory controls how ExceptionInfo objects are created from Exceptions.

The easiest option is to make use of the included InMemExceptionInfoFactory, which creates InMemExceptionInfo objects that provide access to the raw exception. Configure this class with the following directive at bus creation:

Configure.With(activator)
         .Errors(e => e.UseInMemExceptionInfos())

Then, when handling an IFailed<TMessage>, access the original exceptions like this:

public async Task Handle(IFailed<DoStuff> failure)
{
    var exceptions = failure.Exceptions.Select(ei => ei.ConvertTo<InMemExceptionInfo>());
}

For more customized exception handling, implement your own IExceptionInfoFactory and then register it like this at bus creation time:

Configure.With(activator)
         .Errors(e => e.OtherService<IExceptionInfoFactory>()
                       .Register(ctx => new FancyExceptionInfoFactory()))

Then in your handlers, use ConvertTo<TExceptionInfo>() as shown above.

A word of warning

If you configure Rebus to retry messages many times, the in-memory list of exception details could grow quite large, effectively generating the symptoms of a memory leak. Therefore, Rebus will keep at most 10 pieces of full exception details around, trimming the list of the oldest whenever a new one arrives.

Moreover, some transports might limit the amount of information that they include in the message headers. E.g. the Azure Service Bus transport will limit the size of each header value to around 16k characters because of a limitation in the underlying transport.

Clone this wiki locally