Skip to content

Sagas SnowPloughExample

Henrik edited this page Jul 22, 2011 · 3 revisions

Sagas - Ordering a Snow Plow Example

The workflow goes like this:

  1. A snow plow is ordered 1.a. Billing listens to this and performs the billing 1.b. Inventory listens to this and schedules a driver and another truck to drive the snow plow to shipping
  2. While the truck is driving the snow plow, the billing department sends a 'billing failed' event (which is rare, but happens) - now what to do?

No Saga

The problem now is that in order to solve this business problem of the customer now being able to pay the invoice/bill we'd have to either sprinkle information throughout the systems, billing with knowledge about the failure, sales about how to handle the same failure, inventory to react to the failure and potentially retract the shipment or choose another customer paying less, to ship it to -- this is where a saga could be useful.

Bring in the Saga

The Saga's responsibility is to make sure the data each systems need to make decisions is available. The Saga should not take the role of a decider. The Saga should be doing routing of information - orchestrating the calls that fetch the data for the decision makers. The actual domain model systems; Sales and Billing and Inventory should be doing that logic.

By having a correlation Id for this complete conversation between the systems a saga implementation can instead handle how these messages are handled. Each message causes the saga to transition between states, so the saga is purely a state machine.

Let's see a saga that uses a simple request-reply pattern for communicating with the systems:

Simplistic Saga

Instead then of turning the truck around, the saga could be responsible for getting the correct data from Shipping, e.g. "What is the pricing for a 10x10x3m lot for n days?" - the response data would be sent to the correct receiver.

Refining the Saga

First, let's do away with the request-reply asynchronous invocations and instead move towards a fire-and-forget until the resulting event comes in.

Sometimes systems go down - the saga should/could be capable of handling failures by having an alarm scheduler or otherwise some timeout mechanism - the saga could then implement the proper failure handling, such as resending the message or escalating an issue to a human operator. The below picture shows how this can look; e.g. where Bill Customer doesn't return, the Alarm Clock is responsible for waking the Saga in 5 minutes time and allowing it to react to this lack of a reply message:

Saga With Timeout

Now, the alarm clock has two possible ways of working - 1) call back to wake the saga up, or 2) send the alarm clock a message with a message where the encapsulated message contains all information needed to move the process forward.

By using number 2) of this pattern we're actually externalizing the state.

This said, a lot of infrastructure for stateful sagas exist in MassTransit and it should be your reasoned choice about which way to take.