Skip to content
rsumbaly edited this page Jul 17, 2011 · 19 revisions

The purpose of Hinted Handoff is to provide an additional consistency mechanism, allowing consistency to be reached before (or without) read-repair taking place on the key. The idea is to handle transient failures and partitions in situations where performing read repair at the end of the get pipeline is expensive (e.g., across data centers) or where quorums are otherwise unwanted.

How it works

  1. Any single put or delete to a node (whether a part of a quorum or not) fails due to a non-application exception i.e., timeout, machine unreacheable
    1. If the operation is part of a background request, then upon failure a hinted handoff takes place asynchronously
    2. If the operation is a foreground request (a part of ConfigureNodes where we look for a master), then we add the id of the failed node to a list
  2. For each failed operation, use a specified HintedHandoffStrategy to route the request to a node elsewhere in the cluster and perform a synchronous put to the slop store on that node, retaining the original vector clock. A slop store is a separate store that is not used for live quorums, but merely stores these hints. Each hint consist of: key, version (vector clock), original node, time handoff occurred, operation (put or a delete) and, if the operation is a put, the value. The hints are serialized using protocol buffers.
    1. When in a multi-colo environment, make sure to route hints that occur due to failures of remote nodes to nodes in the local datacenter. Route hints occuring due to failures of local nodes to local datacenter.
  3. If ultimately the put or delete is a failure (required-writes not achieved), we still return an exception to the client; if previous step succeeded, specify within the exception explanation string that a handoff has occurred
  4. Periodically the nodes holding the hints should attempt to write the original key and value (or a request to delete the specified version) to the original (once failed) node. Once a put or delete is confirmed as having succeeded to the original node, we can delete the hint. To
    preserve bandwidth and avoid round trip delays, we made an AdminClient (streaming) based implementation of hinted handoff.

Note that this is a deviation from the original Dynamo paper: there are no sloppy quorums for reads; if required-writes aren’t met by a strict quorum, the request is still considered failed (even if hinted handoff succeeds) and hints are written to random nodes rather than to neighbours in the ring to avoid cascading failures.

The strategies

Hinted handoff is not enabled by default

The strategy of which decides which live replica receives a request is a pluggable parameter. There are several implementations:

any-handoff (HandoffToAnyStrategy) — This is the original strategy. Here we choose
a random live node in the cluster and hand the request off to it. Note: there may be scalability issues with this specific pattern in larger (> 15 nodes per datacenter) clusters.

consistent-handoff (ConsistentHandoffStrategy) — Handoff to any of the replica-factor nodes adjacent to the failed node in the ring, the list being static and predetermined.

proximity-handoff (ProximityHandoffStrategy) — Like HandoffToAnyStrategy but will route the hints according to the zone proximity to the client’s zone (data-center) id. Useful if all clients in a specific zone fail, for example.

Configuration

To enable hinted handoff, specify the strategy type in a store definition – in stores.xml for the appropriate stores -

<hinted-handoff-strategy>proximity-handoff</hinted-handoff-strategy>

On the server side these are the parameters that you can use

Parameter Default What it means
slop.store.engine bdb What storage engine should we use for storing misdelivered messages that need to be rerouted?
slop.pusher.enable true Enable the slop pusher job which pushes every ‘slop.frequency.ms’ ms
slop.read.byte.per.sec 10 * 1000 * 1000 Slop max read throughput
slop.write.byte.per.sec 10 * 1000 * 1000 Slop max write throughput
pusher.type StreamingSlopPusherJob Job type to use for pushing out the slops
slop.frequency.ms 5 * 60 * 1000 Frequency at which we’ll try to push out the slops