Skip to content

RoutedStore redesign

rsumbaly edited this page Jun 16, 2011 · 8 revisions

Pipeline Routed Store – Routing layer

The RoutedStore class is an integral part of the Voldemort client that helps implement its distributed design. As per its name, RoutedStore contains the internal logic to route a client request to a given store; the request may need to be satisfied by one or more nodes. For each request, RoutedStore handles the client request communication with the appropriate nodes. For each node, RoutedStore will set up and execute the request, wait for the response, verify the correct number of responses were received, perform supplementary bookkeeping and recovery logic. It performs all of this in parallel for each node.

It has been noted that the previous thread pool-based implementation of RoutedStore suffered from two main problems:

  1. The logic was too complicated
  2. Slow responses by slow servers resulted in thread pool exhaustion

To fix the second problem, we’ve discussed implementing a non-blocking model for client requests by overhauling the socket layer to use NIO. This is detailed in the SocketStore redesign wiki page.

For the first problem, we’re looking to model the operations (get, getVersions, getAll, delete, and put) via a processing pipeline, making the discrete actions comprising the entire process clearly defined, (more) easily understandable, and reusable.

Configuration

Let’s first mention how to enable the pipeline routed store in your application. For Voldemort clients, simply set the following property in your client configuration:

enable_pipeline_routed_store=true

In the server.properties file used by each node, you can set the following property:

enable.pipeline.routed.store=true

Note: as of 0.90, enable_pipeline_routed_store (client) and enable.pipeline.routed.store (server) default to true.

Classes/Interfaces

Let’s discuss some of the classes/interfaces.

Action

An Action is a discrete portion of logic that forms part of the overall process that executes a given operation. Its interface is very simple:

public interface Action {

    public void execute(Pipeline pipeline);

}

There’s no clear standard about how much or how little logic is performed in a given Action, but there are intuitive separations in the logic that form natural boundaries.

Pipeline

A Pipeline is the main conduit through which an Action is run. An Action is executed in response to the Pipeline receiving an event. The majority of the events are self-initiated from within the Pipeline itself. The only case thus-far where external entities create events are in response to asynchronous responses from servers. A Response instance is created on completion of an asynchronous request and is fed back into the Pipeline where an appropriate ‘response handler’ action is executed.

A Pipeline instance is created per-request. This is due to the fact that it includes internal state, specific to each request (get, getAll, getVersions, put, and delete) invocation.

Pipeline includes two addEvent methods:

public void addEvent(Event event);

public void addEvent(Event event, Action action);

Both forms take an Event that describes the event that was received. The second form takes an action that should be executed in response to receiving the event.

public void execute();

This method pulls the Event instances off of an internal queue and processes them one-by-one, in order of receipt, until an event of type Event.COMPLETED is received at which point the method will exit.

Pipeline.Event

An event is simply an enum that details what sort of event occurred. Here is the definition of Event:

public enum Event {

    STARTED,
    CONFIGURED,
    COMPLETED,
    INSUFFICIENT_SUCCESSES,
    INSUFFICIENT_ZONES,
    RESPONSES_RECEIVED,
    ERROR,
    MASTER_DETERMINED,
    ABORTED,
    HANDOFF_FINISHED;

}

PipelineData

PipelineData includes a common set of data that is used to represent the state within the Pipeline as it moves from action to action. There’s a one-to-one correspondence between a Pipeline and PipelineData, though the latter is not included as an instance variable. Action implementations usually include the PipelineData as an instance variable upon creation. There are a handful of subclasses of PipelineData that are used to handle the different types of operations.

Response

Response represents a response from a call to a remote Voldemort node to perform some operation (get, put, etc.). It wraps the following values:

  • Node node
  • K key
  • V value
  • long requestTime

It includes the Node and request time as these are needed by the FailureDetector that will be used by the user of the Response. A Response is usually used in conjunction with asynchronous requests as a sort of callback mechanism, though this isn’t always the case. In the case where they are the result of asynchronous requests, the NonblockingStore will invoke the NonblockingStoreCallback instances’s requestComplete method which will in turn package up the data in a Response object that is sent to the Pipeline via an Event.RESPONSE_RECEIVED event.

Response instances are stored in the PipelineData to represent the responses to requests made during execution of the Pipeline.

This class uses generics for the value to support the return types used by the different operations. Often the key type is simply ByteArray, but in the case of the “get all” operation, the key is actually an Iterable.

Comments, Criticisms, Etc.

If you have found any problems, errors, etc. please feel free to describe them on the issues page.

Thanks.