A simple easy to use Hadoop map reduce workflow engine
Latest commit 99f3563 Mar 30, 2012 Pranab Ghosh save failed workflows for later execution


Fluxua is a simple workflow driver for Hadoop map reduce jobs. It's non intrusive. It does not require
your map reduce implementation to extend any special class and can be plugged into the workflow as is. There
are other map reduce workflow engines like oozie, cascading etc. that you can use. Fluxua is itended to be a 
simple, small foot print alternative.

The Hadoop map reduce jobs are defined as nodes in a DAG (directed acyclic graph). Edges represent the dependency
between the jobs. The workflow drives topologically orders the DAG and executes the jobs in proper order, starting
with the jobs at the root nodes of the DAG.

Each MR job is executed from a separate thread. The executing thread communicates with the driver through a blocking 
queue. All MR jobs at intermediate DAG nodes should do blocking executions of the job, so that the dependent jobs can
be launched only after the parent jobs have completed.

The jobs and the workflow are defined ina JSON file as below. The configuration has three main sections. The first
section has the system configurations. The second sections has configuration of all thge jobs. The las sectipns has 
flow definitions in terms of jobs. There is an example from a data mining project in the resource directory.

Sample Shell Script
Here is sample shell script for using the driver. The entry point is the class org.fluxua.driver.JobDriver

hadoop fs -rmr /zaal/dor/output
hadoop fs -rmr /zaal/bcl/output
hadoop jar $JAR $CL -c $CONFIG -f bayesian -i sample

Additional Info
More details can be found in my blogpost here