Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
Introduction ============ Fluxua is a simple workflow driver for Hadoop map reduce jobs. It's non intrusive. It does not require your map reduce implementation to extend any special class and can be plugged into the workflow as is. There are other map reduce workflow engines like oozie, cascading etc. that you can use. Fluxua is itended to be a simple, small foot print alternative. Architecture ============ The Hadoop map reduce jobs are defined as nodes in a DAG (directed acyclic graph). Edges represent the dependency between the jobs. The workflow drives topologically orders the DAG and executes the jobs in proper order, starting with the jobs at the root nodes of the DAG. Each MR job is executed from a separate thread. The executing thread communicates with the driver through a blocking queue. All MR jobs at intermediate DAG nodes should do blocking executions of the job, so that the dependent jobs can be launched only after the parent jobs have completed. Configuration ============= The jobs and the workflow are defined ina JSON file as below. The configuration has three main sections. The first section has the system configurations. The second sections has configuration of all thge jobs. The las sectipns has flow definitions in terms of jobs. There is an example from a data mining project in the resource directory. Sample Shell Script =================== Here is sample shell script for using the driver. The entry point is the class org.fluxua.driver.JobDriver DR_JAR=/home/pranab/Projects/fluxua/target/fluxua-1.0.jar JAR=/home/pranab/Projects/zaal/target/zaal-1.0.jar CL=org.fluxua.driver.JobDriver CONFIG=/home/pranab/Projects/zaal/zaal.json export HADOOP_CLASSPATH=$DR_JAR:$JAR echo $HADOOP_CLASSPATH hadoop fs -rmr /zaal/dor/output hadoop fs -rmr /zaal/bcl/output hadoop jar $JAR $CL -c $CONFIG -f bayesian -i sample Additional Info =============== More details can be found in my blogpost here http://pkghosh.wordpress.com/2011/05/22/hadoop-orchestration